Handbook of the Economics of Marketing: Marketing and Economics 0444637591, 9780444637598

Handbook of the Economics of Marketing, Volume One: Marketing and Economics mixes empirical work in industrial organizat

1,632 196 6MB

English Pages 632 [619] Year 2019

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Travel Marketing, Tourism Economics and the Airline Product

686 113 688KB Read more

The Handbook of Experimental Economics, Volume 2: The Handbook of Experimental Economics 9781400883172

When The Handbook of Experimental Economics first came out in 1995, the notion of economists conducting lab experiments

229 51 9MB Read more

The Handbook of Experimental Economics 9780691213255

This book, which comprises eight chapters, presents a comprehensive critical survey of the results and methods of labora

253 132 47MB Read more

Handbook of Knowledge and Economics 1843764040, 9781843764045

Why do societies benefit differently from knowledge? How exactly does social interaction interfere with knowledge acquis

458 80 3MB Read more

Travel Marketing, Tourism Economics and the Airline Product: An Introduction to Theory and Practice 9783319498492, 3319498495

This book provides a comprehensive introduction to travel marketing, tourism economics and the airline product. At the s

1,724 56 2MB Read more

The Content Marketing Handbook 9781613084175

889 188 5MB Read more

The Routledge Handbook of the Economics of Ageing 0367713322, 9780367713324

Ageing populations pose some of the foremost global challenges of this century. Drawing on an international pool of scho

331 53 20MB Read more

Economics of Banana Production and Marketing in the Tropics : A Case Study of Cameroon [1 ed.] 9789956726479, 9789956726547

In most African countries, banana production has been consigned to subsistence production. However, a few countries, esp

178 74 4MB Read more

The Digital Marketing Handbook 9781613083819

1,496 244 6MB Read more

The Oxford Handbook of Computational Economics and Finance 0199844372, 9780199844371

The Oxford Handbook of Computational Economics and Finance provides a survey of both the foundations of and recent advan

779 118 19MB Read more

Handbook of the Economics of Marketing: Marketing and Economics
0444637591, 9780444637598

Author / Uploaded
Jean-Pierre Dube (editor)
Peter E. Rossi (editor)

Table of contents :
Cover
Handbook of the
Economics
of Marketing,
Volume 1
Copyright
Contributors
Preface
1 Microeconometric models of consumer demand
1 Introduction
2 Empirical regularities in shopping behavior: The CPG laboratory
3 The neoclassical derivation of an empirical model of individual consumer demand
3.1 The neoclassical model of demand with binding, non-negativity constraints
3.1.1 Estimation challenges with the neoclassical model
3.1.2 Example: Quadratic utility
3.1.3 Example: Linear expenditure system (LES)
3.1.4 Example: Translated CES utility
3.1.5 Virtual prices and the dual approach
3.1.6 Example: Indirect translog utility
3.2 The discrete/continuous product choice restriction in the neoclassical model
3.2.1 The primal problem
3.2.2 Example: Translated CES utility
3.2.3 Example: The dual problem with indirect translog utility
3.2.4 Promotion response: Empirical findings using the discrete/continuous demand model
3.3 Indivisibility and the pure discrete choice restriction in the neoclassical model
3.3.1 A neoclassical derivation of the pure discrete choice model of demand
3.3.2 The standard pure discrete choice model of demand
4 Some extensions to the typical neoclassical specifications
4.1 Income effects
4.1.1 A non-homothetic discrete choice model
4.2 Complementary goods
4.2.1 Complementarity between products within a commodity group
4.2.2 Complementarity between commodity groups (multi-category models)
Example: Perfect substitutes within a commodity group
4.3 Discrete package sizes and non-linear pricing
4.3.1 Expand the choice set
4.3.2 Models of pack size choice
5 Moving beyond the basic neoclassical framework
5.1 Stock-piling, purchase incidence, and dynamic behavior
5.1.1 Stock-piling and exogenous consumption
5.1.2 Stock-piling and endogenous consumption
5.1.3 Empirical findings with stock-piling models
5.2 The endogeneity of marketing variables
5.2.1 Incorporating the supply side: A structural approach
5.2.2 Incorporating the supply side: A reduced-form approach
5.3 Behavioral economics
5.3.1 The fungibility of income
5.3.2 Social preferences
6 Conclusions
References
2 Inference for marketing decisions
1 Introduction
2 Frameworks for inference
2.1 A brief review of statistical properties of estimators
2.2 Distributional assumptions
2.3 Likelihood and the MLE
2.4 Bayesian approaches
2.4.1 The prior
2.4.2 Bayesian computation
2.5 Inference based on stochastic search vs. gradient-based optimization
2.6 Decision theory
2.6.1 Firms profits as a loss function
2.6.2 Valuation of information sets
2.7 Non-likelihood-based approaches
2.7.1 Method of moments approaches
2.7.2 Ad hoc approaches
2.8 Evaluating models
3 Heterogeneity
3.1 Fixed and random effects
Mixed logit models
3.2 Bayesian approach and hierarchical models
3.2.1 A generic hierarchical approach
3.2.2 Adaptive shrinkage
3.2.3 MCMC schemes
3.2.4 Fixed vs. random effects
3.2.5 First stage priors
Normal prior
Mixture of normals prior
3.2.6 Dirichlet process priors
3.2.7 Discrete first stage priors
3.2.8 Conclusions
3.3 Big data and hierarchical models
3.4 ML and hierarchical models
4 Causal inference and experimentation
4.1 The problem of observational data
4.2 The fundamental problem of causal inference
4.3 Randomized experimentation
4.4 Further limitations of randomized experiments
4.4.1 Compliance in marketing applications of RCTs
4.4.2 The Behrens-Fisher problem
4.5 Other control methods
4.5.1 Propensity scores
4.5.2 Panel data and selection on unobservables
4.5.3 Geographically based controls
4.6 Regression discontinuity designs
4.7 Randomized experimentation vs. control strategies
4.8 Moving beyond average effects
5 Instruments and endogeneity
5.1 The omitted variables interpretation of "endogeneity" bias
5.2 Endogeneity and omitted variable bias
5.3 IV methods
5.3.1 The linear case
5.3.2 Method of moments and 2SLS
5.4 Control functions as a general approach
5.5 Sampling distributions
5.6 Instrument validity
5.7 The weak instruments problem
5.7.1 Linear models
5.7.2 Choice models
5.8 Conclusions regarding the statistical properties of IV estimators
5.9 Endogeneity in models of consumer demand
5.9.1 Price endogeneity
5.9.2 Conclusions regarding price endogeneity
5.10 Advertising, promotion, and other non-price variables
5.11 Model evaluation
6 Conclusions
References
3 Economic foundations of conjoint analysis
1 Introduction
2 Conjoint analysis
2.1 Discrete choices
2.2 Volumetric choices
2.3 Computing expected demand
2.4 Heterogeneity
2.5 Market-level predictions
2.6 Indirect utility function
3 Measures of economic value
3.1 Willingness to pay (WTP)
3.1.1 WTP for discrete choice
3.1.2 WTP for volumetric choice
3.2 Willingness to buy (WTB)
3.2.1 WTB for discrete choice
3.2.2 WTB for volumetric choice
3.3 Economic price premium (EPP)
4 Considerations in conjoint study design
4.1 Demographic and screening questions
4.2 Behavioral correlates
4.3 Establishing representativeness
4.4 Glossary
4.5 Choice tasks
4.6 Timing data
4.7 Sample size
5 Practices that compromise statistical and economic validity
5.1 Statistical validity
5.1.1 Consistency
5.1.2 Using improper procedures to impose constraints on partworths
5.2 Economic validity
5.2.1 Non-economic conjoint specifications
5.2.2 Self-explicated conjoint
5.2.3 Comparing raw part-worths across respondents
5.2.4 Combining conjoint with other data
6 Comparing conjoint and transaction data
6.1 Preference estimates
6.2 Marketplace predictions
6.3 Comparison of willingness-to-pay (WTP)
7 Concluding remarks
Technical appendix: Computing expected demand for volumetric conjoint
References
4 Empirical search and consideration sets
1 Introduction
2 Theoretical framework
2.1 Set-up
2.2 Search method
2.2.1 Simultaneous search
2.2.2 Sequential search
2.2.3 Discussion
3 Early empirical literature
3.1 Consideration set literature
3.1.1 Early 1990s
3.1.2 Late 1990s and 2000s
3.1.3 2010s - present
3.1.4 Identification of unobserved consideration sets
3.2 Consumer search literature
3.2.1 Estimation of search costs for homogeneous products
3.2.2 Estimation of search costs for vertically differentiated products
4 Recent advances: Search and consideration sets
4.1 Searching for prices
4.1.1 Mehta et al. (2003)
4.1.2 Honka (2014)
4.1.3 Discussion
4.1.4 De los Santos et al. (2012)
4.1.5 Discussion
4.1.6 Honka and Chintagunta (2017)
4.2 Searching for match values
4.2.1 Kim et al. (2010) and Kim et al. (2017)
4.2.2 Moraga-González et al. (2018)
4.2.3 Other papers
5 Testing between search methods
5.1 De los Santos et al. (2012)
5.2 Honka and Chintagunta (2017)
6 Current directions
6.1 Search and learning
6.2 Search for multiple attributes
6.3 Advertising and search
6.4 Search and rankings
6.5 Information provision
6.6 Granular search data
6.7 Search duration
6.8 Dynamic search
7 Conclusions
References
5 Digital marketing
1 Reduction in consumer search costs and marketing
1.1 Pricing: Are prices and price dispersion lower online?
1.2 Placement: How do low search costs affect channel relationships?
1.3 Product: How do low search costs affect product assortment?
1.4 Promotion: How do low search costs affect advertising?
2 The replication costs of digital goods is zero
2.1 Pricing: How can non-rival digital goods be priced profitably?
2.2 Placement: How do digital channels - some of which are illegal - affect the ability of information good producers to distribute profitably?
2.3 Product: What are the motivations for providing digital products given their non-excludability?
2.4 Promotion: What is the role of aggregators in promoting digital goods?
3 Lower transportation costs
3.1 Placement: Does channel structure still matter if transportation costs are near zero?
3.2 Product: How do low transportation costs affect product variety?
3.3 Pricing: Does pricing flexibility increase because transportation costs are near zero?
3.4 Promotion: What is the role of location in online promotion?
4 Lower tracking costs
4.1 Promotion: How do low tracking costs affect advertising?
4.2 Pricing: Do lower tracking costs enable novel forms of price discrimination?
4.3 Product: How do markets where the customer's data is the `product' lead to privacy concerns?
4.4 Placement: How do lower tracking costs affect channel management?
5 Reduction in verification costs
5.1 Pricing: How willingness to pay is bolstered by reputation mechanisms
5.2 Product: Is a product's `rating' now an integral product feature?
5.3 Placement: How can channels reduce reputation system failures?
5.4 Promotion: Can verification lead to discrimination in how goods are promoted?
6 Conclusions
References
6 The economics of brands and branding
1 Introduction
2 Brand equity and consumer demand
2.1 Consumer brand equity as a product characteristic
2.2 Brand awareness, consideration, and consumer search
2.2.1 The consumer psychology view on awareness, consideration, and brand choice
Awareness
Consideration
2.2.2 Integrating awareness and consideration into the demand model
2.2.3 An econometric specification
2.2.4 Consideration and brand valuation
3 Consumer brand loyalty
3.1 A general model of brand loyalty
3.2 Evidence of brand choice inertia
3.3 Brand choice inertia, switching costs, and loyalty
3.4 Learning from experience
3.5 Brand advertising goodwill
4 Brand value to firms
4.1 Brands and market structure
4.2 Measuring brand value
4.2.1 Reduced-form approaches using price and revenue premia
4.2.2 Structural models
5 Branding and firm strategy
5.1 Brand as a product characteristic
5.2 Brands and reputation
5.3 Branding as a signal
5.4 Umbrella branding
5.4.1 Empirical evidence
5.4.2 Umbrella branding and reputation
5.4.3 Umbrella branding and product quality signaling
5.5 Brand loyalty and equilibrium pricing
5.6 Brand loyalty and early-mover advantage
6 Conclusions
References
7 Diffusion and pricing over the product life cycle
1 Introduction
Three waves
Implication for the PLC: A new perspective
An agenda for further research
Organization
2 The first wave: Models of new product diffusion as way to capture the PLC
2.1 Models of "external" influence
2.2 Models of "internal" influence
2.3 Bass's model
2.4 What was missing in the first wave?
3 The second wave: Life cycle pricing with diffusion models
3.1 Price paths under separable diffusion specifications
3.2 Price paths under market potential specifications
3.3 Extensions to individual-level models
3.4 Discussion
3.5 What was missing in the second wave?
Competition
Consumer expectations
Open vs. closed-loop strategies
4 The third wave: Life cycle pricing from micro-foundations of dynamic demand
4.1 Dynamic life-cycle pricing problem overview
4.2 Monopoly problem
Consumer's inter-temporal choice problem
Evolution of states
Flow of profits and value function
Equilibrium
Solution
4.3 Oligopoly problem
Consumer's inter-temporal choice problem
Evolution of states
Non-purchasers
Purchasers
Putting both together
Flow of profits and value function
Equilibrium
Solution
4.4 Discussion
Inferring demand and cost parameters
Handling expectations
Discount factors
Large state spaces
4.5 Additional considerations related to durability
4.5.1 Commitments via binding contracts
4.5.2 Availability and deadlines
4.5.3 Second-hand markets
4.5.4 Renting and leasing
4.5.5 Complementary goods and network effects
4.6 Summary
5 Goods with repeat purchase
5.1 Theoretical motivations
5.2 Empirical dynamic pricing
5.2.1 State dependent utility
5.2.2 Storable goods
5.2.3 Consumer learning
5.3 Summary
6 Open areas where more work will be welcome
6.1 Life-cycle pricing while learning an unknown demand curve
6.2 Joint price and advertising over the life-cycle
6.3 Product introduction and exit
6.4 Long term impact of marketing strategies on behavior
6.5 Linking to micro-foundations
References
8 Selling and sales management
1 Selling, marketing, and economics
1.1 Selling and the economy
1.2 What exactly is selling?
1.3 Isn't selling the same as advertising?
1.4 The role of selling in economic models
1.5 What this chapter is and is not
1.6 Organization of the chapter
2 Selling effort
2.1 Characterizing selling effort
2.1.1 Selling effort is a decision variable
2.1.2 Selling effort is unobserved
2.1.3 Selling effort is multidimensional
2.1.4 Selling effort has dynamic implications
2.1.5 Selling effort interacts with other firm decisions
3 Estimating demand using proxies for effort
3.1 Salesforce size as effort
3.1.1 Recruitment as selling: Prospecting for customers
3.1.2 Discussion: Salesforce size and effort
3.2 Calls, visits, and detailing as selling effort
3.2.1 Does detailing work?
3.2.2 How does detailing work?
3.2.3 Is detailing = effort?
4 Models of effort
4.1 Effort and compensation
4.2 Effort and nonlinear contracts
4.3 Structural models
4.3.1 Effort and demand
4.3.2 The supply of effort
4.4 Remarks
5 Selling and marketing
5.1 Product
5.2 Pricing
5.3 Advertising and promotions
6 Topics in salesforce management
6.1 Understanding salespeople
6.2 Organizing the salesforce
6.2.1 Territory decisions
6.2.2 Salesforce structure
6.2.3 Decision rights
6.3 Compensating and motivating the salesforce
6.3.1 Contract elements
6.3.2 Contract shape and form
6.3.3 Dynamics
6.3.4 Other issues
7 Some other thoughts
7.1 Regulation and selling
7.2 Selling in the new world
7.3 Concluding remarks
References
9 How price promotions work: A review of practice and theory
1 Introduction
2 Theories of price promotion
2.1 Macroeconomics
2.2 Price discrimination
2.2.1 Inter-temporal price discrimination
2.2.2 Retail competition and inter-store price discrimination
2.2.3 Manufacturer (brand) competition and inter-brand price discrimination
2.3 Demand uncertainty and price promotions
2.4 Consumer stockpiling of inventory
2.5 Habit formation: Buying on promotion
2.6 Retail market power
2.7 Discussion
3 The practice of price promotion
3.1 Overview of trade promotion process
3.2 Empirical example of trade rates
3.3 Forms of trade spend
3.3.1 Off-invoice allowances
3.3.2 Bill backs
3.3.3 Scan backs
3.3.4 Advertising and display allowances
3.3.5 Markdown funds
3.3.6 Bracket pricing, or volume discounts
3.3.7 Payment terms
3.3.8 Unsaleables allowance
3.3.9 Efficiency programs
3.3.10 Slotting allowances
3.3.11 Rack share
3.3.12 Price protection
3.4 Some implications of trade promotions
3.5 Trade promotion trends
3.6 Planning and tracking: Trade promotion management systems
4 Empirical literature on price promotions
4.1 Empirical research - an update
4.1.1 Promotional pass-through
4.1.2 Long-term effects of promotion
4.1.3 Asymmetric cross-promotional effects
4.1.4 Decomposition of promotional sales
4.1.5 Advertised promotions result in increased store traffic
4.1.6 Trough after the deal
4.2 Empirical research - newer topics
4.2.1 Price promotions and category-demand
4.2.2 Cross-category effects and market baskets
4.2.3 Effectiveness of price promotion with display
4.2.4 Coupon promotions
4.2.5 Stockpiling and the timing of promotions
4.2.6 Search and price promotions
4.2.7 Targeted price promotions
4.3 Macroeconomics and price promotions
4.4 Promotion profitability
5 Getting practical
5.1 Budgets and trade promotion adjustments
5.2 Retailer vs. manufacturer goals and issues
5.3 When decisions happen: Promotion timing and adjustments
5.4 Promoted price: Pass-through
5.5 Durable goods price promotion
5.6 Private label price promotions
5.7 Price pass through
6 Summary
References
10 Marketing and public policy
1 Introduction
2 The impact of academic research on policy
3 Competition policy
3.1 Market definition and structural analysis
3.2 Economic analysis of competitive effects
3.3 A few recent examples
3.3.1 The Aetna-Humana proposed merger
3.3.2 The AT&T-DirecTV merger
3.3.3 Mergers that increase bargaining leverage
3.4 Looking forward
4 Nutrition policy
4.1 Objectives of nutrition policy
4.2 Nutrient taxes
4.2.1 The effects of taxes
4.2.2 Estimating pass-though
4.3 Restrictions to advertising
4.3.1 The mechanisms by which advertising might affect demand
4.3.2 Empirically estimating the impact of advertising
4.4 Labeling
4.5 Looking forward
5 Concluding comments
References
Index
Back Cover

Citation preview

Handbook of the Economics of Marketing, Volume 1 Edited by

Jean-Pierre Dubé Sigmund E. Edelstone Professor of Marketing University of Chicago Booth School of Business and N.B.E.R. Chicago, IL, United States

Peter E. Rossi Anderson School of Management University of California, Los Angeles Los Angeles, CA, United States

North-Holland is an imprint of Elsevier Radarweg 29, PO Box 211, 1000 AE Amsterdam, Netherlands The Boulevard, Langford Lane, Kidlington, Oxford OX5 1GB, United Kingdom Copyright © 2019 Elsevier B.V. All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or any information storage and retrieval system, without permission in writing from the publisher. Details on how to seek permission, further information about the Publisher’s permissions policies and our arrangements with organizations such as the Copyright Clearance Center and the Copyright Licensing Agency, can be found at our website: www.elsevier.com/permissions. This book and the individual contributions contained in it are protected under copyright by the Publisher (other than as may be noted herein). Notices Knowledge and best practice in this field are constantly changing. As new research and experience broaden our understanding, changes in research methods, professional practices, or medical treatment may become necessary. Practitioners and researchers must always rely on their own experience and knowledge in evaluating and using any information, methods, compounds, or experiments described herein. In using such information or methods they should be mindful of their own safety and the safety of others, including parties for whom they have a professional responsibility. To the fullest extent of the law, neither the Publisher nor the authors, contributors, or editors, assume any liability for any injury and/or damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or operation of any methods, products, instructions, or ideas contained in the material herein. Library of Congress Cataloging-in-Publication Data A catalog record for this book is available from the Library of Congress British Library Cataloguing-in-Publication Data A catalogue record for this book is available from the British Library ISBN: 978-0-444-63759-8 For information on all North-Holland publications visit our website at https://www.elsevier.com/books-and-journals Publisher: Zoe Kruze Acquisition Editor: Jason Mitchell Editorial Project Manager: Shellie Bryant Production Project Manager: James Selvam Designer: Greg Harris Typeset by VTeX

Contributors Greg M. Allenby Fisher College of Business, Ohio State University, Columbus, OH, United States Eric T. Anderson Kellogg School of Management, Northwestern University, Evanston, IL, United States Bart J. Bronnenberg Tilburg School of Economics and Management, Tilburg University, Tilburg, The Netherlands CEPR, London, United Kingdom Jean-Pierre Dubé Booth School of Business, University of Chicago, Chicago, IL, United States NBER, Cambridge, MA, United States Edward J. Fox Cox School of Business, Southern Methodist University, Dallas, TX, United States Avi Goldfarb Rotman School of Management, University of Toronto, Toronto, ON, Canada NBER, Cambridge, MA, United States Rachel Griffith Institute for Fiscal Studies and University of Manchester, Manchester, United Kingdom Nino Hardt Fisher College of Business, Ohio State University, Columbus, OH, United States Elisabeth Honka UCLA Anderson School of Management, Los Angeles, CA, United States Ali Hortaçsu University of Chicago, Chicago, IL, United States NBER, Cambridge, MA, United States Sanjog Misra University of Chicago Booth School of Business, Chicago, IL, United States

xiii

xiv

Contributors

Sridhar Moorthy Rotman School of Management, University of Toronto, Toronto, ON, Canada Harikesh S. Nair Stanford Graduate School of Business, Stanford, CA, United States Aviv Nevo University of Pennsylvania, Philadelphia, PA, United States Peter E. Rossi Anderson School of Management, University of California at Los Angeles, Los Angeles, CA, United States Catherine Tucker MIT Sloan School of Management, Cambridge, MA, United States NBER, Cambridge, MA, United States Matthijs Wildenbeest Kelley School of Business, Indiana University, Bloomington, IN, United States

Preface This volume is the first in a new Handbook of the Economics of Marketing series. Quantitative marketing is a much younger field than either economics or statistics. While substantial parts of our understanding of consumer welfare and demand theory were laid out in the late 19th and early 20th centuries, serious inquiries into models of consumer behavior in marketing started only in the late 1960s. However, it was really during the past 25–30 years that the access to remarkably detailed, granular customer and seller-level databases generated a take-off in the quantitative marketing literature. The increasing focus by the fields of empirical industrial organization (I/O) and macroeconomics on several of the key themes in marketing has highlighted the central role of marketing institutions to economic outcomes. The purpose of this handbook is both to chronicle the progress in marketing research as well as to introduce researchers in economics to the role of marketing in our understanding of consumer and firm behavior, and to inform public policy. While marketing and economic researchers share many of the common tools of micro-economics and econometrics, there is a fundamental distinction between the aims of the two disciplines. Most research in economics should be viewed as positive economics, namely the pursuit of explanations for given marketing phenomena such as the determinants of market structure or the pricing equilibrium prevailing is a given market. On the other hand, marketing is primarily concerned with the evaluation of firm polices and, as such, is much more of a normative field. For example, a marketing researcher may use the tools of micro-economics to develop models of consumer behavior (demand) but does impose the restriction that firms, necessarily, behave optimally with respect to a given set of marketing instruments and information. For example, as detailed customer-level data became available, marketing research has focused on how to use these data to develop customized advertising and promotion of products. Marketing researchers are loath to assume that firms behave optimally with respect to the use of a new source of information. In the first chapter of this volume, Dubé considers the micro-foundations of demand with an emphasis on the challenges created by the much richer and more dis-aggregate data that are typically available in marketing applications. Researchers in economics have long fit models of aggregate demand and the modern I/O literature emphasizes consistent models of aggregation of individual demands. In marketing, the demand literature has focused on individual, consumer-level drivers of demand starting with the access to consumption diary panels during the late 1950s. With the advent of detailed household purchase panels during the early 1980s, there was a take-off in microeconometric studies using the recent developments in random utility based choice models. However, dis-aggregate demand data present many challenges in demand modeling that stem from discreteness in this data. A substantial component of the literature seeks to accommodate aspects of discreteness which cannot be

xv

xvi

Preface

accommodated by the multinomial models that have been popular in I/O. These aspects include corner solutions with a mixture of discrete and continuous outcomes and non-mutually exclusive discrete outcomes. Marketing researchers were also the first to document the importance of unobservable consumer heterogeneity and point out that this heterogeneity is pervasive and affects just a subset of variables, typically assumed in the empirical I/O literature. Finally, the marketing literature pioneered the study of dynamic consumer choice behavior and its implications for differences between short-run and long-run elasticities of demand. The demands of a world with a high-volume of disaggregate data on inference is discussed by Allenby and Rossi in Chapter 2. In addition, this chapter considers the demands that a normative orientation imposes on inference. Marketing researchers were early to embrace Bayesian approaches to inference to a degree still not matched in economics. The current strong interest in machine learning methods in economics is a partial endorsement of Bayesian methods, since these highly over-parameterized models are often fit with Bayesian or approximate Bayesian methods because of the superior estimation properties of Bayesian methods. Bayesian methods have made been adopted in marketing primarily because of the practical orientation of marketing researchers. Researchers in marketing are looking for methods that can work well rather than simply debating the value of an inference paradigm in the abstract. While discrete data on demand has proliferated, challenges to valid causal inference have also arisen or become accentuated. The classic example of this is a sponsored search advertisement. Here information regarding preferences or interest in a product category is used to trigger the sponsored search ad. Clearly, observational data suffers from a severe endogeneity bias as advertisers are selecting explicitly on unobservable preferences (typically interest in a product category). This poses a fundamental inference challenge as all optimality or evaluation of advertising polices requires valid causal inference. Experimentation can provide one approach to obtaining valid causal estimates but comes with other limitations and challenges as discussed in the chapter. Most economists take the point of view that revealed preference data is to be preferred to stated preference data. While certainly a reasonable point of view, it is somewhat narrow. If we can approximate the purchase environment in a survey context, it is possible to generate data that may rise to the value of revealed preference data. In addition, it is often not possible to find revealed preference data that are sufficiently informative to estimate preferences. For example, many observational datasets have very limited price variation and are subject to legitimate concerns about the endogeneity of marketing variables, such as prices. Clearly, prospective analyses of new products or new product features lack revealed preference data. Unique to marketing is the survey method of preference measurement called conjoint analysis. In Chapter 3, Allenby, Rossi, and Hardt discuss the economic foundations of conjoint analysis as well as the extension to consider both discrete and continuous outcomes. In much of the economics and marketing literatures, demand is formulated under some sort of full information assumption. A classic example is the assumption in nearly all demand applications that prices are known with certainty. If, instead,

Preface

consumers undertake costly search for price or quality information, then these search frictions must be accommodated in the demand model. Here economics and marketing intersect very closely as marketers have always recognized that consumers make choices based on their “consideration sets,” which typically include only a small subset of the available products in any given market or product category. While theoretical implications of sequential and simultaneous search have been worked out in the economics literature, the empirical search literature has only recently taken flight due to a lack of data on the search process. In the online context, the data problem has been removed as browsing data gives us our first comprehensive measure of search. In the off-line context, data are still hard to come by. In Chapter 4, Honka, Hortaçsu, and Wildenbeest discuss the recent developments in this important area of mutual interest for both marketing and economics empirical research. In many markets, digital media and technologies have reduced the cost of search dramatically. For example, consumers can obtain a wealth of information regarding car purchases from internet sources without visiting any dealer. In parallel, digital technologies threaten the value of stores as a logistical intermediary, serving as a fundamental source of change in many industries. Goldfarb and Tucker discuss these trends and provide insight as to their implications for marketing practice and research. In addition, digital media have fundamentally changed the way in which much advertising is conducted. Advertisers now have the ability to trigger ads as well as to track, at individual level, the response to these ads. This opens many new possibilities for marketing policy. Much of modern I/O has concentrated on the economic mechanisms that sustain high levels of industry concentration and supra-normal economic oligopoly profits. One of the leading puzzles in this literature has been the persistence of concentration and dominance in markets where the leading products are differentiated primarily by their brands. Surprisingly, even the literature on pure characteristics models has typically ignored the important role of brands and branding as a source of product differentiation. In Chapter 6, Bronnenberg, Dubé, and Moorthy discuss branding which can be viewed as one of the more important sources of product differentiation. Marketers have long recognized the importance of brands and considered various mechanisms through which brand preferences are developed on the demand side. The primary source of economic value to many firms is their brand equity and, accordingly, the authors also consider the returns on branding to firm value on the supply side. At least since the 1960s, quantitative marketing has been interested in the lifecycle regularities of products as they diffuse from launch to maturity. The diffusion typically focuses on the dynamics of consumer adoption. While the early literature worked mostly with descriptive models, the recent literature has adopted a structural approach that starts with microeconomic foundations to analyze the consumer-driven mechanisms that shape a product’s diffusion over time. In Chapter 7, Nair considers the intersection of the economics and marketing literatures regarding the dynamics of consumer adoption decisions, on the demand side. The chapter also studies the corresponding supply-side dynamics of firms’ marketing decisions (e.g., entry, pricing, advertising) to control the diffusion of their products over time. Here the combination

xvii

xviii

Preface

of data and empirical methods for the estimation of dynamic models has provided a new empirical companion to the theoretical results in the economics literature. In many important markets, the primary interface between the firm and the customer is the salesforce. The activities of the salesforce represent the single largest marketing expenditure for many firms. In markets for drugs and medical devices, these activities have come under public scrutiny as potentially wasteful or even distorting of demand. The role of the salesforce is considered in Chapter 8. Misra reviews the contracting problem which is the primary contribution of the economics literature to this area. Recent empirical work in sales force compensation has emphasized dynamic considerations not present in the classical economics contracting literature. Over the last few decades, US manufacturers have increasingly allocated more of their marketing budgets away from traditional advertising toward price promotions, typically paid to downstream trade partners who handle the re-sale to end-user consumers. In consumer goods alone, annual trade promotion spending is estimated to be over $500 billion. Trade promotion funds are intended to induce temporary discounts on shelf prices, or sales, that represent one of the key sources of price variation over time in many consumer goods markets. Surprisingly, many theoretical models of pricing ignore the institutional structure of the distribution channel that leads to temporary, promotional price cuts, thereby ignoring a key source of price changes facing consumers. In Chapter 9, Anderson and Fox offer a broad overview of the managerial and institutional trade promotion practices of manufacturers and retailers. They also survey the large literature on pricing, illustrating the gaps in the theory and the opportunities for future research. The discussion ties together the key factors driving promotions, including the vertical channel itself, price discrimination, vertical and horizontal competition in the channel, and consumer’s ability to stockpile goods. Marketing considerations have become vital in many analyses of public policy issues. For example, there are debates about various public policy initiatives to promote the consumption of healthier foods. One possibility endorsed by many public policy makers is to provide information on the nutritional content, especially of processed and fast foods. Another method advocated by some is to impose “vice” taxes on food products which are deemed undesirable. In Chapter 10, Nevo and Griffith point out that the evaluation of these policies involve demand modeling and consideration of the response of firms to these policies. Nutrition policy is only one example of policy evaluation where marketing considerations are important. Analysis of mergers is now based on models of demand for differentiated products pioneered in both the economics and marketing literatures. Many other public policy questions hinge on the promotion, sales, pricing, and advertising decisions of firms and, therefore, provide a strong motivation for continued progress in research at the intersection of economics and marketing. Jean-Pierre Dubé Sigmund E. Edelstone Professor of Marketing University of Chicago Booth School of Business and N.B.E.R. Chicago, IL, United States

Preface

Peter E. Rossi Anderson School of Management University of California, Los Angeles Los Angeles, CA, United States

xix

CHAPTER

1

Microeconometric models of consumer demand✩

Jean-Pierre Dubéa,b a Booth

School of Business, University of Chicago, Chicago, IL, United States b NBER, Cambridge, MA, United States e-mail address: [email protected]

Contents 1 Introduction ...................................................................................... 2 Empirical regularities in shopping behavior: The CPG laboratory....................... 3 The neoclassical derivation of an empirical model of individual consumer demand 3.1 The neoclassical model of demand with binding, non-negativity constraints........................................................................... 3.1.1 Estimation challenges with the neoclassical model ........................ 3.1.2 Example: Quadratic utility....................................................... 3.1.3 Example: Linear expenditure system (LES).................................. 3.1.4 Example: Translated CES utility ................................................ 3.1.5 Virtual prices and the dual approach ......................................... 3.1.6 Example: Indirect translog utility............................................... 3.2 The discrete/continuous product choice restriction in the neoclassical model................................................................................. 3.2.1 The primal problem .............................................................. 3.2.2 Example: Translated CES utility ................................................ 3.2.3 Example: The dual problem with indirect translog utility ................. 3.2.4 Promotion response: Empirical findings using the discrete/continuous demand model .......................................... 3.3 Indivisibility and the pure discrete choice restriction in the neoclassical model................................................................................. 3.3.1 A neoclassical derivation of the pure discrete choice model of demand............................................................................. 3.3.2 The standard pure discrete choice model of demand .................... 4 Some extensions to the typical neoclassical specifications ............................. 4.1 Income effects ...................................................................... 4.1.1 A non-homothetic discrete choice model.................................... 4.2 Complementary goods .............................................................

2 6 8 8 11 12 14 15 17 19 20 20 22 23 24 25 25 28 31 31 32 33

✩ Dubé acknowledges the support of the Kilts Center for Marketing and the Charles E. Merrill faculty

research fund for research support. I would like to thank Greg Allenby, Shirsho Biswas, Oeystein Daljord, Stefan Hoderlein, Joonhwi Joo, Kyeongbae Kim, Yewon Kim, Nitin Mehta, Olivia Natan, Peter E. Rossi, and Robert Sanders for helpful comments and suggestions. Handbook of the Economics of Marketing, Volume 1, ISSN 2452-2619, https://doi.org/10.1016/bs.hem.2019.04.001 Copyright © 2019 Elsevier B.V. All rights reserved.

1

2

CHAPTER 1 Microeconometric models of consumer demand

4.2.1 Complementarity between products within a commodity group ........ 4.2.2 Complementarity between commodity groups (multi-category models) 4.3 Discrete package sizes and non-linear pricing ................................. 4.3.1 Expand the choice set ........................................................... 4.3.2 Models of pack size choice ..................................................... 5 Moving beyond the basic neoclassical framework ........................................ 5.1 Stock-piling, purchase incidence, and dynamic behavior .................... 5.1.1 Stock-piling and exogenous consumption ................................... 5.1.2 Stock-piling and endogenous consumption ................................. 5.1.3 Empirical findings with stock-piling models ................................. 5.2 The endogeneity of marketing variables......................................... 5.2.1 Incorporating the supply side: A structural approach ..................... 5.2.2 Incorporating the supply side: A reduced-form approach................ 5.3 Behavioral economics.............................................................. 5.3.1 The fungibility of income ........................................................ 5.3.2 Social preferences................................................................ 6 Conclusions ...................................................................................... References............................................................................................

35 37 41 42 43 44 44 45 47 50 50 53 55 57 57 58 61 62

1 Introduction A long literature in quantitative marketing has used the structural form of microeconometric models of demand to analyze consumer-level purchase data and conduct inference on consumer behavior. These models have played a central role in the study of some of the key marketing questions, including the measurement of brand preferences and consumer tastes for variety, the quantification of promotional response, the analysis of the launch of new products and the design of targeted marketing strategies. In sum, the structural form of the model is critical for measuring unobserved economic aspects of consumer preferences and for simulating counter-factual marketing policies that are not observed in the data (e.g. demand for a new product that has yet to be launched). The empirical analysis of consumer behavior is perhaps one of the key areas of overlap between the economics and marketing literatures. The application of empirical models of aggregate demand using restrictions from microeconomics dates back at least since the mid-20th century (e.g., Stone, 1954). Demand estimation plays a central role in marketing decision-making. Marketing-mix models, or models of demand that account for the causal effects of marketing decision variables, such as price, promotions, and other marketing tools, are fundamental for the quantification of different marketing decisions. Examples include the measurement of market power, the measurement of sales-response to advertising, the analysis of new product introductions, and the measurement of consumer welfare, just as a few examples. Historically, the data used for demand estimation typically consisted of marketlevel, aggregate sales quantities under different marketing conditions. In the digital age, the access to transaction-level data at the point of sale has become nearly ubiqui-

1 Introduction

tous. In many settings, firms can now assemble detailed, longitudinal databases tracking individual customers’ purchase behavior over time and across channels. Simply aggregating these data for the purposes of applying traditional aggregate demand estimation techniques creates several problems. First, aggregation destroys potentially valuable information about customer behavior. Besides the loss of potential statistical efficiency, aggregation eliminates the potential for individualized demand analysis. Innovative selling technologies have facilitated a more segmented and even individualized approach to marketing, requiring a more intimate understanding of the differences in demand behavior between customer segments or even between individuals. Second, aggregating individual demands across customers facing heterogeneous marketing conditions can create biases that could have adverse effects on marketing decision-making (e.g., Gupta et al., 1996).1 Our discussion herein focuses on microeconometric models of demand designed for the analysis of individual consumer-level data. The microeconomic foundations of a demand model allow the analyst to assign a structural interpretation to the model’s parameters, which can be beneficial for assessing “consumer value creation” and for conducting counter-factual analyses. In addition, as we discuss herein, the crossequation restrictions derived from consumer theory can facilitate more parsimonious empirical specifications of demand. Finally, the structural foundation of the econometric uncertainty as a model primitive provides a direct correspondence between the likelihood function and the underlying microeconomic theory. Some of the earliest applications of microeconometric models to marketing data analyzed the decomposition of consumer responses to temporary promotions at the point of purchase (e.g., Guadagni and Little, 1983; Chiang, 1991; Chintagunta, 1993). Of particular interest was the relative extent to which temporary price discounts caused consumers to switch brands, increase consumption, or strategically forward-buy to stockpile during periods of low prices. The microeconometric approach provided a parsimonious, integrated framework to with which understand the inter-relationship between these decisions and consumer preferences. At the same time, the cross-equation restrictions from consumer theory can reduce the degrees-offreedom in an underlying statistical model used to predict these various components of demand. Most of the foundational work derives from the consumption literature (see Deaton and Muellbauer, 1980b, for an extensive overview). The consumption literature often emphasizes the use of cost functions and duality concepts to simplify the implementation of the restrictions from economic theory. In this survey, we mostly focus on the more familiar direct utility maximization problem. The use of a parametric utility function facilitates the application of demand estimates to broader topics than the analysis of price and income effects, such as product quality choice, consumption indivisibilities, product positioning, and product design.

1 Blundell et al. (1993) find that models fit to aggregate data generate systematic biases relative to models

fit to household-level data, especially in the measurement of income effects.

3

4

CHAPTER 1 Microeconometric models of consumer demand

In addition, our discussion focuses on a very granular, product-level analysis within a product category.2 Unlike the macro-consumption literature, which focuses on budget allocations across broad commodity groups like food, leisure, and transportation, we focus on consumer’s brand choices within a narrow commodity group, such as the brand variants and quantities of specific laundry detergents or breakfast cereals purchased on a given shopping trip. The role of brands and branding have been shown to be central to the formation of industrial market structure (e.g., Bronnenberg et al., 2005; Bronnenberg and Dubé, 2017). To highlight some of the differences between a broad commodity-group focus versus a granular brand-level focus, we begin the chapter with a short descriptive exercise laying out several key stylized facts for households’ shopping behavior in consumer packaged goods (hereafter CPG) product categories using the Nielsen-Kilts Homescan database. We find that the typical consumer goods category offers a wide array of differentiated product alternatives available for sale to consumers, often at different prices and under different marketing conditions. Therefore, consumer behavior involves a complex trade-off between the prices of different goods and their respective perceived qualities. Moreover, an individual household typically purchases only a limited scope of the variety available. This purchase behavior leads to the well-known “corner solutions” problem whereby expenditure on most goods is typically zero. Therefore, a satisfactory microeconometric model needs to be able to accommodate a demand system over a wide array of differentiated offerings and a high incidence of corner solutions. The remainder of the chapter surveys the neoclassical, microeconomic foundations of models of individual demand that allow for corner solutions.3 From an econometric perspective, non-purchase behavior contains valuable information about consumers’ preferences and the application of econometric models that impose strictly interior solutions would likely produce biased and inconsistent estimates of demand – a selection bias. However, models with corner solutions introduce a number of complicated computational challenges, including high-dimensional integration over truncated distributions and the evaluation of potentially complicated Jacobian matrices. The challenges associated with corner solutions have been recognized at least since Houthaker (1953) and Houthakker (1961) who discuss them as a special case of quantity rationing. We also discuss the role of discreteness both in the brand variants and quantities purchased. In particular, we explore the relationship between the popular discrete choice models of demand (e.g., logit) and the more general neoclassical models.

2 We refer readers interested in a discussion of aggregation and the derivation of models designed for estimation with market-level data to the surveys by Deaton and Muellbauer (1980b), Nevo (2011), and Pakes (2014). 3 See also the following for surveys of microeconometric models: Nair and Chintagunta (2011) for marketing, Phaneuf and Smith (2005) for environmental economics, and Deaton and Muellbauer (1980b) for the consumption literature.

1 Introduction

In a follow-up section, we discuss several important extensions of the empirical specifications used in practice. We discuss the role of income effects. For analytic tractability, many popular specifications impose homotheticity and quasi-linearity conditions that limit or eliminate income effects. We discuss non-homothetic versions of the classic discrete choice models that allow for more realistic asymmetric substitution patterns between vertically-differentiated goods. Another common restriction used in the literature is additive separability both across commodity groups and across the specific products available within a commodity group. This additivity implies that all products are gross substitutes, eliminating any scope for complementarity across goods. We discuss recent research that has analyzed settings with complementary goods. In many consumer goods categories, firms use complex non-linear pricing strategies that restrict the quantities a consumer can purchase to a small set of pre-packaged commodity bundles. We do not discuss the price discrimination itself, focusing instead on the indivisibility the commodity bundling imposes on demand behavior. In the final section of the survey, we discuss several important departures from the standard neoclassical framework. While most of the literature has focused on static models of brand choices, the timing of purchases can play an important role in understanding the impact of price promotions on demand. We discuss dynamic extensions that allow consumers to stock-pile storable goods based on their price expectations. The accommodation of purchase timing can lead to very different inferences about the price elasticity of demand. We also discuss the potential role of the supply side of the market and the resulting endogeneity biases associated with the strategic manner in which point-of-purchase marketing variables are determined by firms. Most of the literature on microeconometric models of demand has ignored such potential endogeneity in marketing variables. Finally, we address the emerging area of structural models of behavioral economics that challenge some of the basic elements of the neoclassical framework. We discuss recent evidence of mental accounting in the income effect that creates a surprising non-fungibility across different sources of purchasing power. We also discuss the role of social preferences and models of consumer-response to cause marketing campaigns. Several important additional extensions are covered in later chapters of this volume, including the role of consumer search, information acquisition and the formation of consideration sets (Chapter 4), the role of brands and branding (Chapter 6), and the role of durable goods and the timing of consumer adoption throughout the product life cycle (Chapter 7). Perhaps the most crucial omission herein is the discussion of taste heterogeneity, which is covered in depth in Chapter 2 of this volume. Consumer heterogeneity plays a central role in the literature on targeted marketing.

5

6

CHAPTER 1 Microeconometric models of consumer demand

2 Empirical regularities in shopping behavior: The CPG laboratory In this section, we document broad patterns of purchase behavior across US households in the consumer packaged goods (CPG) industry. We will use these shopping patterns in Section 3 as the basis for deriving a microeconometric demand estimation framework derived from neoclassical consumer theory. The CPG industry represents a valuable laboratory in which to study consumer behavior. CPG brands are widely available across store formats including grocery stores, supermarkets, discount and club stores, drug stores, and convenience stores. They are also purchased at a relatively high frequency. The average US household consumer conducted 1.7 grocery trips per week in 2017.4 Most importantly, CPG spending represents an sizable portion of household budgets. In 2014, the global CPG sector was valued at $8 trillion and was predicted to grow to $14 trillion by 2025.5 In 2016, US households spent $407 billion on CPGs.6 A long literature in brand choice has used household panel data in CPG categories not only due to the economic relevance, but also due to the high quality of the data. CPG categories exhibit high-frequency price promotions that can be exploited for demand estimation purposes. We use the Nielsen Homescan panel housed by the Kilts Center for Marketing at the University of Chicago Booth School of Business to document CPG purchase patterns. The Homescan panelists are nationally representative.7 The database tracks purchases in 1,011 CPG product categories (denoted by Nielsen’s product modules codes) for over 132,000 households between 2004 and 2012, representing over 88 million shopping trips. Nielsen classifies product categories using module codes. Examples of product modules include Carbonated Soft Drinks, Ready-to-Eat Cereals, Laundry Detergents, and Tooth Paste. We retain the 2012 transaction data to document several empirical regularities in shopping behavior. In 2012, we observe 52,093 households making over 6.57 million shopping trips during which they purchase over 46 million products. The typical CPG category offers a wide amount of variety to consumers. Focusing only on the products actually purchased by Homescan panelists in 2012, the average category offers 402.8 unique products as indexed by a universal product code (UPC) 4 Source: “Consumers’ weekly grocery shopping trips in the United States from 2006 to 2017,” Statista, 2017, accessed at https://www.statista.com/statistics/251728/weekly-number-of-us-grocery-shoppingtrips-per-household/ on 11/13/2017. 5 Source: “Three myths about growth in consumer packaged goods,” by Rogerio Hirose, Renata Maia, Anne Martinez, and Alexander Thiel, McKinsey, June 2015, accessed at https://www.mckinsey.com/ industries/consumer-packaged-goods/our-insights/three-myths-about-growth-in-consumer-packagedgoods on 11/13/2017. 6 Source: “Consumer packaged goods (CPG) expenditure of U.S. consumers from 2014 to 2020,” Statista, 2017, accessed at https://www.statista.com/statistics/318087/consumer-packaged-goods-spending-of-usconsumers/ on 11/13/2017. 7 See Einav et al. (2010) for a validation study of the Homescan data.

2 Empirical regularities in shopping behavior: The CPG laboratory

and 64.4 unique brands. For instance, a brand might be any UPC coded product with the brand name Coca Cola, whereas a UPC might be a 6-pack of 12-oz cans of Coca Cola. While the subset of available brands and sizes varies across stores and regions, these numbers reveal the extent of variety available to consumers. In addition, CPG products are sold in pre-packaged, indivisible pack sizes. The average category offers 31.9 different pack size choices. The average brand within a category is sold in 5.4 different pack sizes. Therefore, consumers face an interesting indivisibility constraint, especially if they are determined to buy a specific brand variant. Moreover, CPG firms’ widespread pre-commitment to specific sizes is suggestive of extensive use of non-linear pricing. For the average category, we observe 39,787 trips involving at least one purchase. Households purchase a single brand and a single pack during 94.3% and 67.3% of the category-trip combinations, respectively. On average, households purchase 1.07 brands per category-trip. In sum, the discrete brand choice assumption, and to a lesser extent the discrete quantity choice assumption, seems broadly appropriate across trips at the category level. However, we do observe categories in which the contemporaneous purchase of assortments is more commonplace and many of these categories are economically large. In the Ready-to-Eat Cereals category, which ranks third overall in total household expenditures among all CPG categories, consumers purchase a single brand during only 72.6% of trips. Similarly, for Carbonated Soft Drinks and Refrigerated Yogurt, which rank fourth and tenth overall respectively, consumers purchase a single brand during only 81.5% and 86.6% of trips respectively. Therefore, case studies of some of the largest CPG categories may need to consider demand models that allow for the purchase of variety, even though only a small number of the variants is chosen on any given trip. Similarly, we observe many categories where consumers occasionally purchase multiple packs of a product, even when only a single brand is chosen. In these cases, a demand model that accounts for the intensive margin of quantity purchased may be necessary. We also observe brand switching across time within a category, especially in some of the larger categories. For instance, during the course of the year, households purchased 7.5 brands of Ready-to-Eat Cereals (ranked 3rd), 5.9 brands of Cookies (ranked 11th), 4.7 brands of Bakery Bread (ranked 7th), and 4.6 brands of Carbonated Soft Drinks (ranked 4th). In many of the categories with more than an average of 3 brands purchased per household-year, we typically observe only one brand being chosen during an individual trip. In summary, a snapshot of a single year of CPG shopping behavior by a representative sample of consumers indicates some striking patterns. In spite of the availability of a large amount of variety, an individual consumer purchases only a very small number of variants during the course of a year, let alone on any given trip. From a modeling perspective, we observe a high incidence of corner solutions. In some of the largest product categories, consumers routinely purchase assortments, leading to complex patterns of corner solutions. In most categories, the corner solutions degenerate to a pure discrete choice scenario where a single unit of a single product is purchased. In these cases, the standard discrete choice models may be sufficient.

7

8

CHAPTER 1 Microeconometric models of consumer demand

However, the single unit is typically one of several pre-determined pack sizes available suggesting an important role for indivisibility on the demand side, and non-linear pricing on the supply side.

3 The neoclassical derivation of an empirical model of individual consumer demand The empirical regularities in Section 2 show that household-level demand for consumer goods exhibit a high incidence of corner solutions: purchase occasions with zero expenditure on most items in the choice set. The methods developed in the traditional literature on demand estimation (e.g., Deaton and Muellbauer, 1980b) do not accommodate zero consumption. In this section, we review the formulation of the neoclassical consumer demand problem and the corresponding challenges with the accommodation of corner solutions into an empirical framework. Our theoretical starting point is the usual static model of utility maximization whereby the consumer spends a fixed budget on a set of competing goods. Utility theory plays a particularly crucial role in accommodating the empirical prominence of corner solutions in individual-level data.

3.1 The neoclassical model of demand with binding, non-negativity constraints We start with the premise that the analyst has access to marketing data comprising individual-level transactions. The analyst’s data include the exact vector of quantities purchased by a customer on a given shopping trip, xˆ = xˆ1 , ..., xˆJ +1 . An individual transaction database typically has a panel format with time-series observations (trips) for a cross section of customers. We assume that the point-of-purchase causal environment consists of prices, but the database could also include other marketing promotional variables. Our objective consists of deriving a likelihood for this observed vector of purchases from microeconomic primitives. Suppose WLOG that the consumer does not consume the first l goods: xˆj = 0 (j = 1, ..., l), and xˆj > 0 (j = l + 1, ..., J + 1). We use the neoclassical approach to deriving consumer demand from the assumption that each consumer maximizes a utility function U (x; θ, ε) defined over the quantity of goods consumed, x = (x1 , ..., xJ +1 ) . Since most marketing studies focus on demand behavior within a specific “product category,” we adopt the terminology of Becker (1965) and distinguish between the “commodity” (e.g., the consumption benefit of the category, such as laundry detergent), and the j = 1, ..., J “market goods” to which we will refer as “products” (e.g., the various brands sold within the product category, such as Tide and Wisk laundry detergents). The quantities in x are non-negative (xj ≥ 0 ∀j ) and satisfy the consumer’s budget constraint x p ≤ y, where p = (p1 , ..., pJ +1 ) is a vector of strictly positive prices and y is the consumer’s budget. The vector θ consists of unknown (to the researcher) pa-

3 The neoclassical derivation of an empirical model

rameters describing the consumer’s underlying preferences and the vector ε captures unobserved (to the researcher), mean-zero, consumer-specific utility disturbances.8 Typically ε is assumed to be known to the consumer prior to decision-making. Formally, the utility maximization problem can be written as follows V (p, y; θ, ε) ≡ max

x∈RJ +1

U (x; θ, ε) : x p ≤ y, x ≥ 0

(1)

where we assume U (•; θ, ε) is a continuously-differentiable, quasi-concave, and 9 We can define the corresponding Lagrangian function L = increasing function. U (x; θ, ε) + λy y − p x + λx x where λy and the vector λx are Lagrange multipliers for the budget and non-negativity constraints respectively. A solution to (1) exists as long as the following necessary and sufficient KarushKuhn-Tucker (KKT) conditions hold ∂U x∗ ;θ,ε ∂xj

− λy pj + λx,j = 0,

j = 1, ..., J + 1

y − p x∗ = 0, y − p x∗ λy = 0, λy > 0 xj∗ ≥ 0, xj∗ λx,j = 0, λx,j ≥ 0

(2) j = 1, ..., J + 1.

Since U (•; θ, ε) is increasing, the consumer spends her entire budget (the “addingup” condition) and at least one good will always be consumed. We define the J + 1 good as an “essential” numeraire with corresponding price pJ +1 = 1 and with preferences that are separable from those over the commodity group.10 We assume additional regularity conditions on U (•; θ, ε) to ensure that an interior quantity of J + 1 ∂U x∗ ;θ,ε

is always consumed: ∂xJ +1 = λy and λx,J +1 = 0. Therefore, the model can accommodate the case where only the outside good is purchased and none of the inside goods are chosen. We can now re-write the KKT conditions as follows ∂U x∗ ;θ,ε ∂xj

−

∂U x∗ ;θ,ε ∂xJ +1 pj

+ λx,j = 0,

j = 1, ..., J

y − p x∗ = 0 xj∗ ≥ 0, xj∗ λx,j = 0, λx,j ≥ 0

(3) j = 1, ..., J.

8 It is straightforward to allow for additional persistent, unobserved taste heterogeneity by indexing the parameters themselves by consumer (see Chapter 2 of this volume). 9 These sufficient conditions ensure the existence of a demand function with a unique consumption level that maximizes utility at a given set of prices (e.g., Mas-Collel et al., 1995, Chapter 3). 10 The essential numeraire is typically interpreted as expenditures outside of the commodity group(s) of interest.

9

10

CHAPTER 1 Microeconometric models of consumer demand

Demand estimation consists of devising an estimator for the parameters θ based on the solution to the system (3), x∗ (p, y; θ, ε). For our observed consumer, recall that xˆj = 0 (j = 1, ..., l) and xˆj > 0 (j = l + 1, ..., J + 1). We can now re-write the KKT conditions to account for the corner solutions (i.e., non-consumption) ∂U x∗ ;θ,ε ∂xj ∂U x∗ ;θ,ε ∂xj

−

∂U x∗ ;θ,ε ∂xJ +1 pj

−

∂U x∗ ;θ,ε ∂xJ +1 pj

≤ 0,

j = 1, ..., l (4)

= 0,

j = l + 1, ..., J

It is instructive to consider how the KKT conditions (4) influence demand estimation. The l + 1 to J equality conditions in (4) implicitly characterize the conditional demand equations for the purchased goods. The l inequality conditions in (4) give rise to the following demand regime-switching conditions, or “selection” conditions ∂U x∗ ;θ,ε ∂xj ∂U (x∗ ;θ,ε) ∂xJ +1

≤ pj , j = 1, ..., l

(5)

which determine whether a given product’s prices are above the consumer’s reser vation value,

∂U x∗ ;θ,ε ∂x j ∂U x∗ ;θ,ε ∂xJ +1

(see Lee and Pitt, 1986; Ransom, 1987, for a discussion of

the switching regression interpretation). We can now see how dropping the observations with zero consumption will likely result in selection bias due to the correlation between the switching probabilities and the utility shocks, ε. To complete the model, we need to allow for some separability of the utility disturbances. instance, we can assume an additive, stochastic log-marginal utility: ∗ For ∂U x ;θ,ε ln = ln U¯ j (x∗ ; θ ) + εj for each j , where U¯ j (x∗ ; θ ) is deterministic. ∂xj

We also assume that ε are random variables with known distribution and density, Fε (ε) and fε (ε), respectively. We can now write the KKT conditions more compactly: ε˜ j ≡ εj − εJ +1 ≤ hj (x∗ ; θ ) ,

j = 1, ..., l

(6) ε˜ j ≡ εj − εJ +1 = hj (x∗ ; θ ) , j = l + 1, ..., J where hj (x∗ ; θ ) = ln U¯ J +1 (x∗ ; θ ) − ln U¯ j (x∗ ; θ ) + ln pj . We can now derive the likelihood function associated with the observed consumption vector, xˆ . In the case where all the goods are consumed, then the density of xˆ is fx xˆ ; θ = fε˜ (˜ε ) |J xˆ |

(7)

3 The neoclassical derivation of an empirical model

where J xˆ is the Jacobian of the transformation from ε˜ to x. If only the J + 1 numeraire good is consumed, the density of xˆ = (0, ..., 0) is

fx xˆ ; θ =

hJ xˆ ;θ

−∞

···

h1 xˆ ;θ −∞

fε (ε˜ ) d ε˜ 1 · · · d ε˜ J .

(8)

For the more general case in which the first l goods are not consumed, the density of xˆ = 0, ..., 0, xˆ l+1 , ..., xˆ J is

hl xˆ ;θ

fx xˆ ; θ =

h1 xˆ ;θ

··· −∞

fε ε˜ 1 , ..., ε˜ l , hl+1 xˆ ; θ , ..., hJ xˆ ; θ |J xˆ |d ε˜ 1 · · · d ε˜ l

−∞

(9) where J xˆ is the Jacobian of the transformation from ε˜ to (xl+1 , ..., xJ ) when (x1 , ..., xl ) = 0. Suppose the researcher has a data sample with i = 1, ..., N independent consumer purchase observations. The sample likelihood is N fx xˆ i . L θ |ˆx =

(10)

i=1

A maximum likelihood estimate of θ based on (10) is consistent and asymptotically efficient.

3.1.1 Estimation challenges with the neoclassical model van Soest et al. (1993) have shown that the choice of functional form to approximate utility, U (x), can influence consistency of the maximum likelihood estimator based on (10). In particular, the KKT conditions in (2) generate a unique vector x∗ (p, y; θ, ε) at given (p, y) for all possible θ and ε as long as U (x) is monotonic and strictly quasi-concave. When these conditions fail to hold, the system of KKT conditions (2) may not generate a unique solution, x∗ (p, y; θ, ε). This non-uniqueness leads to the well-known coherency problem with maximum likelihood estimation (Heckman, 1978),11 which can lead to inconsistent estimates. Note that the term coherency is used slightly differently in the more recent literature on empirical games with multiple equilibria. Tamer (2003) uses the term coherency in reference to the sufficient conditions for the existence of a solution x∗ (p, y; θ, ε) to the model (in this case x∗ satisfies the KKT conditions). He uses the term model completeness in reference to the case where these sufficient conditions for the statistical model to have 11 Coherency pertains to the case where there is a unique vector x∗ generated by the KKT conditions

corresponding to each possible value of ε, and there is a unique value of ε that generates each possible vector x∗ generated by the KKT conditions.

11

12

CHAPTER 1 Microeconometric models of consumer demand

a well-defined likelihood. For our neoclassical model of demand, the econometric model would be termed “incomplete” if demand was a correspondence and, hence, there were multiple values of x∗ that satisfy the KKT conditions at a given (p, y; θ, ε). van Soest et al. (1993) propose a set of parameter restrictions that are sufficient for coherency. For many specifications, these conditions will only ensure that the regularity of U (x) holds over the set of prices and quantities observed in the data. While these conditions may suffice for estimation, failure of the global regularity condition could be problematic for policy simulations that use the demand estimates to predict outcomes outside the range of observed values in the sample. For many specifications, the parameter restrictions may not have an analytic form, and may require numerical tools to impose them. As we will see in the examples below, the literature has often relied on special functional forms, with properties like additivity and homotheticity, to ensure global regularity and to satisfy the coherency conditions. However, these specifications come at the cost of less flexible substitution patterns. In addition to coherency concerns, maximum likelihood estimation based on Eq. (9) also involves several computational challenges. If the system of KKT conditions does not generate a closed-form expression for the conditional demand equations, it may be difficult to impose coherency conditions. In addition, the likelihood comprises a density component for the goods with non-zero consumption and a mass component for the corners at which some of the goods have an optimal demand of zero. The mass component in (9) requires evaluating an l-dimensional integral over a region defined implicitly by the solution to the KKT conditions (4). When consumers purchase l of the alternatives, there are Jl potential shopping baskets, and each of the observed combinations would need to be solved. The likelihood also involves two change-of-variables from ε to ε˜ and from ε˜ to xˆ respectively, requiring the computation of a Jacobian matrix. Estimation methods are beyond the scope of this discussion. However, a number of papers have proposed methods to accommodate several of the computational challenges above including simulated maximum likelihood (Kao et al., 2001), hierarchical Bayesian algorithms that use MCMC methods based on Gibbs sampling (Millimet and Tchernis, 2008), hybrid methods that combine Gibbs sampling with Metropolis-Hastings (Kim et al., 2002), and GMM estimation (Thomassen et al., 2017). In the remainder of this section, we discuss several examples of functional forms for U (x) that have been implemented in practice.

3.1.2 Example: Quadratic utility Due to its tractability, the quadratic utility, U (x; θ, ε) =

J

+1 j =1

J +1 J +1 1

βj 0 + εj xj + βj k xj xk 2

(11)

j =1 k=1

has been a popular functional form for empirical work (e.g., Wales and Woodland, 1983; Ransom, 1987; Lambrecht et al., 2007; Mehta, 2015; Yao et al., 2012;

3 The neoclassical derivation of an empirical model

Thomassen et al., 2017).12 The random utility shocks in (11) are “random coefficients” capturing heterogeneity across consumers in the linear utility components over the various products. Assume WLOG that the consumer foregoes consumption on goods xj = 0 (j = 1, ..., l), and chooses a positive quantity for goods xj > 0 (j = l + 1, ..., J + 1). The corresponding KKT conditions are ε˜ j + βj 0 + ε˜ j + βj 0 +

J +1

+1 βj k xj∗ − βJ +1,0 + Jk=1 βJ +1,k xJ∗ +1 pj ≤ 0,

j = 1, ..., l

J +1

+1 βj k xj∗ − βJ +1,0 + Jk=1 βJ +1,k xJ∗ +1 pj = 0,

j = l + 1, ..., J

k=1

k=1

(12) where ε˜ j = εj − pj εJ +1 and, by the symmetry condition, βj k = βkj . Since the quadratic utility function homogeneous of degree zero in the parameters, we imis+1 pose the normalization Jj =1 βj 0 = 1. We have also re-written the estimation problem in terms of differences, ε ˜ to resolve the adding-up condition. ˜ If ε˜ ∼ N 0, and the consumer purchases xˆ = 0, ..., 0, xˆl+1 , ..., xˆJ , the corresponding likelihood is13 fx xˆ =

hl xˆ ;θ

h1 xˆ ;θ

··· −∞

fε ε˜ 1 , ..., ε˜ l , hl+1 xˆ ; θ , ..., hJ xˆ ; θ |J xˆ |d ε˜ 1 · · · d ε˜ l

−∞

(13) +1 +1 βki xj∗ + βJ +1,0 + Jk=1 βJ +1,k xJ∗ +1 pj , fε (ε) where h xˆ ; θ = −βj 0 − Jk=1 is the density corresponding to N (0, ), and J xˆ is the Jacobian from ε˜ to (xl+1 , ..., xJ ). Ransom (1987) showed that the concavity of the quadratic utility function, (11), is sufficient for coherency of the maximum likelihood problem (13), even though monotonicity may not hold globally. Concavity is ensured if the matrix of cross-price effects, B, where Bj k = βj k , is symmetric and negative definite. The advantages of the quadratic utility function include the flexibility of the substitution patterns between the goods, including a potential blend of complements and substitutes. However, the specification does not scale well in the number of products J . The number of parameters increases quadratically with J due to the cross-price effects. Moreover, the challenges in imposing global regularity could be problematic for policy simulations using the demand parameters.

12 Thomassen et al. (2017) extend the quadratic utility model to allow for discrete store choice as well as

the discrete/continuous expenditure allocation decisions across grocery product categories within a visited store. 13 The density f (˜ ε ε ) is induced by f (ε) and the fact that ε˜ j = εj − pj εJ +1 .

13

14

CHAPTER 1 Microeconometric models of consumer demand

3.1.3 Example: Linear expenditure system (LES) One of the classic utility specifications in the demand estimation literature is the Stone-Geary model: U (x; θ ) =

J

+1

θj ln xj − θj 1 , θj > 0.

(14)

j =1

Similar to the CES specification, the parameters θj measure the curvature of the subutility of each product and affect the rate of satiation. The translation parameters θj 1 allow for potential corner solutions. The Stone-Geary preferences have been popular in the extant literature because the corresponding demand system can be solved analytically: xj∗ = θj 1 − θ˜j

J

+1

θk1

k=1

pk y + θ˜j , j = 1, ..., J + 1 pj pj

(15)

θ where xj∗ > θj 1 , ∀j , and where θ˜j = j θ . The specification is often termed the k k “linear expenditure system” (LES) because the expenditure model is linear in prices

∗ θk1 pk . pj xj = θj 1 pj + θ˜j y − (16) k

Corner solutions with binding non-negativity constraints can arise when θj 1 ≤ 0 and, consequently, product j is “inessential” (Kao et al., 2001; Du and Kamakura, 2008). Assume WLOG that the consumer foregoes consumption on goods xj = 0 (j = 1, ..., l), and chooses a positive quantity for goods xj > 0 (j = l + 1, ..., J + 1). ¯ If we let θj = eθj +εj where θ¯J +1 = 0 then the KKT conditions are: ε˜ j + θ¯j − ln −θj 1 + ln y − Jk=1 xk∗ pk − θJ +1,1 − ln pj ≤ 0, j = 1, ..., l ε˜ j + θ¯j − ln

xj∗

− θj 1 + ln y −

J

∗ k=1 xk pk

(17)

− θJ +1,1 − ln pj = 0,

j = l + 1, ..., J where ε˜ j = εj − εJ +1 and θj 1 ≤ 0 for j = 1, ..., l. If ε˜ ∼ N (0, ) and the consumer purchases xˆ = 0, ..., 0, xˆl+1 , ..., xˆJ , the corresponding likelihood is fx xˆ ; θ =

hl xˆ ;θ

h1 xˆ ;θ

··· −∞

fε ε˜ 1 , ..., ε˜ l , hl+1 xˆ ; θ , ..., hJ xˆ ; θ |J xˆ |d ε˜ 1 · · · d ε˜ l

−∞

(18)

3 The neoclassical derivation of an empirical model

where hj xˆ ; θ = −θ¯j + ln −θj 1 − ln y − Jk=1 xk∗ pk − θJ +1,1 + ln pj , and fε (ε) is the density corresponding to N (0, ) and J xˆ is the Jacobian from ε˜ to (xl+1 , ..., J ). Some advantages of the LES specification include the fact that the utility function is globally concave, obviating the need for additional restrictions to ensure model coherency. In addition, the LES scales better than the quadratic utility as the number of parameters to be estimated grows linearly with the number of product, J . However, the specification does not allow for the same degree of flexibility in the substitution patterns between goods. The additive separability of the sub-utility functions associated with each good implies that the marginal utility of one good is independent of the level of consumption of all the other goods. Therefore, the goods are assumed to be strict Hicksian substitutes and any substitution between products arises through the budget constraint. The additive structure also rules out the possibility of inferior goods (see Deaton and Muellbauer, 1980b, p. 139).

3.1.4 Example: Translated CES utility Another popular specification for empirical work is the translated CES utility function (Pollak and Wales, 1992; Kim et al., 2002): U x∗ ; θ, ε

J

=

j =1

α α +1 ψj xj + γj j + ψJ +1 xJ J+1

(19)

where ψj = ψ¯ j exp εj > 0 is the stochastic perceived quality of a unit of product j , γj ≥ 0 is a translation of the utility, αj ∈ (0, 1] is a satiation parameter, and the J +1 collection of parameters to be estimated consists of θ = αj , γj , ψ¯ j j =1 . This specification nests several well-known models such as the translated Cobb-Douglas or “linear expenditure system” (αj → 0) and the translated Leontieff (αj → −∞). For any product j , setting γj = 0 would ensure a strictly interior quantity, xj∗ > 0. The CES specification has also been popular due to its analytic solution when quantities demanded are strictly interior. See for instance applications to nutrition preferences by Dubois et al. (2014) and Allcott et al. (2018). For the more general case with corner solutions, the logarithmic form of the KKT conditions associated with the translated CES utility model are ε˜ j ≤ hj xj∗ ; θ , ε˜ j = hj

xj∗ ; θ

j = 1, ..., l (20)

, j = l + 1, ..., J

αJ +1 −1 αj −1 where hj xj∗ ; θ = ln ψ¯ J +1 αJ +1 xj∗ − ln ψ¯ j αj xj∗ + γj + ln pj and ε˜ j = εj − εJ +1 .

15

16

CHAPTER 1 Microeconometric models of consumer demand

If ε˜ ∼ N (0, ) and the consumer purchases xˆ = 0, ..., 0, xˆl+1 , ..., xˆJ , the corresponding likelihood is

hl xˆ ;θ

fx xˆ ; θ =

h1 xˆ ;θ

··· −∞

fε ε˜ 1 , ..., ε˜ l , hl+1 xˆ ; θ , ..., hJ xˆ ; θ |J xˆ |d ε˜ 1 · · · d ε˜ l

−∞

(21) where fε (ε) is the density corresponding to N (0, ) and J xˆ is the Jacobian from ε˜ to (xl+1 , ..., J ). If instead we assume ε ∼ i.i.d. EV (0, σ ), Bhat (2005) and Bhat (2008) derive the simpler, closed-form expression for the likelihood with analytic solutions to the integrals and the Jacobian J xˆ in (21) ⎤ ⎡ ⎡ ⎤⎡ ⎤ h x ˆ ;θ J +1 i i J +1 J

+1 ⎥ σ pi ⎦ ⎢ 1 ⎥ ⎢ i=l+1 e fi ⎦ ⎣ fx xˆ ; θ = J −l ⎣ ⎥ (J − l)! ⎢ σ fi ⎣ J +1 hj xˆk ;θ J −l+1 ⎦ i=l+1 i=l+1 σ j =1 e (22) where, changing the notation from above slightly, we define hj xj∗ ; θ = ψ¯ j + i . αj − 1 ln xj∗ + γj − ln pj and fi = x1−α ∗ +γ i i A formulation of the utility function specifies an additive model of utility over stochastic consumption needs instead of over products (Hendel, 1999; Dubé, 2004) ⎛ ⎞α T J

∗ ⎝ ψ j t xj t ⎠ . U x ; θ, ψ = t=1

j =1

One interpretation is that the consumer shops in anticipation of T separate future consumption occasions (Walsh, 1995) where T ∼ Poisson (λ). The consumer draws the marginal utilities per of each product independently across the T consumption unit occasions, ψj t ∼ F ψ¯ j . The estimable parameters consist of θ = λ, ψ¯ 1 , ..., ψ¯ J , α . For each of the t = 1, ..., T occasions, the consumer has perfect substitutes preferences over the products and chooses a single alternative. The purchase of variety on a given trip arises from the aggregation of the choices for each of the consumption occasions. Non-purchase is handled by imposing indivisibility on the quantities; although a translation parameter like the one in (19) above could also be used if divisibility was allowed. Like the LES specification, the translated CES model is monotonic and quasiconcave, ensuring the consistency of the likelihood. The model also scales better than the quadratic utility as the number of parameters to be estimated grows linearly with the number of products, J . Scalability is improved even further by projecting the

3 The neoclassical derivation of an empirical model

perceived quality parameters, ψj , onto a lower-dimensional space of observed product characteristics (Hendel, 1999; Dubé, 2004; Kim et al., 2007). But, the translated CES specification derived above assumes the products are strict Hicksian substitutes, which limits the substitution patterns implied by the model.14 Moreover, with a large number of goods and small budget shares, the implied cross-price elasticities will be small in these models (Mehta, 2015).

3.1.5 Virtual prices and the dual approach Thus far, we have used a primal approach to derive the neoclassical model of demand with binding non-negativity constraints from a parametric model of utility. Most of the functional forms used to approximate utility in practice impose restrictions motivated by technical convenience. As we saw above, these restrictions can limit the flexibility of the demand model on factors such as substitution patterns between products and income effects. For instance, the additivity assumption resolves the global regularity concerns, but restricts the products to be strict substitutes. Accommodating more flexible substitution patterns becomes computationally difficult even for a simple specification like quadratic utility due to the coherency conditions. The dual approach has been used to derive demand systems from less restrictive assumptions Deaton and Muellbauer (1980b). Lee and Pitt (1986) developed an approach to use duality to derive demand while accounting for with binding nonnegativity constraints using virtual prices. The advantage of the dual approach is that a flexible functional form can be used to approximate indirect utility and cost functions can be used to determine the relevant restrictions to ensure that the derived demand system is consistent with microeconomic principles. The trade-off from using this dual approach is that the researcher loses the direct connection between the demand parameters and their deep structural interpretation as specific aspects of “preferences.” The specifications may be less suitable for marketing applications to problems such as product design, consumer quality choice and the valuation of product features, these specifications. We begin with the consumer’s indirect utility function (23) V (p, y; θ, ε) = max U (x; θ, ε) |p x = y x∈RJ +1

where the underlying utility function U (x; θ, ε) is again assumed to be strictly quasiconcave, continuously differentiable, and increasing. Roy’s Identity generates a system of notional demand equations x˜j (p, y; θ, ε) = −

∂V (p,y;θ,ε) ∂pj ∂V (p,y;θ,ε) ∂y

, ∀j.

(24)

14 Kim et al. (2002) apply the model to the choices between flavor variants of yogurt where consumption

complementarities are unlikely. However, this restriction would be more problematic for empirical studies of substitution between broader commodity groups.

17

18

CHAPTER 1 Microeconometric models of consumer demand

These demand equations are notional because they do not impose non-negativity and can therefore allow for negative values. In essence, x˜ is a latent variable since it is negative for products that are not purchased. Note that Roy’s identity requires that prices are fixed and independent of the quantities purchased by the consumer, an assumption that fails in settings where firms use non-linear pricing such as promotional quantity discounts.15 Lee and Pitt (1986) use virtual prices to handle products with zero quantity demanded (Neary and Roberts, 1980). Suppose the consumer’s optimal consumption ∗ , .., x ∗ vector is x∗ = 0, ..., 0, xl+1 J +1 where, as before, she does not purchase the first l goods. We can define virtual prices based on Roy’s Identity in Eq. (24) that exactly set the notional demands to zero for the non-purchased goods 0=

¯ y; θ, ε) , p, ¯ y; θ, ε) ∂V (π (p, , j = 1, ..., l ∂pj

¯ y; θ, ε) , ..., πl (p, ¯ y; θ, ε)) is the l-vector of virtual ¯ y; θ, ε) = (π1 (p, where π (p, prices and p¯ = (pl+1 , ..., pJ ). These virtual prices act like reservation prices for the non-purchased goods. We can derive the positive demands for goods j = l + 1, ..., J + 1 by substituting the virtual prices into Roy’s identity: ¯ ¯ ∂V (π(p,y;θ,ε), p,y;θ,ε) ∂pj ∗ ¯ xj (p, y; θ, ε) = − ∂V (π(p,y;θ,ε), ¯ ¯ p,y;θ,ε) ∂y

, j = l + 1, ..., J + 1.

(25)

The regime switching conditions in which products j = 1, ..., l are not purchased consist of comparing virtual prices and observed prices: ¯ y; θ, ε) ≤ pj , j = 1, ..., l. πj (p,

(26)

Lee and Pitt (1986) demonstrate the parallel between the switching conditions based on virtual prices in (26) and the binding non-negativity constraints in the KKT conditions, (4). The demand parameters θ can then be estimated by combining the conditional demand system, (25), and the regime-switching conditions, (4). If the consumer pur chases xˆ = 0, ..., 0, xˆl+1 , ..., xˆJ +1 , the corresponding likelihood is fx xˆ ; θ ∞ ∞ ∗−1 ¯ y; θ , ..., xJ∗−1 xˆ , p, ¯ y; θ xˆ , p, ··· fε ε1 , ..., εl , xl+1 = πl−1 (p,y,pl ;θ)

π1−1 (p,y,p1 ;θ)

× |J xˆ |d ε˜ 1 · · · d ε˜ l

(27)

15 Howell et al. (2016) show how a primal approach with a parametric utility specification can be used in

the presence of non-linear pricing.

3 The neoclassical derivation of an empirical model

where fε (ε) is the density corresponding to N (0, ) and J xˆ is the Jacobian from ε˜ to (xl+1 , ..., xJ ). The inverse functions in (27) reflect the fact that πj−1 p, y, pj ; θ ≤ ¯ y; θ = εj for j = l + 1, ..., J . εj for j = 1, ..., l and xj∗−1 xˆ , p, As with the primal problem, the choice of functional form for the indirect utility, V (p, y; θ, ε), can influence the coherency of the maximum likelihood estimator for θ in (27). van Soest et al. (1993) show that the uniqueness of the demand function defined by Roy’s Identity in (24) will hold if the indirect utility function V (p, y; θ, ε) satisfies the following three regularity conditions: 1. V (p, y; θ, ε) is homogeneous of degree zero. 2. V (p, y; θ, ε) is twice continuously differentiable in p and y. 3. V (p, y; θ, ε) is regular, meaning that the Slutsky matrix is negative semi-definite. For many popular and convenient flexible functional forms, the Slutsky matrix may fail to satisfy negativity leading to the coherency problem. In many of these cases, such as AIDS, the virtual prices may need to be derived numerically, making it difficult to derive analytic restrictions that would ensure these regularity conditions hold. For these reasons, the homothetic translog specification discussed below has been extremely popular in practice.

3.1.6 Example: Indirect translog utility One of the most popular implementations of the dual approach described above in Section 3.1.5 uses the translog approximation of the indirect utility function (e.g., Lee and Pitt, 1986; Millimet and Tchernis, 2008; Mehta, 2015)

V (p, y; θ, ε) =

J

+1

θj ln

j =1

pj y

+

J +1 J +1 pj 1

pk θj k ln ln . 2 y y

(28)

j =1 k=1

The econometric error is typically introduced by assuming θj = θ¯j + εj where εj ∼ F (ε). Roy’s Identity gives us the notional expenditure share for product j CHE

sj

=

+1 p −θj − Jk=1 θj k ln yk . pl 1− k l θkl ln y

(29)

van Soest and Kooreman (1990) derived slightly weaker sufficient conditions for coherency of the translog approach than van Soest et al. (1993). Following van Soest and Kooreman (1990), we impose the following additional restrictions which are sufficient for the concavity of the underlying utility function and, hence, the uniqueness

19

20

CHAPTER 1 Microeconometric models of consumer demand

of the demand system (29) for a given realization of ε: θJ +1 = 1 −

k θj k

j θj

= 0, ∀j

(30)

θj k = θkj , ∀j We can re-write the expenditure share for product j sj

= −θj −

J

k=1 θj k ln pk .

(31)

We can see from (31) that an implication of the restrictions in (30) is that they also impose homotheticity on preferences. For the translog specification, Mehta (2015) derived necessary and sufficient conditions for global regularity that are even weaker than the conditions in van Soest and Kooreman (1990). These conditions allow for more flexible income effects (normal and inferior goods) and for more flexible substitution patterns (substitutes and complements), mainly by relaxing homotheticity.16

3.2 The discrete/continuous product choice restriction in the neoclassical model Perhaps due to their computational complexity, the application of the microeconometric models of variety discussed in Section 3 has been limited. However, the empirical regularities documented in Section 2 suggest that simpler models of discrete choice, with only a single product being chosen, could be used in many settings. Recall from Section 2 that the average category has a single product choice chosen during 97% of trips. We now examine how our demand framework simplifies under discrete product choice. The discussion herein follows Hanemann (1984); Chiang and Lee (1992); Chiang (1991); Chintagunta (1993).

3.2.1 The primal problem The model in this section closely follows the simple re-packaging model with varieties from Deaton and Muellbauer (1980b). In these models, the consumption utility for a given product is based on its effective quantity consumed, which scales the quantity by the product’s quality. As before, we assume the commodity group of interest comprises j = 1, ..., J substitute products. Products are treated as perfect substitutes so that, at most, a single variant is chosen. We also assume there is an additional essential numeraire good indexed as product J + 1.

16 In an empirical application to consumer purchases over several CPG product categories, Mehta (2015)

finds the proposed model fits the data better than the homothetic translog specification. However, when J = 2, the globally regular translog will exhibit the restrictive strict Hicksian substitutes property.

3 The neoclassical derivation of an empirical model

To capture discrete product choice within the commodity group, we assume the following bivariate utility over the commodity group and the essential numeraire: ⎛

U x∗ ; θ, ψ = U˜ ⎝

J

⎞ ψj xj , ψJ +1 xJ +1 ⎠ .

(32)

j =1

The parameter vector ψ = (ψ1 , ..., ψJ +1 ), ψj ≥ 0 measures the constant marginal utility of each of the products. In the literature, we often refer to ψj as the “perceived quality” of product j . Specifying the perceived qualities as random variables, ψ ∼ F (ψ), introduces random utility as a potential source of heterogeneity across consumers in their perceptions of product quality. We also assume regularity conditions on U (x∗ ; θ, ψ) to ensure that a positive quantity of xJ +1 is always chosen. To simplify the notation, let the total commodity vector be z1 = Jj=1 ψj xj and let z2 = ψJ +1 xJ +1 so that we can re-write utility as U˜ (z1 , z2 ). The KKT conditions are ∂ U˜ ψ x,ψJ +1 xJ +1 ψj ∂z1 ∂ U˜ ψ x,ψ

−

∂ U˜ ψ x,ψJ +1 xJ +1 ψJ +1 pj ∂z2

≤ 0, j = 1, ..., J

(33)

x

J +1 J +1 where is the marginal utility of total quality-weighted consumption ∂z1 within the commodity group. Because of the perfect substitutes specification, if a product within the commodity group is chosen, it will be product k if

pk ψk

=

min

pj ψj

J j =1

and, hence, k exhibits the lowest price-to-quality ratio. As with the general model in Section 3, demand estimation will need to handle the regime switching, or demand selection conditions. If pk >

∂ U˜ z1∗ ,z2∗ ;θ,ψ ∂z1 ∂ U˜ z1∗ ,z2∗ ;θ,ψ ∂z2

ψk

, then the consumer spends her entire

ψJ +1

budget on the numeraire: xJ∗ +1 = y. Otherwise, the consumer allocates her budget ∂ U˜ z1∗ ,z2∗ ;θ,ψ ψk ∂z1 pk ∂ U˜ z1∗ ,z2∗ ;θ,ψ pj ψJ +1 ∂z2 ∂ U˜ z1∗ ,z2∗ ;θ,ψ

between xk∗ and xJ∗ +1 to equate We define hj (x∗ ; θ, ψ) =

=

∂ U˜ z1∗ ,z2∗ ;θ,ψ ψJ +1 . ∂z2

. When none of the products are cho-

∂z1

sen, we can write the likelihood of xˆ = (0, ..., 0) as fx xˆ ; θ =

∞

−∞

hJ xˆ ;θ,ψJ +1

h1 xˆ ;θ,ψJ +1

··· −∞

fψ (ψ) dψ1 · · · dψJ +1 . −∞

(34)

21

22

CHAPTER 1 Microeconometric models of consumer demand

When product 1 (WLOG) is chosen, we can write the likelihood of xˆ = xˆ1 , 0, ..., 0 as

∞

hJ xˆ ;θ,ψJ +1

h2 xˆ ;θ,ψJ +1

fx xˆ ; θ =

fψ h1 xˆ ; θ, ψ , ψ2 , ..., ψJ +1

···

−∞

−∞

−∞

× |J xˆ |dψ2 · · · dψJ +1

(35) where J xˆ is the Jacobian from ψ1 to xˆ1 . The likelihood now comprises a density component for the chosen alternative j = 1, and a mass function for the remaining goods.

3.2.2 Example: Translated CES utility Recall the translated CES utility function presented in Section 3.1.4 (Bhat, 2005; Kim et al., 2002): U x∗ ; θ, ε

=

J

j =1

α α +1 ψj xj + γj j + ψJ +1 xJ J+1 .

We can impose discrete product choice with the restrictions αj = 1, γj = 0 for j = 1, ..., J , which gives us perfect substitutes utility over the brands U x∗ ; θ, ε

=

J

j =1

ψj xj + ψJ +1 xJα+1 .

Let ψj = exp ψ¯ j εj , j = 1, ..., J and ψJ +1 = exp (εJ +1 ), where εj ∼ i.i.d. EV (0, σ ) (Deaton and Muellbauer, 1980a; Bhat, 2008). When none of the products are chosen, we can write the likelihood of xˆ = (0, ..., 0) as fx xˆ =

1+

J

j =1 exp

1

ψ¯ j −ln pj −ln αy (α−1) σ

which is the multinomial logit model. If WLOG alternative 1 is chosen in the commodity group, the likelihood of xˆ = xˆ1 , ..., 0 is f xˆ ; θ =

J Vj −εJ +1

exp σ σ j =1 −∞ Vj −εJ +1 × exp − exp exp fε (εJ +1 ) dεJ +1 σ σ

α−1 y − xˆ1

1 σ

∞

exp

j

where

Vj ≡ ψ¯ j − ln pj

3 The neoclassical derivation of an empirical model

and

α−1 hj ψJ +1 ; xˆ1 , p, θ = ln ψJ +1 α y − xˆ1 + ln pj − ψ¯ j

and Pk ≡ Pr εk + ψ¯ k − ln (pk ) ≥ εj + ψ¯ j − ln pj , j = 1, ..., J ¯ k) exp ψk −ln(p σ . = ¯ ψj −ln pj J exp j =1 σ

3.2.3 Example: The dual problem with indirect translog utility Muellbauer (1974) has shown that maximizing the simple re-packaging model utility function in (32) generates a bivariate indirect utility function of the form V ψpkky , ψJ 1+1 y when product k is the preferred product in the commodity group of interest. Following the template in Hanemann (1984), several empirical studies of discrete/continuous brand choice have been derived based on the indirect utility function and the dual virtual prices. For instance, Chiang (1991) and Arora et al. (1998) use a second-order flexible translog approximation of the indirect utility function17 : V (p, y; θ, ε) = θ1 ln ψpkky + θ2 ln ψJ 1+1 y + 12 θ11 ln ψpkky + 12 θ12 ln where

pk ψk

= min j

!

pj ψj

pk ψk y

ln ψJ 1+1 y + 12 θ22 ln ψJ 1+1 y

2

(36) 2

"

, ψj = exp ψ¯ j + εj for j = 1, ..., J , ψJ +1 = exp (εJ +1 ), and

εj ∼ i.i.d. EV (0, σ ). To facilitate the exposition, we impose the following restrictions to ensure coherency. But, the restrictions lead to the homothetic translog specification which eliminates potentially interesting income effects in substitution between products (see the concerns discussed earlier in Section 3.1.5): θ1 + θ2 = −1 θ11 + θ12 = 0 θ12 + θ22 = 0.

(37)

Roy’s Identity gives us the notional expenditure share for product k sk

pk = −θ1 − θ11 ln ψ + θ11 ln ψJ +1 . k

(38)

17 See Hanemann (1984) for other specification including LES and PIGLOG preferences. See also Chin-

tagunta (1993) for the linear expenditure system or “Stone-Geary” specification.

23

24

CHAPTER 1 Microeconometric models of consumer demand

11 ψ1 +θ11 εJ +1 From (38), we see that ε1 = sˆ1 +θ1 +θ11 ln(p1θ)−θ . We can now compute the 11 quality-adjusted virtual price (or reservation price) for purchase by setting (38) to θ1 +θ11 εJ +1 zero: R εJ +1 ; sˆ = exp − . θ11 pk > R εJ +1 ; sˆ and the likelihood of If none of the products are chosen, then ψ k sˆ = (0, ..., 0) is exp σθθ111 . f sˆ; θ = (39) ¯ ψ −ln p exp σθθ111 + Jj=1 exp j σ j

If, WLOG, product 1 is chosen, the likelihood of sˆ = sˆ1 , 0, ..., 0 is f sˆ =

∞

−∞

×e

1 σ θ11 P1 − − P1 e 1

e

−

sˆ1 +θ1 +θ11 ln p1 −θ11 ψ1 +θ11 εJ +1 θ11 σ

sˆ1 +θ1 +θ11 ln p1 −θ11 ψ1 +θ11 εJ +1 θ11 σ

fε (εJ +1 ) dεJ +1

(40)

where Pk ≡ Pr εk + ψ¯ k − ln (pk ) ≥ εj + ψ¯ j − ln pj , j = 1, ..., J ¯ k) exp ψk −ln(p σ . = ¯ ψj −ln pj J exp j =1 σ As discussed in Mehta et al. (2010), the distributional assumption εj ∼ i.i.d. EV (0, σ ) imposes a strong restriction on the price elasticity of the quantity purchased (conditional on purchase and brand choice), setting it very close to −1. This property can be relaxed by using a more flexible distribution, such as multivariate normal errors. Alternatively, allowing for unobserved heterogeneity in the parameters of the conditional budget share expression (38) would alleviate this restriction at the population level.

3.2.4 Promotion response: Empirical findings using the discrete/continuous demand model An empirical literature has used the discrete/continuous specification of demand to decompose the total price elasticity of demand into three components: (1) purchase incidence, (2) brand choice, and (3) quantity choice. This literature seeks to understand the underlying consumer choice mechanism that drives the observation of a large increase in quantities sold in response to a temporary price cut. In particular, the research assesses the extent to which a pure discrete brand choice analysis, focusing only on component (2) (see Section 3.3 below), might miss part of the price elasticity

3 The neoclassical derivation of an empirical model

and misinform the researcher or the retailer. Early work typically found that brandswitching elasticities accounted for most of the total price elasticity of demand in CPG product categories (e.g., Chiang, 1991; Chintagunta, 1993), though the unconditional brand choice elasticities were found to be larger than choice elasticities that condition on purchase. More recently, Bell et al.’s (1999) empirical generalizations indicate that the relative role of the brand switching elasticity varies across product categories. On average, they find that the quantity decision accounts for 25% of the total price elasticity of demand, suggesting that purchase acceleration effects may be larger than previously assumed. These results are based on static models in which any purchase acceleration would be associated with an increase in consumption. In Section 5.1, we extend this discussion to models that allow for forward-looking consumers to stock-pile storable consumer goods in anticipation of higher future prices.

3.3 Indivisibility and the pure discrete choice restriction in the neoclassical model The pure discrete choice behavior documented in the empirical stylized facts in Section 2 suggests a useful restriction for our demand models. In many product categories, the consumer purchases at most one unit of a single product on a given trip. Discrete choice behavior also broadly applies to other non-CPG product domains such as automobiles, computers, and electronic devices. The combination of pure discrete choice and indivisibility simplifies the discrete product choice model in Section 3.2 by eliminating the intensive margin of quantity choice, reducing the model to one of pure product choice. Not surprisingly, pure discrete choice models have become extremely popular for modeling demand both in the context of micro data on consumer-level choices and with more macro data on aggregate market shares. We now discuss the relationship between the classic pure discrete choice models of demand estimated in practice (e.g. multinomial logit and probit) and contrast them to the derivation of pure discrete choice from the neoclassical models derived above.

3.3.1 A neoclassical derivation of the pure discrete choice model of demand Recall from Section 3 where we defined the neoclassical economic model of consumer choice based on the following utility maximization problem: V (p, y; θ, ε) ≡ max U (x; θ, ε) : x p ≤ y, x ≥ 0 x

(41)

where we assume U (•; θ, ε) is a continuously-differentiable, quasi-concave, and increasing function. In that problem, we assumed non-negativity and perfect divisibility, xj ≥ 0, for each of the J = 1, ..., J products and the J + 1 essential numeraire. We now consider the case of indivisibility on the j = 1, ..., J products by adding the restriction xj ∈ {0, 1} for j = 1, ..., J . We also assume strong separability (i.e. additivity) of xJ +1 and perfect substitutes preferences over the j = 1, ..., J products such

25

26

CHAPTER 1 Microeconometric models of consumer demand

that: ⎛ ⎞ K J

U⎝ ψj xj , ψJ +1 xJ +1 ; θ ⎠ = ψj xj + u˜ (xJ +1 ; ψJ +1 ) j =1

j =1

where ψj = ψ¯ j + εj and u˜ xj +1 ; ψ¯ J +1 = u xJ +1 ; ψ¯ J +1 + εJ +1 . The KKT conditions will no longer hold under indivisible quantities. The consumer’s choice problem consists of making a discrete choice among the following J + 1 choice-specific indirect utilities: vj = ψ¯ j + u y − pj ; ψ¯ J +1 + εj + εJ +1 = v¯j + εj + εJ +1 , xj = 1 (42) vJ +1 = u y; ψ¯ J +1 + εJ +1 = v¯J +1 + εJ +1 , xJ +1 = y. The probability that consumer chooses alternative 1 ∈ {1, ..., J } is (WLOG) P r (x1 = 1) = P r (v1 ≥ vk , k = 1) = P r (εk ≤ v¯1 − v¯k + ε1 , ∀k = 1, ε1 ≥ v¯J +1 − v¯1 ) =

∞

v¯J +1 −v¯1

v¯1 −v¯2 +x

−∞

···

v¯1 −v¯J +x

−∞

(43)

f (x, ε2 , ...., εJ ) dεJ · · · dε2 dx

where f (ε1 , ..., εJ ) is the density of (ε1 , ..., εJ ) and the probability of allocating the entire budget to the essential numeraire is simply P r (xJ +1 = y) = 1 − J j =1 P r xj = 1 . If we assume (ε1 , ..., εJ ) ∼ i.i.d. EV (0, 1), the choice probabilities in (43) become ¯ J ψ¯ k +u˜ y−pk ;ψ¯ J +1 −e −u˜ y;ψJ +1 exp ψ¯ 1 +u˜ y−p1 ;ψ¯ J +1 k=1 e 1 − e Pr (x1 = 1) = J ¯ ¯ k=1

exp ψk +u˜ y−pk ;ψJ +1

¯ J ψ¯ k +u˜ y−pk ;ψ¯ J +1 −e −u˜ y;ψJ +1 k=1 e . Pr (xJ +1 = y) = 1 − e

(44) Suppose the researcher has a data sample with i = 1, ..., N independent consumer purchase observations. A maximum likelihood estimator of the model parameters can be constructed as follows: L (θ|y) =

J N i=1 j =1

y Pr (xJ +1 = y)yiJ +1 Pr xj = 1 ij

(45)

3 The neoclassical derivation of an empirical model

where yij indicates whether observation i resulted in choice alternative j , and θ = ψ¯ 1 , ..., ψ¯ J +1 . While the probabilities (44) generate a tractable maximum likelihood estimator based on (45), the functional forms do not correspond to the familiar multinomial logit specification used throughout the literature on discrete choice demand (McFadden, 1981).18 To understand why the neoclassical models from earlier sections do not nest the usual discrete choice models, note that the random utilities εJ +1 “difference out” in (43) and the model is equivalent to a deterministic utility for the decision to allocate the entire budget to the numeraire. This result arises because of the addingup condition associated with the budget constraint, which we resolved by assuming xJ∗ +1 > 0, just as we did in Section 3 above. Lee and Allenby (2014) extend the pure discrete choice model to allow for multiple discrete choice and indivisible quantities for each product. As before, assume there are j = 1, ..., J products and a J + 1 essential numeraire. To address the indivisibility of the j = 1, ..., J products, assume xj ∈ {0, 1, ...} for j = 1, ..., J . If utility is concave, increasing, and additive,19 U (x) = Jj=1 uj xj + αJ +1 (xJ +1 ), the consumer’s decision problem consists of selecting an optimal quantity for each of the! products and the essential numeraire, subject " to her budget constraint. Let

= (x1 , ..., xJ ) |y − j xj pj ≥ 0, xj ∈ {0, 1, ...} be the set of feasible quantities that satisfy the consumer’s budget constraint, where xJ +1 = y − j xj pj . The consumer picks an optimal quantity vector x∗ ∈ such that U (x∗ ) ≥ U (x) ∀x ∈ . To derive a tractable demand solution, Lee and Allenby (2014) assume that utility has the following form20 : uj (x) =

αj exp εj ln γj x + 1 . γj

The additive separability assumption is critical since it allows the optimal quantity of each brand to be determined separately. In particular, for each j = 1, ..., J the optimality of xj∗ is ensured if U (x1∗ , ..., xj∗ , ..., xJ∗ ) ≥ max{U (x1∗ , ..., xj∗ + , ..., xJ∗ )|x∗ ∈

, ∈ {−1, 1}}. The limits of integration of the utility shocks, ε, can therefore be derived in closed form:

fx xˆ ; θ =

J

ubj

fε εj dεj

j =1 lbj

18 Besanko et al. (1990) study the monopolistic equilibrium pricing and variety of brands supplied in a

market with discrete choice demand of the form (44).

19 The tractability of this problem also requires assuming linearity in the essential numeraire u x J +1 = α xJ +1 so that the derivation of the likelihood can be computed separately for each product alternative. 20 This specification is a special case of the translated CES model described earlier when the satiation

parameter asymptotes to 0 (e.g., Bhat, 2008).

27

28

CHAPTER 1 Microeconometric models of consumer demand

α p γ γ xˆj +1 αJ +1 pj γj where lbj = ln J +1αj j j − ln ln γ xˆj −1 = ln and ub − j α +1 j j j γ xˆ +1 +1 . ln ln j γ jxˆ +1 j j

3.3.2 The standard pure discrete choice model of demand Suppose as before that the consumer makes a discrete choice between each of the products in the commodity group. We again assume a bivariate utility over an essential numeraire and a commodity group, with perfect substitutes over the j = 1, ..., J J products in the commodity group: U j =1 ψj xj , xJ +1 . If we impose indivisibility on the product quantities such that xj ∈ {0, 1}, the choice problem once again becomes a discrete choice among the j = 1, ..., J + 1 alternatives vj = U ψj , y − pj + εj , j = 1, ..., J (46) vJ +1 = U (0, y) + εJ +1 . In this case, the random utility εJ +1 does not “difference out” and hence we will end up with a different system of choice probabilities. If we again assume that (ε1 , ..., εJ +1 ) ∼ i.i.d. EV(0, 1), the corresponding choice probabilities have the familiar multinomial logit (MNL) form: Pr (j ) ≡ Pr vj ≥ vk , for k = j exp U ψj , y − pj = . exp (U (0, y)) + Jk=1 exp (U (ψk , y − pk ))

(47)

Similarly, assuming (ε1 , ..., εJ +1 ) ∼ N (0, ) would give rise to the standard multinomial probit. We now examine why we did not obtain the same system of choice probabilities as in the previous section. Unlike the derivation in the previous section, the random utilities in (46) as primitive assumptions on the underlying utility were not specified J function U j =1 ψj xj , xJ +1 . Instead, they were added on to the choice-specific values. An advantage of this approach is that it allows the researcher to be more agnostic about the exact interpretation of the errors. In the econometrics literature, ε are interpreted as unobserved product characteristics, unobserved utility or tastes, measurement error or specification error. However, the probabilistic choice model has also been derived by mathematical psychologists (e.g., Luce, 1977) who interpret the shocks as psychological states, leading to potentially non-rational forms of behavior. Whereas econometricians interpret the probabilistic choice rules in (47) as the outcome of utility maximization with random utility components, mathematical psychologists interpret (47) as stochastic choice behavior (see the discussion in Anderson et al., 1992, Chapters 2.4 and 2.5). A more recent literature has derived the multinomial logit from a theory of “rational inattention.” Under rational inattention,

3 The neoclassical derivation of an empirical model

the stochastic component of the model captures a consumer’s product uncertainty and the costs of endogenously reducing uncertainty through search (e.g., Matejka and McKay, 2015; Joo, 2018). One approach to rationalize the system (47) is to define the J + 1 alternative as an additional non-market good with price p0 = 0, usually defined as “home production” (e.g., Anderson and de Palma, 1992). We assume the consumer always chooses at least one of the J + 1 alternative. In addition, we introduce a divisible, essential numeraire good, z, with price pz = 1, so that the consumer has bivariate utility over the J +1 total consumption of the goods and over the essential numeraire: U j =0 ψj xj , z . The choice-specific values correspond exactly to (46) and the shock εJ +1 is now interpreted as the random utility from home production. This model differs from the neoclassical models discussed in Sections 3 and 3.2 because we have now included an additional non-market good representing household production. For example, suppose a consumer has utility: ⎛ ⎞ ⎛ ⎞ J

+1 J

+1 U⎝ ψ j xj , z ⎠ = ⎝ ψj xj ⎠ exp (αz) j =0

j =0

where goods j = 1, ..., J + 1 are indivisible, perfect substitutes each with perceived qualities ψj = exp ψ¯ j + εj , where we normalize ψ¯ J +1 = 1, and where α is preference for the numeraire good. In this case, the choice-specific indirect utilities would be (in logs) vj = ψj + α y − pj + εj , j = 1, ..., J vJ +1 = αy + εJ +1 . The MNL was first applied to marketing panel data for individual consumers by Guadagni and Little (1983). The linearity of the conditional indirect utility function explicitly rules out income effects in the substitution patterns between the inside goods. We discuss tractable specifications that allow for income effects in Section 4.1 below. If income is observed, then income effects in the substitution between the commodity group and the essential numeraire can be incorporated by allowing for non-linearity in the utility of the numeraire. For instance, if U˜ (xJ +1 ) = ψJ +1 ln (xJ +1 ) then we get choice probabilities21 Pr (k; θ ) =

exp (ψk + ψJ +1 ln (y − pk )) . exp (ψJ +1 ln (y)) + Jj=1 exp ψj + ψJ +1 ln y − pj

(48)

This specification also imposes an affordability condition by excluding any alternative for which pj > y.

21 We can derive this specification from the assumption of Cobb-Douglas utility: U x , ..., x 1 J +1 ; θ = ψ J +1 J +1 x exp ψ x . j j J +1 j =1

29

30

CHAPTER 1 Microeconometric models of consumer demand

The appeal of the MNL’s closed-form specification comes at a cost for demand analysis. If U˜ (xJ +1 ) = ψJ +1 xJ +1 as is often assumed in the literature, the model exhibits the well-known Independence of Irrelevant Alternatives property. The IIA property can impose unrealistic substitution patterns in demand analysis. At the individual consumer level, the cross-price elasticity of demand is constant: ∂Pr (j ) pk = ψJ +1 Pr (k) pk ∂pk Pr (j ) so that substitution patterns between products will be driven by their prices and purchase frequencies, regardless of attributes. Moreover, a given product competes uniformly on price with all other products. One solution is to use a non-IIA specification. For instance, error components variants of the extreme value distribution, like nested logit and the generalized extreme value distribution, can relax the IIA property within pre-determined groups of products (e.g., McFadden, 1981; Cardell, 1997).22 If we instead assume that ε ∼ N (0, ) with appropriately scaled covariance matrix , we obtain the multinomial probit (e.g., McCulloch and Rossi, 1994; Goolsbee and Petrin, 2004). Dotson et al. (2018) parameterize the covariance matrix, , using product characteristics to allow for a scalable model with correlated utility errors and, hence, stronger substitution between similar products. When consumer panel data are available, another solution is to use a random coefficients specification that allows for more flexible aggregate substitution patterns (see Chapter 2 of this volume). In their seminal application of the multinomial logit to consumer-level scanner data, Guadagni and Little (1983) estimated demand for the ground coffee category using 78 weeks of transaction data for 2,000 households shopping in 4 Kansas City Supermarkets. Interestingly, they found that brand and pack size were the most predictive attributes for consumer choices. They also included the promotional variables “feature ad” and “in-aisle display” as additive utility shifters. These variables have routinely been found to be predictive of consumer choices. However, the structural interpretation of a marginal utility from a feature ad or a display is ambiguous. While it is possible that consumers obtain direct consumption value from a newspaper ad or a display, it seems more likely that these effects are the reduced-form of some other process such as information search. Exploring the structural foundations of the “promotion effects” remains a fruitful area for future research.

22 Misra (2005) shows that the disutility minimization formulation of the multinomial logit (or “reverse

logit”) leads to a different functional form of the choice probabilities that does not exhibit the IIA property.

4 Some extensions to the typical neoclassical specifications

4 Some extensions to the typical neoclassical specifications 4.1 Income effects Most of the empirical specifications discussed earlier imposed regularity conditions that, as a byproduct, impose strong restrictions on the income effects on demand. Since the seminal work by Engel (1857), the income elasticity of demand has been used to classify goods based on consumption behavior. Goods with a positive income elasticity are classified as Normal goods, for which consumers increase their consumption as income increases. Goods with a negative income elasticity are classified as Inferior goods, for which consumers decrease their consumption as income increases. Engel’s law is based on the empirical observation that households tend to allocate a higher proportion of their income to food as they become poorer (e.g., Engel, 1857). Accordingly, we define necessity goods and luxury goods based on whether the income elasticity of demand is less than or greater than one. Homothetic preferences restrict all products to be strict Normal goods with an income elasticity of one, thereby limiting the policy implications one can study with the model. Quasilinear preferences over the composite “outside” good restrict the income elasticity to zero, eliminating income effects entirely. When the empirical focus is on a specific product category for a low-priced item like a CPG product, it may be convenient to assume that income effects are likely to be small and inconsequential.23 This assumption is particularly convenient when a household’s income or shopping budget is not observed. However, overly restrictive income effects can limit a model’s predicted substitution patterns, leading to potentially adverse policy implications (see McFadden’s forward to Anderson et al. (1992) for a discussion). Even when a household’s income is static, large changes in relative prices could nevertheless create purchasing power effects. Consider the bivariate utility function specification with perfect substitutes in the focal commodity group J from Section 3.2: U˜ ψ j =1 j xj , xJ +1 . For the products in the first commodity p

pk ≤ ψjj for all j = k. When congroup, consumers will select the product k where ψ k sumers face the same prices and have homogeneous quality perceptions, they would all be predicted to choose the same product. Changes in a consumer’s income would change the relative proportion of income spent on the commodity group and the essential numeraire. But the income change would not affect her choice of product. Therefore, homotheticity may be particularly problematic in vertically differentiated product categories where observed substitution patterns may be asymmetric between products in different quality tiers. For instance, the cross-elasticity of demand for lower-quality products with respect to premium products’ prices may be higher than the cross-elasticity of demand for higher-quality products with respect to the lowerquality products’ prices (e.g., Blattberg and Wisniewski, 1989; Pauwels et al., 2007).

23 Income effects are typically incorporated into demand analyses of high-priced, durable consumption

goods like automobiles (e.g., Berry et al., 1995).

31

32

CHAPTER 1 Microeconometric models of consumer demand

Similarly, Deaton and Muellbauer (1980b, p. 262) observed a cross-sectional income effect: “richer households systematically tend to buy different qualities than do poorer ones.” Gicheva et al. (2010) found a cross-time income effect by showing that lower-income households responded to higher gasoline prices by substituting their grocery purchases towards promotional-priced items, which could be consistent with asymmetric switching patterns if lower-quality items are more likely to be promoted. Similarly, Ma et al. (2011) found that households respond to increases in gasoline prices by substituting from national brands to lower-priced brands and to unadvertised own brands supplied by retailers, or “private labels.” These substitution patterns suggest that national brands are normal goods.24

4.1.1 A non-homothetic discrete choice model Given the widespread use of the pure discrete choice models, like logit and probit, we now discuss how to incorporate income effects into these models without losing their empirical tractability. To relax the homotheticity property in the simple re-packaging model with perfect substitutes from Section 3.2 above, Deaton and Muellbauer (1980a) and Allenby and Rossi (1991) introduce rotations into the system of linear indifference curves by defining utility implicitly: ⎛ ⎞

ψj U¯ , ε xj , xJ +1 ⎠ . U (x; θ, ε) = U˜ ⎝ (49) j

The marginal utilities in this specificationvary with the level of total attainable utility at the current prices, U¯ . If we interpret ψ U¯ , ε as “perceived quality,” then we allow the marginal value of perceived quality to vary with the level of total attainable utility. To ensure the marginal utilities are positive, Allenby and Rossi (1991) and Allenby et al. (2010) use the empirical specification ψj U¯ , ε = exp θj 0 − θj 1 U (x; θ ) + εj where εj is a random utility shock as before. If θj 1 > 0, then utility is increasing and concave. The model nests the usual homothetic specification when θj 1 = 0 for each product j . To see that the parameters θj 1 also capture differences in perceived quality, consider the relative marginal utilities: ψk U¯ = exp θk0 − θj 0 + θj 1 − θk1 U (x; θ ) + εk − εj . ψj U¯ The relative perceived quality of product k increases with the level of attainable utility, U¯ , so long as θk1 < θj 1 , and so k would be perceived as superior to j . The identification of θk0 comes from the average propensity to purchase product k whereas the 24 Although, Dubé et al. (2017a) find highly income-inelastic demand for private label CPGs identified

off the large household income shocks during the Great Recession.

4 Some extensions to the typical neoclassical specifications

identification of θk1 comes from the substitution towards k in response to changes in purchasing power either through budgetary changes to y or through changes in the overall price level. Demand estimation is analogous to the discrete-continuous case in Section 3.2, except for the additional calculation of U¯ . Consider the example of pure discrete choice with perfect-substitutes and Cobb-Douglas Utility as in Allenby et al. (2010): ⎛ ⎞

U (x; θ, ε) = ln ⎝ ψj U¯ , ε xj ⎠ + ψJ +1 ln xJ +1 j

where xj ∈ {0, 1} for j = 1, ..., J . The consumer chooses between a single unit of one of the j = 1, ..., J products or the J + 1 option of allocating the entire budget to the outside good with the following probabilities: exp θk0 − θk1 U¯ k − ψJ +1 ln (y − pk ) Pr (xk = 1; θ) = 1+ exp θj 0 − θj 1 U¯ j − ψJ +1 ln y − pj j |j ≤J and pj ≤y

where U¯ k is solved numerically as the solution to the implicit equation ln U¯ k = θk0 − θk1 U¯ k − ψJ +1 ln (y − pk ) .

(50)

Maximum likelihood estimation will therefore nest the fixed-point calculation to (50) at each stage of the parameter search. In their empirical case study of margarine purchases, Allenby and Rossi (1991) find that the demand for generic margarine is considerably more elastic in the price of the leading national brand than vice versa. This finding is consistent with the earlier descriptive findings regarding asymmetric substitution patterns, the key motivating fact for the non-homothetic specification. Allenby et al. (2010) project the brand intercepts and utility rotation parameters, θj 0 and θj 1 , respectively, onto advertising to allow the firms’ marketing efforts to influence the perceived superiority of their respective brands. In an application to survey data from a choice-based conjoint survey with a randomized advertising treatment, they find that ads change the substitution patterns in the category by causing consumers to allocate more spending to higherquality goods.

4.2 Complementary goods The determination of demand complementarity figured prominently in the consumption literature (see the survey by Houthakker, 1961). But, the microeconometric literature tackling demand with corner solutions has frequently used additive models that explicitly rule out complementarity and assume products are strict substitutes (Deaton and Muellbauer, 1980b, pp. 138-139). For many product categories, such as laundry detergents, ketchups, and refrigerated orange juice, the assumption of strict

33

34

CHAPTER 1 Microeconometric models of consumer demand

substitutability seems reasonable for most consumers. However, in other product categories where consumers purchase large assortments of flavors or variants, such as yogurt, carbonated soft drinks, beer, and breakfast cereals, complementarity may be an important part of choices. For a shopping basket model that accounts for the wide array of goods, complementarity seems quite plausible between broader commodity groups (e.g. pasta and pasta sauce, or cake mix and frosting). Economists historically defined complementarity based on the supermodularity of the utility function and the increasing differences in utility associated with joint consumption. Samuelson (1974) provides a comprehensive overview of the arguments against such approaches based on the cardinality of utility. Chambers and Echenique (2009) formally prove that supermodularity is not testable with data on consumption expenditures. Accordingly, most current empirical research defines complementarity based on demand behavior, rather than as a primitive assumption about preferences.25 Perhaps the most widely-cited definition of complementarity comes from Hicks and Allen (1934) using compensated demand: Definition 1. We say that goods j and k are complements if an increase in the price of j leads to a decrease in the compensated demand for good k, substitutes if an increase in the price of j leads to a increase in the compensated demand for good k, independent if an increase in the price of j has no effect on the compensated demand for good k. This definition has several advantages including symmetry and the applicability to any number of goods. However, compensated demand is unlikely to be observed in practice. Most empirical research tests for gross complementarity, testing for the positivity of the cross-derivatives of Marshallian demands with respect to prices. The linear indifference curves used in most pure discrete choice models eliminates any income effects, making the two definitions equivalent. A recent literature has worked on establishing the conditions under which an empirical test for complementarity is identified with standard consumer purchase data (e.g., Samuelson, 1974; Gentzkow, 2007; Chambers et al., 2010). The definition of complementarity based on the cross-price effects on demand can be problematic in the presence of multiple goods. Samuelson (1974, p. 1255) provides the following example: ... sometimes I like tea and cream... I also sometimes take cream with my coffee. Before you agree that cream is therefore a complement to both tea and coffee, I should mention that I take much less cream in my cup of coffee than I do in my cup of tea. Therefore, a reduction in the price of coffee may reduce my demand for cream, which is an odd thing to happen between so-called complements.

25 An exception is Lee et al. (2013) who use a traditional definition of complementarity based on the sign

of the cross-partial derivative of utility.

4 Some extensions to the typical neoclassical specifications

To see how this could affect a microeconometric test, consider the model of bivariate utility over a commodity group defined as products in the coffee and cream categories, and an essential numeraire that aggregates expenditures on all other goods (including tea). Even with flexible substitution patterns between coffee and cream, empirical analysis could potentially produce a positive estimate of the cross-price elasticity of demand for cream with respect to the price of coffee if cream is more complementary with tea than with coffee. On the one hand, this argument highlights the importance of multi-category models, like the ones we will discuss in Section 4.2.2 below, that consider both the intra-category and inter-category patterns of substitution. For instance, one might specify a multivariate utility over all the beverage-related categories and the products within each of the categories. The multi-category model would characterize all the direct and indirect substitution patterns between goods Ogaki (1990, p. 1255). On the other hand, a multi-category model increases the technical and computational burden of demand estimation dramatically. As discussed in Gentzkow (2007, p. 720), the estimated quantities in the narrower, single-category specification should be interpreted as “conditional on the set of alternative goods available in the market.” The corresponding estimates will still be correct for many marketing applications, such as the evaluation of the marginal profits of a pricing or promotional decision. The estimates will be problematic if there is a lot of variation in the composition of the numeraire that, in turn, changes the specification of utility for the commodity group of interest. Our discussion herein focuses on static theories of complementarity. While several of the models in Section 3 allow for complementarity, the literature has been surprisingly silent on the identification strategies for testing complementarity. An exception is Gentzkow (2007), which we discuss in more detail below. A burgeoning literature has also studied the complementarities that arise over time in durable goods markets with inter-dependent demands and indirect network effects. These “platform markets” include such examples as the classic “razors & blades” and “hardware & software” cases (e.g., Ohashi, 2003; Nair et al., 2004; Hartmann and Nair, 2010; Lee, 2013; Howell and Allenby, 2017).

4.2.1 Complementarity between products within a commodity group In most marketing models of consumer demand, products within a commodity group are assumed to be substitutes. When a single product in the commodity group is purchased on a typical trip, the perfect substitutes specification is used (see Sections 3.2 and 3.3). However, even when multiple products are purchased on a given trip, additive models that imply products are strict substitutes are still used (see for instance the translated CES model in Section 3.1.4). Even though some consumer goods products are purchased jointly, they are typically assumed to be consumed separately. There are of course exceptions. The ability to offer specific bundles of varieties of beverage flavors or beer brands could have a complementary benefit when a consumer is entertaining guests. Outside the CPG domain, Gentzkow (2007) analyzed the potential complementarities of jointly consuming digital and print versions of news.

35

36

CHAPTER 1 Microeconometric models of consumer demand

To incorporate the definition of complementary goods into our demand framework, we begin with a discrete quantity choice model of utility over j = 1, ..., J goods in a commodity group, where xj ∈ {0, 1}, and a J + 1 essential numeraire. The goal consists of testing for complementarity between the goods within a given commodity group. The discussion herein closely follows Gentzkow (2007). We index all the possible commodity-group bundles the consumer could potentially purchase as c ∈ P ({1, ..., J }), using c = 0 to denote the allocation of her entire budget to the numeraire. The consumer obtains the following choice-specific utility, normalized by u0 uc =

#

j ∈c

ψj − αpj + εj + 12 j ∈c k∈c,k =j j k ,

0,

if c ∈ P ({1, ..., J }) if c = 0 (51)

where is symmetric and P ({1, ..., J }) is the power set of the j = 1, ..., J products. Assume that ε ∼ N (0, ). To simplify the discussion, suppose that the commodity group comprises only two goods, j and k. The choice probabilities are then $

ε|uj ≥0,uj ≥uk ,uj ≥uj k dF (ε) Pr (k) = ε|uk ≥0,uk ≥uj ,uk ≥uj k dF (ε) $ Pr (j k) = ε|uj k ≥0,uj k ≥uj ,uj k ≥uk dF (ε) .

Pr (j ) =

$

Finally, the expected consumer demand can be computed as follows: xj = Pr (j ) + Pr (j k) and xk = Pr (k) + Pr (j k). It is straightforward to show that an empirical test of complementarity between two goods, j and k, reduces to the sign of the corresponding j k elements of . An increase in the price pk has two effects on demand xj . First, marginal consumers who would not buy the bundle but who were indifferent between buying only j or only k alone will switch to j . At the same time, however, marginal consumers who would not buy only j or only k, and who are indifferent between buying the bundle or not, will switch to non-purchase. More formally, ∂xj ∂Pr (j ) ∂Pr (j k) = + ∂pk ∂pk ∂pk =

ε|uj =uk ,uk ≥0,−kj ≥uj

dF (ε) −

dF ε|uj +uk =−j k ,uj ≤0,uk ≤0

(ε) .

We can see that our test for complementarity is determined by the sign of j k >0⇒ =0⇒ 0 and j and k are substitutes.

4 Some extensions to the typical neoclassical specifications

Gentzkow (2007) provides a practical discussion of the identification challenges associated with j k , even for this stylized discrete choice demand system. At first glance, (51) looks like a standard discrete choice model where each of the possible permutations of products has been modeled as a separate choice.26 But, the correlated error structure in ε plays an important role in the identification of the complementarity. The key moment for the identification of is the incidence of joint purchase of products j and k, Pr (j k). But, Pr (j k) could arise either through a high value of high j k or a high value . A restricted covariance structure like logit, which of cov ε , ε j k sets cov εj , εk = 0 will be forced to attribute a high Pr (j k) to complementarity. An ideal instrument for testing complementarity would be an exclusion restriction. Consider for instance a variable zj that shifts the mean utility for j but does not affect or the mean utility of good k. In the CPG context, the access to highfrequency price variation in all the observed products as well as point-of-purchase promotional variables are ideal for this purpose. The identification of could then reflect the extent to which changes in zj affect demand xk . Panel data can also be exploited to identify j k and cov εj , εk . Following the conventions in the literature allowing for persistent, between-consumer heterogeneity, we could let ε be persistent, consumer-specific “random effects.” We could then also include i.i.d. shocks that vary across time and product to explain within consumer switches in behavior. If joint purchase reflects cov εj , εk , we would expect to see some consumers frequently buying both and other consumers frequently buying neither. But, conditional on a consumer’s average propensity to purchase either good, the cross-time variation in choices should be uncorrelated. However, if joint purchase reflects j k , we would then expect more correlation over time whereby a consumer would either purchase both goods or neither, but would seldom purchase only one of the two.

4.2.2 Complementarity between commodity groups (multi-category models) In the analysis of CPG data, most of the emphasis on complementarity has been between product categories, where products within a commodity group are perceived as substitutes but different commodity groups may be perceived as complements. Typically, such cross-category models have been specified using probabilistic choice models without a microeconomic foundation, and that allow for correlated errors either in the random utility shocks (e.g., Manchanda et al., 1999; Chib et al., 2002) or in the persistent heteroskedastic shocks associated with random coefficients (e.g., Ainslie and Rossi, 1998; Erdem, 1998). For an overview of these models, see the discussion in Seetharaman et al. (2005). The lack of a microfoundation complicates the ability to assign substantive interpretations of model parameters. For instance, the identification discussion in the previous section clarifies the fundamental distinction

26 For instance, Manski and Sherman (1980) and Train et al. (1987) use logit and nested logit specifications that restrict the covariance patterns in ε.

37

38

CHAPTER 1 Microeconometric models of consumer demand

between correlated tastes (as in these multi-category probabilistic models) and true product complementarity. At least since Song and Chintagunta (2007), the empirical literature has used microfounded demand systems to accommodate complementarity and substitutability in the analysis of the composition of household shopping baskets spanning many categories during a shopping trip. Conceptually, it is straightforward to extend the general, non-additive frameworks in Section 3 to many commodity groups. For instance, Bhat et al. (2015) introduce potential complementarity into the translated CES specification (see Section 3.1.4) by relaxing additivity and allowing for interaction effects.27 In their study of the pro-competitive effects of multiproduct grocery stores, Thomassen et al. (2017) use a quadratic utility model that allows for gross complementarity.28 Mehta (2015) uses the indirect translog utility approximation to derive a multi-category model that allows for complementarities. The direct application of the models in Section 3 and the extensions just discussed is limited by the escalation in parameters and the dimension of numerical integration, both of which grow with the number of products studied. Typically, researchers have either focused their analysis on a small set of product alternatives within a commodity group29 or have focused their analysis on aggregate expenditure behavior across categories, collapsing each category into an aggregated composite good.30 As we discuss in the next subsection, additional restrictions on preferences have been required to accommodate product-level demand analysis across categories.

Example: Perfect substitutes within a commodity group Suppose the consumer makes purchase decisions across m = 1, ..., M commodity groups, each containing j = 1, ..., Jm products. The consumer has a weakly separable, multivariate utility function over each of the M commodity groups and an M + 1 essential numeraire good with price pM+1 = 1. Within each category, the consumer has perfect substitutes sub-utility over the products, giving consumer utility ⎛ U˜ ⎝

J1

j =1

ψ1j x1j , ...,

JM

⎞ ψMj xMj , ψM+1 xM+1 ⎠

(52)

j =1

27 This specification does not ensure that global regularity is satisfied, which could limit the ability to

conduct counterfactual predictions with the model. 28 Empirically, they find that positive cross-price elasticities between grocery categories within a store are

driven more by shopping costs associated with store choice than by intrinsic complementarities based on substitution patterns between categories. 29 For instance, Wales and Woodland (1983) allow for J = 3 alternatives of meat: beef, lamb, and other meat, and Kim et al. (2002) allow for J = 6 brand alternatives of yogurt. 30 For instance, Kao et al. (2001) look at expenditures across J = 7 food commodity groups, Mehta (2015) looks at trip-level expenditures across J = 4 supermarket categories, and Thomassen et al. (2017) look at trip-level expenditures across J = 8 supermarket product categories.

4 Some extensions to the typical neoclassical specifications

and budget constraint Jm M

pmj xmj + xM+1 ≤ y.

m=1 j =1

The utility function is a generalization of the discrete choice specification in Section 3.2 to many commodity groups. At most, one product will be chosen in each of the M commodity groups. As before, ψmj ≥ 0 and U˜ () is continuouslydifferentiable, quasi-concave, and increasing function in each of its arguments. We also assume additional regularity conditions to ensure that an interior quantity of ∗ > 0. This approach with perfect the essential numeraire is always purchased, xM+1 substitutes within a category has been used in several studies (e.g., Song and Chintagunta, 2007; Mehta, 2007; Lee and Allenby, 2009; Mehta and Ma, 2012). Most of the differences across studies are based on the assumptions regarding the multivariate utility function U˜ (x). Lee and Allenby (2009) use a primal approach that specifies a quadratic utility over the commodity groups U˜ (u (x1 ; ψ1 ) , ..., u (xM ; ψM ) , ψM+1 xM+1 ) =

M+1

βm0 u (xm ; ψm ) −

m=1

M+1 M+1 1

βmn u (xm ; ψm ) u (xn ; ψn ) 2

(53)

m=1 n=1

where u (xm ; ψm ) =

Jm

ψmj xmj

j =1

and ψmj = exp ψ¯ mj + εmj where ψ¯ M+1 = 0, we normalize β10 , and we assume symmetry such that βmn = βnm for m, n = 1, ..., M + 1. The KKT conditions associated with the maximization of the utility function (53) are now as follows: ∗ >0 ε˜ mj = hmj (x∗ ; ψ) , if xmj ∗ =0 ε˜ mj ≤ hmj (x∗ ; ψ) , if xmj

where ε˜ mj = εmj − εM+1 and hmj

ψ¯ mj x ; ψ = − ln pmj

∗

βm0 −

M

n=1

βmn u

xn∗ ; ψn

.

39

40

CHAPTER 1 Microeconometric models of consumer demand

Lee and Allenby (2009) do not impose additional parameter restrictions to ensure the utility function is quasi-concave, a sufficient condition for the coherency of the likelihood function. Instead, they set the likelihood deterministically to zero at any support point where either the marginal utilities are negative or the utility function fails quasi-concavity.31 While this approach may ensure coherency, it will not ensure that global regularity is satisfied, which could limit the ability to conduct counterfactual predictions with the model. While products in the same commodity group are assumed to be perfect substitutes, the utility function (53) allows for gross complementarity between a pair of commodity groups, m and n, through the sign of the parameter βmn . In their empirical application, they find gross complementarity between laundry detergents and fabric softeners, which conforms with their intuition. All other pairs of categories studied are found to be substitutes. Song and Chintagunta (2007), Mehta (2007), and Mehta and Ma (2012) use a dual approach that specifies a translog approximation of the indirect utility function. For simplicity of presentation, we use the homothetic translog specification from Song and Chintagunta (2007)32 V (p, y; θ, ε) = ln (y) −

M+1

θm ln

m=1

+

1 2

M+1

M+1

pmjm ψmjm

θmn ln

m=1 n=1

pmjm ψmjm

ln

pnjm ψmjn

+

M+1

εjm ln

m=1

pmjm ψmjm

(54) ψ

ψ

mjm mj where for each commodity group m, product jm satisfies pmj ≥ pmj , ∀j = jm . To m ensure the coherency of the model, the following parameter restrictions are imposed:

m θm

= 1, θmn = θnm , ∀m, n

M+1 m=1

θmn = 0, ∀n.

Applying Roy’s identity, we derive the following conditional expenditure shares smjm (p, y; θ, ε) = θm −

M+1

n=1

θmn ln

pmjm ψmjm

.

(55)

31 The presence of indicator functions in the likelihood create discontinuities that could be problematic

for maximum likelihood estimation. The authors avoid this problem by using a Bayesian estimator that does not rely on the score of the likelihood. 32 Mehta and Ma (2012) use a non-homothetic translog approximation which generates a more complicated expenditure share system, but which allows for more flexible income effects.

4 Some extensions to the typical neoclassical specifications

Since the homothetic translog approximation in (54) eliminates income effects from the expenditure shares, a test for complementarity between a pair of categories m and n amounts to testing the sign of θmn . In particular, conditional on the chosen products in categories m and n, jm and jn , respectively, complementarity is identified p off the changes in smjm due to the quality-adjusted price, ψnjnjn . Hence, several factors n can potentially serve as instruments to test complementarity. Changes in the price pnjn is an obvious source. In addition, if the perceived quality ψnjn is projected onto observable characteristics of product jn , then independent characteristic variation in product jn can also be used to identify the complementarity. The switching conditions will be important to account for variation in the identity of the optimal product jn . A limitation of this specification is that any complementarity only affects the intensive quantity margin and does not affect the extensive brand choice and purchase incidence margins. Song and Chintagunta (2007) do not detect evidence of complementarities in their empirical application, which may be an artifact of the restricted way in which complementarity enters the model. Mehta and Ma (2012) use a non-homothetic translog approximation that allows for complementarity in purchase incidence as well as the expenditure shares. In their empirical application, they find strong complementarities between the pasta and pasta sauces categories. These findings suggest that the retailer should be coordinating prices across the two categories and synchronizing the timing of promotional discounts.

4.3 Discrete package sizes and non-linear pricing In many consumer goods product categories, product quantities are restricted to the available package sizes. For instance, a customer must choose between specific prepackaged quantities of liquid laundry detergent (e.g., 32 oz, 64 oz, or 128 oz) and cannot purchase an arbitrary, continuous quantity. Early empirical work focused on brand choices, either narrowing the choice set to a specific pack size or collapsing all the pack sizes into a composite brand choice alternative. However, these models ignore the intensive quantity margin and limit the scope of their applicability to decision-making on the supply side. Firms typically offer an array of pre-packaged sizes as a form of commodity bundling, or “non-linear pricing.” In practice, we expect to see quantity discounts whereby the consumer pays a lower price-per-unit when she buys the larger pack size, consistent with standard second-degree price discrimination (e.g., Varian, 1989; Dolan, 1987). However, several studies have documented cases where firms use quantity-surcharging by raising the price-per-unit on larger pack sizes (e.g., Joo, 2018). The presence of nonlinear pricing introduces several challenges into our neoclassical models of demand (e.g., Howell et al., 2016; Reiss and White, 2001). First, any kinks in the pricing schedule will invalidate the use of the Kuhn-Tucker conditions.33 Second, the dual approach that derives demand using Roy’s identity is 33 See Lambrecht et al. (2007) and Yao et al. (2012) for the analysis of demand under three-part mobile tariffs.

41

42

CHAPTER 1 Microeconometric models of consumer demand

invalidated by non-linear pricing because Roy’s Identity only holds under a constant marginal price for any given product. An exception is the case of piecewiselinear budget sets (e.g., Hausman, 1985; Howell et al., 2016). Third, the price paid per unit of a good depends on a consumer’s endogenous quantity choice, creating a potential self-selection problem in addition to the usual non-negativity problem. To see this potential source of bias, note that the consumer’s budget constraint is j pj xj xj ≤ y, so the price paid is endogenous as it will depend on unobservable (to the researcher) aspects of the quantity demanded by the consumer.

4.3.1 Expand the choice set One simple and popular modeling approach simply expands the choice set to include all available combinations of brands and pack sizes (e.g., Guadagni and Little, 1983). A separate random utility shock is then added to each choice alternative. Suppose the consumer makes choices over the j = 1, ..., J products in a commodity group where each product is available in a finite number of pre-packaged sizes, a ∈ Aj . If the consumer has additive preferences and the j = 1, ..., J products are J perfect substitutes, U (x, xJ +1 ) = u1 j =1 ψj xj + u2 (xJ +1 ), her choice-specific indirect utilities are vj a = u1 ψj xj a + u2 y − pj a + εj a , j = 1, ..., J, a ∈ Aj (56) vJ +1 = u2 (y) + εJ +1 where εaj ∼ i.i.d. EV (0, 1), which allows for random perceived utility over the pack size variants of a given product. The probability of choosing pack size a for product k is then P r (ka; θ ) =

exp (u1 (ψk xka ) + u2 (y − pka )) . (57) ψ + u y − p exp (u2 (y)) + exp u x 1 j j a 2 j a j =1,..,J |a∈Aj

To see how one might implement this model in practice, the consumer has assume Cobb-Douglas utility. In this case, u1 (ψk xka ) = α1 ln ψ¯ k + α1 ln (xka ) where α1 is the satiation rate over the commodity group. Conceptually, this model could be expanded even further to allow the consumer to purchase bundles of the products to configure all possible quantities that are feasible within the budget constraint. An important limitation of this specification is that it assigns independent random utility to each pack size of the, otherwise, same product. This assumption would make sense if, for instance, a large 64-oz plastic bottle of soda is fundamentally different than a small, 12-oz aluminum can of the same soda. In other settings, we might expect a high correlation in the random utility between two pack-size variants of an otherwise identical product (e.g., 6 packs versus 12 packs of aluminum cans of soda). Specifications that allow for such correlation within-brand, such as nested logit, generalized extreme value or even multinomial probit could work. But, in a

4 Some extensions to the typical neoclassical specifications

setting with many product alternatives, it may not be possible to estimate the full covariance structure between each of the product and size combinations. In some settings, the temporal separation between choices can be used to simplify the problem. For instance, Goettler and Clay (2011) and Narayanan et al. (2007) study consumers’ discrete-continuous choices on pricing plan and usage. For instance, providers of mobile data services and voice services typically offer consumers choices between pricing plans that differ in their convexity. In practice, we might not expect the consumer to derive marginal utility from the convexity of the pricing plan, seemingly rendering the pricing plan choice deterministic. But, if the consumer makes a discrete choice between pricing plans in expectation of future usage choices, the expectation errors can be used as econometric uncertainty in the discrete choice between plans.

4.3.2 Models of pack size choice Allenby et al. (2004) use the following Cobb-Douglas utility specification U (x, xJ +1 ) =

J

α1 ln ψj xka + α2 ln y − pj a

j =1 a∈Aj

where ψj = exp ψ¯ j + εj and εj ∼ i.i.d. EV (0, 1). In this specification, the utilities of each of the pack sizes for a given product are perfectly correlated. The corresponding, optimal pack size choice for a given product j is deterministic: aj∗ = max α1 ln (xka ) + α2 ln y − pj a a∈Aj

and does not depend on ψj . The consumer’s product choice problem is then the usual maximization across the random utilities of each of the j = 1, ..., J products corresponding to their respective optimal pack sizes choices. The probability of observing the choice of product k is then

Pr k; θ, a

∗

exp α1 ψ¯ k + α1 ln xak ∗ + α2 ln y − pak ∗ = J ¯ j =1 exp α1 ψj + α1 ln xaj∗ + α2 ln y − paj∗

(58)

where ak is the observed pack size chosen for brand k. One limitation of the pack size demand specification (58) is that the corresponding likelihood will not have full support. In particular, variation between pack sizes of a given brand, all else equal, will reject the model. In a panel data version of the model with consumer-specific parameters, within-consumer switching between pack sizes of the same brand over time, all else equal, would reject the model. Goettler and Clay (2011) and Narayanan et al. (2007) propose a potential solution to this issue, albeit in a different setting. The inclusion of consumer uncertainty over future quantity needs allows for random variation in pack sizes.

43

44

CHAPTER 1 Microeconometric models of consumer demand

5 Moving beyond the basic neoclassical framework 5.1 Stock-piling, purchase incidence, and dynamic behavior The models discussed so far have treated the timing of purchase as a static consumer decision. According to these models, a consumer allocates her entire budget to the essential numeraire (or outside good), when all of the products’ prices exceed their corresponding reservation utilities, as in Eq. (5) above. Indeed, the literature on price promotions has routinely reported a large increase in sales during promotion weeks (see for instance the literature survey by Blattberg and Neslin (1989) and the empirical generalizations in Blattberg et al. (1995)). However, in the case of storable products, consumers may accumulate an inventory and time their purchases strategically based on their expectations about future price changes. An empirical literature has found that price promotions affect both the quantity sold and the timing of purchases through purchase acceleration (e.g., Blattberg et al., 1981; Neslin et al., 1985; Gupta, 1991; Bell et al., 1999). This work estimates that purchase acceleration accounts for between 14 and 50 percent of the promotion effect on quantities sold. Purchase acceleration could simply reflect an increase in consumption. However, more recent work finds that the purchase acceleration could reflect strategic timing based on price expectations. Pesendorfer (2002) finds that while the quantity of ketchup sold is generally higher during periods of low prices, the level depends on past prices. Hendel and Nevo (2006b) find that the magnitude of the total sales response to a price discount in laundry detergent is moderated by the time since the last price discount. The quantity sold increases by a factor of 4.7 if there was not a sale in the previous week, but only by a factor of 2.0 if there was a sale in the previous week. Using household panel data, Hendel and Nevo (2003) also detect a post-promotion dip in sales levels. Looking across 24 CPG categories, Hendel and Nevo (2006b) find that households pay 12.7% less than if they paid the average posted prices. Collectively, these findings suggest that households may be timing their purchases strategically to coincide with temporary price discounts. In this case, a static model of demand may over-estimate the own-price response. The potential bias on cross-price elasticities is not as clear. In the remainder of this section, we discuss structural approaches to estimate demand with stock-piling and strategic purchase timing based, in part, on price expectations. These models can be used to measure short and longterm price response through counterfactuals. To the best of our knowledge, Blattberg et al. (1978) were the first to propose a formal economic model of consumer stock-piling based on future price expectations. In the spirit of Becker’s (1965) household production theory, they treat the household as a production unit that maintains a stock of market goods to meet its consumption needs. While the estimation of such a model exceeded the computing power available at that time, Blattberg et al. (1978) find that observable household resources, such as home ownership, and shopping costs, such as car ownership and dual-income status, are strongly associated with deal-proneness. In the macroeconomics literature, Aguiar and Hurst (2007) extend the model to account for the time allocation between

5 Moving beyond the basic neoclassical framework

“shopping” and “household production” to explain why older consumers tend to pay lower prices (i.e. find the discount periods). In the following sub-sections, we discuss more recent research that has estimated the underlying structure of a model of stock-piling.

5.1.1 Stock-piling and exogenous consumption Erdem et al. (2003) build on the discrete choice model formulation as in Section 3.3. Let t = 1, ..., T index time periods. At the start of each time period, a consumer has inventory it of a commodity and observes the prices pt . The consumer can endogenously increase her inventory by purchasing quantities xj kt of each of the j products, where k ∈ {1, ..., K} indexes the discrete set of available pack sizes. Denote the nonpurchase decision as x0t = 0. Assume the consumer incurs a shopping cost if she chooses to purchase at least one of the products: F (xt ; τ ) = τ I{J xj t >0} . Her total j =1 post-purchase inventory in period t is: it = it + Jj=1 xj t . After making her purchase decision, the consumer draws an exogenous consumption need that is unobserved to the analyst, ωt ∼ Fω (ω), and that she con 34 sumes from her inventory, it . Her total consumption during period t is therefore ct = min ωt , it , which is consumed at a constant rate throughout the period.35 Assume also that the consumer is indifferent between the brands in her inventory when she consumes the commodity and that she consumes each of them in constant proportion: cj t = ci t ij t . t If the consumer runs out of inventory before the end of the period, ωt > ct , she incurs a stock-out cost SC (ωt , ct ; λ) = λ0 + λ1 (ωt − ct ). The consumer also incurs an inventory carrying cost each period based on the total average inventory held during the period. Her average inventory in period # ω it − 2t , ωt ≤ it t is i¯t = i i . t t ωt > it ωt 2 , Her total inventory carrying cost is given by I C it , ωt ; δ = δ0 i¯t + δ1 i¯t2 . Assume the consumer has the following perfect substitutes consumption utility function each period: J

U it , i˜t , pt , ωt ; θ = ψj cj t + ψJ +1 y − pj kt xj kt − F (xt ; τ ) j =1

j,k

− SC (ωt , ct ; λ) − I C it , ωt ; δ

34 The consumer therefore makes purchase decisions in anticipation of her future expected consumption

needs, as in Dubé (2004). 35 Sun et al. (2003) use a data-based approach that measures the exogenous consumption need as a con-

stant consumption rate, based on the household’s observed average quantity purchased.

45

46

CHAPTER 1 Microeconometric models of consumer demand

c it , ωt ˜ = pj kt − F (xt ; τ ) it + ψJ +1 y − it j,k − SC ωt , c it , ωt ; λ − I C it , ωt ; δ

(59)

where i˜t ≡ Jj=1 ψj ij t is the post-purchase quality-adjusted inventory, and where the shopping cost, inventory carrying cost, and stock-out cost have all been subsumed into the budget constraint. The vector θ = (ψ1 , ..., ψJ +1 , λ0 , λ1 , δ0 , δ1 , τ0 ) contains all the model’s parameters. The three state variables are summarized by st = it , i˜t , pt . The inventory state variables evolve as follows: it = it−1 + Jj=1 xj t 1 − ci t t

i˜t = i˜t−1 + Jj=1 ψj xj t 1 −

ct it

.

Assume in addition that consumers’ price beliefs are known to the analyst and evolve according to the Markov transition density36 pt+1 ∼ fp (pt+1 |pt ).37 Therefore, the state vector also follows a Markov Process which we denote by the transition density fs s |s, xj k . The consumer’s purchase problem is dynamic since she can control her future inventory states with her current purchase decision. Assuming the consumer discounts future utility at a rate β ∈ (0, 1), the value function associated with her purchase decision problem in state st is v (st , εt ) = max vj k (st ; θ ) + εj kt (60) j,k

where εj kt ∼ i.i.d. EV (0, 1) is a stochastic term known to the household at time t but not to the analyst. vj k (s) is the choice-specific value function associated with choosing product j and pack size k in state s vj k (s; θ ) = U (st , ω; θ ) fω (ω) dω + β v s , ε fs s |s, xj k d s , ε . (61) When the taste parameters θ are known, the value functions in (60) and (61) can be solved numerically (see for instance Erdem et al. (2003) for technical details).

36 Typically, a rational expectations assumption is made and the price process, F p p t+1 |pt , is estimated

in a first stage using the observed price series. An interesting exception is Erdem et al. (2005) who elicit consumers’ subjective price beliefs through a consumer survey. 37 In four CPG case studies, Liu and Balachander (2014) find that a proportional hazard model for the price process fits the price data better and leads to a better fit of the demand model when used to capture consumers’ price expectations.

5 Moving beyond the basic neoclassical framework

Suppose the researcher observes a consumer’s sequence of choices, xˆ = (xˆ1 , ..., xˆT ). Conditional on the state, s, the probability that the consumer’s optimal choice is product j and pack size k has the usual multinomial logit demand form: exp vj k (s; θ ) Pr xj k |s; θ = , xj k ∈ {x11 , ..., xJ K } . (62) exp (v0 (s; θ )) + Jk=1 exp vj k (s; θ )

˜ To accommodate the fact that the two inventory state variables,i and i, are not observed, we partition the state as follows: s = (p, s˜ ) where s˜ = i, i˜ . Since we do not observe the initial values of s˜0 , we have a classic initial conditions problem (Heckman, 1981). We resolve the initial conditions problem by assuming there is a true initial state, s˜0 , with density fs (˜s0 ; θ ). We can now derive the density associated with the consumer’s observed sequence of purchase decisions, xˆ :

f xˆ ; θ =

⎞ ⎛ T I{x =xˆt } ⎝ P r xj k |pt , s˜t , ω, s˜0 ; θ j k fω (ω) dω⎠ fs (˜s0 ; θ ) d s˜0 . t=1

j,k

(63) Consistent estimates of the parameters θ can then be obtained via simulated maximum likelihood.

5.1.2 Stock-piling and endogenous consumption The model in the previous section assumed an exogenous consumption rate, which implies that any purchase acceleration during a price discount period reflects stockpiling. Sun (2005), Hendel and Nevo (2006a), and Liu and Balachander (2014) allow for endogenous consumption. This important extension allows for two types of response to a temporary price cut. In addition to stock-piling, consumers can potentially increase their consumption of the discounted product. We focus on the formulation in Hendel and Nevo (2006a), which reduces part of the computational burden of Erdem et al. (2003) by splitting the likelihood into a static and a dynamic component. A key assumption is that consumers only value the brand at the time of purchase so that the optimal consumption decisions are independent of specific brands and depend only on the quantities purchased. During period t, the consumer derives the following consumption utility38

38 Hendel and Nevo (2006a) also allow for point-of-purchase promotional advertising, like feature ads and displays, to shift utility in their specification.

47

48

CHAPTER 1 Microeconometric models of consumer demand

U (ct + ωt , it , pt ; θ ) = u (ct + ωt ; γc ) +

K J

j =1 k=1

Ixj kt >0 ψJ +1 y − pj kt + ψj k − C (it+1 ; λc ) (64)

where as before we index the products by j = 1, ..., J and the discrete pack sizes available by k ∈ {1, ..., K}. u (c + ω; γc ) is the consumption utility with taste paramis a random “consumption need” shock. The start of period eters γc and ω ∼ Fω (ω) inventory is it = it−1 + j k xj kt−1 − ct−1 . C (it+1 ; λc ) is the inventory carrying cost with cost-related parameters λc . As before, x denotes a purchase quantity (as opposed to consumption), and J +1 captures the marginal utility of the numeraire good. The three state variables are summarized by st = (it , pt , ωt ). Inventory evolves as follows: it = it−1 + Jj=1 xj t−1 − ct−1 . In addition, consumers’ form Markovian price expectations: pt+1 ∼ Fp (pt+1 |pt ). Unlike Erdem et al. (2003), the consumption need ω ∼ Fω (ω) is a state variable. The state variables, st , follow a Markov Process with transition density fs s |s, xj k . The value function associated with the consumer’s purchase decision problem during period t is (65) v (st , εt ) = max vj k (st ) + εj kt j,k

where st is the state in period t, εj kt ∼ i.i.d. EV (0, 1) is a stochastic term known to the household at time t but not to the analyst, and vj k (s) is the choice-specific value function associated with choosing product j and pack size k in state s vj k (st ) = ψJ +1 y − pj kt + ψj k + M st , xj k ; θc

(66)

$ where M(st , xj k ; θc ) = max{u(c + ωt ; γc ) − C(it+1 ; λc ) + β v(s , ε)fs (s |s, xj k , c) c

fε (ε)d(s , ε)} and θc are the consumption-related parameters. Hendel and Nevo (2006a) propose a simplified three-step approach to estimating the model parameters. The value function (65) can be simplified by studying Eq. (66), which indicates that consumption is only affected by the quantity purchased, not the specific brand chosen. In a first step, it is straightforward to show that consistent estimates of the brand taste parameters, ψ = (ψ1 , ..., ψJ +1 ) can be obtained from the following standard multinomial logit model of brand choice across all brands available in the pack size k: exp ψJ +1 y − pj kt + ψj k . Pr (j |st , k; ψ) = i exp (ψJ +1 (y − pikt ) + ψik )

5 Moving beyond the basic neoclassical framework

In a second step, define the expected value of the optimal brand choice, conditional on pack size, as follows: ⎫ ⎧ ⎨

⎬ ηkt = ln (67) exp ψJ +1 y − pj kt + ψj k . ⎭ ⎩ j

Using an idea from Melnikov (2013), assume that ηt−1 is a sufficient statistic for ηt so that F (ηt |st−1 ) can be summarized by F (ηt |ηt−1 ). The size-specific inclusive values can then be computed with the brand taste parameters, ψ, and Eq. (67) and then used to estimate the distribution F (ηt |ηt−1 ). In a third step, the quantity choice state can then be defined as s˜t = (it , ηt , ωt ), which reduces dimensionality by eliminating any brand-specific state variables. The value function associated with the consumer’s quantity decision problem can now be written in terms of these size-specific “inclusive value” terms: v (˜st , εt ) = max u (c + ωt ; γc ) − C (it+1 ; λc ) + ηkt c,k (68) + β v s˜ , ε fs s˜ |˜s , xk , c fε (ε) d s˜ , ε . Similarly, the pack-size choice-specific value functions can also be written in terms of these size-specific “inclusive value” terms: vk (˜st ) = ηkt + Mk (˜st ; θc )

(69)

$ where Mk (˜st ; θc ) = max u (c + ωt ; γc ) − C (it+1 ; λc ) + β v s˜ , ε fs s˜ |˜s , xk , c c fε (ε) d s˜ , ε . The corresponding optimal pack size choice probabilities are then: exp (ηk + Mk (˜st ; θc )) Pr (k|˜st ; θc ) = . st ; θc )) i exp (ηi + Mi (˜ The density associated with the consumer’s observed sequence of pack size decisions, xˆ , is39 :

T P r xˆt |˜st , s˜0 ; θc fω (ω) dω fω (ω) fs (˜s0 ; θc ) d (ω, s˜0 ) . f xˆ ; θc = t=1

(70) Consistent estimates of the parameters θc can then be obtained via simulated maximum likelihood. A limitation of this three-step approach is that it does not allow for persistent, unobserved heterogeneity in tastes. 39 The initial conditions can be resolved in a similar manner as in Erdem et al. (2003).

49

50

CHAPTER 1 Microeconometric models of consumer demand

5.1.3 Empirical findings with stock-piling models In a case study of household purchases of Ketchup, Erdem et al. (2003) find that the dynamic stock-piling model described above fits the data well in-sample, in particular the timing between purchases. Using simulations based on the estimates, they find that a product’s sales-response to a temporary price cut mostly reflects purchase acceleration and category expansion, as opposed to brand switching. This finding is diametrically opposite to the conventional wisdom that “brand switchers account for a significant portion of the immediate increase volume due to sales promotion” (Blattberg and Neslin, 1989, p. 82). The cross-price elasticities between brands are found to be quite small compared to those from static choice models; although the exact magnitude is sensitive to the specification of the price process representing consumers’ expectations. They find much larger cross-price elasticities in response to permanent price changes and conclude that long-run price elasticities are likely more relevant to policy analysts who want to measure the intensity of competition between brands. In case studies of Packaged Tuna and Yogurt, Sun (2005) finds that consumption increases with the level of inventory and decreases in the level of promotional uncertainty. While promotions do lead to brand-switching, she also finds that they increase consumption. A model that assumes exogenous consumption over-estimates the extent of brand switching. In a case study of Laundry Detergents, Hendel and Nevo (2006a) focus on the long-run price elasticities by measuring the effects of permanent price changes. They find that a static model generates 30% larger price elasticities than the dynamic model. They also find that the static model underestimates cross-price elasticities. Some of the cross-price elasticities in the dynamic model are more than 20 times larger than those from the static model. Finally, the static model overestimates the degree of substitution to the outside good by 200%. Seiler (2013) builds on Hendel and Nevo’s (2006a) specification by allowing consumers with imperfect price information to search each period before making a purchase decision. In a case study of laundry detergent purchases, Seiler’s (2013) parameter estimates imply that 70% of consumers do not search each period. This finding highlights the importance of merchandizing efforts, such as in-store displays, to help consumers discover low prices. In addition, by using deeper price discounts, a firm can induce consumers to engage in more price search which can increase total category sales. This increase in search offsets traditional concerns about inter-temporal cannibalization due to strategic purchase timing.

5.2 The endogeneity of marketing variables The frameworks discussed thus far focus entirely on the demand side of the market. However, many of the most critical demand-shifting variables at the point of purchase consist of marketing mix variables such as prices and promotions, including merchandizing activities like temporary discounts, in-aisle displays, and feature advertising. If these marketing variables are set strategically by firms with more consumer in-

5 Moving beyond the basic neoclassical framework

formation than the researcher, any resulting correlation with unobserved components of demand could impact the consistency of the likelihood-based estimates discussed thus far. In fact, one of the dominant themes in the empirical literature on aggregate demand estimation consists of the resolution of potential endogeneity of supply-side variables (e.g., Berry, 1994; Berry et al., 1995). While most of the literature has focused on obtaining consistent demand estimates in the presence of endogenous prices, bias could also arise from the endogeneity of advertising, promotions, and other marketing variables.40 Surprisingly little attention has been paid to the potential endogeneity of marketing variables in the estimation of individual consumer level demand. In the remainder of this section, we focus on the endogeneity of prices even though many of the key themes would readily extend to other endogenous demandshifting variables.41 Suppose a sample of i = 1, ..., N consumers each makes a discrete choice among j = 1, ..., J product alternatives and a J + 1 “no purchase” alternative. Each consumer is assumed to obtain the following conditional indirect utility from choice j : Vij = ψj − αpij + εij ViJ +1 = εi,J +1 where εi ∼ i.i.d. F (ε) and pij is the price charged to consumer i for alternative j . Demand estimation is typically carried out by maximizing the corresponding likelihood function: Pr (j ; θ )yij (71) L (θ|y) = i

j

where Pr (j ; θ ) ≡ Pr Vij ≥ Vik , ∀k = j and y = (yi1 , ..., yiJ +1 ) indicates which of the j = 1,..., J + 1 products was chosen by consumer i. If cov pij , εij = 0 then the maximum likelihood estimator θ MLE based on (71) may be inconsistent since the likelihood omits information about ε. In general, endogeneity can arise in three ways (see Wooldridge, 2002, for example): 1. Simultaneity: Firms observe and condition on εi when they set their prices. 2. Self-Selection: Certain types of consumers systematically find the lowest prices. 3. Measurement Error: The researcher observes a noisy estimate of true prices, p˜ ij : p˜ ij = pij + ηij . Most of the emphasis has been on simultaneity bias whereby endogeneity arises because of the strategic pricing decisions by the firms. Measurement error is not typically discussed in the demand estimation literature. However, many databases contain 40 For instance, Manchanda et al. (2004) address endogenous detailing levels across physicians. 41 In the empirical consumption literature, the focus has been more on the endogeneity of household

incomes than the endogeneity of prices (see for instance Blundell et al., 1993). Since the analysis is typically at the broad commodity group level (e.g., food), the concern is that household budget shares are determined simultaneously with consumption quantities.

51

52

CHAPTER 1 Microeconometric models of consumer demand

time-aggregated average prices rather than the actual point-of-purchase price, which could lead to classical measurement error. To the best of our knowledge, a satisfactory solution has yet to be developed for demand estimation with this type of measurement error. In many marketing settings, endogeneity bias could also arise from the self-selection of consumers into specific marketing conditions based on unobserved (to the researcher) aspects of their tastes. For instance, unobserved marketing promotions like coupons could introduce both measurement error and selection bias if certain types of consumers are systematically more likely to find/have a coupon and use it (Erdem et al., 1999). Similarly, Howell et al. (2016) propose an approach to resolve the price self-selection bias associated with consumers choosing between non-linear pricing contracts based on observable (to the researcher) aspects of their total consumption needs. If consumers face incomplete information about the choice set, then selection could arise from price search and the formation of consumers’ consideration sets (e.g., Honka, 2014). The topics of consumer search and the formation of consideration sets are discussed in more detail in chapters in this volume on branding and on search. Finally, the potential self-selection of consumers into discount and regular prices based on their unobserved (to the researcher) potential stock-piling behavior during promotional periods in anticipation of future price increases could also bias preference estimates (e.g., Erdem et al., 2003; Hendel and Nevo, 2006a). For the remainder of this discussion, we will focus on price endogeneity associated with the simultaneity bias. Suppose that j = 1, ..., J consumer goods in a product category are sold in t = 1, ..., T static, spot markets by single-product firms playing a Bertrand-Nash pricing game. Typically, a market is a store-week since stores tend to set their prices at a weekly frequency and most categories in the store are “captured markets” in the sense that consumers likely to not base their store choices on each of the tens of thousands of prices charged across the products carried in a typical supermarket. On the demand side, consumers make choices in each market t to maximize their choice-specific utility Vij t = vj (wt , pt ; θ ) + ξj t + εij t (72) ViJ +1t = εi,J +1t where we distinguish between the exogenous point-of-purchase utility shifters, wt , and the prices, pt . In addition, we now specify a composite error term consisting of the idiosyncratic utility shock, εij t ∼ i.i.d. EV (0, 1), and the common shock, ξj t ∼ i.i.d. Fξ (ξ ), to control for potential product-j specific characteristics that are observed to the firms when they set prices, but not to the 1994). researcher (Berry, Consumers have corresponding choice probabilities, Pr j ; θ |wt , pt , ξ t for each of the j alternatives including the J + 1 no-purchase alternative. Price endogeneity arises when the firms condition on ξ when setting its prices and cov pt , ξ t = 0.

5 Moving beyond the basic neoclassical framework

A consistent and efficient estimator can be constructed by maximizing the following likelihood ··· L (θ|y, p) = Pr (j ; θ |wt , pt , ξ )yij t fp (pt |ξ ) fξ (ξ ) dξ1 ...dξJ . t

i

j

In practice, the form of the likelihood of prices may not be known and ad hoc assumptions about fp (p|ξ ) could lead to additional specification error concerns. We now discuss the trade-offs between full-information and limited information approaches.

5.2.1 Incorporating the supply side: A structural approach An efficient “full-information” solution to the price endogeneity bias consists of modeling the data-generating process for prices and deriving the density fp (p|ξ ) structurally. Since consumer goods are typically sold in a competitive environment, this approach requires specifying the structural form of the pricing game played by the various suppliers. The joint density of prices is then induced by the equilibrium in the game (e.g., Yang et al., 2003; Draganska and Jain, 2004; Villas-Boas and Zhao, 2005). On the supply side of the model in Eq. (72), assume the J firms play a static, Bertrand-Nash game for which the prices each period satisfy the following necessary conditions for profit maximization: ∂Pr j ; θ|pt , ξ t Pr j ; θ|wt , pt , ξ t + pj t − cj t =0 (73) ∂pj t where cj t = bj t γ + ηj t is firm j ’s marginal cost in market t, bj t are observable costshifters, like factor prices, γ are the factor weights, and ηt ∼ i.i.d. F (η) is a vector of cost shocks that are unobserved to the researcher. We use the static Nash concept as an example. Alternative modes of conduct (including non-optimal behavior) could easily be accommodated instead. In general, these first order conditions (73) will create covariance between prices and demand shocks, cov (pt , ξt ) = 0. As long as the system of first-order conditions, (73), generates a unique vector ofequilibrium prices, we can then derive the density of prices fp p|ξ t , ct , wt = f ηt |ξ t |Jη→p | where Jη→p is the Jacobian of the transformation from η to prices. A consistent and efficient estimate of the parameters = θ , γ can then be obtained by maximizing the likelihood function42 y L (θ |y, p) = Pr j ; θ |wt , pt , ξ t ij t fp (pt |ξ , ct , wt ) f (ξ ) dηdξ. t

i

j

(74)

42 Yang et al. (2003) propose an alternative Bayesian MCMC estimator.

53

54

CHAPTER 1 Microeconometric models of consumer demand

Two key concerns with this approach are as follows. First, in many settings, pricing conduct may be more sophisticated than the single-product, static Bertrand Nash setting characterized by (73). Mis-specification of the pricing conduct would lead to a mis-specification of the density fp p|ξ t , which could lead to bias in the parameter estimates. Yang et al. (2003) resolve this problem by testing between several different forms of static, pricing conduct. An advantage of their Bayesian estimator is the ability to re-cast the conduct test as a Bayesian decision theory problem of model selection. Villas-Boas and Zhao (2005) incorporate conduct parameters into the system of first-order necessary conditions (73), where specific values of the conduct parameter nest various well-known pricing games. In addition to the conduct specification, even if we can assume existence of a price equilibrium, for our simple static Bertrand-Nash pricing game it is difficult to prove uniqueness to the system of first-order necessary conditions (73) for most demand specifications, P r j ; θ |wt , pt , ξ t . This non-uniqueness problem translates into a coherency problem for the maximum likelihood estimator based on (74). The multiplicity problem would likely be exacerbated in more sophisticated pricing games involving dynamic conduct, multi-product firms, and channel interactions. Berry et al. (1995) avoid this problem by using a less efficient GMM estimation approach that does not require computing the Jacobian term Jη→p . Another potential direction for future research might be to recast (71) as an incomplete model and to use partial identification for inference on the supply and demand parameters (e.g., Tamer, 2010). A more practical concern is the availability of exogenous variables, bj t , that shift prices but are plausibly excluded from demand. Factor prices and other cost-related factors from the supply side may be available. In the absence of any exclusion restrictions, identification of the demand parameters will then rely on the assumed structure of fp (p|ξ ) f (ξ ). The full-information approaches have thus far produced mixed evidence on the endogeneity bias in the demand parameters in a small set of empirical case studies. Draganska and Jain (2004) and Villas-Boas and Zhao (2005) find substantial bias, especially in the price coefficient α. However, in a case study of light beer purchases, Yang et al. (2003) find that the endogeneity bias may be an artifact of omitted heterogeneity in the demand specification. Once they allow for unobserved demand heterogeneity, they obtain comparable demand estimates regardless of whether they incorporate supply-side information into the likelihood. Interestingly, in a study of targeted detailing to physicians, Manchanda et al. (2004) find that incorporating the supply side not only resolves asymptotic bias in the estimates of demand parameters, they also find a substantial improvement in efficiency.43

43 Manchanda et al. (2004) address a much more sophisticated form of endogeneity bias whereby the

detailing levels are coordinated with the firm’s posterior beliefs about a physician’s response coefficients, as opposed to an additive error component as in the cases discussed above.

5 Moving beyond the basic neoclassical framework

5.2.2 Incorporating the supply side: A reduced-form approach As explained in the previous section, the combination of potential specification error and a potential multiplicity of equilibria are serious disadvantages to full-information approaches. In the literature, several studies have proposed less efficient limitedinformation approaches that are more agnostic about the exact data-generation process on the supply side. Villas-Boas and Winer (1999) use a more agnostic approach that is reminiscent of two-stage least squares estimators in the linear models setting. Rather than specify the structural form of the pricing game on the supply side, they instead model the reduced form of the equilibrium prices pj t = Wj (wt , bt ; λ) + ζj t

(75)

where bt are again exogenous price-shifters thatare excluded from the demand side, and ζ t is a random price shock such that ξ t ,ζ t ∼ F (ξt , ζt ) and bt are independent of ξ t , ζ t . It is straightforward to derive f p|ξ t , bt , wt = f ζ t |ξ t since the linearity obviates the need to compute A consistent “limited information” a Jacobian. estimate of the parameters = θ , λ can then be obtained by substituting this density into the likelihood function (74). While this approach does not require specifying pricing conduct, unlike a two-stage least squares estimator, linearity is not an innocuous assumption. Any specification error in the ad hoc “reduced form” will potentially bias the demand estimates. For instance, the first-order necessary conditions characterizing equilibrium prices in (73) would not likely reduce to a specification in which the endogenous component of prices is an additive, Gaussian shock. Conley et al. (2008) resolve this problem by using a semi-parametric, mixture-of-Normals approximation of the density over ξ t , ζ t . A separate stream of work has developed instrumental variables methods to handle the endogeneity of prices. Chintagunta et al. (2005) conduct a case study of product categories in which, each store-week, a large number of purchases are observed for each product alternative. On the demand side, they can then directly estimate the weekly mean utilities as “fixed effects” Vij t = vj (wt , pt ; θ ) + ξj t + εij t ViJ +1t = εi,J +1t without needing to model the supply side. Using standard maximum esti likelihood mation techniques, they estimate the full set of brand-week effects ψj t j,t in a first stage.44 Following Nevo’s (2001) approach for aggregate data, the mean responses to marketing variables are obtained in a second stage minimum distance procedure that projects the brand-week effects onto the product attributes, xt and pt ψˆ j t = vj (wt , pt ; θ ) + ξj t

(76)

44 Their estimator allows for unobserved heterogeneity in consumers’ responses to marketing variables.

55

56

CHAPTER 1 Microeconometric models of consumer demand

using instrumental variables, (wt , bt ) to correct for the potential endogeneity of prices. Unlike Villas-Boas and Winer (1999), the linearity in (75) does not affect the consistency of the demand estimates. Even after controlling for persistent, unobserved consumer taste heterogeneity, Chintagunta et al. (2005) find strong evidence of endogeneity bias in both the levels of the response parameters and in the degree of heterogeneity. A limitation of this approach is that any small sample bias in the brand-week effects will potentially lead to inconsistent estimates. In related work, Goolsbee and Petrin (2004) and Chintagunta and Dubé (2005) use an alternative approach that obtains exact estimates of the mean brand-week utilities by combining the individual purchase data with store-level data on aggregate sales. Following Berry et al. (1995) (BLP), the weekly, mean brand-week utilities are inverted out of the observed weekly market share data, st ψ t = Pr−1 (st )

(77)

where Pr−1 (st ) is the inverse of the system of predicted market shares corresponding to the demand model.45 These mean utilities are then substituted into the first stage for demand estimation.46 In a second stage, the mean response parameters are again obtained using the projection (76) and instrumental variables to correct for the endogeneity of prices. When aggregate market share data are unavailable, Petrin and Train (2010) propose an alternative “control function” approach. On the supply side, prices are again specified in reduced form as in (75). On the demand side, consumers make choices in each market t to maximize their choice-specific utility Vij t = vj (wt , pt ; θ ) + εij t ViJ +1t = εi,J +1t where the utility shocks to the j = 1, ..., J products can be decomposed as follows: 1 2 εij t = εij t + εij t

1 ,ζ 2 ∼ i.i.d. F (ε). We can then re-write the choicewhere εij ∼ N (0, ) and εij ij t t t specific utility as: 2 Vij t = vj (wt , pt ; θ ) + λζij t + σ ηj t + εij t , j = 1, ..., J

(78)

where ηj t ∼ N (0, 1). Estimation is then conducted in two steps. The first stage consists of the price regression based on Eq. (75). The second stage consists of estimating 45 See Berry (1994) and Berry et al. (2013) for the necessary and sufficient conditions required for the

demand system to be invertible. 46 Chintagunta and Dubé (2005) estimate the parameters characterizing unobserved heterogeneity in this

first stage.

5 Moving beyond the basic neoclassical framework

the choice probabilities corresponding to (78) using the control function, λζ for alternatives j = 1, ..., J with parameter λ to be estimated. In an application to household choices between satellite and cable television content suppliers, Petrin and Train (2010) find that the control function in (78) generates comparable demand estimates to those obtained using the more computationally and data-intensive BLP approach based on (77).

5.3 Behavioral economics The literature on behavioral economics has created an emerging area for microeconometric models of demand. This research typically starts with surprising or puzzling moments in the data that would be difficult to fit using the standard neoclassical models. In this section, we look at two specific topics: the fungibility of income and social preferences. For a broader discussion of structural models of behavioral economics, see DellaVigna (2017). The pursuit of ways to incorporate more findings from the behavioral economics literature into quantitative models of demand seems like a fertile area for future research.47

5.3.1 The fungibility of income Building on the discussion of income effects from Section 4.1, the mental accounting literature offers a more nuanced theory of income effects whereby individuals bracket different sources of income into mental accounts out of which they have different marginal propensities to consume (Thaler, 1985, 1999). Recent field studies have also found evidence of bracketing. In an in-store coupon field experiment involving an unanticipated coupon for a planned purchase, Heilman et al. (2002) find that coupons cause more unplanned purchases of products that are related to the couponed item.48 Milkman and Beshears (2009) find that the incremental online consumer grocery purchases due to coupons are for non-typical items. Similarly, Hastings and Shapiro (2013) observe a much smaller cross-sectional correlation between household income and gasoline quality choice than the inter-temporal correlation between the gasoline price level and gasoline quality choice. In related work, Hastings and Shapiro (2018) find that the income-elasticity of SNAP49 -eligible food demand is much higher with respect to SNAP benefits than with respect to cash. Each of these examples is consistent with consumers perceiving money budgeted for a product category differently from “cash.” Hastings and Shapiro (2018) test the non-fungibility of income more formally using a demand model with income effects. Consider the bivariate utility over a com47 The empirical consumption literature has a long tradition of testing the extent to which consumer de-

mand conforms with rationality by testing the integrability constraints associated with utility maximization (e.g., Lewbel, 2001; Hoderlein, 2011). 48 Lab evidence has also confirmed that consumers are much more likely to spend store gift card money on products associated with the brand of the card than unbranded gift card money (e.g. American Express), suggesting that store gift card money is not fungible with cash (Reinholtz et al., 2015). 49 SNAP refers to the Supplemental Nutrition Assistance Program, or “food stamps.”

57

58

CHAPTER 1 Microeconometric models of consumer demand

modity group, with J perfect substitutes products, and a J + 1 essential numeraire, with quadratic utility50 U (x) =

J

j =1

1 ψj xj + ψJ +1,1 xJ +1 − ψJ +1,2 xJ2 +1 2

where ψj = exp ψ¯ j + εj . In the application, the goods consist of different quality grades of gasoline. In this model, incomes effects only arise through the allocation of the budget between the gasoline commodity group and the essential numeraire. ! "J pj k WLOG, if product k is the preferred good and, hence, ψ = min , then pk ψj j =1

the KKT conditions are ψk − ψJ +1,1 pk + ψJ +1,2 (y − xk pk ) pk ≤ 0.

(79)

Estimation of this model follows from Section 3.2. A simple test of fungibility consists of re-writing the KKT conditions with a different marginal utility on budget income and commodity expenditure

ψk pk

− ψJ +1,1 + ψJ +1,y y − ψJ +1,x xk pk ≤ 0

(80)

and testing the hypothesis H0 : ψJ +1,y = ψJ +1,x . The identification of this test relies on variation in both observed consumer income, y, and in prices, p. What is missing in this line of research is a set of primitive assumptions in the microeconomic model that leads to this categorization of different sources of purchasing power. An interesting direction for future research will consist of digging deeper into the underlying sources of the mental accounting and how/whether it changes our basic microeconomic model. For instance, perhaps categorization creates a multiplicity of budget constraints in the basic model, both financial and perceptual.

5.3.2 Social preferences As discussed in the survey by DellaVigna (2017), there is a large literature that has estimated social preferences in lab experiments. We focus herein specifically on the role of consumer’s social preference and their responses to cause marketing campaigns involving charitable giving. A dominant theme of this literature has consisted of testing whether consumer response to charitable giving campaigns reflects genuine altruistic preferences versus alternative impure altruism and/or self-interest. In a pioneering study, DellaVigna et al. (2012) conducted a door-to-door fundraising campaign to test the extent to which charitable giving is driven by a genuine preference to give (altruism or warm glow) versus a disutility from declining to 50 Hastings and Shapiro (2013) treat quantities as exogenous and instead focus on the multinomial discrete

choices problem between different goods, which are qualities of gasoline.

5 Moving beyond the basic neoclassical framework

give due, for instance, to social pressure. The field data are then used to estimate a structural model of individuals’ utility from giving that separates altruism and social pressure. Formally, total charitable giving, x consists of the sum of dollars donated to the charitable campaign either directly to the door-to-door solicitor, x1 , or, of the donor is not home at the time of the visit, she can instead make a private donation, x2 , by mail at an additional cost (1 − θ ) x2 ≥ 0 for postage, envelope, etc. All remaining wealth is spent on an essential numeraire, x3 , to capture all other private consumption. Prospective donors have a quasi-linear, bivariate utility over other consumption and charitable giving U (x1 , x2 ) = y − x1 − x2 + U˜ (x1 + θ x2 ) − s (x1 ) .

(81)

To ensure the sub-utility over giving, U˜ (x), (or “altruism” utility) has the usual monotonicity and concavity conditions, we assume U˜ (x) = ψ log ( + x) where ψ is an altruism parameter and > 0 influences the degree of concavity.51 By allowing ψ to vary freely, the model captures the possibility of a donor who dislikes the charity. The third term in (81), s (x), represents the social cost of declining to donate or giving a small donation to the solicitor. We assume s (x) = max (0, s (g − x)) to capture the notion that the donor only incurs social pressure from donation amounts to the solicitor of less than g. To identify the social preferences, DellaVigna et al. (2012) randomize subjects into several groups. In the first group, the solicitor shows up unannounced at the prospective donor’s door. In this case, if the donor is home (with exogenous probability h0 ∈ (0, 1)), she always prefers to give directly to the solicitor to avoid the additional cost (1 − θ) of donating by mail. The total amount given depends on the relative magnitudes of ψ and the social cost s. If the donor is not home, the only reason for her to donate via mail is due to altruism. In the second group, the prospective donor is notified in advance of the solicitor’s visit with a flyer left on the door. In this case, the donor can opt out by adjusting 2 0) her probability of being home according to a cost c (h − h0 ) = (h−h 2η . The opt-out decision reflects the donor’s trade-off between the utility of donating to the solicitor, subject to social pressure costs, and donating by mail, subject to mailing costs and the cost of leaving home. In a third group, subjects are given a costless option to “opt out” by checking a “do not disturb” box on the flyer, effectively setting c (0) = 0. The authors estimate the model with a minimum distance estimator based on specific empirical moments from various experimental cells, although a maximum likelihood procedure might also have been used by including an additional random utility term into the model. While DellaVigna’s (2017) estimates indicate that donations are driven by both social costs and altruism, the social cost estimates are surprisingly large. Almost half of the sample is found to prefer not to have a solicitation, either because they prefer not to donate or to donate a small amount. The 51 Note that the marginal utility of giving d U˜ = ψ so that high implies a slow satiation on giving. +x dx

59

60

CHAPTER 1 Microeconometric models of consumer demand

results suggest a substantial welfare loss to donors from door-to-door solicitations. Moreover, the results indicate that the observed levels of charitable giving may not reflect altruism per se. Kang et al. (2016) build on DellaVigna et al. (2012) by modeling the use of advertising creative to moderate the potential crowding-out effects of other donations by others. They estimate a modified version of 81 for a prospective donor U (x; G, θ ) = θ1 ln (x + 1) + θ2 ln (G) + θ3 ln (y − x + 1)

(82)

where G = x + x−i measures total giving to the cause and x−i represents the total stock of past donations from other donors. The authors can then test pure altruism, θ1 = 0 and θ2 > 0, versus a combination of altruism and warm-glow, θ1 > 0 and θ2 > 0 (Andreoni, 1989). The authors allow the relative role of warm glow to altruism, θθ12 , to vary with several marketing message variables. Since the preferences in (82) follow the Stone Geary functional form, demand estimation follows the approach in Section 3.1.3 above. The authors conduct a charitable giving experiment in which subjects were randomly assigned to different cells that varied the emotional appeals of the advertising message and also varied the reported amount of money donated by others. As in earlier work, the authors find that higher donations by others crowd out a prospective donor’s contribution (Andreoni, 1989). The authors also find that recipient-focused advertising messages with higher arousal trigger the impure altruism appeal, which increases the level of donations. Dubé et al. (2017b) test an alternative self-signaling theory of crowding-out effects in charitable giving based on consumer’s self-perception of altruistic preferences (Bodner and Prelec, 2002; Benabou and Tirole, 2006). Consumers make a binary purchase decision x ∈ {0, 1} for a product with a price, p, and a pro-social characteristic a ≥ 0 that measures the portion of the price that will be donated to a specific charity. Consumers obtain consumption utility from buying the product, (θ0 + θ1 a + θ2 p) where θ1 is the consumer’s social preference or marginal utility for the donation. Consumers make the purchase in a private setting (e.g. online or on a mobile phone) with no peer influence (e.g., sales person or solicitor). In addition to the usual consumption utility, the consumer is uncertain about her own altruism and derives additional ego utility from the inference she makes about herself based on her purchase decision: θ3 E (θ1 |a, p, x). θ3 measures the consumer’s ego utility.52 The consumer chooses to buy if the combination of her consumption utility and ego utility exceed the ego utility derived from not purchasing: (θ0 + θ1 a + θ2 p + ε1 ) + θ3 E (θ1 |a, p, 1) > ε0 + θ3 E (θ1 |a, p, 0)

(83)

where ε are choice-specific random utility shocks and θ3 is the marginal ego utility associated with the consumer’s self-belief about her own altruism, θ1 . In this self52 In a social setting, this ego utility could instead reflect the value a consumer derives from conveying a

“favorable impression” (i.e. signal) to her peers based on her observed action.

6 Conclusions

signaling model, the consumer’s decision is driven not only by the maximization of consumption utility, but also by the equilibrium signal the consumer derives from her own action. Purchase and non-purchase have differential influences on the con53 sumer’s derived inference her own ego utility, E (θ1 |a, p, 0). about 2 If ε˜ = ε1 − ε0 ∼ N 0, σ , then consumer choice follows the standard random coefficients probit model of demand with purchase probability conditional on receiving the offer (a, p) Pr (x = 1|a, p) + , = θ0 + θ1 a + θ2 p + θ3 E (θ1 |a, p, 1) − E (θ1 |a, p, 0) dF (θ )

(84)

where F (θ) represents the consumer’s beliefs about her own preferences prior to receiving the ticket offer. Note that low prices can dampen the consumer’s selfperception of being altruistic, E (θ1 |a, p, 0), and reduce ego utility. If ego utility overwhelms consumption utility, consumer demand could exhibit backward-bending regions that would be inconsistent with the standard neoclassical framework. Dubé et al. (2017b) test the self-signaling theory through a cause marketing field experiment in partnership with a large telecom company and a movie theater. Subject received text messages with randomly-assigned actual discount offers for movie tickets. In addition, some subjects were informed that a randomized portion of the ticket price would be donated to a charity. In the absence of a donation, demand is decreasing in the net price. In the absence of a discount, demand is increasing in the donation amount. However, when the firm uses both a discount and a donation, the observed demand exhibits regions of non-monotonicity where the purchase rate declines at larger discount levels. These non-standard moments are used to fit the self-signaling model above in Eq. (84). The authors find that consumer response to the cause marketing campaign is driven more by ego utility, θ3 , than by standard consumption utility.

6 Conclusions Historically, the computational complexity of microeconometric models has limited their application to consumer-level transaction data. Most of the literature has focused on models of discrete brand choice, ignoring the more complicated aspects of demand for variety and purchase quantity decisions. Recent advances in computing power have mostly eliminated these computational challenges. While much of the foundational work on microeconometric models of demand was based on the dual approach, the recent literature has seen a lot of innovation

53 As in Benabou and Tirole (2006), Dubé et al. (2017b) also include E (θ |a, p, x) in the ego utility to 2 moderate the posterior belief by the consumer’s self-perception of being sensitive to money.

61

62

CHAPTER 1 Microeconometric models of consumer demand

on direct models of utility. The dual approach is limiting for marketing applications because it abstracts from the actual form of “preferences” and requires strong assumptions, like differentiability, to apply Roy’s Identity. That said, many of the model specifications discussed herein require strong restrictions on preferences for analytic tractability, especially in the handling of corner solutions. These restrictions often rule out interesting and important aspects of consumer behavior such as income effects, product complementarity, and indivisibility. We view the development of models to accommodate these richer behaviors as important directions for future research. We also believe that the incorporation of ideas from behavioral economics and psychology into consumer models of demand will be a fruitful area for future research. Several recent papers have incorporated social preferences into traditional models of demand (e.g., DellaVigna et al., 2012; Kang et al., 2016; Dubé et al., 2017b). For a broader discussion of structural models of behavioral economics, see DellaVigna (2017). Finally, the digital era has expanded the scope of consumer-level data available. These new databases introduce a new layer of complexity as the set of observable consumer features grows, sometimes into the thousands. Machine learning (ML) and regulation techniques offer potential opportunities for accommodating large quantities of potential variables into microeconometric models of demand. For instance, these methods may provide practical tools for analyzing heterogeneity in consumer tastes and behavior, and detecting segments. Future work may benefit from developing approaches to incorporate ML into the already-computationally-intensive empirical demand models with corners. Finally, devising approaches to conduct inference on structural models that utilize machine learning techniques will also likely offer an interesting opportunity for new research (e.g., Shiller, 2015 and Dubé and Misra, 2019). This growing complexity due to indivisibilities, non-standard consumer behavior from the behavioral economics literature, and the size and scope of so-called “Big Data” raise some concerns about the continued practicality of the neoclassical framework for future research.

References Aguiar, M., Hurst, E., 2007. Life-cycle prices and production. American Economic Review 97 (5), 1533–1559. Ainslie, A., Rossi, P.E., 1998. Similarities in choice behavior across product categories. Marketing Science 17 (2), 91–106. Allcott, H., Diamond, R., Dube, J., Handbury, J., Rahkovsky, I., Schnell, M., 2018. Food Deserts and the Causes of Nutritional Inequality. Working Paper. Allenby, G., Garratt, M.J., Rossi, P.E., 2010. A model for trade-up and change in considered brands. Marketing Science 29 (1), 40–56. Allenby, G.M., Rossi, P.E., 1991. Quality perceptions and asymmetric switching between brands. Marketing Science 10 (3), 185–204.

References

Allenby, G.M., Shively, T.S., Yang, S., Garratt, M.J., 2004. A choice model for packaged goods: dealing with discrete quantities and quantity discounts. Marketing Science 23 (1), 95–108. Anderson, S.P., de Palma, A., 1992. The logit as a model of product differentiation. Oxford Economic Papers 44, 51–67. Anderson, S.P., de Palma, A., Thisse, J.-F., 1992. Discrete Choice Theory of Product Differentiation. The MIT Press. Andreoni, J., 1989. Giving with impure altruism: applications to charity and Ricardian equivalence. Journal of Political Economy 97, 1447–1458. Arora, N.A., Allenby, G., Ginter, J.L., 1998. A hierarchical Bayes model of primary and secondary demand. Marketing Science 17, 29–44. Becker, G.S., 1965. A theory of the allocation of time. The Economic Journal 75 (299), 493–517. Bell, D.R., Chiang, J., Padmanabhan, V., 1999. The decomposition of promotional response: an empirical generalization. Marketing Science 18 (4), 504–526. Benabou, R., Tirole, J., 2006. Incentives and prosocial behavior. American Economic Review 96, 1652–1678. Berry, S., Gandhi, A., Haile, P., 2013. Connected substitutes and invertibility of demand. Econometrica 81 (5), 2087–2111. Berry, S., Levinsohn, J., Pakes, A., 1995. Automobile prices in market equilibrium. Econometrica 63 (4), 841–890. Berry, S.T., 1994. Estimating discrete-choice models of product differentiation. Rand Journal of Economics 25 (2), 242–262. Besanko, D., Perry, M.K., Spady, R.H., 1990. The logit model of monopolistic competition: brand diversity. Journal of Industrial Economics 38 (4), 397–415. Bhat, C.R., 2005. A multiple discrete-continuous extreme value model: formulation and application to discretionary time-use decisions. Transportation Research, Part B 39, 679–707. Bhat, C.R., 2008. The multiple discrete-continuous extreme value (MDCEV) model: role of utility function parameters, identification considerations, and model extensions. Transportation Research, Part B 42, 274–303. Bhat, C.R., Castro, M., Pinjari, A.R., 2015. Allowing for complementarity and rich substitution patterns in multiple discrete-continuous models. Transportation Research, Part B 81, 59–77. Blattberg, R., Buesing, T., Peacock, P., Sen, S., 1978. Identifying the deal prone segment. Journal of Marketing Research 15 (3), 369–377. Blattberg, R.C., Briesch, R., Fox, E.J., 1995. How promotions work. Marketing Science 14 (3), G122–G132. Blattberg, R.C., Eppen, G.D., Lieberman, J., 1981. A theoretical and empirical evaluation of price deals for consumer nondurables. Journal of Marketing 45, 116–129. Blattberg, R.C., Neslin, S.A., 1989. Sales promotion: the long and short of it. Marketing Letters 1 (1), 81–97. Blattberg, R.C., Wisniewski, K.J., 1989. Price-induced patterns of competition. Marketing Science 8, 291–309. Blundell, R., Pashardes, P., Weber, G., 1993. What do we learn about consumer demand patterns from micro data? American Economic Review 83 (3), 570–597. Bodner, R., Prelec, D., 2002. Self-signaling and diagnostic utility in everyday decision making. In: Collected Essays in Psychology and Economics. Oxford University Press. Bronnenberg, B.J., Dhar, S.K., Dubé, J.-P., 2005. Market structure and the geographic distribution of brand shares in consumer package goods industries. Manuscript. Bronnenberg, B.J., Dubé, J.-P., 2017. The formation of consumer brand preferences. Annual Review of Economics 9, 353–382. Cardell, N.S., 1997. Variance components structures for the extreme-value and logistic distributions with applications to models of heterogeneity. Econometric Theory 13 (2), 185–213. Chambers, C.P., Echenique, F., 2009. Supermodularity and preferences. Journal of Economic Theory 144, 1004–1014.

63

64

CHAPTER 1 Microeconometric models of consumer demand

Chambers, C.P., Echenique, F., Shmaya, E., 2010. On behavioral complementarity and its implications. Journal of Economic Theory 145 (6), 2332–2355. Chiang, J., 1991. A simultaneous approach to the whether, what and how much to buy questions. Marketing Science 10, 297–315. Chiang, J., Lee, L.-F., 1992. Discrete/continuous models of consumer demand with binding nonnegativity constraints. Journal of Econometrics 54, 79–93. Chib, S., Seetharaman, P., Strijnev, A., 2002. Analysis of multi-category purchase incidence decisions using IRI market basket data. In: Advances in Econometrics: Econometric Models in Marketing. JAI Press, pp. 57–92. Chintagunta, P., 1993. Investigating purchase incidence, brand choice and purchase quantity decisions of households. Marketing Science 12, 184–208. Chintagunta, P., Dubé, J.-P., 2005. Estimating a stockkeeping-unit-level brand choice model that combines household panel data and store data. Journal of Marketing Research XLII (August), 368–379. Chintagunta, P., Dubé, J.-P., Goh, K.Y., 2005. Beyond the endogeneity bias: the effect of unmeasured brand characteristics on household-level brand choice models. Management Science 51, 832–849. Conley, T.G., Hansen, C.B., McCulloch, R.E., Rossi, P.E., 2008. A semi-parametric Bayesian approach to the instrumental variable problem. Journal of Econometrics 144 (1), 276–305. Deaton, A., Muellbauer, J., 1980a. An almost ideal demand system. American Economic Review 70 (3), 312–326. Deaton, A., Muellbauer, J., 1980b. Economics and Consumer Behavior. Cambridge University Press. DellaVigna, S., 2017. Structural behavioral economics. In: Handbook of Behavioral Economics. NorthHolland. Forthcoming. DellaVigna, S., List, J., Malmendier, U., 2012. Testing for altruism and social pressure in charitable giving. Quarterly Journal of Economics 127 (1), 1–56. Dolan, R.J., 1987. Quantity discounts: managerial issues and research opportunities. Marketing Science 6 (1), 1–22. Dotson, J.P., Howell, J.R., Brazell, J.D., Otter, T., Lenk, P.J., MacEachern, S., Allenby, G., 2018. A probit model with structured covariance for similarity effects and source of volume calculations. Journal of Marketing Research 55, 35–47. Draganska, M., Jain, D.C., 2004. A likelihood approach to estimating market equilibrium models. Management Science 50 (5), 605–616. Du, R.Y., Kamakura, W.A., 2008. Where did all that money go? Understanding how consumers allocate their consumption budget. Journal of Marketing 72 (November), 109–131. Dubé, J.-P., 2004. Multiple discreteness and product differentiation: demand for carbonated soft drinks. Marketing Science 23 (1), 66–81. Dubé, J.-P., Hitsch, G., Rossi, P., 2017a. Income and wealth effects on private label demand: evidence from the great recession. Marketing Science. Forthcoming. Dubé, J.-P., Luo, X., Fang, Z., 2017b. Self-signaling and pro-social behavior: a cause marketing mobile field experiment. Marketing Science 36 (2), 161–186. Dubé, J.-P., Misra, S., 2019. Personalized Pricing and Customer Welfare. Chicago Booth School of Business Working Paper. Dubois, P., Griffith, R., Nevo, A., 2014. Do prices and attributes explain international differences in food purchases? American Economic Review 2014 (3), 832–867. Einav, L., Leibtag, E., Nevo, A., 2010. Recording discrepancies in Nielsen Homescan data: are they present and do they matter? Quantitative Marketing and Economics 8 (2), 207–239. Engel, E., 1857. Die Productions- und Consumtionsver-haltnisse des Konigreichs Sachsen. Zeitschrift des Statistischen Bureaus des Koniglich Sachsischen Ministeriums des Innern 8, 1–54. Erdem, T., 1998. An empirical analysis of umbrella branding. Journal of Marketing Research 35 (3), 339–351. Erdem, T., Imai, S., Keane, M.P., 2003. Brand and quantity choice dynamics under price uncertainty. Quantitative Marketing and Economics 1, 5–64. Erdem, T., Keane, M.P., Öncü, T.S., Strebel, J., 2005. Learning about computers: an analysis of information search and technology choice. Quantitative Marketing and Economics 3, 207–246.

References

Erdem, T., Keane, M.P., Sun, B.-H., 1999. Missing price and coupon availability data in scanner panels: correcting for the self-selection bias in choice model parameters. Journal of Econometrics 89, 177–196. Gentzkow, M., 2007. Valuing new goods in a model with complementarity: online newspapers. American Economic Review 97 (3), 713–744. Gicheva, D., Hastings, J., Villas-Boas, S.B., 2010. Investigating income effects in scanner data: do gasoline prices affect grocery purchases? American Economic Review: Papers and Proceedings 100, 480–484. Goettler, R.L., Clay, K., 2011. Tariff choice with consumer learning and switching costs. Journal of Marketing Research XLVIII (August), 633–652. Goolsbee, A., Petrin, A., 2004. The consumer gains from direct broadcast satellites and the competition with cable TV. Econometrica 72 (2), 351–381. Guadagni, P.M., Little, J.D., 1983. A logit model of brand choice calibrated on scanner data. Marketing Science 2, 203–238. Gupta, S., 1991. Stochastic models of interpurchase time with time-dependent covariates. Journal of Marketing Research 28, 1–15. Gupta, S., Chintagunta, P., Kaul, A., Wittink, D.R., 1996. Do household scanner data provide representative inferences from brand choices: a comparison with store data. Journal of Marketing Research 33 (4), 383–398. Hanemann, W.M., 1984. Discrete/continuous models of consumer demand. Econometrica 52 (3), 541–561. Hartmann, W.R., Nair, H.S., 2010. Retail competition and the dynamics of demand for tied goods. Marketing Science 29 (2), 366–386. Hastings, J., Shapiro, J.M., 2013. Fungibility and consumer choice: evidence from commodity price shocks. Quarterly Journal of Economics 128 (4), 1449–1498. Hastings, J., Shapiro, J.M., 2018. How are SNAP benefits spent? Evidence from a retail panel. American Economic Review 108 (12), 3493–3540. Hausman, J.A., 1985. The econometrics of nonlinear budget sets. Econometrica 53 (6), 1255–1282. Heckman, J.J., 1978. Dummy endogenous variables in a simultaneous equation system. Econometrica 46 (4), 931–959. Heckman, J.J., 1981. The incidental parameters problem and the problem of initial conditions in estimating a discrete time-discrete data stochastic process and some Monte Carlo evidence. In: Manski, C., McFadden, D. (Eds.), Structural Analysis of Discrete Data with Econometric Applications. MIT Press, pp. 179–195 (Chap. 4). Heilman, C.M., Nakamoto, K., Rao, A.G., 2002. Pleasant surprises: consumer response to unexpected in-store coupons. Journal of Marketing Research 39 (2), 242–252. Hendel, I., 1999. Estimating multiple-discrete choice models: an application to computerization returns. Review of Economic Studies 66, 423–446. Hendel, I., Nevo, A., 2003. The post-promotion dip puzzle: what do the data have to say? Quantitative Marketing and Economics 1, 409–424. Hendel, I., Nevo, A., 2006a. Measuring the implications of sales and consumer inventory behavior. Econometrica 74 (6), 1637–1673. Hendel, I., Nevo, A., 2006b. Sales and consumer inventory. Rand Journal of Economics 37 (3), 543–561. Hicks, J., Allen, R., 1934. A reconsideration of the theory of value. Part I. Economica 1 (1), 52–76. Hoderlein, S., 2011. How many consumers are rational? Journal of Econometrics 164, 294–309. Honka, E., 2014. Quantifying search and switching costs in the U.S. auto insurance industry. Rand Journal of Economics 45, 847–884. Houthaker, H., 1953. La forme des courbes. Cahiers du Seminaire d’Econometrie 2, 59–66. Houthakker, H., 1961. The present state of consumption theory. Econometrica 29 (4), 704–740. Howell, J., Allenby, G., 2017. Choice Models with Fixed Costs. Working Paper. Howell, J., Lee, S., Allenby, G., 2016. Price promotions in choice models. Marketing Science 35 (2), 319–334. Joo, J., 2018. Quantity Surcharged Larger Package Sales as Rationally Inattentive Consumers’ Choice. University of Texas at Dallas Working Paper.

65

66

CHAPTER 1 Microeconometric models of consumer demand

Kang, M.Y., Park, B., Lee, S., Kim, J., Allenby, G., 2016. Economic analysis of charitable donations. Journal of Marketing and Consumer Behaviour in Emerging Markets 2 (4), 40–57. Kao, C., fei Lee, L., Pitt, M.M., 2001. Simulated maximum likelihood estimation of the linear expenditure system with binding non-negativity constraints. Annals of Economics and Finance 2, 215–235. Kim, J., Allenby, G.M., Rossi, P.E., 2002. Modeling consumer demand for variety. Marketing Science 21 (3), 229–250. Kim, J., Allenby, G.M., Rossi, P.E., 2007. Product attributes and models of multiple discreteness. Journal of Econometrics 138, 208–230. Lambrecht, A., Seim, K., Skiera, B., 2007. Does uncertainty matter? Consumer behavior under three-part tariffs. Marketing Science 26 (5), 698–710. Lee, J., Allenby, G., 2009. A Direct Utility Model for Market Basket Data. OSU Working Paper. Lee, L.-F., Pitt, M.M., 1986. Microeconometric demand systems with binding nonnegativity constraints: the dual approach. Econometrica 5, 123–1242. Lee, R.S., 2013. Vertical integration and exclusivity in platform and two-sided markets. American Economic Review 103 (7), 2960–3000. Lee, S., Allenby, G., 2014. Modeling indivisible demand. Marketing Science 33 (3), 364–381. Lee, S., Kim, J., Allenby, G.M., 2013. A direct utility model for asymmetric complements. Marketing Science 32 (3), 454–470. Lewbel, A.A., 2001. Demand systems with and without errors. American Economic Review 91 (3), 611–618. Liu, Y., Balachander, S., 2014. How long has it been since the last deal? Consumer promotion timing expectations and promotional response. Quantitative Marketing and Economics 12, 85–126. Luce, R.D., 1977. The choice axiom after twenty years. Journal of Mathematical Psychology 15, 215–233. Ma, Y., Ailawadi, K.L., Gauri, D.K., Grewal, D., 2011. An empirical investigation of the impact of gasoline prices on grocery shopping behavior. Journal of Marketing 75 (2), 18–35. Manchanda, P., Ansari, A., Gupta, S., 1999. The “shopping basket”: a model for multicategory purchase incidence decisions. Marketing Science 18, 95–114. Manchanda, P., Rossi, P.E., Chintagunta, P., 2004. Response modeling with nonrandom marketing-mix variables. Journal of Marketing Research 41 (4), 467–478. Manski, C.F., Sherman, L., 1980. An empirical analysis of household choice among motor vehicles. Transportation Research 14 (A), 349–366. Mas-Collel, A., Whinston, M.D., Green, J.R., 1995. Microeconomic Theory. Oxford University Press. Matejka, F., McKay, A., 2015. Rational inattention to discrete choices: a new foundation for the multinomial logit model. American Economic Review 105 (1), 272–298. McCulloch, R., Rossi, P.E., 1994. An exact likelihood analysis of the multinomial probit model. Journal of Econometrics 64, 207–240. McFadden, D.L., 1981. In: Structural Analysis of Discrete Data and Econometric Applications. The MIT Press, pp. 198–272 (Chap. 5). Mehta, N., 2007. Investigating consumers’ purchase incidence and brand choice decisions across multiple product categories: a theoretical and empirical analysis. Marketing Science 26 (2), 196–217. Mehta, N., 2015. A flexible yet globally regular multigood demand system. Marketing Science 34 (6), 843–863. Mehta, N., Chen, X.J., Narasimhan, O., 2010. Examining demand elasticities in Hanemann’s framework: a theoretical and empirical analysis. Marketing Science 29, 422–437. Mehta, N., Ma, Y., 2012. A multicategory model of consumers’ purchase incidence, quantity, and brand choice decisions: methodological issues and implications on promotional decisions. Journal of Marketing Research XLIX (August), 435–451. Melnikov, O., 2013. Demand for differentiated durable products: the case of the U.S. computer printer market. Economic Inquiry 51 (2), 1277–1298. Milkman, K.L., Beshears, J., 2009. Mental accounting and small windfalls: evidence from an online grocer. Journal of Economic Behavior & Organization 71, 384–394. Millimet, D.L., Tchernis, R., 2008. Estimating high-dimensional demand systems in the presence of many binding non-negativity constraints. Journal of Econometrics 147, 384–395.

References

Misra, S., 2005. Generalized reverse discrete choice models. Quantitative Marketing and Economics 3, 175–200. Muellbauer, J., 1974. Household composition, Engel curves and welfare comparisons between households. European Economic Review 10, 103–122. Nair, H.S., Chintagunta, P., 2011. Discrete-choice models of consumer demand in marketing. Marketing Science 30 (6), 977–996. Nair, H.S., Chintagunta, P., Dubé, J.-P., 2004. Empirical analysis of indirect network effects in the market for personal digital assistants. Quantitative Marketing and Economics 2, 23–58. Narayanan, S., Chintagunta, P., Miravete, E.J., 2007. The role of self selection, usage uncertainty and learning in the demand for local telephone service. Quantitative Marketing and Economics 5, 1–34. Neary, J., Roberts, K., 1980. The theory of household behavior under rationing. European Economic Review 13, 25–42. Neslin, S.A., Henderson, C., Quelch, J., 1985. Consumer promotions and the acceleration of product purchases. Marketing Science 4 (2), 147–165. Nevo, A., 2001. Measuring market power in the ready-to-eat cereal industry. Econometrica 69 (2), 307–342. Nevo, A., 2011. Empirical models of consumer behavior. Annual Review of Economics 3, 51–75. Ogaki, M., 1990. The indirect and direct substitution effects. The American Economic Review 80 (5), 1271–1275. Ohashi, H., 2003. The role of network effects in the US VCR market, 1978–1986. Journal of Economics and Management Strategy 12 (4), 447–494. Pakes, A., 2014. Behavioral and descriptive forms of choice models. International Economic Review 55 (3), 603–624. Pauwels, K., Srinivasan, S., Franses, P.H., 2007. When do price thresholds matter in retail categories? Marketing Science 26 (1), 83–100. Pesendorfer, M., 2002. Retail sales: a study of pricing behavior in supermarkets. Journal of Business 75 (1), 33–66. Petrin, A., Train, K.E., 2010. A control function approach to endogeneity in consumer choice models. Journal of Marketing Research 47 (1), 3–13. Phaneuf, D., Smith, V., 2005. Recreation demand models. In: Handbook of Environmental Economics. North-Holland, pp. 671–762. Pollak, R.A., Wales, T.J., 1992. Demand System Specification and Estimation. Oxford University Press. Ransom, M.R., 1987. A comment on consumer demand systems with binding non-negativity constraints. Journal of Econometrics 34, 355–359. Reinholtz, N., Bartels, D., Parker, J.R., 2015. On the mental accounting of restricted-use funds: how gift cards change what people purchase. Journal of Consumer Research 42, 596–614. Reiss, P.C., White, M.W., 2001. Household Electricity Demand Revisited. NBER Working Paper 8687. Samuelson, P.A., 1974. Complementarity: an essay on the 40th anniversary of the Hicks-Allen revolution in demand theory. Journal of Economic Literature 12 (4), 1255–1289. Seetharaman, P.B., Chib, S., Ainslie, A., Boatright, P., Chan, T.Y., Gupta, S., Mehta, N., Rao, V.R., Strijnev, A., 2005. Models of multi-category choice behavior. Marketing Letters 16 (3), 239–254. Seiler, S., 2013. The impact of search costs on consumer behavior: a dynamic approach. Quantitative Marketing and Economics. Shiller, B.R., 2015. First-Degree Price Discrimination Using Big Data. Working Paper. Song, I., Chintagunta, P., 2007. A discrete-continuous model for multicategory purchase behavior of households. Journal of Marketing Research 44 (November), 595–612. Stone, R., 1954. Linear expenditure systems and demand analysis: an application to the pattern of British demand. The Economic Journal 255 (64), 511–527. Sun, B., 2005. Promotion effect on endogenous consumption. Marketing Science 24 (3), 430–443. Sun, B., Neslin, S.A., Srinivasan, K., 2003. Measuring the impact of promotions on brand switching when consumers are forward looking. Journal of Marketing Research 40 (4), 389–405. Tamer, E., 2003. Incomplete simultaneous discrete response model with multiple equilibria. Review of Economic Studies 70, 147–165.

67

68

CHAPTER 1 Microeconometric models of consumer demand

Tamer, E., 2010. Partial identification in econometrics. Annual Review of Economics 2 (1), 167–195. Thaler, R., 1985. Mental accounting and consumer choice. Marketing Science 4 (3), 199–214. Thaler, R.H., 1999. Anomalies: saving, fungibility, and mental accounts. Journal of Economic Perspectives 4 (1), 193–205. Thomassen, O., Seiler, S., Smith, H., Schiraldi, P., 2017. Multi-category competition and market power: a model of supermarket pricing. American Economic Review 107 (8), 2308–2351. Train, K.E., McFadden, D.L., Ben-Akiva, M., 1987. The demand for local telephone service: a fully discrete model of residential calling patterns and service choices. Rand Journal of Economics 18 (1), 109–123. van Soest, A., Kapteyn, A., Kooreman, P., 1993. Coherency and regularity of demand systems with equality and inequality constraints. Journal of Econometrics 57, 161–188. van Soest, A., Kooreman, P., 1990. Coherency of the indirect translog demand system with binding nonnegativity constraints. Journal of Econometrics 44, 391–400. Varian, H.R., 1989. In: Price Discrimination. Elsevier Science Publishers, pp. 597–654 (Chap. 10). Villas-Boas, J.M., Winer, R.S., 1999. Endogeneity in Brand choice models. Management Science 45 (10), 1324–1338. Villas-Boas, J.M., Zhao, Y., 2005. Retailer, manufacturers, and individual consumers: modeling the supply side in the ketchup marketplace. Journal of Marketing Research XLII (February), 83–95. Wales, T., Woodland, A., 1983. Estimation of consumer demand systems with binding non-negativity constraints. Journal of Econometrics 21, 263–285. Walsh, J.W., 1995. Flexibility in consumer purchasing for uncertain future tastes. Marketing Science 14 (2), 148–165. Wooldridge, J.M., 2002. Econometric Analysis of Cross Section and Panel Data. MIT Press. Yang, S., Chen, Y., Allenby, G.M., 2003. Bayesian analysis of simultaneous demand and supply. Quantitative Marketing and Economics 1, 251–275. Yao, S., Mela, C.F., Chiang, J., Chen, Y., 2012. Determining consumers’ discount rates with field studies. Journal of Marketing Research 49, 822–841.

CHAPTER

Inference for marketing decisions

2

Greg M. Allenbya , Peter E. Rossib,∗ a Fisher b Anderson

School of Business, Ohio State University, Columbus, OH, United States School of Management, University of California Los Angeles, Los Angeles, CA, United States ∗ Corresponding author: e-mail address: [email protected]

Contents 1 Introduction ...................................................................................... 2 Frameworks for inference ...................................................................... 2.1 A brief review of statistical properties of estimators.......................... 2.2 Distributional assumptions ....................................................... 2.3 Likelihood and the MLE .......................................................... 2.4 Bayesian approaches .............................................................. 2.4.1 The prior .......................................................................... 2.4.2 Bayesian computation ......................................................... 2.5 Inference based on stochastic search vs. gradient-based optimization.... 2.6 Decision theory..................................................................... 2.6.1 Firms profits as a loss function............................................... 2.6.2 Valuation of information sets.................................................. 2.7 Non-likelihood-based approaches ............................................... 2.7.1 Method of moments approaches ............................................ 2.7.2 Ad hoc approaches............................................................. 2.8 Evaluating models ................................................................. 3 Heterogeneity .................................................................................... 3.1 Fixed and random effects ........................................................ 3.2 Bayesian approach and hierarchical models................................... 3.2.1 A generic hierarchical approach ............................................. 3.2.2 Adaptive shrinkage ............................................................. 3.2.3 MCMC schemes ................................................................ 3.2.4 Fixed vs. random effects ...................................................... 3.2.5 First stage priors ................................................................ 3.2.6 Dirichlet process priors ........................................................ 3.2.7 Discrete first stage priors ...................................................... 3.2.8 Conclusions ...................................................................... 3.3 Big data and hierarchical models ............................................... 3.4 ML and hierarchical models ..................................................... 4 Causal inference and experimentation ...................................................... 4.1 The problem of observational data .............................................. 4.2 The fundamental problem of causal inference ................................ Handbook of the Economics of Marketing, Volume 1, ISSN 2452-2619, https://doi.org/10.1016/bs.hem.2019.04.007 Copyright © 2019 Elsevier B.V. All rights reserved.

70 72 74 76 77 79 82 83 84 85 86 87 88 88 91 92 93 94 98 98 100 101 101 101 104 106 106 107 107 108 109 111

69

70

CHAPTER 2 Inference for marketing decisions

4.3 Randomized experimentation .................................................... 4.4 Further limitations of randomized experiments ............................... 4.4.1 Compliance in marketing applications of RCTs ........................... 4.4.2 The Behrens-Fisher problem ................................................. 4.5 Other control methods ............................................................ 4.5.1 Propensity scores ............................................................... 4.5.2 Panel data and selection on unobservables ............................... 4.5.3 Geographically based controls ............................................... 4.6 Regression discontinuity designs................................................ 4.7 Randomized experimentation vs. control strategies .......................... 4.8 Moving beyond average effects .................................................. 5 Instruments and endogeneity .................................................................. 5.1 The omitted variables interpretation of “endogeneity” bias ................. 5.2 Endogeneity and omitted variable bias ......................................... 5.3 IV methods ......................................................................... 5.3.1 The linear case .................................................................. 5.3.2 Method of moments and 2SLS ............................................... 5.4 Control functions as a general approach ....................................... 5.5 Sampling distributions ............................................................ 5.6 Instrument validity ................................................................ 5.7 The weak instruments problem .................................................. 5.7.1 Linear models ................................................................... 5.7.2 Choice models................................................................... 5.8 Conclusions regarding the statistical properties of IV estimators........... 5.9 Endogeneity in models of consumer demand ................................. 5.9.1 Price endogeneity............................................................... 5.9.2 Conclusions regarding price endogeneity .................................. 5.10 Advertising, promotion, and other non-price variables ....................... 5.11 Model evaluation................................................................... 6 Conclusions ...................................................................................... References............................................................................................

113 115 115 117 118 118 119 120 121 122 122 123 124 126 127 128 128 129 131 133 135 135 138 141 141 142 143 144 144 145 146

1 Introduction Much has been written on the virtues of various inference frameworks or paradigms. The ultimate judgment regarding the usefulness of a given inference framework is dependent upon the nature of the inference challenges presented by a field of application. In this chapter, we discuss important challenges for inference presented by both the nature of the problems dealt with in marketing as well as the nature of the data available. Given that much of the current work in quantitative marketing is influenced by economics, we will also contrast the prevalent view in economics regarding inference with what we have found useful in marketing applications. One important goal of quantitative marketing is to devise marketing policies which will help firms to optimize their choice of marketing actions. For example, a firm might seek to improve profitability by better measurement of demand function for its products. Ultimately, we would like to maximize profitability over the space of

1 Introduction

policy functions which determine the levels and combinations of marketing actions. This goal imposes a high bar for inference and modeling, requiring a measurement of the entire surface which relates marketing actions to sales consequences, not just a derivative at a point or an average derivative. In addition to these problems in response surface estimation, some marketing decisions take on a discrete nature such as which person or sub-group to target for advertising exposure or which ad creative is best. For these actions, the problem is how to evaluate a large number of discrete combinations of marketing actions. To help solve the demanding problem of optimizing firm actions, researchers in marketing have access to an unprecedented amount of highly detailed data. Increasingly it is possible to observe highly disaggregate data on an increasing number of consumer attributes, purchase history, search behavior, and interaction with firms. Aggregation occurs over consumers, time, and products. At its most granular level, marketing data involves observing individual consumers through the process of consideration and purchase of specific products. In any given time period, most consumers purchase only a tiny fraction of the products available to them. Thus, the most common observation in purchase data is a “0.” In addition, products are only available in discrete quantities with the most commonly observed quantity of “1.” This puts a premium on models of demand which generate corner solutions as well as econometric models of discrete or limited dependent variables (see Chapter 1 for discussion of demand models which admit corner solutions). Consumer panel data features not only a very large number of variables which characterize consumer history but also very large number of consumers observed over a relatively short period of time. In the past, marketing researchers used only highly aggregate data where the aggregation is made over consumers and products. Typically, information about the consideration of products or product search was not available. Today, in the digital area at least, we observe search behavior from browsing history. This allows for the possibility that we can infer directly regarding consumer preferences before the point of purchase. In the past, only demographic or geo-demographic1 consumer characteristics were observed. Now we observe self-generated and other social media content which can help the researcher infer preferences. We also observe the social network of many if not most potential customers, opening up new possibilities for targeted marketing activities. This explosion of data holds out great promise for improved “data-based” marketing decisions, while at the same time posing substantial challenges to traditional estimation methods. For example, in pure predictive tasks we are faced with a huge number (more than 1 billion in some cases) of potential explanatory variables. Firms have been quick to use new sources of consumer data as the basis for marketing actions. The principle way this new data has been used is to target messages

1 Demographics inferred from the location of the residence of the consumer. Here the assumption is that consumers in a given geographic area are similar in demographic characteristics.

71

72

CHAPTER 2 Inference for marketing decisions

and advertisements in a much more customized and, hopefully, effective way. If, for example, which ad is displayed to a customer is a function of the customer’s preferences, this creates a new set of challenges for statistical methods which are based on the assumption that explanatory variables are chosen exogenously or as though they are a result of a process independent of the outcome variable. Highly customized and targeted marketing activities make these assumptions untenable and put a premium on finding and exploiting sources of exogenous (random-like) variation. Some would argue that the only true solution to the “endogeneity” problem created by targeted actions is true random variation of the sort that randomized experimentation is thought to deliver. Many in economics have come to a view that randomized experiments are one of the few ways to obtain a valid estimate of an effect of an economic policy. However, our goal in marketing is not just to estimate the effect of a specific marketing action (such as exposure to a given ad) but to find a policy which can help firms optimize marketing actions. It is not at all clear that optimization purely via randomized experimentation is feasible in the marketing context. Conventional randomized experiments can only be used to evaluate (without a model) discrete actions. With more than one marketing variable and many possibilities for each variable, the number of possible experiments required in a purely experimental approach becomes prohibitively large.2 In Section 2, we consider various frameworks for inference and their suitability given the desiderata of marketing applications. Given the historic emphasis on predictive validation in marketing and the renewed emphasis spurred by adoption of Machine Learning methods, it is important to review methods for evaluation models and inferences procedures. Pervasive heterogeneity in marketing applications has spurred a number of important methodological developments which we review in Section 3. We discuss the role of causal inference in marketing applications, discussing the advantages and disadvantages of experimental and non-experimental methods in Section 4. Finally, we consider the endogeneity problem and various IV approaches in Section 5.

2 Frameworks for inference Researchers in marketing have been remarkably open to many different points of view in statistical inference, taking a decidedly practical view that new statistical methods and procedures might be useful in marketing applications. Currently, marketing researchers are busy investigating the potential for Machine Learning techniques to be useful in marketing problems. Prior to Machine Learning, Bayesian methods made considerable inroads into both industry and academic practice. Again, Bayesian methods were welcomed into marketing as long as they proved to be worthwhile in

2 Not withstanding developments that use approximate multi-arm bandit solutions to reduce the cost of experimentation with many alternatives (Scott, 2014).

2 Frameworks for inference

the sense of compensating for the costs of using the methods with improved inferences and predictions. In recent years, a number of researchers in marketing as well as allied areas of economics have brought perspectives from their training in economics to bear on marketing problems. In this section, we will review the various paradigms for inferences and provide our perspective on the relative advantages and disadvantages of each approach to inference. To prove a concrete example for discussion, consider a simple marketing response model which links sales to marketing inputs. y = f (x|θ ) Here y is sales and x is a vector of inputs such as prices or promotional/advertising variables. The goal of this modeling exercise is to estimate the response surface for the purpose of making predictions regarding the level of sales expected for over the input space. These predictions can then be used to maximize profits and provide guidance to improve firm policies regarding the setting of these inputs. In the case of price inputs, the demand theory discussed in Chapter 1 of this volume can be used to select the functional form for f (). However, most researchers would want to explicitly represent the fact that sales data is not a deterministic function of the input variables – and represent deviations of y from that predicted from f () as corresponding to draws from error distributions. It is common to introduce an additive error term into this model. y = f (x|θ ) + ε

(1)

There are several ways to view ε. One way is to view the error term as arising from functional form approximation error. In this view, the model parameters can be estimated via pure projection methods such as non-linear least squares. Since the estimator is a projection, the error term is, by construction, orthogonal to f . Another interpretation of the error term is as arising from omitted variables (such as variables which describe the environment in which the product is sold but are not observed or included in the x). These could also include unobservable demand shocks. In random utility models of demand, the error terms are introduced out of convenience to “rationalize” or allow for the fact that when markets or individuals are faced with the same value of x, they don’t always demand the same quantity. For these situations, some further assumptions are required in order to perform inference. If we assume that the error terms are independent of x,3 then we can interpret the projection estimator as arising from the moment condition Ex,ε ∇f (x|θ ) ε = 0. Here ∇f is the gradient of the response surface with respect to θ . This moment condition can be used to rationalize the non-linear least squares projection in the 3 In linear models, this assumption is usually given as ε is mean independent of x, E [ε|x] = 0. Since we are allowing for a general functional form in f , we must make a stronger assumption of full independence.

73

74

CHAPTER 2 Inference for marketing decisions

sense that we are choosing parameter values, θ, so that sample moment is held as close as possible to the population moment and the sample moment condition is the first order condition for non-linear least squares. The interpretation of least squares as a method of moments estimator based on assumptions about the independence of the error term and the gradient of the response surface provides a “distribution-free” basis for estimation. This is the sense that we do not have to specify a particular distribution for the error term. The independence assumption assumes that the x variables are chosen independently of the error term or shocks to aggregate demand (conditional on x). In settings where the firm chooses x with some partial knowledge of the demand shocks, the independence assumption is violated and we must resort to other methods of estimation. In Section 5.3 below, we consider these methods.

2.1 A brief review of statistical properties of estimators This chapter is not designed to be a reference on inference methods in general, but, instead, to discuss how features of marketing applications make particular demands of inference and what methods have shown promise in the marketing literature. However, to fix notation and to facilitate discussion, we provide a brief review of statistical properties of estimation procedures. Prior to obtaining or analyzing a dataset, it is entirely reasonable to choose an estimation procedure for model parameters on the basis of the general properties of the procedure. Statistical properties for an estimation procedure are deduced by regarding the procedure as specifying a function of the data and studying the sampling properties of the estimator by considering the distribution of the estimator over repeated samples from a specific data generation mechanism. That is, we have an estimator, θˆ = g(D), where D represents that data. The estimator is specified by the function g(). Viewed in this fashion, the estimator is a random variable whose distribution comes from the distribution of the data via the summary function, g. The performance of the estimator must be gauged by specifying a loss function and examining the distribution of loss. A common loss function is the squared error loss function, (θˆ , θ ) = (θˆ − θ )t A(θˆ − θ ), where A is a positivedefinite weighting matrix. We can evaluate alternative estimation procedures by the comparing their distribution of loss. Clearly, we would prefer estimators with loss massed near zero. For convenience, many use MSE ≡ ED [(θˆ (D), θ )] as a measure of the distribution of squared error and look for estimators that offer the lowest possible value of MSE. Unfortunately, there is no general solution to problem of finding the minimal MSE procedure for all possible values of θ even conditional on a specific family of data generating mechanisms/distributions. This problem is well illustrated by shrinkage estimation methods. Shrinkage methods modify an existing estimation procedure by “shrinking” the point estimates toward some point in the parameter space (typically 0). Thus, shrinkage methods will have lower MSE than base estimation procedure for values of θ near the shrinkage point but higher MSE far from the point.

2 Frameworks for inference

As a practical matter, we do not expect arbitrarily large values of θ and this gives shrinkage methods an advantage in most applications. But the point is that shrinkage methods dominate the base method because they attempt to improve upon for only certain parts of the parameter space at the expense of inferior performance elsewhere. The best we can say is that we would like to use estimators in the admissible class of estimators – estimators that can’t be improved upon everywhere in the parameter space. Another way to understanding the problem is to observe that MSE can be reexpressed as relating to bias and the sampling variance. For a scalar estimator, 2 ˆ − θ + V(θˆ ) = Bias2 + Variance MSE = E[θ] Clearly, if one can find an estimation procedure that reduces both bias and sampling variance, this procedure will improve MSE. However, at some point in the pursuit of efficient estimators, there may well be a trade-off between these two terms. That is, we can reduce overall MSE by a favorable trade-off of somewhat larger bias for an even larger reduction in variance. Many of the modern shrinkage and variable selection procedures exploit this trade-off by finding a favorable point on the bias-variance trade-off frontier. A further complication in the evaluation of estimation procedures is that the sampling distribution (and MSE) can be extremely difficult to derive and there may be no closed-form expression for MSE. This has led many statisticians and econometricians to resort to large sample or asymptotic approximations.4 A large sample approximation is the result of an imaginary sampling experiment in which the sample size is allowed to grow indefinitely. While we may not be able to derive the sampling distribution of an estimator for a fixed N , we may be able to approximate its distribution for arbitrarily large N or in a limiting sense. This approach consists of two parts: (1) a demonstration that the distribution of an estimator is massed arbitrarily close to the true value for large enough N and (2) the use of some variant of the Central Limit Theorem to provide a normal large sample or asymptotic approximation to the sampling distribution. In a large sample framework, we consider infinite increases in the sample size. Clearly, any reasonable procedure (with enough independence in the data) should “learn” about the true parameter value with access to an unlimited amount of data. A procedure that does not learn at all from large and larger datasets is clearly a fundamentally broken method. Thus, a very minimal requirement of all estimation procedures is that as N grows to infinity, we should see the sampling distribution massed closer and closer to the true value of θ with smaller and smaller MSE. This property is called consistency and is usually defined by what is called a probability limit or plim. The idea of a plim is very simple – if the mass of the sampling distribution becomes concentrated closer and closer to the true value then we should be able 4 We note that, contrary to popular belief, the bootstrap does not provide finite sample inference and can only be justified by appeal to large sample arguments.

75

76

CHAPTER 2 Inference for marketing decisions

to find a sample size sufficiently large that for any sample of that size or larger we can make the probability that the estimator lies near the true value as large as possible. It should be emphasized that the consistency property is a very minimal property. Among the set of consistent estimation procedures, there can be some procedures with lower MSE than others. As a simple example, consider estimation of the mean. The Law of Large numbers tells us that, under very minimal assumptions, the sample mean converges to the true mean. This will be true whether we use all observations in our sample or only every 10th observation. However, if we estimate the mean using only 1/10 of the data, this procedure will be consistent but inefficient relative to the standard procedure. If an estimation procedure has a finite sample bias, this bias will be reduced to zero in the asymptotic experiment or else the estimator would be inconsistent. For this reason, consistent estimators are sometimes referred to as “asymptotically unbiased.” In other words, in large sample MSE converges to just sampling variance. Thus, asymptotic evaluation of estimators is entirely about comparison of sampling variance. Statistical efficiency in large samples is measured by sampling variance.

2.2 Distributional assumptions To complete the econometric specification of the sales response model in (1), some would argue that we have to make an assumption regarding the joint distribution of the vector of error terms. Typically, we assume that the error terms are independent which is not really a restrictive assumption in a cross-section of markets. There is a prominent school of thought in econometrics that one should not make any more distributional assumptions than minimally necessary to identify the model parameters. This view is associated with the Generalized Method of Moments (see Hansen, 1982) and applications in macro/finance. In these applications, the moment conditions are suggested by economic theory and, therefore, are well-motivated. In our setting, the moment restriction is motivated by assumptions regarding the independence of the error term from the response variables which is not motivated by appeal to any particular economic theory of firm behavior. For these reasons, we believe that the argument for not making additional distributional assumptions is less forceful in marketing applications. Another related argument from the econometrics literature is what some call the “consistency-efficiency” trade-off. Namely, that we might be willing to forgo the improved statistical efficiency afforded by methods which employ more modeling assumptions in exchange for a more “robust” estimator which gives up some efficiency in exchange for providing consistent estimates over a wider range of possible model specifications. In the case of a simple marketing mix response model and additive error terms with a continuous sales response, changes in the distribution model for the errors will typically not result in inconsistent parameter estimators. However, various estimators of the sampling variance of the estimators may not be consistent in the presence of heteroskedastic error terms. For this reason, various authors (see, for example, Angrist and Pischke, 2009) endorse the use of Eicker-White style heteroskedastic consistent estimators of the variance. If a linear approximation to an

2 Frameworks for inference

underlying non-linear response function is used, then we might expect to see heteroskedastic errors as the error term would include functional form approximation error. In marketing applications, the use of a linear regression function can be much less appealing than in economics. We seek to optimize firm behavior and linear approximations to sales response models will not permit computation of global optima. For these reasons, we believe it is far less controversial, in marketing as opposed to economics, to complete the econometric specification with a specific distributional assumption for basic sales response and demand models. Moreover, in demand models which permit discrete outcomes as discussed in Chapter 1, then the specification of the error term is really part of the model and few are interested in model inferences that are free from a specific distributional choice for the model error terms. For example, the pervasive use of logit and other closely related models is based on the use of extreme value error terms which can be interpreted as marginal utility errors. While one might want to change the assumed distribution of error terms to create a different model specification, we do not believe that “distribution-free” inference has much value in models with discrete outcomes.

2.3 Likelihood and the MLE Invariably, we will want to make inferences either for the generic response surface models as in (1) or in demand models which are derived from an assumed direct utility function along with a distribution of marginal utility errors. The generic demand model can be written as y = g (p, E, ε|θ)

(2)

where y is a vector of quantities demanded, p is a vector of prices, E is expenditure allocated to the group of products in the demand group, and ε is a vector of marginal utility errors. In either the case of (1) or (2), if we postulate a distribution of the error terms, this will induce a distribution on the response variable in (1) or the vector of demanded quantities given above. As is well known, this will allow us to compute the likelihood of the observed data. In the case of the generic sales response model, the likelihood is the joint distribution of the observed data. p (y, x|θ, ψ) = p (y|x, θ) p (x|ψ)

(3)

Here p (y|x, θ), the conditional distribution y|x is derived from (1) along with an assumed distribution of the error term. With additive error terms, the Jacobian from ε to y is 1. In the demand model application, x consists of (p, E) and the Jacobian may be more complicated due (see Chapter 1 for many examples). In the case where the error terms are independent of the right hand side variables, the marginal distribution of the right hand variables, e.g. p (x|ψ) above, will not depend on the parameters that govern the dependence of the left hand side variable on the right hand side variables. Under these assumptions, the likelihood for θ is proportional to the conditional

77

78

CHAPTER 2 Inference for marketing decisions

distribution. (θ ) ∝ p (y|x, θ)

(4)

A very powerful principle is the Likelihood Principle which states that all samplebased information is reflected in the likelihood function. Or, put another way, the likelihood is sufficient for the data. Two datasets with the same likelihood function are informationally equivalent with respect to inferences about θ even though the datasets do not have to have identical observations.5 Another consequence of the likelihood principle is that approaches which are not based on likelihood function are based on potentially inferior information and, therefore, may not be efficient. Some statisticians believe that one must base inferences procedures on more than just the likelihood and do not endorse the likelihood principle. Typically, this is argued via special and somewhat pathological examples where strict adherence to the likelihood principle produces non-sensical estimators. We are not aware of any practical example in which it is necessary to take into account more than the likelihood to make sensible inferences. Exactly how the likelihood is used to create estimators does not follow directly from the likelihood principle. The method of maximum likelihood creates an estimator using the maximum of the likelihood function. θˆMLE = arg min (θ) = f (y, x)

(5)

The MLE appears to obey the likelihood principle in the sense that the MLE depends only on the likelihood function. However, the MLE only uses one feature of the likelihood function (the max). The analysis of the MLE depends only on the local behavior of the likelihood function in the vicinity of the MLE. The properties of the MLE can only be deduced via large sample analysis. Under moderately weak conditions, the MLE is consistent and asymptotically normal. The most attractive aspect of the MLE is that it attains the Cramer-Rao lower bound for sampling variance in large samples. In other words, there is no other consistent estimator which can have a smaller sampling variance in large samples. However, the finite sample properties of the MLE are not especially compelling. There are examples where the MLE is inadmissible. The analysis of the MLE strongly suggests if an estimator is to be considered useful it should be asymptotically equivalent to the MLE. However, in finite samples, an estimator may differ appreciably from the MLE and have superior sampling properties. From a practical point of view, to use the MLE the likelihood must be evaluated at low cost and maximization must be feasible. In order for asymptotic inference to be conducted, the likelihood must be differentiable at least in a neighborhood of the 5 In cases where the sampling mechanism does not depend on θ , then the likelihood principle states that

inference should ignore the sampling mechanism. The classic example is the coin toss experiment. It does not matter whether the observed data of m heads in n coin tosses was acquired by tossing a coin n times or tossing a coin until m heads appear. This binomial versus negative binomial example has spurred a great deal of debate.

2 Frameworks for inference

maximum. Typically, researchers used non-linear programming techniques to maximize the likelihood and most of these methods are gradient-based. There are other methods such as simulated annealing and simplex methods which do not require the gradient vector; however, these methods are slow and often impractical with a large number of parameters. In addition, a maximum of the likelihood is not useful without a method for conducting inference which requires computation of gradients and or second derivatives. In many settings, we assume that economic agents act with a larger information set than is available to the researcher. In these situations, the likelihood function for the observed data must be evaluated by integrating over the distribution of unobservable variables. In many cases, this integration must be accomplished by numerical means. Integration by simulation-based methods creates a potentially non-smooth likelihood and can create problems for those who use gradient-based non-linear programming algorithms to maximize the likelihood. Inference for the MLE is accomplished by reference to the standard asymptotic result: √ N θˆMLE − θ ∼ ˙ N 0, I−1 2 ln where I is the information matrix, I = −E ∂∂θ∂θ t . For any but the most trivial models, the expected information matrix must be approximated, typically by using an estimate of the Hessian evaluated at the MLE. Most optimizers provide such an estimate on convergence. The quality of Hessian estimates particularly where numerical gradients are used can be very low and it is advisable to expend additional computational power at the optimum to obtain the highest quality Hessian estimate possible. Typically, numerical Hessians should be symmetrized.6 In some cases, numerical Hessians can be close to singularity (ill-conditioned) and cannot be used to estimate the asymptotic variance-covariance matrix of the MLE without further “regularization.”7 The average outer product of the log-likelihood function can also be used to approximate the information matrix.

2.4 Bayesian approaches As mentioned above, there is a view by some econometricians that any likelihoodbased approach requires additional assumptions that may not be supported by economic theory and that can be viewed as somewhat arbitrary. For this reason, nonlikelihood approaches which make minimal assumptions are viewed favorably. In our view, the fact that one has to specify a complete model is a benefit rather than a cost of a likelihood-based approach to inference. That is to say, it is easy to see that some data has zero likelihood even though non-likelihood based methods might be used to

6 A∗ = .5A + .5At . 7 Typically achieved by adding a small amount to each of the diagonal elements of the Hessian estimate.

79

80

CHAPTER 2 Inference for marketing decisions

make inferences. For example, if we postulate a discrete choice model, we must eliminate any data that shows consumers purchasing more than one product in the demand group at a single time. This would violate the mutually exclusive assumption of the choice model. However, we could use moment conditions to estimate a choice model which has zero likelihood for a given dataset. To guard against mis-specification, our view is that a more fruitful approach is to develop models specifically designed to accommodate the important features of marketing datasets and to be able to easily change the specifications to perform a sensitivity analysis. Given that the MLE has only desirable large sample properties, there is a need for a inference framework which adheres to the likelihood principle but offers better finite sample properties. It is well known than Bayesian procedures are asymptotically similar to the MLE but have desirable finite sample properties. Bayes estimators can be shown to be admissible and it can also be shown that all admissible estimators are Bayes. Bayesian procedures are particularly useful in very high dimensional settings and in highly structured multi-level models. Bayesian approaches have many favorable properties including shrinkage and adaptive shrinkage (see Section 3.2 for discussion of adaptive shrinkage) in which the shrinkage adapts to the information in the data. In the machine learning literature, these properties are called “regularization” which simply means reducing sampling error by avoiding outlandish or absurd estimates and by various forms of shrinkage. Many popular regularization methods such as the Lasso can be given a Bayesian interpretation (Park and Casella, 2008). There are many treatments of the Bayesian approach (see, for example, Gelman et al., 2004 and Rossi et al., 2005). We briefly review the Bayesian approach. Bayesians take the point of view that any unknown quantity (including model parameters, but also including predictions and which model governs the data) should be described by a probability distribution which represents our current state of information. Given the limited information available to any researcher, no quantity that cannot be directly measured is known with certainty. Therefore, it is natural to use the machinery of probability theory to characterize the degree of uncertainty regarding any unknown quantity. Information arises from two sources: (1) prior information and (2) from the data via the likelihood. Prior information can either come from other datasets or from economic theory (such as monotonicity of demand) or from “structure” such as I believe there is a super-population from which a given dataset is drawn from. Bayes theorem provides the way in which prior and sample information is brought together to make “after the data” or a posteriori inferences. In the case of our simple response model example, Bayes theorem states p (θ |y, X) =

p (y, θ|X) p (y|X, θ) p (θ ) = p (y|X) p (y|X)

(6)

Thus, given a prior distribution on the response parameters, Bayes theorem tells us how to combine this prior with the likelihood to obtain an expression for the posterior distribution. The practical value of Eq. (6) is that the posterior distribution of the model parameters is proportional to the likelihood times the prior density. p (θ |y, X) ∝ (θ |y, X) p (θ )

(7)

2 Frameworks for inference

All features of the posterior distribution are inherited from the likelihood and the prior. Of course, this equation does not define an estimator without some further assumptions. Under squared error loss, the posterior mean is the Bayes estimator, θˆBayes = θp (θ |y, X) dθ . However, Bayesians do not think of the Bayesian apparatus as simply a way to obtain an estimator but rather as a complete, and different, method of inference. The posterior distribution provides information about what we have learned from the data and the prior and expresses this information as a probability distribution. The degree of uncertainty can be expressed by the posterior probability of various subsets of the parameter space (for example, the posterior probability that a price elasticity parameter is less than −1.0). The marginal distributions of individual parameters in the θ vector are often used to characterize uncertainty though the computation of the Bayesian analogues of the familiar standard errors and confidence intervals. The posterior standard deviation for an element of the θ vector is analogous to the standard error and the quantiles of the posterior distribution can be used to construct a Bayesian “credibility” interval (the analogue of the confidence interval). Of course, the joint posterior offers much more information than the marginals – unfortunately this information is rarely explored or provided. It is also important to note that Bayes estimators use both information from the prior and the likelihood. While the MLE is based only the likelihood, an informative prior serves to modify location of posterior and influences the Bayes estimators. All Bayes estimators with informative priors can be interpreted as a form of shrinkage estimator. The likelihood is centered at the MLE which (within the confines of the particular parametric model) is a “greedy” estimator which tries to fit the data according the likelihood criterion. The prior serves to “shrink” or pull the Bayes estimator into a sort of compromise between the prior and the likelihood. In simplified conjugate cases, the posterior mean can be written as an average of the prior mean and the MLE where the weights in the average depend on the relative informativeness of the prior relative to the likelihood. Often the prior mean is set to zero and the Bayes estimator will shrink the MLE toward zero which improves sampling properties by exploiting the bias-variance tradeoff. However, as the sample size increases, the weight accorded the likelihood increases relative to the prior, reducing shrinkage and allowing the Bayes estimator to achieve consistency. Most Bayesian statisticians are quick to point out that there are very fundamental differences between the Bayesian measure of uncertainty and the sampling theoretic ones. Bayesian inference procedures conditional on the observed data which differs dramatically from sampling theoretic approaches that consider imaginary experiments in which new datasets are generated from the same model. Bayesians argue very convincingly that sampling properties can be useful to select a method of inference but that the applied researcher is interested in the information content of a given dataset. Moreover, Bayesian procedures do not depend on asymptotic experiments which are even of more questionable relevance for a researcher who wishes to summarize the information in one finite sample. The Bayesian approach is appealing due to superior sampling properties coupled with the appropriate inference statements that conditional on a specific dataset. The

81

82

CHAPTER 2 Inference for marketing decisions

problem is that the Bayesian approach appears to exact higher costs than a standard MLE or method of moments approach. The cost is twofold: (1) a prior distribution must be provided and (2) some practical method must be provided to compute the many integrals that are used to summarize the posterior distribution.

2.4.1 The prior Recognizing that the prior is an additional “cost” which many busy researchers might not be interested in providing, the Bayesian statistics literature pursued the development of various “reference” priors.8 The idea is that the “reference” priors might be agreed upon by researchers as providing modest or minimal prior information and that the “reference” priors can be assessed at low cost. It is our view that the literature on reference priors is largely a failure in the sense that there can’t really be one prior or form of prior that is satisfactory in all situations. Our view is that informative priors are useful and that even a modest amount of prior information can be exceptionally useful in the sense of eliminating absurd or highly improbable parameter estimates. Priors and prior structure become progressively more important as the parameterization of models becomes increasingly complex and high dimensional. Bayesian methods have become almost universally adopted in the analysis of conjoint survey data9 and in the fitting of marketing mix models. These successes in adoption of Bayesian methods come from the regularization and shrinkage properties afforded Bayesian estimators, particularly in what is called the hierarchical setting. In Section 3, we explore Bayesian approaches to the analysis of panel data which provide an element of what is termed adaptive shrinkage – namely, a multilevel prior structure which can be inferred partly on the basis of data from other units in the panel. This notion, called “borrowing strength,” is a key attribute of Bayesian procedures with highly structured and informative priors. In summary, the assessment of an informative prior is a requirement of the Bayesian approach beyond likelihood. In low dimensional settings, such as a linear regression with a small number of potential regressors and only one cross-section or time series, any in a range of moderately informative priors will produce similar results and, therefore, the prior is not important or particularly burdensome. However, in high dimensional settings such as flexible non-parametric models or in the case of panel data where there are many units and relatively few observations, the Bayesian approach provides a practical procedure where the prior is important and confers important sampling benefits. Even the simple regression example becomes a convincing argument for the Bayesian approach when the number of potential regressors becomes extremely large. Many of the variable selection Machine Learning techniques can be interpreted as Bayesian procedures. In the background, there is an informative prior which is assessed indirectly, typically through cross-validation. 8 See, for example, Bernardo and Smith (1994), Sections 5.4 and 5.62. 9 For example, Sawtooth Software implements a Bayesian hierarchical model for choice-based conjoint.

Sawtooth is the marketshare leader. SAS has several procedures for fitting systems of regression equations using Bayesian approaches and these are widely applied in the analysis of aggregate market data.

2 Frameworks for inference

Thus, Bayesian methods have become useful even in the simple linear regression problem. For example, Lasso and Ridge regression are Bayesian methods with particular prior forms (see, for example, Park and Casella, 2008).

2.4.2 Bayesian computation Historically, Bayesian methods were not used much in applications due to the problems with summarizing the posterior via computation of various integrals. For example, the minimal requirement of many investigators is to provide a point estimate and a measure of uncertainty. For the Bayesian, this involves computing the posterior mean and posterior standard deviation both of which are integrals involving the marginal posterior distribution of a given parameter. That is, inference regarding θi requires computation of the marginal posterior distribution, p (θ |y, X) dθ−i pi (θi |y, X) = θ−i

Here θ−i is all elements of the θ vector except the ith element. In addition, we must compute the normalizing constant for the posterior since the likelihood times the prior is only proportional to the posterior. Clearly, these integrals are available as closedform solutions for only very special cases. Numerical integration methods such as quadrature-based methods are only effective for very low dimensional integrals. The current popularity of Bayesian methods has been make possible by various simulation-based approaches to posterior computation. Obviously if we could simulate from the posterior at low computational cost and only require knowledge of the posterior up to a normalizing constant, this would provide a practical solution. While iid samplers from arbitrary multivariate distributions are not practical, various Markov Chain methods have been devised that can effectively simulate from the posterior at low cost. These MCMC (Markov Chain Monte Carlo) methods (see the classic treatment in Robert and Casella (2004) for a complete treatment and Rossi et al. (2005) for many applications to models of interest to marketing) create a continuous state space Markov Chain whose invariant distribution is the posterior. The accuracy of this method is determined by-the ability to simulate large number of draws from the Markov Chain at low cost as well as our ability to construct Markov Chains with limited dependence. MCMC-based Bayes procedures use simulations or draws from the Markov Chain to compute the posterior expectation of any arbitrary function of the model parameters. For example, if we have R draws from the chain, then a simulation-based estimate of the posterior expectation of any function can be obtained by simply forming an average over the function evaluated at each of these R draws. Typically, we use extremely large numbers of draws (typically greater than 10,000) to ensure that the error in the simulation approximation is small. Note that the number of draws used in under the control of the investigator (contrast to the fixed sample size of the data). R 1

Eθ|y,X g (θ) = g (θr ) R r=1

(8)

83

84

CHAPTER 2 Inference for marketing decisions

Thus, our simulation-based estimates are averages of draws from a Markov Chain constructed with an invariant distribution equal to the posterior. While there is a very general theory that assures ergodicity (convergence of these ensemble averages to posterior expectations), in practice, the draws from the Markov Chain can be highly correlated, requiring a very large number of draws. For the past 25 years, Bayesian statisticians and econometricians have enlarged the class of models which can be treated successfully by MCMC methods. Advancement in computation also has allowed applications to data sets and models that were previously thought to be impossible to analyze. We routinely analyze highly nonlinear choice models with thousands of cross-sectional units and tens of thousands of observations. Multivariate mixtures of normals in high dimensions and with a large number of components can be implemented on laptop computing equipment using an MCMC approach. Given the availability of MCMC methods (particularly the Gibbs Sampler), statisticians have realized that many models whose likelihoods involve difficult integrals can be analyzed with Bayesian methods using the principle of data augmentation. For example, models with latent random variables that must be integrated out to form the likelihood can be analyzed from in a Bayesian approach by augmenting the parameter space with these latent variables and defining a Markov Chain on this “augmented” state space. Marginalizing (integrating) out the latent can be achieved trivially by simply discarding draws of the latent variable. These ideas have been applied very successfully to random coefficient models as well as models like the multinomial and multivariate Probit. In summary, the Bayesian approach was previously thought to be impractical given lack of a computational strategy for computing various integrals of the nonnormalized posterior. MCMC methods have not only eliminated this drawback but, with the advent of data augmentation, have made Bayesian procedures the only practical methods for models with a form of likelihood that involves integration. The current challenge to Bayesian methods is to apply to truly enormous data sets generated by millions of consumers and a vast number of potential explanatory variables. As currently implemented, MCMC methods are fundamentally sequential in nature (the rth simulate of the Markov Chain depends on the value of the (r − 1)st simulate). The vast computing power currently available is obtained not by the speed of any one given processor but the ability to break the computing task into pieces and farm this out to large array of processors. Sequential computations do not naturally lend themselves to anything other than very tightly coupled computer architectures. This is a current area of research which awaits innovation in our methods as well as possible changes in the computing environment. In Section 3, we review some of the approaches to “scaling” MCMC methods to truly huge panel data sets.

2.5 Inference based on stochastic search vs. gradient-based optimization Up to this point, we have viewed MCMC methods as a method for indirectly sampling from the posterior distribution. The theory of MCMC methods says that, if Markov

2 Frameworks for inference

Chain is allowed to run long enough, the procedure will visit any “appreciable” set with frequency proportional to the posterior probability of that set. Clearly, then a MCMC sampler will visit high probability areas of the posterior much more often low probability areas. In some cases, the set of possible parameter values is so large that, as a practical matter, the MCMC will only visit a part of the parameter space. For example, consider the application of Bayesian methods to the problem of selecting regression models from the space of all possible regression models (if there are k possible regressions there are 2k possible models). For k > 30, any of the MCMC methods (see, for example, George and McCulloch, 1997) will visit only a subset of the possible models and one would not necessarily want to use the frequency of model visits as an estimate of the posterior probability of a model. Thus, one way of looking at the MCMC method is as a method of stochastic search. All MCMC methods are designed to draw points from the parameter space with some degree of randomness. Standard methods such as the Gibbs Sampler or random walk MCMC methods do not rely on any gradient information regarding the posterior (note: there are variational and stochastic gradient methods that do use gradient information). This means that the parameter space does not have to be continuous (as in the case of variable selection) nor does the likelihood have to be smooth. There are some problems which give rise to a likelihood function with discrete jumps (see, for example, Gilbride and Allenby, 2004). Gradient-based MLE methods simply cannot be applied in such situations. However, an MCMC method does not require even continuity of the likelihood function. In other situations, the likelihood function requires an integral. The method of simulated maximum likelihood simply replaces that integral with a simulation-based estimate of the integral. For example, the integral might be taken over a normal distribution of consumer heterogeneity. Given the simplicity and low cost of normal draws, a simulated MLE seems to be a natural choice to evaluate the likelihood function numerically. However, given that only a finite number of draws are used to approximate the integral, the likelihood function is now non-differentiable and gradient-based maximization methods can easily fail. A Bayesian has a choice of whether to use a random walk or similar MCMC method directly on the likelihood function evaluated by simulation-based integration or to augment the problem with latent variables. In either case, the Bayesian using stochastic search methods is not dependent on any smoothness in the likelihood function.

2.6 Decision theory Most of the recent Bayesian literature in marketing emphasizes the value of the Bayesian approach to inference, particularly in situations with limited information. Bayesian inference is only a special case of the more general Bayesian decisiontheoretic approach. Bayesian Decision Theory has two critical and separate components: (1) a loss function and (2) the posterior distribution. The loss function associates a loss with a state of nature and an action, L (a, θ ), where a is the action and θ is the state of nature. The optimal decision maker chooses the action so as

85

86

CHAPTER 2 Inference for marketing decisions

to minimize expected loss where the expectation is taken with respect to the posterior distribution. min L¯ (a) = L (a, θ ) p (θ |Data) dθ a

Inference about θ can be viewed as a special case of decision theory where the “action” is to choose an estimate based on the data. Model choice can also be thought of as a special case of decision theory. If the loss function associated with model choice is takes on the value of 1 if the model is correct and 0 if not, then the solution which minimizes expected loss is to select the model (from a set of models) with highest posterior probability (for examples and further details see Chapter 6 of Rossi et al., 2005).

2.6.1 Firms profits as a loss function In the Bayesian statistical literature, decision theory has languished as there are few compelling loss functions, only those chosen for mathematical convenience. The loss function must come from the subject area of application and is independent of the model. That is to say, a principle message of decision theory is that we use the posterior distribution to summarize the information in the data (via likelihood) and prior and that decisions are made as a function of that information and a loss function. In marketing, we have a natural loss function, namely, the profit function of the firm. Strictly speaking, the profit function is not a loss function which we seek to minimize. We maximize profits or minimize the negative of profits. To take the simple sales response model presented here, the profit function here would be π (x|θ) = E y|x, θ (p − c (x)) = f (x|θ ) (p − c (x)) (9) where y = f (x|θ ) + ε and c (x) is the cost of providing the vector of marketing inputs.10 Optimal decision theory prescribes that we should make decisions so as to maximize the posterior expectation of the profit function in (9). x ∗ = argmax π¯ (x) π¯ (x) = π (x|θ ) p (θ|Data) dθ The important message is that we act to optimize profits based on the posterior expectation of profits rather than inserting our “best guess” of the response parameters (the plug-in approach) and proceeding as though this estimate is the truth. The “plug-in” approach can be thought of as expressing overconfidence in the parameter estimates. If the profit function is non-linear in θ then the plug-in and full decision theoretic 10 It is a simple matter to include covariates out of the control of the firm in the sales response surface.

Optimal decision could either be done conditional on this vector of covariates or the covariates could be integrated out according to some predictive distribution.

2 Frameworks for inference

approaches will yield different solutions and the plug-in approach will typically overstate potential profits. In commercial applications of marketing research, many firms offer what are termed marketing mix models. These models are built to help advise firms how to allocate their budgets over many possible marketing activities including pricing, trade promotions, and advertising of many kinds including TV, print, and various forms of digital advertising. The options in digital advertising have exploded and now include sponsored search, web-site banner ads, product placement ads, and social advertising. The marketing mix model is designed to attack the daunting task of estimating the return on each of these activities and making predictions regarding the consequence of possible reallocation of resources on firm profits. As indicated above, the preferred choice for estimation in the marketing mix applications are Bayesian methods applied to sets of regression models. However, in making recommendations to clients, the marketing mix modeler simply “plugs-in” the Bayes estimates and is guilty of overconfidence. The problem with this approach is that, if taken literally, the conclusion is often to put all advertising resources in only one “bucket” or type of advertising. The full decision theoretic approach avoids these problems created by overconfidence in parameter estimates.

2.6.2 Valuation of information sets An important problem in modern marketing is the valuation of information. Firms have an increasing extensive array of possible information sets on which to base decisions. Moreover, acquisition of information is considered a major part of strategic decisions. For example, Amazon’s recent acquisition of Whole Foods as opening of brick and mortar stores has been thought to be motivated by the rich set of offline information which can be accumulated by observing Amazon customers in these store environments. In China, retailing giants Alibaba and JD.com have built what some term “data ecosystems” that can link customers across many different activities including web-browsing, social media, and on-line retail. On-line ad networks and programmatic ad platforms offer unprecedented targeting opportunities based on information regarding consumer preferences and behavior. The assumption behind all of these developments is that new sources of information are extremely valuable. The valuation of information is clearly an important part of marketing. One way of valuing information is in the ability of this new information to provide improved estimates of consumer preferences and, therefore, more precise predictions of consumer response to various marketing activities. However, statistically motivated estimation and prediction criteria such as mean squared error do not place a direct monetary valuation on information. This can only be obtained in a specific decision context and a valid loss function such as firm profits. To make this clear, consider two information sets, A and B, regarding sales response parameters. We can value these information sets by solving the decision theoretic problem and comparing the attainable expected

87

88

CHAPTER 2 Inference for marketing decisions

profits for the two information sets. That is, we can compute k = max π (x|θ) pk (θ ) dθ x

k = A, B where pk (θ) is the posterior based on information set k. In situations where decisions can be made at the consumer level rather than at the aggregate level, information set valuation can be achieved within a hierarchical model via different predictive distributions of consumer preferences based on alternative information sets (in Section 3, we will provide a full development of this idea).

2.7 Non-likelihood-based approaches 2.7.1 Method of moments approaches As indicated above, the original appeal of the Generalized Method of Moments (GMM) methods is that they use only a minimal set of assumptions consistent with the predictions of economic theory. However, when applied in marketing and demand applications, the methods of moments is often used as a convenient way of estimating the parameters of a demand model. Given a parametric demand model, there are a set of moment conditions which can identify the parameters of the model. These moment conditions can be used to define a method of moments estimator even for a fully parametric model. The method of moments approach is often chosen to avoid deriving the likelihood of the data and associated Jacobians. While this is true, the method of moments approach does not specify which set of moments should be used and there is often an infinite number of possible sets of moments conditions, any one of which is sufficient to identify and estimate model parameters. The problem, of course, is that method of moments provides little guidance as to which set of moments to use. The most efficient procedure for method of moments is to use the score function (gradient of the expected log-likelihood) to define the moment conditions. This means, of course, that one can only approach the asymptotic efficiency of the maximum likelihood estimator. In other situations, the method of moments approach is used to estimate a model which is only partially specified. That is, most parts of the model have a specific parametric form and specific distributional assumptions, but investigators purport to be reluctant to fully specify other parts of the model. We do not understand why it is defensible to make full parametric assumptions about part of a model but not others when there is no economic theory underpinning any of the parametric assumptions made. GMM advocates would argue that fewer assumptions is always better than more assumptions. The problem, then, is which parts of the model are designated for specific parametric assumptions and which parts of the model are not? The utility of the GMM approach must be judged relative to the arguments made by the investigator in defense of a particular choice of which part of the model is left unspecified. Frequently, we see no arguments of this sort and, therefore, we conclude that the method of moments procedure was chosen primarily for reasons of convenience.

2 Frameworks for inference

The aggregate share model of Berry et al. (1995) provides a good example of this approach. The starting point for the BLP approach is to devise a model for aggregate share data that is consistent with valid demand models postulated at the individual level. For example, it is possible to take the standard multinomial logit model as the model governing consumer choice. In a market with a very large number of consumers, the market shares are the expected probabilities of purchase which would be derived by integrating the individual model over the distribution of heterogeneity. The problem is that, with a continuum of consumers, all of the choice model randomness would be averaged out and the market shares would be a deterministic function of the included choice model covariates. To overcome this problem, Berry et al. (1995) introduced an additional error term into consumer level utility which reflects a market-wide unobservable. For their model, the utility of brand j for consumer i and time period t is given by Uij t = Xj t θji + ηj t + εij t

(10)

where Xj t is a vector of brand attributes, θji is a k × 1 vector of coefficients, ηj t , is an unobservable common to all consumers, and εij t is the standard idiosyncratic shock (i.i.d. extreme value type I). If we normalize the utility of the outside good to zero, then market shares (denoted by sj t ) are obtained by integrating the multinomial logit model over a distribution of consumer parameters, f θ i |δ , θ i = θ1i , . . . , θJi . δ is the vector of hyper-parameters which govern the distribution of heterogeneity. exp Xj t θji + ηj t i |δ dθ i f θ sj t =

J 1 + k=1 exp Xkt θki + ηkt = sij t θ i |Xt , ηt f θ i |δ dθ i While it is not necessary to assume that consumer parameters are normally distributed, most applications assume a normal distribution. In some cases, difficulties in estimating the parameters of the mixing distribution force investigators to further restrict the covariance matrix of the normal distribution to a diagonal matrix (see Jiang et al., 2009). Assume that θ i ∼ N θ¯ , , then the aggregate shares can be expressed as a function of aggregate shocks and the preference distribution parameters. exp Xj t θ i + ηj t i |θ, dθ i = h ηt |Xt , θ¯ , (11) φ θ sj t =

J i 1 + k=1 exp Xkt θk + ηkt where ηt is the J × 1 vector of common shocks. If we make an additional distributional assumption regarding the aggregate shock, ηt , we can derive the likelihood. Given that we have already made specific assumptions regarding the form of the utility function, the distribution of the idiosyncratic choice errors, and the distribution of heterogeneity, this does not seem particularly restrictive. However, the recent literature on GMM methods for aggregate share models

89

90

CHAPTER 2 Inference for marketing decisions

does emphasize the lack of distributional assumptions regarding the aggregate shock. In theory, the GMM estimator should be robust to autocorrelated and heteroskedastic errors of an unknown form. We will assume that the aggregate shock is i.i.d. across both products and time periods and follows a normal distribution, ηj t ∼ N 0, τ 2 . The normal distribution assumption is not critical to the derivation of the likelihood; however, as Bayesians we must make some specific parametric assumptions. Jiang et al. (2009) propose a Bayes estimator based on a normal likelihood and document that this estimator has excellent sampling properties even in the presence of misspecification and, in all cases considered, has better sampling properties than a GMM approach (see Chen and Yang, 2007 and Musalem et al., 2009 for other Bayesian approaches). The joint density of shares at “time” t (in some applications of aggregate share models, shares are observed over time for one market and in other shares are observed for a cross-section of markets. In the latter case, the “t” index would index markets) can be obtained by using standard change of variable arguments. π s1t , . . . , sJ t |X, θ¯ , , τ 2 = φ h−1 s1t , . . . , sJ t |X, θ¯ , |0, τ 2 IJ J(η→s) −1 = φ h−1 s1t , . . . , sJ t |X, θ¯ , |0, τ 2 IJ J(s→η) (12) φ (·) is the multivariate normal density. The Jacobian is given by ∂sj J(s→η) = ∂η k ∂sj −sij θ i sik θ i φ θ i |θ¯ , k = j = i i i ¯ ∂ηk sij θ 1 − sik θ φ θ |θ , k = j

(13) (14)

It should be noted that, given the observed shares, the Jacobian is a function of only (see Jiang et al., 2009 for details). To evaluate the likelihood function based on (12), we must compute the h−1 function and evaluate the Jacobian. The share inversion function can be evaluated using the iterative method of BLP (see Berry et al., 1995). Both the Jacobian and the share inversion require a method for approximation of the integrals required to compute “expected share” as in (11). Typically, this is done by direct simulation; that is, averaging over draws from the normal distribution of consumer level parameters. It has been noted that the GMM methods can be sensitive to simulation error in the evaluation of the integral as well as errors in computing the share inversion. Since the number of integral estimates and share inversions is of the order of magnitude of the number of likelihood or GMM criterion evaluations, it would desirable, from a strictly numerical point of view, that the inference procedure exhibit little sensitivity to the number of iterations of the share inversion contraction or the number of simulation draws used in the integral estimates. Our experience is that the Bayesian methods that use stochastic search as opposed to optimization are far less sensitive to

2 Frameworks for inference

these numerical errors. For example, Jiang et al. (2009) show that the sampling properties of Bayes estimates are virtually identical when 50 or 200 simulation draws are used in the approximation of the share integrals; this is not true of GMM estimates.11 In summary, the method of moments approach has inferior sampling properties to a likelihood-based approach and the literature has not fully explored the efficiency losses of using method of moments procedures. The fundamental problem is that the set of moments conditions is arbitrary and there is little guidance as to how to choose the most efficient set of moment conditions.12

2.7.2 Ad hoc approaches Any function of the data can be proposed as an estimator. However, unless that function is suggested or derived from a general approach to inference that has established properties, there is no guarantee that proposed estimator will have favorable properties. For example, any Bayesian estimator with non-dogmatic priors will be consistent as is true with MLE and Method of Moment estimators under relatively weak conditions. Estimators outside these classes (Bayes, MLE, and MM), we term “ad hoc” estimators as there are no general results establishing the validity of the estimator procedure. Therefore, the burden is on the investigator proposing an estimator not based on established principles to demonstrate, at a minimum, that the estimator is consistent. Unfortunately, pure simulation studies cannot establish consistency. It is well known that various biased estimators (for example, Bayes) can have very favorable finite sample properties by exploiting the bias-variance trade-off. However, as the amount of sample information becomes greater in large samples, all statistical procedures should reduce bias. The fact that a procedure can be constructed to do well by some criterion for a few parameter settings does not insure this. This is the value of theoretical analysis. Unfortunately, there are examples in the marketing literature (see Chapter 3 on conjoint methods) of procedures that have been proposed that are not derived from an inference paradigm that insures consistency. That is not to say that these procedures are inconsistent, but that consistency should and has not been established. Our view is that consistency is a necessary condition which must be made in order to admit an estimation procedure for further evaluation. It may well be, that a procedure offers superior (or inferior) performance to existing methods but this must wait until this minimal property is established. Some of the suggestions provided in the marketing literature are based purely on optimization method without establishing that the criterion for optimization is derived from a valid probability model. Thus, establishing consistency for these procedures is apt to be difficult. In proposing new procedures, investigators should be well aware of the complete class theorem which states that all admissible estimators are Bayes estimators. In 11 It is possible to lessen the sensitivity of the method of moments approach to numerical inversion error

(Dubé et al., 2012). 12 The GMM literature does have results about the asymptotically optimal moment conditions but, for

many models, it is impractical to derive the optimal set of moment conditions.

91

92

CHAPTER 2 Inference for marketing decisions

other words, it is not possible to dominate a Bayes estimator in finite samples for a specific and well-defined model (likelihood). Thus, procedures which are designed to be competitive with Bayes estimators must be based on robustness considerations (that is procedures that make fewer distributional assumptions in hopes of remaining consistent across a broader class of models).

2.8 Evaluating models Given the wide variety of models as well as methods for estimation of models, a methodology for comparison of models is essential. One approach is to view model choice as a special case of Bayesian decision theory. This will lead the investigator to compute the posterior probability of a model. p (Mi |y) =

p (y|Mi ) p (Mi ) p (y)

(15)

where Mi denotes model i. Thus, Bayesian model selection is based on the “marginal” likelihood, p (y|Mi ), and the prior model probability. We can simply select models with the highest value of the numerator of (15) or we can average predictions across models using these posterior probabilities. The practical difficulty with the Bayesian approach to model selection is that the marginal likelihood of the data must be computed. (16) p (y|Mi ) = p (y|θ, Mi ) p (θ |Mi ) dθ This integral is difficult to evaluate using only the MCMC draws that are used to perform posterior inferences regarding the model parameters (see, for example, the discussion in Chapter 6 of Rossi et al., 2005). Not only are these integrals difficult to evaluate numerically, but the results are highly sensitive to the choice of prior. Note also that the proper priors would be required for unbounded likelihoods in order to obtain convergence of the integral which defines the marginal likelihood. We can view the marginal likelihood as the expected value of the likelihood taken over the prior parameter distribution. If you have a very diffuse (or dispersed) prior, then the expected likelihood can be small. Thus, the relative diffusion of priors for different models must be considered when computing posterior model probabilities. While this can be done, it often involves a considerable amount of effort beyond that required to perform inferences conditional on a model specification. However, when properly done, the Bayesian approach to model selection does afford a natural penalty for over-parameterized models (the asymptotic approximation known as the Schwarz approximation shows the explicit dependence of the posterior probability on model size; however, the Schwarz approximation is notoriously inaccurate and, therefore, not of great practical value). Given difficulties in implementing a true decision theoretic approach to model selection, investigators have searched for other methods which are more easily implemented while offering some of the benefits of the more formal approach. Investigators

3 Heterogeneity

are aware that, given the “greedy” algorithms that dominate estimation, in-sample measures of model fit will understate true model error in prediction or classification. For this reason, various predictive validation exercises have become very popular in both marketing and the Machine Learning literatures. The standard predictive validation exercise involves dividing the sample into two data sets (typically by random splits): (1) estimation dataset and (2) validation dataset. Measures of model performance such as MSE are computed by fitting the model to the estimation dataset and predicting out-of-sample on the validation data. This procedure will work to remove the over-fitting biases of in-sample measures of MSE. Estimation procedures which have favorable bias-variance trade-offs will perform well by the criteria of predictive validation. The need to make arbitrary divisions of the data into estimation and validation datasets can be removed by using a k-fold cross-validation procedure. In k-fold cross-validation, the data is divided randomly into k “folds” or subsets. Each fold is reserved for validation and the model is fit on the other k − 1 folds. This is averaged over many draws of the fold classification and can be shown to produce an unbiased estimate of the model prediction error criterion. While these validation procedures are useful in discriminating between various estimation procedures and models, some caution should be exercised in their application to marketing problems. In marketing, our goal is to optimize policies for selection of marketing variables and we must consider models that are policy invariant. If our models are not policy invariant, then we may find models that perform very well in pure predictive validation exercises make poor predictions for optimal policy determination. In Section 4, we will consider the problem of true causal inference. The causal function linking market variables to outcomes such as sales can be policy invariant. The need for causal inference may also motivate us to consider other estimation procedures and these are explored in Section 5.

3 Heterogeneity A fundamental premise of marketing is that customers differ both in preferences for product features as well as their sensitivities to marketing variables. Observable characteristics such as psycho-demographics can only be expected to explain a limited portion of the variation in tastes and responsiveness. Disaggregate data is required in order to measure customer heterogeneity.13 Typically, disaggregate data are obtained for a relatively large number of cross-sectional units but with a relatively short history of activity. In the consumer packaged goods industry, store level panel data are common, especially for retailers. There is also increased availability of customer level purchase data from specialized panels of consumers or from detailed purchase

13 Some argue that, with specific functional forms, the heterogeneity distribution can be determined from

aggregate data. Fundamentally, the functional forms of response models and the distribution of heterogeneity are confounded in aggregate data.

93

94

CHAPTER 2 Inference for marketing decisions

histories assembled from firm records. As the level of aggregation decreases, discrete features of sales data become magnified. The short time span of panel data coupled with the comparatively sparse information in discrete data means that we are unlikely to have a great deal of sample information about any one cross-sectional unit. If inference about unit-level parameters is important, then Bayesian approaches will be important. Moreover, the prior will matter and there must be reasonable procedures for assessing informative priors. Increasingly firms want to make decentralized marketing decisions that exploit more detailed disaggregate information. Examples include store or zone level pricing, targeted electronic couponing, and sales force activities in the pharmaceutical industry. All of these examples involve allocation of marketing resources across consumers or local markets and the creation of possibly customized marketing treatments for each unit. In digital advertising, the ability to target an advertising message at a very specific group of consumers, defined by both observable and behavioral measures, makes the modeling of heterogeneity even more important. The demands of marketing applications contrast markedly with applications in micro-economics where the average response to a variable is often deemed more important. However, even the evaluation of policies which are uniform across some set of consumers will require information about the distribution of preferences in order to evaluate the effect on social welfare. In this section, we will review approaches to modeling heterogeneity, particularly the Bayesian hierarchical modeling approach. We will also discuss some of the challenges that truly huge panel datasets offer for Bayesian approaches.

3.1 Fixed and random effects A generic formulation of the panel data inference problem is that we observe a crosssection of H units over “time.” The panel does not necessarily have to be balanced, namely each unit can have a different number of observations. For some units, the number of observations may be very small. In many marketing context, a new “unit” maybe “born” with 0 observations, but we still have to make predictions for this unit. For example, a pharmaceutical company has a very expensive direct sales force that calls on many key “accounts” which in this industry is defined as a prescribing physician. There may be some physicians with a long history of interaction with the company and others who are new “accounts” with no history of interaction. Any firm that acquires new customers over time faces the same problem. Many standard econometric methods simply have no answer to this problem. Let p (yh |θh ) be the model we postulate at the unit level. If the units are consumers or households, this likelihood could be a basic demand model, for example. Our goal is to make inferences regarding the collection {θh }. The “brute force” solution would be to conduct separate likelihood-based analyses for each cross-sectional unit (either Bayes or non-Bayes). The problem is that many units may have so little information (think singular X matrix or choice histories in which a unit did not purchase all of the choice alternatives) that the unit-level likelihood does not have a maximum. For

3 Heterogeneity

this reason, separate analyses is not practical. Instead, the coefficients allowed to be unit-specific is limited. The classic example is what is often called at “Fixed Effects” estimator which has its origins in a linear regression model with unit specific intercepts and common coefficients. yht = αh + β xht + εht

(17)

Here the assumption is that there is a common “effect” or coefficients on the x variables but individual-specific intercepts and that there are unit-level intercept parameters. A common interpretation of this set-up is that there is some sort of unobservable variable(s) which influences the outcome, y, and which varies across the units, h. With panel data, we can just label the effect of these unobservables as time-invariant intercepts and estimate the intercepts with panel data. Sometimes econometricians will characterize this as solving the “selection on unobservables” problem via the introduction of fixed effects. Advocates for this approach will explain that no assumptions are made regarding the distribution of these unobservables across units nor are the unobservables required to be independent of the included x variables. For a linear model, estimation of (17) is a simple matter of concentrating the {αh } out of the likelihood function for the panel data set. This can be done by either subtracting the unit-level means of all variables or by differencing over time. yht − y¯h. = β t (xht − xh. ) + εht − εh.

(18)

In most cases, there is no direct estimation of the intercepts but, instead, the demeaning operation removes a part of the variation of both the independent and dependent variables from the estimation of the β terms. It is very common for applied econometricians to use hundreds if not thousands of fixed effect terms in estimation of linear panel data models. The goal is to isolate or control for unobservables that might compromise clean estimation of the common β coefficients. Typically, a type of sensitivity analysis is done where groups of fixed effect terms are entered or removed from the specification and changes in the β estimates are noted. Of course, this approach to heterogeneity does not help if the goal is to make predictions regarding a new unit with no data or a unit with so little data even the fixed effect intercept estimator is no defined. The reason is that there is no common structure assumed for the intercept terms. Each panel unit is unique and there is not source of commonality of similarity across units. This problem does not normally trouble econometricians who are concerned more with “effect estimation” or estimating β rather than prediction for new units or units with insufficient information. In marketing applications, we have no choice – we must make predictions for all units in the analysis. The problems with the fixed effects approach to heterogeneity does not stop with prediction for units with insufficient data. The basic idea that unobservables have additive effects and simply change the intercept and the assumption of a linear mean function allows the econometrician to finesse the problem of estimating the fixed

95

96

CHAPTER 2 Inference for marketing decisions

effects by concentrating them out of the likelihood function. That is, there is a transformation of the data so that the likelihood function can be factored into two terms – one term does not involve β and the other term only involves the data through the transformation. Thus, we lose no information by “demeaning” or differencing the data. This idea does not extend to non-linear models such as discrete choice or nonlinear demand models. This problem is so acute that many applied econometricians fit “linear probability” models to discrete or binary data in order to use the convenience of the additive intercept fixed effects approach even thought they know that a linear probability model is very unlikely to fit their data well and must only be regarded as an approximation to the true conditional mean function. This problem with the fixed effects approach does not apply to the random coefficients model. In the random coefficient model, we typically do not distinguish between the intercept and slope parameters and simply consider all model coefficients to be random (iid) draws from some “super-population” or distribution. That is, we assume the following two part model: y ∼ p (y|θh )

(19)

θh ∼ p (θh |τ ) Here the second equation is the random coefficient model. Almost without exception, the random coefficient model is taken to be a multivariate normal model, θh ∼ N θ¯ , . It is also possible to parameter the mean of the random coefficient model by observable characteristics of each cross sectional unit, i.e. θ¯ = z where z is a vector of unit characteristics. However, there is still the possibility that there are unobservable unit characteristics that influence x explanatory variables in each unit response model. The random coefficient model assumes independence between the random effects and the levels of unit x variables conditional on z. Many regard this assumption as an drawback to the random coefficient model and point out that a fixed effects specification does not require this assumption. Manchanda et al. (2004) explicitly model the joint distribution of random effects and unit level x variables as one possible approach to relaxing the conditional independence assumption used in standard random coefficient models. If we start with a linear model, then the random coefficient model can be expressed as a linear regression model with a special structured covariance matrix as in yht = θ¯ t xht + εht + vht xht

(20)

Here θ = θ¯ + v, v ∼ N (0, ). Thus, we can regard the regression model as estimating the mean of the random coefficient distribution and the covariance matrix is inferred from the likelihood of the correlated and heteroskedastic error terms. This error term structure motivates the use of “cluster” covariance matrix estimators. Of course, what we are doing by substituting the random coefficient model into the unit level regression is integrating out the {θh } parameters. In the more general non-linear setting, those who insist upon doing maximum likelihood would regard the random

3 Heterogeneity

coefficient model as part of the model and integrate or “marginalize” out the unitlevel parameters. H p (yh |θh ) p (θh |τ ) dθh (21) (τ ) = h=1

Some call random coefficient models “mixture” models since the likelihood is a mixture of the unit level distribution over an assumed random coefficient distribution. Maximum likelihood estimation of random coefficient models requires an approximation to the integral in the likelihood function over τ .14 Investigators find that they must restrict the dimension of this integral (by assuming that only parts of the coefficient vector are random) in order to obtain reasonable results. As we will see below, Bayesian procedures finesse this problem via data augmentation. The set of random coefficients are “augmented” in the parameter space, exploiting the fact that given the random coefficients, inference regarding τ is often easily accomplished.

Mixed logit models Econometricians are keenly aware of the limitations of the multinomial logit model to represent the demand for a set of products. The multinomial logit model has only one price coefficient and thus the entire matrix of cross-price elasticities must be generated with that one parameter. This problem is often known as the IIA property of logit models. As is shown in Chapter 1, one way of interpreting the logit model with linear prices is as corresponding to demand derived from linear utility with extreme value random utility errors. Linear utility assumes that all products are perfect substitutes. The addition of the random utility error means that choice alternatives are no longer exact perfect substitutes but the usual iid extreme value assumption means that all products have substitutability differences that can be expressed as a function of market share (or choice probability) alone. However, applied choice modelers are quick to point out that if aggregate demand is formed as the integral of logits over a normal distribution of preferences, then this aggregate demand function no longer has the IIA property. Our own experience is that while this is certainly true as a mathematical statement that aggregate preferences often exhibit elasticity structures which are very close to those implied by IIA. Our experience is that high correlations in the mixing distribution are required to obtain large deviations form IIA in the aggregate demand system. Many of the claims regarding the ability of mixed logit to approximate arbitrary aggregate demand systems stem from a misreading of McFadden and Train (2000). A superficial reading of this article might imply that mixed logits can approximate any demand structure but this is only true if explanatory variables such as price are allowed to enter the choice probabilities in arbitrary non-linear ways. In some sense, it must always be true that any demand model can be approximated by arbitrary 14 This situation seems to be a clear case where simulated MLE might be used. The integral in (21) is

approximated by a set of R draws from the normal distribution.

97

98

CHAPTER 2 Inference for marketing decisions

functions of price. One should not conclude that mixtures of logits with linear price terms can be used to approximate arbitrary demand structures.

3.2 Bayesian approach and hierarchical models 3.2.1 A generic hierarchical approach Consider a cross-section of H units, each with a likelihood, p (yh |θh ) , h = 1, . . . , H . θh is a k × 1 vector. yh generically represents the data on the hth unit and θh is a vector of unit-level parameters. While there is no restriction on the model for each unit, common examples include a multinomial logit or standard regression model at the unit level. The parameter space can be very large and consists of the collection of unit level parameters, {θh , h = 1, . . . , H }. Our goal will be to conduct a posterior analysis of these joint set of parameters. It is common to assume that units are independent conditional on θh . More generally, if the units are exchangeable (see Bernardo and Smith, 1994), then we require a prior distribution which is the same no matter what the ordering of the units are. In this case, we can write down the posterior for the panel data as p (θ1 , . . . , θH |y1 , . . . , yH ) ∝

H

p (yh |θh ) p(θ1 , . . . , θH |τ )

(22)

h=1

τ is a vector of prior parameters. The prior assessment problem posed by this model is daunting as it requires specifying a potentially very high dimensional joint distribution. One simplification would be to assume that the unit-level parameters are independent and identically distributed, a priori. In this case, the posterior factors and inference can be conducted independently for each of the H units. p (θ1 , . . . , θH |y1 , . . . , yH ) ∝

H

p (yh |θh ) p(θh |τ )

(23)

h=1

Given τ , the posterior in (23) is the Bayesian analogue of the classical fixed effects estimation approach. However, there are still advantages to the Bayesian approach in that an informative prior can be used. The informative prior will impart important shrinkage properties to Bayes estimators. In situations in which the unit-level likelihood may not be identified, a proper prior will regularize the problem and produce sensible inferences. The real problem is a practical one in that some guidance must be provided for assessing the prior parameters, τ . The specification of the conditionally independent prior can be very important due to the scarcity of data for many of the cross-sectional units. Both the form of the prior and the values of hyper-parameters are important and can have pronounced on effects ¯ Vθ . Just the unit-level inferences. For example, consider a normal prior, θh ∼ N θ, the use of a normal prior distribution is highly informative regardless of the value of hyper-parameters. The thin tails of the prior distribution will reduce the influence of the likelihood when the likelihood is centered far away from the prior. For this

3 Heterogeneity

reason, the choice of the normal prior is far from innocuous. For many applications, the shrinkage of outliers is a desirable feature of the normal prior. The prior results in very stable estimates but at the same time this prior might mask or attenuate differences in consumers. It will, therefore, be important to consider more flexible priors. In other situations, the normal prior may be inappropriate. Consider the problem of random coefficient distribution of price coefficients in a demand model. Here we expect that the population distribution puts mass only on negative values and that the distribution would be highly skewed and possibly with a fat left tail. The normal random coefficient distribution would not be appropriate. It is a simple matter to reparameterize the price coefficient as in βp = − exp(βp∗ ) where we assume βp∗ is normal. But the general point that the normal distribution is restrictive is important to note. That is why we have enlarged these models to consider a finite or even infinite mixture of normals which can flexibly approximate any continuous distribution. If we accept the normal form of the prior as reasonable, a method for assessing the prior hyper-parameters is required (Allenby and Rossi, 1999). It may be desirable to adapt the shrinkage induced by use of an informative prior to the characteristics of both the data for any particular cross-sectional unit as well as the differences between units. Both the location and spread of the prior should be influenced by both the data and our prior beliefs. For example, consider a cross-sectional unit with little information available. For this unit, the posterior should shrink toward some kind of “average” or representative unit. The amount of shrinkage should be influenced both by the amount of information available for this unit as well as the amount of variation across units. A hierarchical model achieves this result by putting a prior on the common parameter, τ . The hierarchical approach is a model specified by a sequence of conditional distributions, starting with the likelihood and proceeding to a two-stage prior. p (yh |θh ) p (θh |τ )

(24)

p (τ |a) The prior distribution on θh |τ is sometimes called the first stage prior. In nonBayesian applications, this is often called a random effect or random coefficient model and is regarded as part of the likelihood. The prior on τ completes the specification of a joint prior distribution on all model parameters and is often called the “second-stage” prior. Here a is a vector of prior hyper-parameters which must be assessed or chosen by the investigator.

p (θ1 , . . . , θH , τ |h) = p (θ1 , . . . , θH |τ ) p (τ |a) =

H h=1

p (θh |τ ) p (τ |a)

(25)

99

100

CHAPTER 2 Inference for marketing decisions

One way of regarding the hierarchical model is just as a device to induce a joint prior on the unit-level parameters, that is we can integrate out τ to inspect the implied prior. p (θ1 , . . . , θH |a) =

H

p (θh |τ ) p (τ |a) dτ

(26)

h=1

It should be noted that, while {θh } are independent conditional on τ , the implied joint prior can be highly dependent, particularly if the prior on τ is diffuse (note: it is sufficient that the prior on τ should be proper in order for the hierarchical model to specify a valid joint distribution). To illustrate this, consider a linear model, θh = τ + vh . τ acts as common variance component and the correlation between any two θ s is Corr (θh , θk ) =

στ2

στ2 + σv2

As the diffusion of the distribution of τ relative to v increases, this correlation tends toward one.

3.2.2 Adaptive shrinkage The popularity of the hierarchical model stems from the improved parameter estimates that are made possible by the two stage prior distribution. To understand why this is the case, let’s consider a simplified situation in which each unit level likelihood is approximately normal with mean θˆMLE and covariance matrix, Ih−1 (here we are abstracting issues involving existence of the MLE). If we have a normal prior, from θh ∼ N θ¯ , Vθ , then conditional on the normal prior parameters the approximate posterior mean is given by −1 θ˜ = Ih + Vθ−1 Ih θˆh + Vθ−1 θ¯

(27)

This equation demonstrates the principle of shrinkage. The Bayes estimator is a compromise between the MLE (where the unit h likelihood is centered) and the prior mean. The weights in the average depend on the information content of the unit level likelihood and the variance of the prior. As we have discussed above, shrinkage is what gives Bayes estimators such excellent sampling properties. The problem becomes where should we shrink toward and by how much? In other words, how do we assess the normal mean and variance-covariance matrix. The two-part prior in the hierarchical model means that the data will be used (in part) to assess these parameter values. The “mean” of the prior will be something like the mean of the θh parameters over units and the variance will summarize the dispersion or extent of heterogeneity. This means that if we think that all units are very similar (Vθ is small) then we will shrink a lot. In this sense the hierarchical Bayes procedures has “adaptive shrinkage.” Note also that, for any fixed amount of heterogeneity, units with a great deal of information regarding θh will not be shrunk much.

3 Heterogeneity

3.2.3 MCMC schemes Given the independence of the units conditional on θh , all MCMC algorithms for hierarchical models will contain two basic groups of conditional distributions. p (θh |yh , τ ) ,

h = 1, . . . , H

p (τ | {θh } , a)

(28)

As is well-known, the second part of this scheme exploits the conditional independence of yh and τ . The first part of (28) is dependent on the form of the unit-level likelihood, while the second part depends on the form of the first stage prior. Typically, the priors in the first and second stages are chosen to exploit some sort of conjugacy and the {θh } are treated as “data” with respect to the second stage.

3.2.4 Fixed vs. random effects In classical approaches, there is a distinction made between a “fixed effects” specification in which there are different parameters for every cross-sectional unit and random effects models in which the cross-sectional unit parameters are assumed to be draws from a super-population. Advocates of the fixed effects approach explain that the approach does not make any assumption regarding the form of the distribution or the independence of random effects from included covariates in the unit-level likelihood. The Bayesian analogue of the fixed effects classical model is an independence prior with no second-stage prior on the random effects parameters as in (23). The Bayesian hierarchical model is the Bayesian analogue of a random effects model. The hierarchical model assumes that each cross-sectional unit is exchangeable (possibly conditional on some observable variables). This means that a key distinction between models (Bayesian or classical) is what sort of predictions could be made for a new cross-sectional unit. In either the classical or Bayesian “fixed effects” approach, no predictions can be made about a new member of the cross-section as there is no model linking units. Under the random effects view, all units are exchangeable and the predictive distribution for the parameters of a new unit is given by p (θh∗ |y1 , . . . , yH ) =

p (θh∗ |τ ) p (τ |y1 , . . . , yH ) dτ

(29)

3.2.5 First stage priors Normal prior A straightforward model to implement is a normal first stage prior with possible covariates. θh = zh + vh ,

vh ∼ N (0, Vθ )

(30)

where zh is a d × 1 vector of observable characteristics of the cross-sectional unit. is a d × k matrix of coefficients. The specification in (30) allows the mean of each of the elements of θh to depend on the z vector. For ease of interpretation, we find it

101

102

CHAPTER 2 Inference for marketing decisions

useful to subtract the mean and use an intercept. ¯ zh = (1, xh − x) In this formulation, the first row of can be interpreted as the mean of θh . (30) specifies a multivariate regression model and it is convenient, therefore, to use the conjugate prior for the multivariate regression model. Vθ ∼ I W V , ν ¯ Vθ ⊗ A−1 δ = vec () |Vθ ∼ N δ,

(31)

A is a d × d precision matrix. vec () is the stacks the columns of a matrix up and ⊗ denotes Kronecker product. This prior specification allows for direct one-for-one draws of the common parameters, δ and Vθ .

Mixture of normals prior While the normal distribution is flexible, there is no particular reason to assume a normal first-stage prior. For example, if the observed outcomes are choices among products, some of the coefficients might be brand specific intercepts. Heterogeneity in tastes for a product might be more likely to assume the form of clustering by brand. That is, we might find “clusters” of consumers who prefer specific brands over other brands. The distribution of tastes across consumers might then be multi-modal. We might want to shrink different groups of consumers in different ways or shrink to different group means. A multi-modal distribution will achieve this goal. For other coefficients such as a price sensitivity coefficient, we might expect a skewed distribution centered over negative values. Mixtures of multivariate normals are one way of achieving a great deal of flexibility (see, for example, Griffin et al., 2010 and the references therein). Multi-modal, thick-tailed, and skewed distributions are easily achieved from mixtures of a small number of normal components. For larger numbers of components, virtually any joint continuous distribution can be approximated. The mixture of normals model for the first-stage prior is given by θh = zh + vh vh ∼ N (μind , ind )

(32)

ind ∼ MN (π) π is a K × 1 vector of multinomial probabilities. This is a latent version of a mixture of K normals model in which a multinomial mixture variable, denoted here by ind, is used. In the mixture of normal specification, we remove the intercept term from zh and allow vh to have a non-zero mean. This allows the normal mixture components to mix on the means as well as on scale, introducing more flexibility. As before, it is convenient to demean the variables in z. A standard set of conjugate priors can be used for the mixture probabilities and component parameters, coupled with a standard

3 Heterogeneity

conjugate prior on the matrix. ¯ A−1 δ = vec () ∼ N δ, δ π ∼D α μk ∼ N μ, ¯ k ⊗ a −1 μ k ∼ I W V , ν

(33)

Assessment of these conjugate priors is relatively straightforward for diffuse settings. Given that the θ vector can be of moderately large dimension (>5) and the θh parameters are not directly observed, some care must be exercised in the assessment of prior parameters. In particular, it is customary to assess the Dirichlet portion of the prior by using the interpretation that the K × 1 hyper-parameter vector, α, is an observed classification of a sample of size, α k , into the K components. Typically, all components in α are assessed equal. When a large number of components are used, the elements of α should be scaled down in order to avoid inadvertently specifying an informative prior with equal prior probabilities on a large number of components. We suggest a setting of α k = .5/K (see Rossi, 2014a, Chapter 1 for further discussion). As in the single component normal model, we can exploit the fact that, given the H × k matrix, , whose columns consist of each θh values and standard conditionally conjugate priors in (33), the mixture of normals model in (32) is easily handled by a standard unconstrained Gibbs sampler which includes augmentation to include the latent vector of component indicators (see Rossi et al., 2005, Section 5.5.1). The latent draws can be used for clustering as discussed below. We should note that any label-invariant quantity such as a density estimate or clustering is not affected by the “label-switching” identification problem (see Fruhwirth-Schnatter, 2006 for a discussion). In fact, the unconstrained Gibbs sampler is superior to various constrained approaches in terms of mixing. A tremendous advantage of Bayesian methods when applied to mixtures of normals is that, with proper priors, Bayesian procedures do not overfit the data and provide reasonable and smooth density estimates. In order for a component to obtain appreciable posterior mass, there must be enough structure in the “data” to favor the component in terms of a Bayes factor. As is standard in Bayesian procedures, the existence of a prior puts an implicit penalty on models with a larger number of components. It should also be noted that the prior for the mixture of normals puts positive probability on models with less than K components. In other words, this is really a prior on models of different dimensions. In practice, it is common for the posterior mass to be concentrated on a set of components of much smaller size than K. The posterior distribution of any ordinate of the joint (or marginal densities) of the mixture of normals can be constructed from the posterior draws of component parameters and mixing probabilities. In particular, a Bayes estimate of a density ordinate

103

104

CHAPTER 2 Inference for marketing decisions

can be constructed. 1

r πk φ θ|μrk , kr dˆ (θ) = R R

K

(34)

r=1 k=1

Here the superscript r refers to an MCMC posterior draw and φ (·) the k-variate multivariate normal density. If marginals of sub-vectors of θ are required, then we simply compute the required parameters from the draws of the joint parameters.

3.2.6 Dirichlet process priors While it can be argued that a finite mixture of normals is a very flexible prior, it is true that the number of components must be pre-specified by the investigator. Given that Bayes methods are being used, a practical approach would be to assume a very large number of components and allow the proper priors and natural parsimony of Bayes inference to produce reasonable density estimates. For large samples, it might be reasonable to increase the number of components in order accommodate greater flexibility. The Dirichlet Process (DP) approach can, in principle, allow the number of mixture components to be as large as the sample size and potentially increase with the sample size. This allows for a claim that a DP prior can facilitate general non-parametric density estimation. Griffin et al. (2010) provide a discussion of the DP process approach to density estimation. We review only that portion of this method necessary to fix notation for use within a hierarchical setting. Consider a general setting in which each θh is drawn from a possibly different multivariate normal distribution. θh ∼ N (μh , h ) The DP process prior is a hierarchical prior on the joint distribution of {(μ1 , 1 ) , . . . , (μH , H )}. The DP prior has the effect of grouping together cross-section units with the same value of (μ, ) and specifying a prior distribution for these possible “atoms.” The DP process prior is denoted G (α, G0 (λ)). G (·) specifies a distribution over distributions that is centered on the base distribution, G0 , with tightness parameter, α. Under the DP prior, G0 is the marginal prior distribution for the parameters for any one cross-sectional unit. α specifies the prior distribution on the clustering of units to a smaller number of unique (μ, ) values. Given the normal base distribution for the cross-sectional parameters, it is convenient to use a natural conjugate base prior. 1 (35) ¯ × h , h ∼ I W V , ν G0 (λ) : μh |h ∼ N μ, a λ is the set of prior parameters in (35): μ, a, ν, V . In our approach to a DP model, we also put priors on the DP process parameters, α and λ. The Polya Urn representation of the DP model can be used to motivate the choice of prior distributions on these process parameters. α influences the number

3 Heterogeneity

of unique values of (μ, ) or the probability that a new set of parameter values will be “proposed” from the base distribution, G0 . λ governs the distribution of proposed values. For example, if we set λ to put high prior probability on small values of , then the DP prior will attempt to approximate the density of parameters with normal components with small variance. It is also important that the prior on μ put support on a wide enough range of values to locate normal components at wide enough spacing to capture the structure of the distribution of parameters. On the other hand, if we set very diffuse values of λ then this will reduce the probability of the “birth” of a new component via the usual Bayes Factor argument. α induces a distribution on the number of distinct values of (μ, ) as shown in Antoniak (1974). (α) (36) P r I ∗ = k = Sn(k) α k (n + α) Sn are Sterling numbers of the first kind. I ∗ is the number of clusters or unique values of the parameters in the joint distribution of (μh , h , h = 1, . . . , H ). It is common in the literature to set a Gamma prior on α. Our approach is to propose a simple and interpretable distribution for α. (k)

α − αl φ p (α) ∝ 1 − αu − αl

(37)

α ∈ α l , α u . We assess the support of α by setting the expected minimum and max∗ and I ∗ . We then invert to obtain the bounds of imum number of components, Imin max support for α. Rossi (2014b), Chapter 2, provides further details including the assessment of the φ hyper-parameter. It should be noted that this device does not restrict the support of the number of components but merely assesses an informative prior that puts most of the mass of the distribution of α on values which are consistent with the specified range in the number of unique components. A draw from the posterior distribution of α can easily be accomplished as I ∗ is sufficient and we can use a griddy Gibbs sampler as this is simply a univariate draw. Priors on λ (35) can be also be implemented by setting μ ¯ = 0 and letting ν vIk . This paramV = νvIk . If ∼ I W (νvIk , ν), then this implies mode () = ν+2 eterization helps separate the choice of a location for the matrix (governed by v) from the choice of the tightness on the prior for (ν). In this parameterization, there are three scalar parameters that govern the base distribution, (a, v, ν). We take them to be a priori independent with the following distributions. p (a, v, ν) = p (a) p (v) p (ν) a ∼ U al , au v ∼ U vl , vu

ν = dim (θh ) − 1 + exp (z) , z ∼ U zl , zu , zl > 0

(38)

105

106

CHAPTER 2 Inference for marketing decisions

It is a simple matter to write down the conditional posterior given that the unique The set of I ∗ unique parameter values is denoted set of (μ, ) are sufficient. ∗ μ∗i , j∗ , j = 1, . . . , I ∗ . The conditional posterior is given by

∗

p a, v, ν|

I∗ a

−1 ∗ −1 ∗ −1/2 exp − μ∗i i∗ μi ∝ a i 2 i=1 1 ν/2 ∗ −(ν+k+1)/2 ∗ |νvIk | etr − νvi p (a, v, ν) i 2

(39)

We note that the conditional posterior factors and that, conditional on ∗ , a and (ν, v) are independent.

3.2.7 Discrete first stage priors Both the economics (Heckman and Singer, 1984) and marketing literatures (see, for example, references in Allenby and Rossi, 1999) considered the use of discrete random coefficient or mixing distributions. In this approach, the first state prior is a discrete distribution which puts mass on only a set of M unknown mass points. Typically, some sort of model selection criterion is used to select the number of mass points (such as BIC or AIC). As discussed in Allenby and Rossi (1999), discrete mixtures are poor approximations to continuous mixtures. We do not believe that consumer preferences consist of only a small number of types but, rather, a continuum of preferences which can be represented by a flexible continuous distribution. One can see this is a degenerate special case of the mixture of normals approach. Given that it is now feasible to use not only mixture of normals but mixtures with a potentially infinite number of components, the usefulness of the discrete approximation has declined. In situations where the model likelihood is extremely costly to evaluate (such as some dynamic models of consumer behavior and some models of search), the discrete approach retains some appeal from pure computational convenience.

3.2.8 Conclusions In summary, hierarchical models provide a very appealing approach to modeling heterogeneity across units. Today almost all conjoint modeling (for details see Chapter 3) is accomplished using hierarchical Bayesian procedures applied to a unit level multinomial logit model. With more aggregate data, Bayesian hierarchical models are frequently employed to insure high dimensional systems of sales response equations produce reasonable coefficient estimates. Given that there are a very large number of cross-sectional units in marketing panel data, there is an opportunity to move considerably beyond the standard normal distribution of heterogeneity. The normal distribution might not be expected to approximate the distribution of preferences across consumers. For example, brand preference parameters might be expected to be multi-modal while marketing mix sensitivity parameters such as a price elasticity or advertising responsiveness may

3 Heterogeneity

be highly skewed and sign-constrained distributions.15 For this reason, mixture-ofnormal priors can be very useful (see, for example, Dube et al., 2010).

3.3 Big data and hierarchical models It is not uncommon for firms to assemble panel data on millions of customers. Extremely large panels pose problems for the estimation and use of hierarchical models. The basic Gibbs sampler strategy in (28) means alternating between drawing unit level parameters (θh ) and the common second-stage prior parameters (τ ). Clearly, the unit level parameters can be draw in parallel which exploits a modern distributed computing environment with many loosely coupled processors. Moreover, the amount data which have to be sent to each processor undertaking unit-level computations is small. However, the draws of the common parameters require assembling all unit level parameters, {θh }. The communications overhead of assembling unit level parameters may be prohibitive. One way out of this computational bottleneck, while retaining the current distributed computer architecture, is to perform common parameter inferences on a subset (but probably a very large subset) of the data. The processor farm could be used to draw unit level parameters conditional on draws of the common parameters which have already been accomplished and reserved for this purpose. Thus, an initial stage of computation would be to implement the MCMC strategy on a large subset of units. Draws of the common parameters from this analysis would be reserved. In a second stage, common parameter draws would be sent down along with unit data to a potentially huge group of processors that would undertake parallel unit level computations. This remains an important area for computational research.

3.4 ML and hierarchical models The Machine Learning (ML) literature has emphasized flexible models which are evaluated primarily on predictive performance. While the theory of estimation does not discriminate between two equally flexible approaches to estimating non-linear and unknown regression functions, as a practical matter investigators in the ML have found that certain functional forms or sets of basis functions appear to do very well in approximating arbitrary regression functions. One could regard the entire hierarchical model approach entirely from a predictive point of view. To predict some unit level outcome variable, we should be able to use any function of the other observations on this unit. That is to predict, yht0 , we can use any other data on unit h except for yht0 . The hierarchical approach also suggests that summaries of the unit data (such as means and variance of unit parameters) might also be helpful in predicting yht0 . This suggests that we might consider ways of 15 For example, the price coefficient could be reparameterized as in β = −eδ and the first stage prior p

could be placed on δ. This will require care in assessment of priors as the δ parameter is on a log-scale while other parameters will be on a normal scale.

107

108

CHAPTER 2 Inference for marketing decisions

training flexible ML methods to imitate or approximate the predictions that arise from a hierarchical approach and use these as approximate solutions to fitting hierarchical models when it is impractical to implement the full-blown MCMC apparatus. Again, this might be a fruitful avenue for future research.

4 Causal inference and experimentation As we have indicated, one fundamental goal of marketing research is to inform decisions which firms make about the deployment of marketing resources. At the core, all firm decisions regarding marketing involve counterfactual reasoning. For example, we must estimate what a potential customer would do had they not been exposed to a paid search ad in order to “attribute” the correct sales response estimate to this action. Marketing mix models pose a much more difficult problem of valid counterfactual estimates of what would happen to sales and profits if marketing resources were re-allocated in a different manner than observed in the past. The importance of counterfactual reasoning in any problem related to optimization of resources raises the ante for any model of customer behavior. Not only must this model match the co-variation of key variables in the historical data, but the model must provide accurate and valid forecasts of sales in a new regime with a different set of actions. This means that we must identify the causal relationship between marketing variables and firm sales/profits and this causal relationship must be valid over a wide range of possible actions, including actions outside of the support of historical data. The problem of causal inference has received a great deal of attention in the bio-statistics and economic literatures, but relatively little attention in the marketing literature. Given that marketing is, by its very nature, a decision-theoretic field, this is somewhat surprising. The problems in the bio-statistics and economics applications are usually evaluating the causal effect of a “treatment” such as a new drug or a job-training program. Typically, the models used in these literatures are simple linear models. Often the goal is to estimate a “local” treatment effect. That is, a treatment effect for those induced by an experiment or other incentives to become treated. A classic example from this literature is the Angrist and Krueger (1991) paper which starts with the goal of estimating the returns to an additional year of schooling but ends up only estimating (with a great deal of uncertainty) the effect of additional schooling for those induced to complete the 10th grade (instead of leaving school in mid-year). To make any policy decisions regarding investment in education, we would need to know the entire causal function (or at least more than one point) for the relationship between years of education and wages. The analogy in marketing analytics is to estimate the causal relationship between exposures to advertising and sales. In order to optimize the level of advertising, we require the whole function not just a derivative at a point. Much of the highly influential work of Heckman and Vytlacil (2007) has focused on the problem of evaluating job training programs where the decision to enroll in

4 Causal inference and experimentation

the program is voluntary. This means that those people who are most likely to benefit from the job training program or who have the least opportunity cost of enrolling (such as the recently unemployed) are more likely to be treated. This raises a host of thorny inference problems. The analogy in marketing analytics is to evaluate the effect of highly targeted advertising. Randomized experimentation offers at least a partial solution to the problems of causal inference. Randomization in assignment to treatment conditions can be exploited as the basis of estimators for causal effects. Both academic researchers and marketing practitioners have long advocated the use of randomized experiments. In the direct marketing and credit card contexts, randomized field experiments have been conducted for decades to optimize direct marketing offers and manage credit card accounts. In advertising, IRI International used randomized experiments implemented through split cable to evaluate TV ad creatives (see Lodish and Abraham, 1995). In the early 1990s, randomized store-level experiments were used to evaluate pricing policies by researchers at the University of Chicago (see Hoch et al., 1994). In economics, the Income-Maintenance experiments of the 1980s stimulated an interest in randomized social experiments. These income maintenance experiments were followed by a host of other social experiments in housing and health care.

4.1 The problem of observational data In the generic problem of estimating the relationship between sales and marketing inputs, the goal is to make causal inferences so that optimization is possible on the basis of our estimated relationship. The problem is that we often have only observational data on which to base our inferences regarding the causal nexus between marketing variables and sales. There is a general concern that not all of the variation in marketing input variables can be considered exogenous or as if the variation is the result of random experimentation. Concerns that some of the variation in the right hand side variables is correlated with the error term or jointly determined with sales mean that observational data may lead to biased or inconsistent causal inferences. For example, suppose we have aggregate time series data16 on the sales of a product and some measure of advertising exposure. St = f (At |θ ) + εt Our goal is to infer the function, f , which can be interpreted as a causal function, that is, we can use this function to make valid predictions of expected sales for a wide range of possible values of advertising. In order to consider optimizing advertising, 16 In the general case, assembling even observational data to fit a market response model can be difficult.

At least three or possible four different sources are required: (1) Sales data, (2) Pricing and promotional data, (3) Digital advertising, and (4) Traditional advertising such as TV, Print, and Outdoor. Typically, these various data sources feature data at various levels of temporal, geographic, and product aggregation. For example, advertising is typically not associated with a specific product but with a line of products and may only be available at the monthly or quarterly level.

109

110

CHAPTER 2 Inference for marketing decisions

we require a non-linear function which, at least at some point, exhibits diminishing returns. Given that we wish to identify a non-linear relationship, we will require more extensive variation in A than if we assume a linear approximation. The question from the point of view of causal inference is whether or not we can use the variation in the observed data to make causal inferences. As discussed in Section 2.3, the statistical theory behind any likelihood-based inference procedure for such a model assumes the observed variation in A is as though obtained via random experimentation. In a likelihood-based approach, we make the assumption that the marginal distribution of A is unrelated to the parameters, θ , which drive the conditional mean function. An implication of this assumption is that the conditional mean function is identified only via the effect of changes in A; the levels of A have no role in inference regarding the parameters that govern the derivative of f () with respect to A. In practice, this may not be true. In general, if the firm sets the values of A observed in the data on the basis of the function f (), then the assumption that the marginal distribution of A is not related to θ is violated. In this situation, we may not be able to obtain valid (consistent) estimates of the sales response function parameters.17 Manchanda et al. (2004) explain how a model in which both inputs are chosen jointly can be used to extract causal information from the levels of an advertising input variable. However, this approach requires additional assumptions about how the firm chooses the levels of advertising input. Another possibility is that there is some unobservable variable that influences both advertising and sales. For example, suppose there are advertising campaigns for a competing product that is a close substitute and we, as data scientists, are not aware of or cannot observe this activity. It is possible that, when there is intensive activity from competitive advertising, the firm increases the scale of its advertising to counter or blunt the effects of competitive advertising. This means that we no longer estimate the parameters of the sales response function consistently. In general, anytime the firm sets A with knowledge of some factor that also affects Sales and we do not observe this factor, we will have difficulty recovering the sales response function parameters. In some sense, this is a generic and non-falsifiable critique. How do we know that such an unobservable does not exist? We can’t prove it. Typically, the way we might deal with this problem is to include as large a possible set of covariates in the sales equation as control variables. The problem in sales response model building is that we often do not observe any actions of competing products or we only observe these imperfectly and possibly at a different time frequency. Thus, one very important set of potential control variates is often not available. Of course, this is not the only possible set of variables observable to the firm but not observable to the data scientist. There are three possible ways to deal with this problem of “endogeneity.”

17 An early example is Bass (1969), with a model of the simultaneous determination of sales and advertis-

ing is calibrated using cigarette data. Bass suggested that ad hoc rules which allocate advertising budgets as some percentage of sales create a feedback loop or simultaneity problem.

4 Causal inference and experimentation

1. We might consider using data sampled at a much higher frequency than the decisions regarding A are made. For example, if advertising decisions are made only quarterly, we might use weekly data and argue that the lion’s share of variation in our data holds the strategic decisions of the firm constant.18 2. We might attempt to partition the variation in A into that which is “clean” or unrelated to factors driving sales and that which is. This is the logical extension of the conditioning approach of adding more observables to the model. We would then use an estimation method with uses only the “clean” portion of the variation. 3. We could consider experimentation to break whatever dependence there is between the advertising and sales. Each of these ideas will be discussed in detail below. Before we embark on a more detailed discussion of these methods, we will relate our discussion of simultaneity or endogeneity to the literature on causal inference for treatment effects.

4.2 The fundamental problem of causal inference A growing literature (see, for example, Angrist and Pischke, 2009 and Imbens and Rubin, 2014) emphases a particular formulation of the problem of causal inference. Much of this literature re-interprets existing econometric methods in light of this paradigm. The basis for this paradigm of causal inference was originally suggested by Neyman (1990) who conceived of the notion of potential outcomes for a treatment. The notation favored by Imbens and Rubin is as follows. Y represents the outcome random variable. In our case, Y will be sales or some sort of event (like a conversion or click) which is on the way toward a final purchase. We seek to evaluate a treatment, denoted D. For now, consider an binary treatment such as exposure to an ad.19 We conceive of there being two potential outcomes: • Yi (1): potential outcome if unit i is exposed to the treatment. • Yi (0): potential outcome if unit i is not exposed to the treatment. We would like to estimate the causal effect of the treatment which is defined as i = Yi (1) − Yi (0) The fundamental problem of causal inference is that we only see one of two potential outcomes for each unit being treated. That is, we only observe Yi (1) for Di = 1 and Yi (0) for Di = 0. Without further assumptions or information, this statistical 18 Here we are assuming that within variation in A is exogenous. For example, if promotions or ad

campaigns are designed at the quarterly level, then we are assuming that within quarter variation is execution-based and unrelated to within quarter demand shocks. The validity of this assumption would have to be assessed in the same way that any argument for exogeneity is made. However, this exploits institutional arrangements that may well be argued are indeed exogenous. 19 It is a simple matter to extend potential outcomes framework a more continuous treatment variables such as in causal inference with respect to the effect of price on demand.

111

112

CHAPTER 2 Inference for marketing decisions

problem is un-identified. Note that we have already simplified the problem greatly by assuming a linear model or restricting our analysis to only one “level” of treatment. Even if we simplify the model by assuming a constant treatment effect, i = ∀i, the causal effect is still not identified. To see this problem, let’s take the mean differences in Y between those who were treated and not treated and express this in terms of potential outcomes. E [Yi |Di = 1] − E [Yi |Di = 0] = E [Yi (1) |Di = 1] − E [Yi (0) |Di = 0] = E [Yi (1) |Di = 1] − E [Yi (0) |Di = 1] + E [Yi (0) |Di = 1] − E [Yi (0) |Di = 0] This equation simply states that what the data identifies is the mean difference in the outcome variable between the treated and untreated and this can be expressed as the sum of two terms. The first term is the effect on the treated, [Yi (1) |Di = 1] − E [Yi (0) |Di = 1], and the second term is called the selection bias, E [Yi (0) |Di = 1] − E [Yi (0) |Di = 0]. Selection bias occurs when the potential outcome for those assigned to the treatment differs in a systematic way from those who are assigned to the “control” or assigned not to be treated. This selection bias is what inspired much of the work of Heckman, Angrist, and Imbens to obtain further information. The classic example of this is the so-called “ability” bias argument in the literature on education. We can’t simply compare the wages of college graduates with those who did not graduate for college, because it is likely that college graduates have greater ability even “untreated” with a college education. Those who argue for the “certification” view of higher education are the extreme point of this selection bias – they argue that the only point of education is not those courses in Greek Philosophy but simply the selection bias of finding higher ability individuals. It is useful to reflect on what sort of situations are likely to have large selection bias in the evaluation of marketing actions. Mass media like TV or print are typically only targeted at a very broad demographic group. For example, advertisers on the Super Bowl are paying a great deal of money to target men aged 25-45. There is year-to-year variation in Super Bowl viewership which in principle would allow us to estimate some sort of regression based model of the effect of exposure to Super Bowl ads. The question is what is the possible selection bias? It is true that the effectiveness of a beer ad on those who view the Super Bowl versus a random consumer may be very different, but, that may not be relevant to the Super Bowl advertiser. The SB advertiser cares more about the effect on the treated, that is the effect of exposure on those in the target audience who view the SB. Are those who choose not to view the SB in year X different from those who view the SB in year Y? Not necessarily, viewership is probably driven by differences in the popularity of the teams in the SB. Thus, if our interest is the effect on the treated Super Bowl fan, there probably is little selection bias (under the assumption that the demand for beer is similar across the national population of SB fans). However, selection bias is a probably a very serious problem in other situations. Consider a firm like North Face that markets outdoor clothing. This is a highly sea-

4 Causal inference and experimentation

sonal industry with two peaks in demand each year: one in the spring as people anticipate summer outdoor activities and another in the late fall as consumers are purchasing holiday gifts. North Face is aware of these peaks in demand and typically schedules much of its promotional and advertising activity to coincide with these peaks in demand. This means we can’t simply compare sales in periods of high advertising activity to sales in periods of low as we are confounding the seasonal demand shift with the effect of marketing. In the example of highly seasonal demand and coordinated marketing, the marketing instruments are still mass or untargeted for the most part (other than demographic and, possible, geographic targeting rules). However, the problem of selection bias can also be created by various forms of behavioral targeting. The premier example of this is the paid search advertising products that generate much of Google Inc.’s profits. Here the ad is triggered by the consumer’s search actions. Clearly, we can’t compare the subsequent purchases of someone who uses search keywords related to cars with those consumers who were not exposed to paid search ads for cars. There is apt to be a huge selection bias as most of those not exposed to the car keyword search ad are not in the market to purchase a car. Correlational analyses of the impact of paid search ads are apt to show a huge impact that is largely selection bias (see Blake et al., 2015 for analysis of paid search ads for eBay in which they conclude that they have little effect). There is no question that targeting ads based on the preferences of customers as revealed in their behavior is apt to become even more prevalent in the future. This means that, for all the talk of “big data” we are creating more and more data that is not amenable to analysis with our standard bag of statistical tricks.

4.3 Randomized experimentation The problem with observational data is the potential correlation between “treatment” assignment and the potential outcomes. We have seen that this is likely to be a huge problem for highly targeted forms of marketing activities where the targeting is based on customer preferences. More generally, any situation in which some of the variation in the right hand side variables is correlated with the error term in the sales response equation will make any “regression-style” method inconsistent in estimating the parameters of the causal function. For example, the classical errors-in-variables model results in a correlation between the measured values of the rhs variables and the error term. In a randomized experiment, the key idea is that assignment to the treatment is random and therefore uncorrelated with any other observable or unobservable variable. In particular, assignment to the treatment is uncorrelated with the potential outcomes. This eliminates the selection bias term. E [Yi (0) |Di = 1] − E [Yi (0) |Di = 0] = 0 This means that the difference in means between the treated and untreated populations consistently estimates not only the effect on the treated but also the average effect or the effect on the person chosen at random from the population.

113

114

CHAPTER 2 Inference for marketing decisions

However, it is important to understand that when we say person chosen at random from the “population” we are restricting attention to the population of units eligible for assignment in the experiment. Deaton and Cartright (2016) call the set of units eligible for assignment to a treatment cell (including the control cell) the trial sample. In many randomized experiments, the trial sample is anything but a random sample of the appropriate population to which we wish to extrapolate the results of the experiment. Most experiments have a very limited domain. For example, if we randomly assign DMAs in the Northeast portion of the US, our population is only that restricted domain. Most of the classic social experiments in economics have very restricted domains or population to which the results can be extrapolated. Generalizability is the most restrictive aspect of randomized experimentation. Experimentation in marketing applications such as “geo” or DMA based experiments conducted by Google and Facebook start to get at experiments which are generalizable to the relevant population (i.e. all US consumers). Another key weakness of randomization is that this idea is really a large sample concept. It is of little comfort to the analyst that treatments were randomly assigned if it turns out that randomization “failed” and did not give rise to a random realized sample of treated and untreated units. With a finite N , this is a real possibility. In some sense, all we know is that statements based on randomization only work asymptotically. Deaton and Cartright (2016) make this point as well and point out that only when all other effects actually balance out between the controls and the treated does randomization achieve the desired aim. This only happens in expectation or in infinite size samples. If there are a large number of factors to be “balanced out,” then this may require very large N . A practical limitation to experimentation is that there can be situations in which randomization results in samples with low power to resolve causal effects. This can happen when the effects of the variables being tested are small, the sales response model has low explanatory power, and the sales dependent variable is highly variable. A simple case might be where you are doing an analysis of the effect of an ad using individual data and no other covariates in the sales response model. The standard errors of the causal effect (here just the √ coefficient on the binary treatment variables) of course are decreasing only at rate N and increasing in the standard deviation of the error term. If the effects are small, then the standard deviation of the error term is about the same as the standard deviation of sales. Simple power calculations in these situations can easily result in experimental designs with thousands or even tens of thousands of subjects, a point made recently by Lewis and Rao (2014). Lewis and Rao neglect to say that if there are other explanatory variables (such as price and promotion) included in the model, then even though sales may be highly variable, we still may be able to design experiments with adequate power even with smallish N . If there are explanatory variables included in the response model (in addition to dummy variables corresponding to treatment assignment), then the variance of the error term can be much lower than the variance of dependent variable (sales). In these situations, the power calculations that lead to pessimistic views regarding the

4 Causal inference and experimentation

number of experimental subjects could change dramatically and the conclusions of Lewis and Rao may not apply. It should be emphasized that this is true even though these additional control variables will be (by construction) orthogonal to the treatment regressors. While Lewis and Rao’s point regarding the difficulties in estimating ad effects is well-taken due to the small size of ad effects, this is not true regarding randomized experimentation on pricing. Marketing researchers have long observed that price and promotional effects are often very large (price elasticities exceeding 3 in absolute value and promotional lifts of over 100 per cent). This means that randomized experiments may succeed in estimating price and promotional effects with far smaller numbers of subjects than for advertising experiments (Dubé and Misra, 2018 constitute an example of recent attempts to use experimentation to optimize pricing). While randomization might seem the panacea for estimation of causal effects, it has severe limitations for situations in which a large number or a continuum of causal effects are required. For example, consider the situation of two marketing variables and a possibly non-linear causal function: In order to maximize profits for choice of the two variables, we must estimate not just the gradient of this function at some point but the entire function. Clearly, this would require an continuum of experimental conditions. Even if we discretized the values of the variables used in the experiments, the experimental paradigm clearly suffers from the curse of dimensionality as we add variables to the problem. For example, the typical marketing mix model might include at least five or six marketing variables resulting in experiments with hundreds of cells. conjoint as an experiment

4.4 Further limitations of randomized experiments 4.4.1 Compliance in marketing applications of RCTs The history of randomized experiments dates from agricultural experiments in which the “treatment” consists of various agricultural practices such as fertilization etc. When assigned to an experimental cell, there were no “compliance” issues – whatever treatment was prescribed was administered. However, in both medical and social experimentation applications, compliance can be an important problem. Much of Heckman’s work centers around the evaluation of various labor market interventions such as job training programs. Even with complete randomized selection treatment, the government cannot compel US citizens to enroll in job training programs. The best we can do is randomized eligibility for treatment or assignment to treatment. Clearly, there can be selection bias in the acceptance of treatment by those assigned to treatment. Heckman and others have modeled this decision as a rational choice in which people consider the benefits of the job training program as well as their opportunity costs of time. In any event, the “endogeneity” of the actual receipt of treatment means that selection bias would affect the “naive” differences in means (or regression generalizations) approach to effect estimation. There are two ways to tackle this problem: (1) explicit modeling of the decision to accept treatment or (2) use of the treatment assignment as an instrumental variable (see discussion in Angrist and Pis-

115

116

CHAPTER 2 Inference for marketing decisions

chke, 2009, Section 4.4.3). The Instrumental Variables estimator, in this case, is to simply scale the difference in means by the compliance rate. In some situations such as geographically based ad experiments or store-level pricing experiments, compliance is not an issue in marketing.20 If we randomize exposure to an ad by DMAs, all consumers in the DMA will have the opportunity to be exposed to the ad. Non-compliance would require consumers in the DMA to deliberately avoid exposure to the ad. Note that this applies to any ad delivery mechanism as long as everyone in the geographic area has the opportunity to become exposed. The only threat to the experimental design is “leakage” in which consumers in nearby DMAs are exposed to the stimulus. Store-level pricing or promotional experiments is another example where compliance is assured. However, consider what happens when ad experiments are conducted by assigning individuals to an ad exposure treatment. For example, Amazon randomly assigns customers to be exposed to an ad or a “recommendation” while others are assigned not be exposed. However, not all those assigned to the treatment cell will actually be exposed to the ad. The only way to be exposed to the ad is to visit the Amazon site (or mobile app). Those who are more frequent visitors to the Amazon website will have a higher probability of being exposed to the ad than those who are less frequent or intense users. If the response to the ad is correlated to visitor frequency, then the analysis of the RCT for this ad will only reveal the “intent -to-treat” effect and not the average treatment effect on the treated. One way to avoid this problem is to randomize assignment by session and not by user (see Sahni, 2015 for details). The compliance issue in digital advertising becomes even more complicated due to the “ad campaign” optimization algorithms used by advertising platforms to enhance the effect of ads. For example Facebook has established a randomized experimentation platform (see Gordon and Zettelmeyer, 2017 for analysis of some Facebook ad experiments). The idea is to allow Facebook advertisers to easily implement and analyze randomized experiments for ad campaigns. Facebook users will be randomly assigned to a “control” status in which they will not be exposed to ads. Ads are served by a complicated auction mechanism. For controls, if the ad campaign in question wins an auction for space on a Facebook page that a control is viewing, then the “runner-up” will be served. This insures one-sided compliance – the controls will never have access to the treatment. However, the “experimental” unit will only have an opportunity to view the ad if they visit Facebook as we have already pointed out. However, the problem of compliance is made worse by the Facebook ad campaign “optimization” feature. Any ad campaign is invoked for some span of time (typically 3-5 weeks on Facebook). Data from exposures to the ad early in the campaign is used 20 There are always problems with implementation of store experiments. That is, the researcher must

verify or audit stores to insure treatments are implemented and during the time periods of the experiment. This is not a “compliance” issue as compliance addresses whether or not experimental subjects can decide or influence exposure to the treatment. Consumers are exposed to the properly executed store experiment stimuli. Similarly, ad experiments do not have an explicit compliance problem. There may be a leakage problem across geographies but this is not a compliance problem.

4 Causal inference and experimentation

to model who should be exposed to the ad in later stages of the campaign, in addition to whatever targeting criteria are embedded in the campaign. Thus, the probability of exposure to the ad can vary for two Facebook users who visit Facebook with the same frequency and intensity. This means that the Facebook experimentation platform can only estimate an intent to treat effect of the ad campaign and not the effect the treated. Johnson et al. (2017) construct a proxy control ad which they term a “Ghost Ad” which they claim avoids some of the un-intended consequences of the in-campaign optimization methods and can be implemented at lower cost than a more traditional approach in which the control group is exposed to a “dummy” ad or public service announcement. While the “ghost ad” approach appears promising as a way to reduce costs and deal with ad optimization questions, this must be achieved at some power costs which are not yet clear.

4.4.2 The Behrens-Fisher problem We have explained that the simplest randomized experiment would consist of only control and one treatment cell and the effect of treatment would be estimated by computing the difference in means. Without extensive control covariates, the difference in means is apt to be a very noisy, but consistent (in the number of consumers in the experiment) of the causal effect of treatment. However, Deaton and Cartright (2016) point out that inference with standard methods faces what statisticians have long called the “Behrens-Fisher” problem. If variance of the outcome variable is different between control and treatment groups, then the distribution of the difference in means will be a function of the variance ratio (there is no simple t-distribution anymore). Since the distribution of the test-statistic is dependent on unknown variance parameters, standard finite sample testing methods cannot be used. Given that there is random assignment to control and treatment groups, any differences in variability must be due to the treatment effect. In a world with heterogeneous treatment effects, we interpret the difference in means between controls and treated as measure the average effect of the treatment. Thus, having the heterogeneous treatments in the error term will create a variance component not present for the controls. For this reason, we might expect that the variance of the treated cell will be higher than the control cell and we are faced with the Behrens-Fisher inference problem. One could argue that Behrens-Fisher problem is apt to be minimal in advertising experiments as the treatment effects are small so that the variance component introduced in the ad exposure treatment cell would be small. However, in experiments related to pricing actions, it is possible that the Behrens-Fisher problem could be very consequential. Many trained in modern econometrics would claim that the Behrens-Fisher problem could be avoided or “solved” by the use of so-called heteroskedasticity-consistent (White) variance-covariance estimators. This is nothing more than saying that, in large samples, the Behrens-Fisher problem “goes away” in the sense that we can consistently recover the different variances and proceed as though we actually know the variances of the two groups. This can also be seen as a special case of the “cluster”

117

118

CHAPTER 2 Inference for marketing decisions

variance problem with only two clusters. Again, heteroskedastic-consistent estimators have long been advocated as a “solution” to the cluster variance problem. However, it is well known that heteroskedastic consistent variance estimators can have very substantial finite sample biases (see Imbens and Kolesar, 2016 for explicit simulation studies for the case considered here of two clusters). There appears to be no way out of the Behrens-Fisher problem without additional information regarding the relative size of the two variances.

4.5 Other control methods We have seen that randomization can be used to consistently estimate causal effects (or eliminate selection bias). In Section 5, we will discuss Instrumental Variables approaches. One way of viewing an IV is as a source of “naturally” occurring randomization (IVs) can help solve the fundamental problem of causal inference. Another approach is to add additional covariates to the analysis in hopes of achieving independence of the treatment exposure conditional on these sets of covariates. If we can find covariates that are highly correlated with the unobservables and then add these to the sales response model, then the estimate on the treatment or marketing variables of interest can be “cleaner” or less confounded with selection bias.

4.5.1 Propensity scores If we have individual level data and are considering a binary treatment such as ad exposure, then conditioning on covariates to achieve approximate independence, simplifies to the use of propensity scores as a covariate. The propensity score21 is nothing more than the probability that the individual is exposed to the ad as a function of covariates (typically the fitted probability from a logit/probit model of exposure). For example, suppose we want to measure the effectiveness of a YouTube ad for an electronic device. The ad is shown on a YouTube channel whose theme is electronics. Here the selection bias problem can be severe – those exposed to the ad may be pre-disposed to purchase the product. The propensity score method attempts to adjust for these biases by modeling the probability of exposure to the ad based on covariates such as demographics and various “techno-graphics” such as browser type and previous viewing of electronics YouTube channels. The propensity score estimate of the treatment or ad exposure effect would be from a response model that includes the treatment variable as well as the propensity score. Typically, effect sizes are reduced by inclusion of the propensity score in the case of positive selection bias. Of course, the propensity score method is only as good as the set of co-variates used to form the propensity score. There is no way to test that a propensity score fully adjusts for selection bias other than confirmation via true randomized experimentation. Goodness-of-fit or statistical significance of the propensity score model is

21 See Imbens and Rubin (2014), Chapter 13, for more details on propensity scores.

4 Causal inference and experimentation

re-assuring but not conclusive. There is a long tradition of empirical work in marketing that demonstrates that demographic variables are not predictive of brand choice or brand preference.22 This implies that propensity score models built on standard demographics are apt to be of little use reducing selection bias and obtaining better causal effect estimates. Another way of understanding the propensity score method is to think about a “synthetic” control population. That is, for each person who is exposed to the ad, we find a “twin” who is identical (in terms of product preferences and ability to buy) who was not exposed to the ad. The difference in means between the exposed (treatment) group and this synthetic control population should be a cleaner estimate of the causal effect. In terms of propensity scores, those with similar propensity scores are considered “twins.” In this same spirit, there is a large literature on “matching” estimators that attempt to construct synthetic controls (cf. Imbens and Rubin, 2014, Chapters 15 and 18). Again, any matching estimator is only as good as the variables used in implementing “matching.”

4.5.2 Panel data and selection on unobservables The problem of selection bias and the barriers to causal inference with observation data can also be interpreted as the problem of “selection on unobservables.” Suppose our goal is to learn about the income effects for the demand for a class of goods such as private label goods (see, for example, Dubé et al., 2018). We could take a crosssection of households and examine the correlation between income and demand for private label goods. If we are interested in how the business cycle affects demand for private labels, then we want the true causal income effect. It could be that there is some unobservable household trait (such as pursuit of status) that drives both attainment of higher income as well as lowers the demand for lower quality private label goods. This unobservable would create a spurious negative correlation between household income and private label demand. Thus, we might be suspicious of crosssectional results unless we can properly control (by inclusions of the appropriate covariate) for the “unobservables” by using proxies for the unobservable or direct measurement of the unobservable. If we have panel data and we think that there are unobservables that are time invariant, then we can adopt a “fixed effects” style approach which uses only variation within unit over time to estimate causal effects. The only assumption required here is that the unobservables are time invariant. Given that marketing data sets seldom span more than a few years, this time invariance assumption seems eminently reasonable. It should be noted that if the time span increases a host of non-stationarities arise such as the introduction of new products, entry of competitors, etc. In sum, it is not clear that we would want to use a long time series of data without modeling the evolution of the industry we are studying. Of course as pointed out in Section 3.1 above, the fixed effects approach only works with linear models.

22 See, for example, Fennell et al. (2003).

119

120

CHAPTER 2 Inference for marketing decisions

Consider the example of estimating the effect of a Super Bowl ad. Aggregate time series data may have insufficient variation in exposure to estimate ad effects. Pure cross-sectional variation confounds regional preferences for products with true useful variation in ad exposure. Panel data, on the other hand, might be very useful to isolate Super Bowl ad effects. Klapper and Hartmann (2018) exploit a short panel of six years of data across about 50 different DMAs to estimate effects of CPG ads. They find that there is a great deal of year-to-year variation in the same DMA in SB viewership. It is hard to believe that preferences for these products vary from year to year in a way that is correlated with the popularity of the SB broadcast. Far more plausible, is that this variation depends on the extent to which the SB is judged to be interesting at the DMA level. This could be because a home team is in the SB or it could just be due to the national or regional reputation of the contestants. Klapper and Hartmann estimate linear models with Brand-DMA fixed effects (intercepts) and find a large and statistically significant effect of SB ads by beer and soft drink advertisers. This is quite an achievement given the cynicism in the empirical advertising literature about ability to have sufficient power to measure advertising effects without experimental variation. Many, if not most, of the marketing mix models estimated today are estimated on aggregate or regional time series data. The success of Klapper and Hartmann in estimating effects using more disaggregate panel data is an important source of hope for the future of marketing analytics. It is well known that the idea of using fixed effects or unit-specific intercepts does not generalize to non-linear models. If we want to optimize the selection of marketing variables then we will have to use more computationally intensive hierarchical modeling approaches to allowing response parameters to vary over cross-sectional units. Advocates of the fixed effects approach argue that the use of fixed effects does not require any distributional assumptions nor the assumption that unit parameters are independent of the rhs variables. Given that it is possible to construct hierarchical models with a general distributional form as well as to allow unit characteristics to affect these distributions, it seems the time is ripe to move to hierarchical approaches for marketing analytics with non-linear response models.

4.5.3 Geographically based controls In the area of advertising research, some have exploited a control strategy that depends on some of the historical institutional artifacts in purchase of TV advertising. In the day in which local TV stations were limited by the reach of their signal strength, it made sense to purchase local TV advertising on the basis of a definition of media market that include the boundaries of the TV signal. There are 204 such “Designated Market Areas” in the US. Local TV advertising is purchased by DMA. This means that there are “pairs” of counties on opposite sides of a DMA boundary, one of which receives the ad exposure while the does not. Geographical proximity also serves as a kind of “control” for other factors influencing ad exposure or ad response. Shapiro (2018) uses this strategy to estimate the effect of direct-to-consumer ads for various anti-depressant drugs. Instead of using all variation in viewership across counties and

4 Causal inference and experimentation

across time, Shapiro limits variation to a relatively small number of “paired” DMAs. Differences in viewership between these two “bordering” DMA is used to identify ad effects. Shapiro finds only small differences between ad effects estimated with his “border strategy” vs not. However, this idea of exploiting institutional artifacts in the way advertising is purchased is a general idea which might be applied in other ways. However, the demise of broadcast or even subscription TV in favor of streaming will likely render this particular “border strategy” increasingly irrelevant. But the idea of exploiting the discreteness in the allocation or exposure rule used by firms in a case of what is called a regression discontinuity design discussed below.

4.6 Regression discontinuity designs Many promotional activities in marketing are conducted via some sort of threshold rule or discretized into various “buckets.” For example, consider the loyalty program of a gambling casino. The coin of the realm in this industry is the expected win for each customer which is simply a function of the volume of gambling and type of game. The typical loyalty program encourages customers to gamble more and come back to the casino by establishing a set of thresholds. As customers increase their expected win, they “move” from one tier or “bucket” in this program to the next. In the higher tiers, the customer receives various benefits like complementary rooms or meals. The key is that there is a discrete jump in benefits by design of the loyalty program. On the other hand, it is hard to believe that the response function of the customer to the level of complementary benefits is non-smooth or discontinuous. Thus, it would seem that we can “select” on the observables to compare those customers whose volume of play is just on either side of each discontinuity in the reward program. As Hartmann et al. (2011) point out, as long as the customer is not aware of the threshold or the benefits from “selecting in” or moving to the next tier are small relative to the cost of greater play, this constitutes a valid Regression Discontinuity (RD) design. Other examples in marketing include direct mail activity (those who receive offers and or contact are a discontinuous function of past order history) and geographic targeting (it is unlikely people will move to get the better offer). But, if consumers are aware that are large promotions or rebates for a product and they can change their behavior (such as purchase timing), then an RD approach is likely to be invalid. Regression discontinuity analysis has received a great deal of attention in economics as well (see Imbens and Lemieux, 2008). The key assumption is that the response function is continuous in the neighborhood of the discontinuity in the assignment of the treatment. There are both parametric and non-parametric forms of analysis, reflecting the importance of estimating the response function without bias that would aversely affect the RD estimates. Parametric approaches require a great deal of flexibility which may compromise power, while non-parametric methods rest on the promise to narrow the window of responses used in the vicinity of the threshold (s) as the sample size increases. This is not much comfort to the analyst with one

121

122

CHAPTER 2 Inference for marketing decisions

finite sample. Non-parametric RD methods are profligate with data as, ultimately, most of the data is not used in forming treatment effect estimates. RD designs result in only local estimates of the derivative of the response function. For this reason, unless the ultimate treatment is really discrete, RD designs do not offer a solution to the marketing analytics problem of optimization. RD designs may be helpful to corroborate the estimates based on response models fit to the entire dataset (the RD estimate and the derivative the response function at the threshold should be comparable).

4.7 Randomized experimentation vs. control strategies Recent work in marketing compares inferences based on various control strategies (including propensity scores and various “matching” or synthetic control approaches with the results of large scale randomized experiments performed at Facebook. Gordon and Zettelmeyer (2017) find that advertising effects estimated from observational methods do not agree very closely with those based on randomized experimentation in the context of ad campaigns evaluated on the Facebook ad platform. If various control strategies are inadequate, then we might expect that ad effects estimated by observational data would be larger than those estimates that are based on randomized experimentation (at least up to sampling variation). Gordon and Zettelmeyer do not find any consistent pattern of this sort. They find estimates based on observation data to be, in some cases, smaller than those based on experimentation with nonoverlapping confidence intervals. This result is difficult to understand and implies that there are important unobservables which are positively related to ad exposure and negatively related to ad effects. However, it is pretty clear that the jury is out on the efficacy of observational methods as Eckles and Bakshy (2017) find that observational methods (propensity scores) produce ad effect estimates which are close to those obtained from randomized experimentation in a similar context involving estimation of peer effects with Facebook data. It is possible that Facebook ad campaign “optimization” may make that comparison between the observational data-based effect estimates and the randomized trial results less direct than Gordon and Zettelmeyer imply.

4.8 Moving beyond average effects We live in a world of heterogeneous treatment effects in which each consumer, for example, has a different response to the same ad campaign. In the past, the emphasis in economics is on estimating some sort of average treatment effects which is thought to be adequate for policy evaluation. Clearly, the distributional effects of policies are also important and, while the randomized experiment does identify the average treatment effect with minimal assumptions, randomized experimentation does not identify the distribution of effects without imposing additional assumptions. In marketing applications, heterogeneity assumes even greater importance than in economic policy evaluation. This is because the policies in marketing are not applied uniformly to a subset of consumers but, rather, include the possibility of targeting

5 Instruments and endogeneity

policies based on individual treatment effects. A classic example of this problem is the problem in direct marketing of to whom a catalog or offer should be sent to from the very large set of customers whose characteristics are summarized in the “house” file of past order and response behavior. Classically, direct marketers built models that are standard marketing response models in which order response to a catalog or offer is modeled as a function of the huge set of variables that might be constructed using the house data file. This raises two inference problems. First, the model-builder must have a way of selecting from a set of variables that is may be even larger than the number of observations. Second, the model-builder should recognize that there may be unobservables that create the classic selection bias problem. The selection bias problem can be particularly severe when the variables used as “controls” are simply summaries of past response behavior as must be, by construction, from house file data. How then does randomization help the model-builder? If there is a data-set where exposure to the marketing action is purely random, then there are no selection bias problems and there is nothing wrong with using regression-like methods to estimate or predict response to the new offering (i.e. “optimal targeting”). The problem then becomes more of a standard non-parametric modeling problem of selecting the most efficient summaries of the past behavior to be included as controls in the response model. Hitsch and Misra (2018) compare a number of different methods to estimate heterogeneous treatment effects based on a randomized trial and evaluate various estimators with respect to their potential profitability.

5 Instruments and endogeneity23 The problem of causal inference and the potential outcomes framework has recently assumed greater importance in the economics literature but that is not to say that the problem of causal inference has only recently been addressed. The original concern of the Cowles commission was to obtain consistent estimates of “structural” parameters using only observation data as well as the recognition that methods that assume right-hand-side variables are exogenous may not be appropriate in many applications. In most applications, the “selection” bias or “selection on unobservables” interpretation is appropriate and econometricians have dubbed this the “endogeneity” problem. One basic approach to dealing with this problem is to find some way of partitioning the variation in the right-hand-side variable so that some of the variation can be viewed as “though random.” This involves selection of an instrument. In this section, we provide a detailed discussion of the instrumental variables approach.

23 This section was adapted in large part from Rossi (2014b).

123

124

CHAPTER 2 Inference for marketing decisions

As we have indicated, Instrumental Variable (IV) methods do not use all of the variation in the data to identify causal effects, but instead partition the variation into that which can be regarded as “clean” or as though generated via experimental methods and that which is “contaminated” and could result in endogeneity bias. “Endogeneity bias” is almost always defined as the asymptotic bias for an estimator which uses all of the variation in the data. IV methods are only asymptotically unbiased if the instruments are valid instruments. Validity is an unverifiable assumption. Even if valid, IV estimators can have poor sampling properties including fat tails, high RMSE, and bias. While most empirical researchers may recall that the validity assumption is important from their econometrics training, the poor sampling properties of IV estimators are not well appreciated. Careful empirical researchers are aware of some of these limitations of IV methods and, therefore, sometimes view the IV method as a form of sensitivity analysis. That is, estimates of causal effects using standard regression methods are compared with estimates based on IV procedures. If the estimates are not appreciably different, then some conclude that endogeneity bias is not a problem. While this procedure is certainly more sensible than abandoning regression methods altogether, it is based on the implicit assumption that the IV method uses valid instruments. If the instruments are not valid, then the differences between standard regression style estimates and IV estimates don’t have any bearing on the existence or extent of endogeneity bias. Closely related to the problem of endogeneity bias is the problem of omitted variables in cross-sectional analyses or pooled analyses of panel data. Many contend that there may exist unobservable variables that a set of control variables, no matter how exhaustive, cannot control for. For this reason, researchers often use a Fixed Effects (hereafter FE) approach in which cross-sectional unit specific intercepts are included in the analysis. In a FE approach, the slope coefficients on variables of interest are only identified using only “within” variation in the data. Cross-sectional variation is thrown out. Advocates for the FE approach argue that, in contrast to IV methods, the FE approach does not require any further assumptions than those already used by the standard linear regression analysis. The validity of the FE approach depends critically on the assumption of a linear model and the lack of measurement error in the independent variables.24 If there is measurement error in the independent variables, then the FE approach will generally magnify the errors-in-the-variables bias.

5.1 The omitted variables interpretation of “endogeneity” bias In marketing applications, the omitted variable interpretation of endogeneity bias provides a very useful intuition. In this section, we will briefly review the standard omitted variables analysis and relate this to endogeneity bias. For those familiar with 24 If lagged dependent variables are included in the model, then the standard fixed effects approach is

invalid, see Narayanan and Nair (2013); Nickell (1981).

5 Instruments and endogeneity

the omitted variables problem, this section will simply serve to set notation and a very brief review (see also treatments in Section 4.3 of Woolridge, 2010 or Section 3.2.2 of Angrist and Pischke, 2009). Consider a linear model with one independent variable (note: the intercept is removed for notational simplicity). yi = βxi + εi

(40)

The least squares estimator from a regression of y on x will consistently estimate parameters of the conditional expectation of y given x under the restriction that the conditional expectation is linear in x. However, the least squares estimator will converge to β only if E [ε|x] = 0 (or cov (x, ε) = 0). plim

x y = β + plim x x

x ε N x x N

= β + plim

x x N

−1 plim

x ε = β + Q × cov (x, ε) N

Here Q−1 = plim xNx . Thus, least squares will consistently estimate the “structural” parameter β only if (40) can be considered a valid regression equation (with an error term that has a conditional expectation of zero). If E [ε|x] = 0, then least squares will not be a consistent estimator of β. This situation can arise if there is an omitted variable in the equation. Suppose there exists another variable, w, which belongs in the equation in the sense that the multiple regression of y on x and w is a valid equation. yi = βxi + γ wi + εi E [ε|x, w] = 0 The least squares regression of y on x alone will consistently recover the parameters of the conditional expectation of y given x which will not necessarily be β E y|x = βx + E γ w + ε|x = βx + γ E [w|x] = βx + γ πx = δx Here π is the coefficient of w in the conditional expectation of w|x. If π = 0, then the least squares estimator will not consistently recover β (sometimes called the structural parameter) but instead will recover δ. The intuition is that, in the simple regression of y on x, least squares estimates the effect of x without controlling for w. This estimate confounds two effects: (1) the direct effect of x (β) and (2) the indirect effect of x (γ π). The indirect effect (which is non-zero whenever x and w are correlated) also has a very straightforward interpretation: for each unit change in x, w will change by π units and this will, in turn, change y (on average) by γ units. In situations where δ = β, there is an omitted variable bias. The solution, which is feasible only if w is observable, is to run the multiple regression of y on x and w. Of course, the multiple regression does not use all of the variation in x to estimate the multiple regression coefficient – only that part of the variation in x which is uncorrelated with w. Thus, we can see that a multiple regression method is more

125

126

CHAPTER 2 Inference for marketing decisions

demanding of the data in the sense that only part of the variation of x is used. In a true randomized experiment, there is no omitted variable bias because the values of x are assigned randomly and, therefore, are uncorrelated by definition with any other variable (observable or not). In the case of the randomized experiment, the only motivation for bringing in other covariates is to reduce the size of the residual standard error which can improve the precision of estimation. However, if the simple regression model produces statistically significant results, there is no reason for adding covariates. The standard recommendation for limiting omitted variable bias is to include as many “control” variables or covariates as possible. For example, suppose that we observe demand for a given product across a cross-section of markets. If we regress quantity demanded on price across these markets, a possible omitted variable bias is that there are some markets where there is a higher demand for the product than others and that price is set higher in those markets with higher demand. This is a form of omitted variable bias where the omitted variable is some sort of indicator of market demand conditions. To avoid omitted variable bias, the careful researcher would add covariates (such as average income or wealth measures) which seek to “control” or proxy for the omitted demand variable and use a multiple regression. There is a concern that these control or proxy variables are only imperfectly related to true underlying demand conditions which are never perfectly predicted or “observable.”

5.2 Endogeneity and omitted variable bias Most applied empirical researchers will identify “endogeneity bias” as arising from correlation between independent variables and error terms in a regression. This is to describe a cold by its symptoms. To develop a strong intuitive understanding, it is helpful to give an omitted variables interpretation. Assume that there is an unobservable variable, ν, which is related to both y and x. yi = βxi + αy νi + εy,i xi = αx νi + εx,i

(41) (42)

Here both εx , εy have 0 conditional mean given x and ν and are assumed to be independent. In our example of demand in a cross-section of markets, ν represents some unknown demand shifter variable that allows some markets to have a higher level of demand for any given price than others. Thus, ν is an omitted variable and has the potential to cause omitted variable bias if ν is correlated with x. The model listed in (42) builds this correlation in by constructing x from ν and another exogenous error term. The idea here is that prices are set partially as a function of this underlying demand characteristic which is observable to the firm but not observable to the researcher. In the regression of y on x, the error term is now ανi + εy,i which is correlated with x. This form of omitted variable bias is called endogeneity bias. The term “endogeneity” comes from the notion that x is no longer determined “exogenously” (as if via an experiment) but is jointly determined along with y.

5 Instruments and endogeneity

We can easily calculate the endogeneity bias by taking conditional expectations (or linear projections) of y given x. E y|x = βx + E αy ν + εy |x σν2 x = βx + αy αx αx2 σν2 + σε2x The ratio αx

σν2 αx2 σν2 +σε2x

(43)

is simply the regression coefficient from a regression of

the composite error term (including the unobservable) on x. The endogeneity bias is thus the coefficient on x in (43). Whenever the unobservable has variation which comprises a large fraction of the total variation in x, and has the unobservable has a large effect on y, the endogeneity bias will be large. If we go back to our example of price endogeneity in a cross-section of markets, this would mean that the demand differences across markets would have to be large relative to other influences that shift price. In addition, the influence of the unobservable demand shifter on demand (y) must be large.

5.3 IV methods As we have seen the “endogeneity” problem is best understood as arising from an unobservable that is correlated both with the error in the “structural” equation and one or more of the right side variables in this equation. Regression methods were originally designed for experimental data where the x variable was chosen by the investigator as part of the experimental design. For observational data, this is not true and there is always the danger that there exists some unobservable variable which has been omitted from the structural equation. This makes a concern for endogeneity a generic criticism which can always be applied. The ideal solution to the endogeneity problem would be to conduct an experiment in which the x variable is, by construction, uncorrelated via randomization with any unobservable. Short of this ideal, researchers opt to partition the variation in x variable25 into two parts: (1) variation that is “exogenous” or unrelated to the structural equation error term and (2) variation that might be correlated with the error term. Of course, this partition always exists; the only question is whether or not the partition can be accessed by the use of observable variables. If such an observable variable exists, then it must be correlated with x variable but it must not enter the structural equation. Such a variable is termed an “instrumental variable.” The idea of an instrument is that this variable moves around x but does not affect y in direct way, only indirectly via x. Of course, there can be many instrumental variables.

25 For simplicity, I will consider the case of only one right hand side endogenous variable. There is no

additional insight gained from the multiple rhs variable case and the great majority of applied work only considers endogeneity in one variable.

127

128

CHAPTER 2 Inference for marketing decisions

5.3.1 The linear case The case of a linear structural equation and linear instrumental variable model provides the intuition for the general case and also includes many of empirical applications of IV methods. However, it should be noted that due the widespread use of choice models in marketing applications, there is a much higher incidence of the use of nonlinear models. We consider nonlinear choice models in Section 5.7. (44) and (45) constitute the linear IV model. y = βx + γ w + εy x = δ z + εx

(44) (45)

(44) is the structural equation. The focus is on estimation of the “structural” parameter, β, avoiding endogeneity bias. There is the possibility there are other variables inthe “structural” equation which are exogenous in the sense that we assume that E εy |w = 0. If these variables are comprehensive enough, meaning that almost all of the variation in the unobservable that is at the heart of the endogeneity problem can be explained by w, then the “endogeneity” problem ceases to be an issue. The regression methods will only use the variation in x that is independent of w and, under the assumption that the w controls are complete, then there should be no endogeneity problem. For the purpose of this exposition, we will assume that E εy |x, w = f (x) = 0, or that we still have an endogeneity problem. The second equation (45) is a just a linear projection of x on the set of instrumental variables and is often called the instruments or “first-stage” equation. In a linear model, the statement, E εy |x, w = 0, is equivalent to corr εx , εy = 0. In the omitted variable interpretation, this correlation in the equation errors is brought about by a common unobservable. As the correlation between the errors increases, the “endogeneity bias” becomes more severe. The critical assumption in the linear IV model is that the instrumental variables, z, do not enter into the structural equation. This means that the instruments only have an indirect effect on y via movement in x but no direct effect. This is restriction is often called the exclusion restriction or sometimes the over-identification restriction. Unfortunately, there is no way to “test” the exclusion restriction because the model in which the z variables enters both equations is not identified.26

5.3.2 Method of moments and 2SLS There are a number of ways to motivate inference for the linear IV model in (44)-(45). The most popular is the method of moments approach. For the sake of brevity and notational simplicity, consider the linear IV model with only one instrument and no other “exogenous” variables in the structural equation. The method of moments estimator exploits the assumption that z (now just a scalar r.v.) is uncorrelated or orthogonal to the structural equation error. This is called a moment condition and 26 The so-called “Hausman” test requires at least one instrument for which the investigator must assume

the exclusion restriction holds.

5 Instruments and endogeneity

involves an assumption about population or data generating model that E εy z = 0. The method of moments principle defines an estimator by minimizing the discrepancy between the population and sample moments. z y βˆMM = argmin E εy z − (y − βx) z =

zx β

(46)

Here y, x, z are N × 1 vectors of the observations. It is easy to see that this estima zε tor is consistent (because we assume E εy z = 0 = plim Ny ) and asymptotically normal. If the structural equation errors are uncorrelated and homoskedastic, it can be shown (see, for example, Hayashi, 2000, Section 3.8) that the particular method of moments estimator in (46) is the optimal Generalized Method of Moments Estimator. If the structural equation errors are conditionally heteroskedastic and/or autocorrelated, then the estimator above is no longer optimal and can be improved upon. It should be emphasized that when econometricians say that an estimator is optimal, this only means that the estimator has an asymptotic distribution with variance not exceeding that of any other estimator. This does not mean that, in finite samples, the method of moments estimator has better sampling properties than another other estimator. In particular, even estimators with asymptotic bias such as least squares can have lower mean-squared error than IV estimators. Another way of motivating the IV estimator for the simple linear IV model is the principle of Two Stage Least Squares (2SLS). The idea of Two Stage Least Squares is much the same as how it is possible to perform a multiple regression via a sequence of simple regressions. The “problem” with the least squares estimator is that some of the variation in x is not exogenous and correlated with the structural equation error. The instrumental variables can be used to purge x of any correlation with the error term. The fitted values from a regression of x on z will be uncorrelated with the structural equation errors. Thus, we can use the fitted values from a “first-stage” regression of x on z and regress y on the fitted values from this first-stage (this is the second-stage regression). ˆ + ex x = xˆ + ex = δz

(47)

y = βˆ2SLS xˆ + ey

(48)

This procedure yields the identical estimator as the MM estimator in (46). If there are more than one instrument, more than one rhs endogenous variable, or if we include a matrix of exogenous variables in the structural equation, then both procedures generalize but the principle of utilizing the assumption that there exists a valid set of instruments and that one should only use that portion of the rhs endogenous variables can is accounted for by the instruments remains the same.

5.4 Control functions as a general approach A very useful way of viewing the 2SLS estimator is as a special case of the “control function” approach to obtaining an IV estimator. The control function interpretation

129

130

CHAPTER 2 Inference for marketing decisions

of 2SLS comes from the fact that the multiple regression coefficient on x is estimated using only that portion of the variation of x which is uncorrelated with the other variables in the equation. If we put a regressor in the structural equation which contains only that part of x which is potentially correlated with εy , then the multiple regression estimator would be a valid IV estimator. In fact, the 2SLS estimator can also be obtained by regressing y on x as well as the residual from the first-stage IV regression. y = βˆT SLS x + cex

(49)

ex is the residual from (47). Petrin and Train (2010) observe that the same idea can be applied to “control” for or eliminate (at least, asymptotically) endogeneity bias in a demand model with a potentially endogenous variable. For example, the control function approach can work even if the demand model is a nonlinear model such as a choice model. If x is a choice characteristic that might be considered potentially endogenous, then one can construct “control” functions from valid instruments and achieve the effects of an IV estimator simply by adding these control functions to the nonlinear model. Since, in non-linear models, the key assumption is not a zero correlation but conditional independence, it is necessary to not just project x on a linear function of the instruments, but to estimate the conditional mean function, E [x|z] = f (z). The conditional mean function is of unspecified form and this means that we need to choose functions of the instruments that can approximate any smooth function. Typically, polynomials in the instruments of high order should be sufficient. The residual, e = x − fˆ, is created and can be interpreted as that portion of x which is independent of the instruments. The controls required to be included in the nonlinear model must also allow for arbitrary flexibility in the way in which the residual is entered. Again, polynomials in the residual (or any valid set of basis functions) should work, at least for large enough samples, if we allow the polynomial order to increase with the sample size. The control function approach has a lot of appeal for applied workers as all we have to do is a first stage linear regression on polynomials in the instruments and simply add polynomials in the residual from this first stage to the nonlinear model. For linear index models like choice models, this simply means that we can do one auxiliary regression and I can use any standard method to fit the choice model, but with constructed independent variables. The ease of use of the control function approach makes it convenient for checking to see whether an instrumental variables analysis produces estimates that are much different. However, inference in the control function approach requires additional computations as the standard errors produced by the standard non-linear models software will be incorrect as they don’t take into account that some of the variables are “constructed.” It is not clear that from an inference point of view that the control function approach offers any advantages over using the general GMM method which computes valid asymptotic standard errors. The control function approach has a number of assumptions required to show consistency. However, there is some evidence that it will closely approximate the IV solution in the choice model situation.

5 Instruments and endogeneity

5.5 Sampling distributions In the OLS estimator (conditional on the X matrix) is a linear estimator with a sam −1

pling distribution derived from the distribution of βÔLS − β = X X X ε. If the errors terms are homoskedastic and normal, then the finite sample distribution of the OLS sampling error is also normal. However, all IV estimators are fundamentally non-linear functions of the data. For example, the simple Method of Moments estimator (46) is a nonlinear function of the random variables. The proper way of viewing the linear IV problems is that, given a matrix of instruments, Z, the model provides the joint distribution of both y and x. Since x is involved non-linearly, via the term −1 z x , we cannot provide an analytical expression for the finite sample distribution of the IV estimator even if we make assumptions regarding the distribution of the error terms in the linear IV model (44)-(45). The sampling distribution of the IV√estimator is approximated by asymptotic methods. This is done by normalizing by N and applying a Central Limit Theorem. z x −1 √ z ε √ y N βˆMM − β = N N N

(50)

As N approaches infinity, the denominator of the MM estimator, zNx , converges to a constant by the Law of Large Numbers. The asymptotic distribution is entirely √ driven by the numerator which has been expressed as N times a weighted sum of the error terms in the structural equation. The asymptotic distribution is then derived by applying a Central Limit Theorem to this average. Depending on whether or not the error terms are conditional heteroskedastic or autocorrelated (in the case of time series data) a different CLT is used. However, the basic asymptotic normality results are derived by assuming that the sample is large enough to that we can simply ignore the contribution of the denominator to the sampling distribution. While asymptotics greatly simplifies the derivation of a sampling distribution, there is very good reason to believe that this standard method of deriving the asymptotic distribution is apt to be highly inaccurate under the conditions in which the IV estimator is often applied. The finite sampling distribution can deviate from the asymptotic approximation in two important respects: (1) there can be substantial bias in the sampling distribution of the IV estimator even if the model assumptions hold and (2) the asymptotic approximation can be very poor and can dramatically understate the true sampling variability in the estimator. The simple Method of Moments estimator is a ratio of a weighted average of y to the weighted average of x. βˆMM =

z y N z x N

The distribution of a ratio of random variables is very different from the distribution of a linear combination of random variables (the distribution of OLS). Even if the error terms in the linear IV model are homoskedastic and normal, then distribution of

131

132

CHAPTER 2 Inference for marketing decisions

FIGURE 1 Distribution of a ratio of normals.

the Method of Moments IV estimator is non-normal. The denominator is the sample covariance between z and x. If this sample covariance is small, then the ratio can assume large positive and negative values. More precisely, if the distribution of the denominator puts appreciable mass near zero, then the distribution of the ratio will have extremely fat tails. The asymptotic distribution is using a normal distribution to approximate a distribution which has much fatter tails than the normal distribution. This means that the normal asymptotic approximation can dramatically understate the true sampling variability. To illustrate how ratios of normals can fail to have a normal distribution. Consider the distribution of a ratio of an N (1, .5) to an N (.1, 1) random variable.27 The distribution is shown by the magenta histogram in Fig. 1 and is revealed to be bimodal with the positive mode having slightly more mass. This distribution exhibits massive outliers and the figure only shows the histogram of the data trimmed to remove the top and bottom 1 per cent of the observations. The thick left and right tails are generated by draws from the denominator normal distribution which are close to the origin. The standard asymptotic approximation to the distribution of IV estimators simply ignores the denominator which is supposed to converge to a constant. The asymptotic approximation is shown by the green density curve in the figure. Clearly, this is a poor approximation that ignores the other mode and under-estimates variability. The 27 Here the second argument in the normal distribution is the standard deviation.

5 Instruments and endogeneity

dashed light yellow line in the figure represents a normal approximation based on the actual sample moments of the ratio of normals. The fact that this approximation is so spread-out is another way of emphasizing that the ratio of normals has very fat tails. The only reasonable normal approximation is shown by the medium dark orange curve which is fit to the observed InterQuartile range. Even this approximation misses the bi-modality of the actual distribution. Of course, the approximation based on the IQ range is not available via asymptotic calculations. The degree to which the ratio of normals can be well-approximated by a normal distribution depends on both the location and spread of the distribution. Obviously, if the denominator is tightly distributed around a non-zero value, then the normal approximation can be highly accurate. The intuition that we have established is when the denominator has a spread-out distribution and/or places mass near zero, then the standard asymptotic approximation will fail for IV estimators. This can happen into two conditions: (1) in small samples and (2) where the instruments are “weak” in the sense they explain only a small portion of the variation in x. Both cases are really about lack of information. The sampling distributions of IV estimators become very spread out with fat tails when there is little information about the true causal effect in the data. “Information” should be properly measured by total covariance of the instruments with x. This total covariation can be “small” even in what appear to be “large” samples when instruments have only weak explanatory power. In next section, we will explore what are the boundaries of the “weak” instrument problem.

5.6 Instrument validity One point that is absent from the econometrics literature is that the sampling distribution of IV estimators are only considered conditional on the validity of the instruments. This is an untestable assumption which certainly is violated in many datasets. This form of mis-specification is much more troubling than other forms of model mis-specification such as non-normality of the error terms, conditional heteroskedasticity, or non-linearity. For each of these mis-specification problems, we have tests for mis-specification and alternative estimators. There are also methods to provide inference (i.e. standard errors and confidence intervals) which are robust to model mis-specification for conditional heteroskedastic, auto-correlated, and non-normal errors. There are no methods which are robust to the use of invalid instruments. To illustrate this point, consider the sampling distribution of an IV estimator based on an invalid instrument. We simulate data from the following model. y = −2x − z + εy

x = 2z + εx εx 1 .25 ∼ N 0, ; εy .25 1

zi ∼ Unif (0, 1)

(51)

133

134

CHAPTER 2 Inference for marketing decisions

FIGURE 2 Sampling distributions of estimators with invalid instruments.

This is a situation with a relatively strong instrument (the population R-squared for the regression of x on z is about .25). Here N = 500 which is a large sample in many cross-sectional contexts. The instrument is invalid but with a smaller direct effect, −1, than an indirect effect, −4. Moreover, the structural parameter is also larger than the direct effect. Fig. 2 shows the sampling distribution of the method of moments estimator and the standard OLS estimator. Both estimators are biased and inconsistent. Moreover, the IV estimator has inferior sampling properties with a root mean-squared-error of more than seven times the OLS estimator. Since we can’t know if the instruments are valid, the argument that the IV estimator should be preferred because it is consistent conditional on validity is not persuasive. Conley et al. (2012) consider the problem of validity of instruments and use the term “plausibly exogenous.” That is to say, except for true random variation, it is impossible to prove that an instrument is valid. In most situations, the best that can be said is that the instrument is approximately valid. Conley et al. (2012) define this as an instrument which does not exactly satisfy an exclusion restriction (i.e. the assumption of no direct effect on the response variable) but that the instrument has a small direct effect relative to the indirect effect. From both sampling and Bayesian points of view, Conley et al. (2012) argue that a sensitivity analysis with respect to the exclusion restriction can be useful. For example, if minor (i.e. small) violations of the exclusion restriction do not fundamentally change inferences regarding the key effects, then Conley et al. (2012) consider the analysis robust to violations of instru-

5 Instruments and endogeneity

ment validity within that range. For marketing applications, this seems to be a very useful framework. We do not expect any instrument to exactly satisfy the exclusion restriction but we might expect the instrument to be “plausibly exogeneous” in the sense of a small violation. The robustness or sensitivity checks developed by Conley et al. (2012) help assess if the findings are critically sensitive to violations of exogeneity in a plausible range. This provides a useful way of discussing and evaluating the instrument validity issue. This was absent in the econometrics literature and is of great relevance to researchers in marketing who rarely have instruments for which there are air-tight exogeneity arguments.

5.7 The weak instruments problem 5.7.1 Linear models Not only are instruments potentially invalid, there is a serious problem when instruments are “weak” or only explain a small portion of the variation in the rhs endogenous variable. In situations of low information regarding causal effects (either because of small samples or weak instruments or both), then standard asymptotic distribution theory begins to break down. What happens is that asymptotic standard errors are no longer valid and are generally too small. Thus, confidence intervals constructed from asymptotic standard errors typically have much lower coverage rates than their nominal coverage probability. This phenomenon has spawned a large sub-literature in econometrics on the so-called weak or many instruments problem. In marketing applications, we typically do not encounter the “many” instrument problem in the sense that we don’t have more than a handful of potential instruments. There is a view among some applied econometricians that failure of standard asymptotics is only occurs for very small values of the first-stage R-squared or when the F-stat for the first stage is less than 10. This view comes from a misreading of the excellent survey of Stock et al. (2002). The condition of requiring the first stage F-stat be > 10 comes in the problem with only one instrument (in general, the “concentration parameter” or kF should be used). However, the results summarized in Stock et al. (2002) simply state that the “average asymptotic bias” will be less than 15 per cent in the case where kF > 10. This does not mean that confidence intervals constructed using the standard asymptotics will have good actual coverage properties (i.e. actual coverage close to nominal coverage). Nor does this result imply that there aren’t finite sample biases of an even greater magnitude than these asymptotic biases. The poor sampling properties of the IV estimator28 can easily be shown even in cases where the instruments have a modest but not small first-stage R-squared. We

28 Here I focus on the sampling properties of the estimator rather than the size of a test procedure. Simu-

lations found in Hansen et al. (2008) show that coverage probabilities and bias can be very large even in situations where the concentration ratio is considerably more than 10.

135

136

CHAPTER 2 Inference for marketing decisions

FIGURE 3 “Weak” instruments sampling distribution: p = 1.

simulate from the following system: y = −2x + εy

x = Zδ + εx εx 1 .25 ∼ N 0, εy .25 1

N = 100. Z is an N × p matrix of iid Unif (0, 1). The δ vector is made up of p identical elements, chosen to make the population first-stage R-squared equal to .10 2

12ρ 2 using the formula, p(1−ρ 2 ) , where ρ is the desired R-squared value. Fig. 3 shows the sampling distribution of the IV and OLS estimators in this situation with p = 1. The method of moments IV estimator has huge tails, causing it to have a much larger RMSE than OLS. OLS is slightly biased but without the fat tails of the IV estimator. Fig. 4 provides the distribution of the first-stage F statistics for 2000 replications. The vertical line in the figure is the median. This means that more than 50 per cent of these simulated samples had F-stats of greater than 10, showing the fallacy of this rule of thumb. Thus, for this case of only a moderately weak (but valid!) instrument,

5 Instruments and endogeneity

FIGURE 4 Distribution of first-stage F-statistics.

the IV estimator would require a sample size of approximately four29 times larger than the OLS estimator to deliver the same RMSE level. Lest the reader form the false impression that the IV estimator doesn’t have appreciable bias, consider the case where there are 10 instruments instead of one but where all other parameters are held constant. Fig. 5 shows the sampling distributions in this case. The IV estimator now has both fat tails and finite sample bias. The weak instruments literature seeks to improve on standard asymptotic approximations to the sampling distribution of the IV estimator. The literature focuses exclusively on improving inference which is defined as obtaining testing and confidence interval procedures which have correct size. That is, the weak instruments literature assumes that the researcher has decided to employ an IV method and just wants a test or confidence interval with the proper size. This literature does not propose new estimators with improved sampling properties but merely seeks to develop improved asymptotic approximation methods. This literature is very large and has made considerable progress on obtaining test procedures with actual size very close to nominal size under a variety of assumptions. There are two major variants in this literature. One variant starts from what is called the Conditional Likelihood Ratio statistic and builds a testing theory which is exact under the homoskedastic, normal case (conditional on the error covariance matrix) (see Moreira, 2003 as an example). The other variant uses the GMM approach to define a test statistic which is consistent in the presence of heteroskedasticity and does not rely on normal errors (see Stock and Wright, 2000). The GMM variant will never deliver exact results but is potentially more robust. Both the CLR and GMM methods will work very well when the averages of y and x used in the IV estimator

29

.5 2 .24 .

137

138

CHAPTER 2 Inference for marketing decisions

FIGURE 5 “Weak” instruments sampling distribution: p = 10.

(see, for example, (46)) are approximately normal. This happens, of course, when the CLT sets in quickly. The performance of these methods is truly impressive even in small samples in the sense that the nominal and actual coverage of confidence intervals is very close. However, the intervals produced by the improved methods simply expose the weakness of the IV estimators in the first place, that is the intervals can be very large (in fact, the intervals can be of infinite length). The fact that proper size intervals are very large simply says that if you properly measure sampling error, it can be very large for IV estimators. This reflects the fact that an IV approach uses only a portion of the total sample variability or information to estimate the structural parameter.

5.7.2 Choice models Much of the applied econometrics done in marketing is done in the context of a logit choice model of demand rather than a linear structural model. Much of the intuition regarding the problems with IV methods in linear models carries over to the nonlinear case. For example, the basic exclusion restriction that underlies the validity of an instrument also applied to a non-linear model. The idea that the instruments partition the variability of the endogenous rhs variable still applies. The major difference is that the GMM estimator is now motivated not just by the assumption that the structural errors are uncorrelated with the instruments but on a more fundamental notion that

5 Instruments and endogeneity

the instruments and the structural errors are conditionally independent. Replacing zero conditional correlation with conditional independence means that the moment conditions used to develop the GMM approach can be generated by not just assuming that the error terms are orthogonal to the instruments but also to any function of the instruments. This allows a greater flexibility than in the linear IV model. In the linear IV model, we need as many (or more) instruments as there are included rhs endogenous variables to identify the model. However, in a nonlinear model such as the choice model, any function of the instruments is also a valid instrument and can be used to identify model parameters. To make this concrete, consider a very simply homogeneous logit model. exp α cj,t + β mj,t + ξj,t Pr (j |t) = (52)

j exp α cj,t + β mj,t + ξj,t Here ξj,t is an unobservable, mj,t are the marketing mix variables for alternative j observed at time t, and cj,t are the characteristics of choice alternative j . The endogeneity problem comes from the possibility that firms set the marketing mix variables with partial knowledge of the unobservable demand shock and, therefore, the marketing mix variables are possibly a function of the ξj,t . Since the choice model is a linear index model, this is the same as suggesting that the unobservables are correlated with the marketing mix variables. The IV estimator would be based on the assumption that there exists a matrix, Z, of observations on valid instruments which are variables are conditionally independent of ξj,t .

E ξt g (zt ) = 0 for any measurable function, g () zt is the vector of the p instrumental variables. As a practical matter, this means that one can use as valid instruments any polynomial function of the z variables and interactions between the instruments, greatly expanding the number of instruments. However, the identification achieved by expanding the set of instruments in this manner is primarily from the model functional form. To illustrate the problem with IV estimators for the standard choice model, we will consider the choice model in (52) along with a standard linear IV “first-stage” equation. mt = zt + νt νt and ξt are correlated, giving rise to the classic omitted variable interpretation of the endogeneity problem. To examine the sampling properties of the IV estimator for this situation, we will consider the special case where there is only one endogenous marketing mix variable, there is only one instrument, and the choice model is a binary choice model. To generate the data, we will assume that the unobserved demand shocks joint normal with the errors in the IV equation. Pr (1) =

exp (α + βm + ξ ) 1 + exp (α + βm + ξ )

139

140

CHAPTER 2 Inference for marketing decisions

FIGURE 6 Sampling distributions for logit models.

m = δ z + ν 1 ξ ∼ N 0, ν ρ

ρ 1

Here we have arbitrarily written the probability of choice alternative 1. We used the same parameters as in the simulation of the weak instruments problem for the linear IV with one instrument, N = 100, ρ = .25, and δ is set for that first-stage Rsquared is 0.10. Fig. 6 shows the sampling distribution of the standard ML estimator which ignores the endogeneity (shown by the darker “magenta” histogram). We used a control-function approach to compute the IV estimator for this problem under the assumption of a linear first stage: We regressed the endogenous rhs variable, m, on the instruments and used the residual from this regression to create additional explanatory variables which were included in the logit model. In particular, we used the residual, the residual-squared, the residual-cubed, and exp of the residual as control functions. The sampling distribution of this estimator is shown by the yellow histogram. The sampling performance of the IV estimator is considerably inferior to that of the MLE which ignores the endogeneity problem. The fat tails of the IV estimator contribute to a RMSE of about twice that of the MLE. The IV estimator appears to be approximately unbiased, however, this goes away quickly if you increase the number of instruments, while the RMSE remains high.

5 Instruments and endogeneity

5.8 Conclusions regarding the statistical properties of IV estimators We have seen that an IV estimator can exhibit substantial finite sample bias and tremendous variability, particularly in the case of small samples, non-normal errors, and weak to moderate instruments. The failure of standard asymptotic theory applies not just to extreme case of very weak instruments, but also to cases of moderate strength instruments. All of these results assume that the instruments used are valid. If there are even “small” violations of the exclusion restriction (i.e. the instruments have a direct effect on y), then the statistical performance of the IV estimator degrades even further. The emphasis in the recent econometrics literature on instruments is on improved testing and confidence interval construction. This emphasis is motivated by a “theorytesting” mentality. That is, researchers want to test hypotheses regarding whether or not a causal effect exists. The emphasis is not on predicting y conditional on a change in x. This exposes an important difference between economics and marketing applications. In many (but not all) marketing applications, we are more interested in conditional prediction rather than testing a hypothesis. If our goal is to help the firm make better decisions, then the first step is to help the firm make better predictions of the effects of changes in marketing mix variables. One may actually prefer estimators which do not attempt to adjust for endogeneity (such as OLS) for this purpose. OLS can have a much lower RMSE than an IV method. In sum, IV methods are costly to apply and prone to specification errors. This serves to underscore the need for caution and the requirement that arguments in support of potential endogeneity bias and validity must be strong.

5.9 Endogeneity in models of consumer demand Much empirical research in marketing is directed toward calibrating models of product demand (see, for example, the Chintagunta and Nair, 2011 survey and Chapter 1 of this volume). In particular, there has been a great deal of emphasis on discrete choice models of demand for differentiated products (for an overview, see pp. 4178–4204 of Ackerberg et al., 2007). Typically, these are simple logit models which allow marketing mix variables to influence demand and account for heterogeneity. An innovation of Berry et al. (1995) was to include a market wide error term in this logit structure so that the aggregate demand system is not a deterministic function of product characteristics and marketing mix variables. exp α cj + β mj t + ξj t (53) MS (j |t) = J p (α, β) dαdβ

j =1 exp α cj t + β mj t + ξj t There are J products observed either in T time periods or in a cross section of T markets. cj is a vector characteristic of the j th product, mj t is a vector of market mix variables such as price and promotion for the j th product, and ξj t represents an error term which is often described as a “demand shock.” The fact that consumers are heterogeneous is reflected by integrating the logit choice probabilities over a distribution of parameters which represents the distribution of preferences in the market. This ba-

141

142

CHAPTER 2 Inference for marketing decisions

sic model represents a sort of intersection between marketing and I/O and provides a useful framework to catalog the sorts of instruments used in the literature.

5.9.1 Price endogeneity (53) provides a natural motivation for concerns regarding endogeneity using an omitted variables interpretation. If we could observe the ξj t variable, then we would simply include this variable in the model and we would be able to estimate the β parameters which represent the sensitivity to marketing mix variables. However, researchers do not observe ξj t and it is entirely possible that firms have information regarding ξj t and set marketing mix variables accordingly. One of the strongest argument made for endogeneity is the argument of Berry et al. (1995) that if ξj t represents an unobserved product characteristic (such as some sort of product quality) that we would expect that firms would set price as a function of ξj t as well as of the observed characteristics. This a very strong argument, when applied in marketing applications, as the observed characteristics of many consumer products are often limited to packaging, package size, and a subset of ingredients. For consumer durable goods, the observed characteristics are also limited as it is difficult to quantify design, aesthetic, and performance characteristics. We might expect that price and unobserved quality are positively correlated, giving rise to a classic downward endogeneity bias in price sensitivity. This would result in what appears to be sub-optimal prices. There are many possible interpretations of the ξj t terms other than the interpretation as an unobserved product characteristic. If the demand is observed in crosssection of markets, we might interpret the ξj t as unobserved market characteristics that make particular brands more or less attractive in this market. If the t index is time, then others have argued that the ξj t represent some sort of unobserved promotional or advertising shock. These arguments for endogeneity bias in the price coefficient have led to the search for valid instruments for the price variable. The obvious place to look for instruments is the supply side which consists of cost and competitive information. The idea here is that costs do not affect demand and therefore serve to push around price (via some sort of mark-up equation) but are uncorrelated with the unobserved demand shock, ξj t . Similarly, the structure of competition should be a driver of price but not of demand. If a particular product lies in a crowded portion of the product characteristics space, then we might expect smaller mark-ups than a product that is more isolated. The problem with cost-based instruments is lack of variability and observability. For some types of consumer products, input costs such as raw material costs may be observable and variable, but other parts of marginal cost may be very difficult to measure. For example, labor costs, measured by the Bureau of Labor Statistics, are based on surveys with a potentially high measurement error. Typically, those costs that are observable do not vary by product so that input costs are not usable as instruments for the highly differentiated product categories studied in marketing applications. If the data represent a panel of markets observed over time, then the suggestion of Hausman (1996) can be useful. The idea here is that the demand shocks are not

5 Instruments and endogeneity

correlated across markets but that costs are.30 If this is the case, then the prices of products in other markets would be valid instruments. Hausman introduced this idea to get around the problem that observable input costs don’t vary by product. To evaluate the usefulness and validity of the Hausman approach, one must take a critical view of what the demand shocks represent. If these error terms represent unobservable market level demand characteristics which do not vary over time, then simply including market fixed effects would eliminate the need for instruments. One has to argue that the unobserved demand shocks are varying both by market and by time period in the panel. For this interpretation, authors often point to unobserved promotional efforts such as advertising and coupon drops. If these promotional efforts have a much lower frequency than the sampling frequency of the data (e.g. feature promotions are planned quarterly but we observe weekly demand data), then it is highly unlikely that these unobservables explain much of the variation in demand and that this source of endogeneity concerns is weak. For products with few observable characteristics and for cross-sectional data, Berry et al. (1995) make a strong argument for price endogeneity. However, their arguments for the use of characteristics of other products as potential instruments are not persuasive for marketing applications. Their argument is that the characteristics of competing products will influence mark-up independent of demand shocks. This may be reasonable. However, their assumption that firms observed characteristics are exogenous and set independently of the unobservable characteristic is very likely to be incorrect. Firms set the bundle of both observed and unobserved characteristics jointly. Thus, the instruments proposed by Berry et al. (1995) are unlikely to be valid. With panel data, there is no need to use instruments as simple product specific fixed effects would be sufficient to remove the “endogeneity” bias problem as long as the unobserved product characteristics do not vary across time.

5.9.2 Conclusions regarding price endogeneity Price endogeneity has received a great deal of attention in the recent marketing literature. There is no particular reason to single out price as the one variable in the marketing mix which has potentially the greatest problems of endogeneity bias. In fact, the source of variation in prices in most marketing datasets consists of cost variation (including wholesale price variation) and the ubiquitous practice of temporary price promotions or sales. Within the same market over time, it is hard to imagine what the unobservable demand shocks are that vary so much over time and by brand. Retailers set prices using mark-up rules and other heuristics that do not depend on market wide variables. Cost variables are natural price instruments but lack variation over time and by brand. Wholesale prices, if used as instruments, will confuse long and short run price effects. We are not aware of any economic arguments which can justify the use of lagged prices as instruments. In summary, we believe that, with panel or time-series data, the emphasis on price endogeneity has been misplaced. 30 Given that, for many products, there are national advertising and promotional campaigns suggests that

the Hausman idea will only work if there are advertising expenditure variables included in the model.

143

144

CHAPTER 2 Inference for marketing decisions

5.10 Advertising, promotion, and other non-price variables While the I/O literature has focused heavily on the possibility of price endogeneity, there is no reason to believe, a priori, that the endogeneity problem is confined to price. In packaged goods, demand is stimulated by various “promotional” activities which an include what amount to local forms of advertising from display signage, direct mail, and newspaper inserts. In the pharmaceutical and health care products industry, large and highly compensated sales forces “call on” doctors and other health care professionals to promote products (this is often called “detailing”). In many product categories, there is very little price variation but a tremendous expenditure of effort on promotional activities such as detailing. This means that for many product categories, the advertising/promotional variables are more important than price. An equally compelling argument can be made that these non-price marketing mix variables are subject to the standard “omitted” variable endogeneity bias problem. For example, advertising would seem to be a natural variable that is chosen as a function of demand unobservables. Others have suggested that advertising is determined simultaneously along with sales as firms set advertising expenditures as a function of the level of sales. In fact, the classical article (Bass, 1969) uses linear simultaneous equations models to capture this “feedback” mechanism for advertising. The standard omitted variables arguments apply no less forcefully to non-price marketing mix variables. This motivates a search for valid instruments for advertising and promotion. Other than costs of advertising and promotion, there is no set of instruments that naturally emerge as valid and strong instruments. Even the cost variables are unlikely to be brand or product-specific and may vary only slowly over time, maximizing the “weak” instruments problem. We have observed that researchers have argued persuasively that some kinds of panel data can be used to infer causal effects of advertising by using fixed effects to control for various concerns that changes advertising allocations over time or that specific markets receive allocations that depend on possible responsiveness to the ad or campaign in question (see, for example, Klapper and Hartmann, 2018). In the panel setting, researchers limit the source of variation so as to reduce concerns for endogeneity bias. The IV approach is to affirmatively seek out “clean” or exogenous variation. In the same context of measuring return to Super Bowl ads, StephensDavidowitz et al. (2015) use whether or not the home team is in the Super Bowl as an explicit instrument (exposure to the ad because viewership changes if the home team is in the Super Bowl which they argue is genuinely unpredictable or random at the point at which advertisers bid on Super Bowl slots. This is clearly a valid instrument in the same way as Angrist’s Vietnam draft lottery is a valid instrument and no further proof is required. However, such truly random instruments are extremely rare.

5.11 Model evaluation The purpose of causal inference in marketing applications is to inform firm decisions. As we have argued, in order to optimize actions of the firm, we must consider counterfactual scenarios. This means that the causal model must predict well in con-

6 Conclusions

ditions that can be different from those observed in the data. The model evaluation exercise must validate the model’s predictions across a wide range of different policy regimes. If we validate the model under a policy regime that is the same or similar to the observational data, then that validation exercise will be uninformative or even misleading. To see this point clearly, consider the problem of making causal inferences regarding a price elasticity. The object of causal inference is the true price elasticity in a simple log-log approximation. ln Qt = α + η ln Pt + εt Imagine that there is an “endogeneity” problem in the observational data in which the firm has been setting price with partial knowledge of the demand shocks which are in the error term. Suppose further, that the firm raises price when it anticipates a positive demand shock. This means that a OLS estimate of the elasticity will be too small and we might conclude, erroneously, that the firm should raise its price even if the firm is setting prices optimally. Suppose we reserve a portion of our observational data for out-of-sample validation. That is, we will fit the log-log regression on observations, 1, 2, . . . , T0 , reserving observations T0 + 1, . . . , T for validation. If we were to compare the performance of the inconsistent and biased OLS estimator of the price elasticity with any valid causal estimate using our “validation” data, we would conclude that OLS is superior using anything like the MSE metric. This is because OLS is a projection-based estimator that seeks to minimize mean squared error. The only reason OLS will fare poorly in prediction in this sort of exercise is if the OLS model is highly over-parameterized and the OLS procedure will over-fit the data. However, the OLS estimator will yield non-profit maximizing prices if used in a price optimization exercise because it is inconsistent for the true causal elasticity parameter. Thus, we must devise a different validation exercise in evaluating causal estimates. We must either find different policy regimes in our observational data or we must conduct a validation experiment.

6 Conclusions Marketing is an applied field where the emphasis is on providing advice to firms on how to optimize their interaction with customers. The decision-oriented emphasis motivates an interest in both inference paradigms compatible with decision making as well as response functions which are nonlinear. Much of the recent emphasis in econometrics is focused on estimating effects of a policy variable via linear approximations or even via a “local” treatment effect or (LATE). Unfortunately, local treatment effects are only a beginning of an understanding of policy effects and policy optimization. This means that, in marketing, we will have to impose something like a parametric model (which can be achieved via strong priors and non-parametric approaches) in order to make progress on the problem of optimizing marketing policies.

145

146

CHAPTER 2 Inference for marketing decisions

On the positive side, marketing researchers have access to an unprecedented amount of detailed observational data regarding the reactions of customers to a ever increasing variety of firm actions (for example, the many types of advertising or marketing communications possible today). Large scale randomized experimentation holds out the possibility of valid causal inferences regarding the effects of marketing actions. Thus, the amount of both observational and experimental data at the disposal of a marketing researcher is truly impressive and growing as new sources of spatial (via phone or mobile device pinging) and other types of data are possible. Indeed, some data ecosystems such as Google/Facebook/Amazon in the U.S. and Alibaba/JD in China aim to amass data on purchases, search and product consideration, advertising exposure, and social interaction into one large dataset. At the same time, firms are becoming increasingly sophisticated in targeting customers based on information regarding preferences and responsiveness to marketing actions. This means that the evaluation of targeted marketing actions may be difficult or even impossible with even the richest observational data and may require experimentation. We must use observational data to the greatest extent possible as it is impossible to optimize fully marketing actions on the basis solely of experimentation. The emphasis in the econometrics literature has been on using only a subset of the variation in observational data so as to avoid concerns that invalidate causal inference. We can ill afford a completely purist point of view that we should only use a tiny fraction of the variation in our data to estimate causal effects or optimize marketing policies. Instead, our view is that we can restrict variation in observational data only when there are strong prior reasons to expect that use of a given dimension of variation will produce inconsistent estimates of marketing effects. Experimentation must be combined with observational data to achieve the goals of marketing. It is highly unlikely that randomized experimentation will ever completely replace inferences based on observational data. Many problems in marketing are not characterized by attempts to infer about an average effect size but, rather, to optimize firm actions over a wide range of possibilities. Optimization cannot be achieved in any realistic situation only by experimental means. It is likely, therefore, that experiments should play a role in estimating some of the critical effects but models calibrated on observational data will still be required to make firm policy recommendations. Experiments could also be used to test key assumptions in the model such as functional form or exogeneity assumptions without requiring that policy optimization be the direct result of experimentation.

References Ackerberg, D., Benkard, C.L., Berry, S., Pakes, A., 2007. Econometric tools for analyzing market outcomes. In: Handbook of Econometrics. Elsevier Science, pp. 4172–4271 (Chap. 63). Allenby, G.M., Rossi, P.E., 1999. Marketing models of consumer heterogeneity. Journal of Econometrics 89, 57–78. Angrist, J.D., Krueger, A.B., 1991. Does compulsory school attendance affect schooling and earnings? The Quarterly Journal of Economics 106, 979–1014.

References

Angrist, J.D., Pischke, J.-S., 2009. Mostly Harmless Econometrics. Princeton University Press, Princeton, NJ, USA. Antoniak, C.E., 1974. Mixtures of Dirichlet processes with applications to Bayesian nonparametric problems. The Annals of Statistics 2 (6), 1152–1174. Bass, F.M., 1969. A simultaneous equation study of advertising and sales of cigarettes. Journal of Marketing Research 6 (3), 291–300. Bernardo, J.M., Smith, A.F.M., 1994. Bayesian Theory. John Wiley & Sons. Berry, S., Levinsohn, J., Pakes, A., 1995. Automobile prices in market equilibrium. Econometrica 63 (4), 841–890. Blake, T., Nosko, C., Tadelis, S., 2015. Consumer heterogeneity and paid search effectiveness: a large scale field experiment. Econometrica 83, 155–174. Chen, Y., Yang, S., 2007. Estimating disaggregate models using aggregate data through augmentation of individual choice. Journal of Marketing Research 44, 613–621. Chintagunta, P.K., Nair, H., 2011. Discrete-choice models of consumer demand in marketing. Marketing Science 30 (6), 977–996. Conley, T.G., Hansen, C.B., Rossi, P.E., 2012. Plausibly exogenous. Review of Economics and Statistics 94 (1), 260–272. Deaton, A., Cartright, N., 2016. Understanding and Misunderstanding Randomized Controlled Trials. Discussion Paper 22595. NBER. Dube, J., Hitsch, G., Rossi, P.E., 2010. State dependence and alternative explanations for consumer inertia. The Rand Journal of Economics 41 (3), 417–445. Dubé, J.-P., Fox, J.T., Su, C.-L., 2012. Improving the numerical performance of static and dynamic aggregate discrete choice random coefficients demand estimation. Econometrica 80 (5), 2231–2267. Dubé, J.-P., Hitsch, G., Rossi, P.E., 2018. Income and wealth effects in the demand for private label goods. Marketing Science 37, 22–53. Dubé, J.-P., Misra, S., 2018. Scalable Price Targeting. Discussion Paper. Booth School of Business, University of Chicago. Eckles, D., Bakshy, E., 2017. Bias and High-Dimensional Adjustment in Observational Studies of Peer Effects. Discussion Paper. MIT. Fennell, G., Allenby, G.M., Yang, S., Edwards, Y., 2003. The effectiveness of demographic and psychographic variables for explaining brand and product use. Quantitative Marketing and Economics 1, 223–244. Fruhwirth-Schnatter, S., 2006. Finite Mixture and Markov Switching Models. Springer. Gelman, A., Carlin, J.B., Stern, H.S., Rubin, D.B., 2004. Bayesian Data Analysis. Chapman and Hall. George, E.I., McCulloch, R.E., 1997. Approaches for Bayesian variable selection. Statistica Sinica 7, 339–373. Gilbride, T.J., Allenby, G.M., 2004. A choice model with conjunctive, disjunctive, and compensatory screening rules. Marketing Science 23 (3), 391–406. Gordon, B.R., Zettelmeyer, F., 2017. A Comparison of Approaches to Advertising Measurement. Discussion Paper. Northwestern University. Griffin, J., Quintana, F., Steel, M.F.J., 2010. Flexible and nonparametric modelling. In: Geweke, J., Koop, G., Dijk, H.V. (Eds.), Handbook of Bayesian Econometrics. Oxford University Press. Hansen, C., Hausman, J., Newey, W., 2008. Estimation with many instrumental variables. Journal of Business and Economic Statistics 26 (4), 398–422. Hansen, L.P., 1982. Large sample properties of generalized method of moments estimators. Econometrica 50 (4), 1029–1054. Hartmann, W.R., Nair, H.S., Narayanan, S., 2011. Identifying causal marketing mix effects using a regression discontinuity design. Marketing Science 30, 1079–1097. Hausman, J., 1996. The valuation of new goods under perfect and imperfect competition. In: Bresnahan, T., Gordon, R. (Eds.), The Economics of New Goods, vol. 58. University of Chicago, pp. 209–237. Hayashi, F., 2000. Econometrics. Princeton University Press. Heckman, J., Singer, B., 1984. A method for minimizing the impact of distributional assumptions in econometric models. Econometrica 52 (2), 271–320.

147

148

CHAPTER 2 Inference for marketing decisions

Heckman, J., Vytlacil, E.J., 2007. Econometric evaluation of social programs. In: Heckman, J., Leamer, E. (Eds.), Handbook of Econometrics, vol. 6B. Elsevier, pp. 4779–4874. Hitsch, G., Misra, S., 2018. Heterogeneous Treatment Effects and Optimal Targeting Policy Evaluation. Discussion Paper. Booth School of Business, University of Chicago. Hoch, S.J., Dreze, X., Purk, M.E., 1994. EDLP, Hi-Lo, and margin arithmetic. Journal of Marketing 58 (4), 16–27. Imbens, G.W., Kolesar, M., 2016. Robust standard errors in small samples: some practical advice. Review of Economics and Statistics 98 (4), 701–712. Imbens, G.W., Lemieux, T., 2008. Regression discontinuity designs: a guide to practice. Journal of Econometrics 142, 807–828. Imbens, G.W., Rubin, D.B., 2014. Causal Inference. Cambridge University Press. Jiang, R., Manchanda, P., Rossi, P.E., 2009. Bayesian analysis of random coefficient logit models using aggregate data. Journal of Econometrics 149, 136–148. Johnson, G.A., Lewis, R.A., Nubbemeyer, E.I., 2017. Ghost ads: improving the economics of measuring online ad effectiveness. Journal of Marketing Research 54, 867–884. Klapper, D., Hartmann, W.R., 2018. Super bowl ads. Marketing Science 37, 78–96. Lewis, R.A., Rao, J.M., 2014. The Unfavorable Economics of Measuring the Returns to Advertising. Discussion Paper. NBER. Lodish, L., Abraham, M., 1995. How T.V. advertising works: a meta-analysis of 389 real world split cable T.V. advertising experiments. Journal of Marketing Research 32 (2), 125–139. Manchanda, P., Rossi, P.E., Chintagunta, P.K., 2004. Response modeling with nonrandom marketing-mix variables. Journal of Marketing Research 41, 467–478. McFadden, D.L., Train, K.E., 2000. Mixed MNL models for discrete response. Journal of Applied Econometrics 15, 447–470. Moreira, M.J., 2003. A conditional likelihood ratio test for structural models. Econometrica 71, 1027–1048. Musalem, A., Bradlow, E.T., Raju, J.S., 2009. Bayesian estimation of random-coefficients choice models using aggregate data. Journal of Applied Econometrics 24, 490–516. Narayanan, S., Nair, H., 2013. Estimating causal installed-base effects: a bias-correction approach. Journal of Marketing Research 50 (1), 70–94. Neyman, J., 1990. On the application of probability theory to agricultural experiments: essay on principles. Statistical Science 5, 465–480. Nickell, S., 1981. Biases in dynamic models with fixed effects. Econometrica 49 (6), 1417–1426. Park, T., Casella, G., 2008. The Bayesian lasso. Journal of the American Statistical Association 103 (482), 681–686. Petrin, A., Train, K., 2010. Control function corrections for unobserved factors in differentiated product models. Journal of Marketing Research 47 (1), 3–13. Robert, C.P., Casella, G., 2004. Monte Carlo Statistical Methods, second ed. Springer. Rossi, P.E., 2014a. Bayesian Non- and Sem-Parametric Methods and Applications. The Econometric and Tinbergen Institutes Lectures. Princeton University Press, Princeton, NJ, USA. Rossi, P.E., 2014b. Even the rich can make themselves poor: a critical examination of IV methods in marketing applications. Marketing Science 33 (5), 655–672. Rossi, P.E., Allenby, G.M., McCulloch, R.E., 2005. Bayesian Statistics and Marketing. John Wiley & Sons. Sahni, N., 2015. Effect of temporal spacing between advertising exposures: evidence from online field experiments. Quantitative Marketing and Economics 13 (3), 203–247. Scott, S.L., 2014. Multi-Armed Bandit Experiments in the Online Service Economy. Discussion Paper. Google Inc. Shapiro, B., 2018. Positive spillovers and free riding in advertising of pharmaceuticals: the case of antidepressants. Journal of Political Economy 126 (1). Stephens-Davidowitz, S.H., Varianc, H., Smith, M.D., 2015. Super Returns to Super Bowl Ads? Discussion Paper. Google Inc. Stock, J.H., Wright, J.H., 2000. GMM with weak identification. Econometrica 68 (5), 1055–1096.

References

Stock, J.H., Wright, J.H., Yogo, M., 2002. A survey of weak instruments and weak identification in generalized method of moments. Journal of Business and Economic Statistics 20 (4), 518–529. Woolridge, J.M., 2010. Econometric Analysis of Cross Section and Panel Data. MIT Press, Cambridge, MA, USA.

149

CHAPTER

Economic foundations of conjoint analysis

3

Greg M. Allenbya , Nino Hardta , Peter E. Rossib,∗ a Fisher

b Anderson

College of Business, Ohio State University, Columbus, OH, United States School of Management, University of California at Los Angeles, Los Angeles, CA, United States ∗ Corresponding author: e-mail address: [email protected]

Contents 1 Introduction ...................................................................................... 2 Conjoint analysis ................................................................................ 2.1 Discrete choices ................................................................... 2.2 Volumetric choices ................................................................ 2.3 Computing expected demand .................................................... 2.4 Heterogeneity ...................................................................... 2.5 Market-level predictions .......................................................... 2.6 Indirect utility function ........................................................... 3 Measures of economic value .................................................................. 3.1 Willingness to pay (WTP) ......................................................... 3.1.1 WTP for discrete choice ....................................................... 3.1.2 WTP for volumetric choice .................................................... 3.2 Willingness to buy (WTB) ......................................................... 3.2.1 WTB for discrete choice ....................................................... 3.2.2 WTB for volumetric choice .................................................... 3.3 Economic price premium (EPP) ................................................. 4 Considerations in conjoint study design..................................................... 4.1 Demographic and screening questions ......................................... 4.2 Behavioral correlates .............................................................. 4.3 Establishing representativeness ................................................. 4.4 Glossary ............................................................................. 4.5 Choice tasks ........................................................................ 4.6 Timing data ......................................................................... 4.7 Sample size......................................................................... 5 Practices that compromise statistical and economic validity ........................... 5.1 Statistical validity ................................................................. 5.1.1 Consistency ...................................................................... 5.1.2 Using improper procedures to impose constraints on partworths .... 5.2 Economic validity .................................................................. 5.2.1 Non-economic conjoint specifications ...................................... 5.2.2 Self-explicated conjoint ........................................................ 5.2.3 Comparing raw part-worths across respondents ......................... Handbook of the Economics of Marketing, Volume 1, ISSN 2452-2619, https://doi.org/10.1016/bs.hem.2019.04.002 Copyright © 2019 Elsevier B.V. All rights reserved.

152 154 154 156 158 159 160 160 161 161 162 163 163 164 164 164 165 166 167 168 170 171 173 174 175 175 176 177 177 177 178 178

151

152

CHAPTER 3 Economic foundations of conjoint analysis

5.2.4 Combining conjoint with other data ......................................... 6 Comparing conjoint and transaction data ................................................... 6.1 Preference estimates .............................................................. 6.2 Marketplace predictions .......................................................... 6.3 Comparison of willingness-to-pay (WTP) ....................................... 7 Concluding remarks............................................................................. Technical appendix: Computing expected demand for volumetric conjoint................ References............................................................................................

179 179 180 184 186 188 189 190

1 Introduction Conjoint analysis is a survey-based method used in marketing and economics to estimate demand in situations where products can be represented by a collection of features and characteristics. It is estimated that 14,000 conjoint studies are conducted yearly by firms with the goal of valuing product features and predicting the effects of changes in formulation, price, advertising, or method of distribution (Orme, 2010). Conjoint analysis is often the only practical solution to the problem of predicting demand for new products or for new features of products that are not present in the marketplace. The economic foundations of conjoint analysis can be traced to seminal papers on utility measurement in economics (Becker et al., 1964) and mathematical psychology (Luce and Tukey, 1964). Conjoint analysis was introduced to the marketing literature by Green and Rao (1971) as an approach to take ranked input data and estimate utility part-worths for product attributes and their levels (e.g., amount of computer memory). The use of dummy-variables to represent the utility of attribute-levels provided a flexible approach to measure preferences without imposing unnecessary assumptions on the form of the utility function. Later, nonparametric and semiparametric models of choice were developed to avoid incorrect assumptions about the distribution of the error term (Matzkin, 1991, 1993). This earlier work does not lend itself to models of heterogeneity that employ a random-effect specification across respondents (Allenby and Rossi, 1998) which have become popular in marketing. Virtually all conjoint studies done today are in the form of a “choice-based” conjoint in which survey respondents are offered the choice between sets of products represented by combinations of well-defined product attributes or features. For many durable good products, the assumption of mutually exclusive unit demand might be appropriate. In these cases, respondents can only choose one of the product offerings (or the “outside” alternative) with unit demand. In other situations, it may be more reasonable to allow respondents to choose a subset of products and to consume continuous quantities. This form of conjoint analysis is called “volumetric” conjoint and is not practiced very extensively due to lack of software for analysis of volumetric conjoint data. As a companion to this chapter, we hope to promote use of volumetric conjoint designs by providing general purpose software in an R package. Both standard and volumetric conjoint analysis can best be viewed as a simulation of market conditions set up via an experimental design. As such, choice-based conjoint (if executed well) is closer to revealed rather than stated preferences. Virtually

1 Introduction

all researchers in marketing accept the premise that choice-based conjoint studies offer superior recovery of consumer preferences than a pure stated preference method in which direct elicitation of preferences is attempted. However, there is remarkably little research in the conjoint literature that attempts to compare preferences estimated from choice-based conjoint with preferences estimated from consumer panel observational data (cf. Louviere and Hensher, 1983). Even with existing products for which marketplace data is observed, there are many situations in which it is not possible to identify consumer preferences. In many markets, there is very little price variation (at least in the short run where preferences might be assumed to be stable) and the set of existing products represent only a very sparse set of points in the product characteristic space. In these situations, conjoint has a natural appeal. In still other situations, only aggregate demand data is available. The considerable influence of the so-called BLP (Berry et al., 1995) approach notwithstanding, many recognize that it is very difficult to estimate both preferences and the distribution of preferences over consumers from only aggregate data. Econometricians have focused considerable attention on the problem of “endogeneity” in demand estimation. With unobservable product characteristics or other drivers of demand, not all price variation may be usable for preference inference. In these situations, econometricians advocate a variety of instrumental variable and other techniques which further restrict the usable portion of price variation. In contrast, conjoint data has all product characteristics well-specified and the levels of price and characteristics are chosen by experimental design. Therefore, there is no “endogeneity” problem in conjoint analysis - all variation in both product attributes and price is exogenous and usable in preference estimation. This is both a great virtue as well as a limitation of conjoint analysis. Conjoint analysis avoids the problems that plague valid demand inference with observational data but at the cost of requiring that all product attributes are well-specified and can be described to the survey respondents in a meaningful way. Conjoint researchers have shown great inventiveness in describing product features but always face the limitation that the analysis is limited to the features included. Respondents are typically instructed to assume that all unspecified features are constant across the choice alternatives presented in the conjoint survey. In some situations, this assumption can seem strained. It should be emphasized that conjoint analysis can only estimate demand. Conjoint survey data, alone, cannot be used to compute market equilibrium outcomes such as market prices or equilibrium product positioning in characteristics space. Clearly, supply assumptions and cost information must be added to conjoint data in order to compute equilibrium quantities. Conjoint practitioners have chosen the unfortunate term “market simulation” to describe demand predictions based on conjoint analyses. These practices are not simulations of any market outcome and must be understood for their limitations as we discuss below. The purpose of this chapter in the Handbook of the Economics of Marketing is to provide an introduction to the economic foundations of modern conjoint analysis. We begin in Section 2 with a discussion of the economic justification for conjoint analysis, examining economic models of choice that can be adapted for conjoint analysis.

153

154

CHAPTER 3 Economic foundations of conjoint analysis

Section 3 discusses economic measures of value that can be derived from a conjoint study. Section 4 discusses survey requirements for conducting valid conjoint analysis, covering topics such as the use of screening questions and the task of selecting and describing product attributes. Section 5 considers aspects of current practice in conjoint analysis that are not supported by economic theory and should be avoided. Section 6 provides evidence that demand data and conjoint data provide similar estimates of feature value. Section 7 offers concluding remarks.

2 Conjoint analysis Conducting a valid conjoint analysis requires both a valid procedure for collecting conjoint data as well as a valid model for analysis. Conjoint data is not the same as stated preference data in that respondents in conjoint surveys are not simply declaring what they know about their utility or, equivalently, their willingness to pay for products and their features. Instead, respondents react to hypothetical purchase scenarios involving products, which may not currently exist in the marketplace, that are specifically described and priced. Respondents provide expected demand quantities across a variety of buying scenarios in which product attributes and prices change. Well-executed conjoint surveys can approximate revealed preference data available from sources like shopper panels by instructing respondents to think about a specific buying context and focus on a subset of product alternatives. The economic foundation of conjoint analysis rests on using valid economic models for analyzing this data. In this chapter we discuss two models based on direct utility maximization – the discrete choice model based on extreme value errors that leads to the standard logit specification, and a volumetric demand model that allows for both corner and interior solutions as well as continuous demand quantities. In both specifications, coefficients associated with the marginal utility of an offering are parameterized in terms of product characteristics.

2.1 Discrete choices A simple model of choice involves respondents selecting just one choice alternative. Examples include demand for durable goods, such as a car, where only one good is demanded. Models for the selection of one good are referred to as discrete choice models (Manski et al., 1981) and can be motivated by a linear1 direct utility function: u (x, z) =

ψ k xk + ψz z

(1)

k

1 The linearity assumption is only an approximation in that we expect preferences to exhibit satiation. However, since demand quantities are restricted to unit demand, satiation is not a consideration.

2 Conjoint analysis

where xk denotes the demand quantities, constrained to be either zero or one for the choice alternative k, z is an outside good, and ψk is the marginal utility of consuming the kth good. Respondents are assumed to make choices to maximize their utility u(x, z) subject to budget constraint: pk x k + z = E (2) k

where the price of the outside good is set to $1.00. The outside good represents the decision to allocate some or all of the budgetary allotment E outside of the k goods in the choice task, including the option to delay a purchase. The budgetary allotment is the respondent’s upper-limit of expenditure for the goods under study and is not meant to indicate his or her annual income. An additive error term is introduced into the model specification for each good to allow for factors affecting choice that are known to the respondent but not observed by the researcher. Assuming distributional support for the error term on the real line allows the model to rationalize any pattern of respondent choices. There are some error realizations, however unlikely, that lead to an observed choice being utility maximizing. The utility for selecting good j is therefore equal to: (3) u xj = 1, z = E − pj = ψj + ψz E − pj + εj and the utility for selecting the ‘no-choice’ option is: u (x = 0, z = E) = ψz E + εz

(4)

Choice probabilities are obtained from the utility expressions by integrating over regions of the error space that coincide with a particular choice having highest utility. Assuming extreme value EV(0,1)2 errors leads to the familiar logit choice probability: Pr (j ) = Pr ψj + ψz E − pj + εj > ψk + ψz (E − pk ) + εk for any k = j exp ψj + ψz E − pj = exp ψz (E) + k exp ψk + ψz (E − pk ) exp ψj − ψz pj (5) = 1 + k exp ψk − ψz pk The discrete choice model is used extensively in conjoint applications because of its computational simplicity. Observed demand is restricted to two points {0, 1} for each of the inside goods (x) and the constant marginal utility assumption leads to the

2 The cdf of the EV(0,1) error term is F(x) = exp[-exp(-x)] with location parameter equal to zero and scale parameter equal to one.

155

156

CHAPTER 3 Economic foundations of conjoint analysis

budgetary allotment dropping out of the expression for the choice probability.3 The marginal utility for the outside good, ψz , is interpreted as a price coefficient and measures the disutility for paying a higher price. Choice alternatives with prices that are larger than the budgetary allotment are screened-out of the logit choice probability. That is, only goods for which pk ≤ E are included in the logit probability expression (see Pachali et al., 2017). Finally, expected demand is equal to the choice probability, which is convenient for making demand predictions. Conjoint analysis uses the discrete choice model by assuming marginal utility, ψj , is a linear function of brand attributes: ψj = aj β

(6)

where aj denotes the attributes of good j . The attributes are either coded using a dummy variable specification where one of the attribute-levels is selected as the null level of the attribute, or using effects-coding that constrain the sum of coefficients to equal zero for an attribute (Hardy, 1993). For either specification, conjoint analysis measures the value of changes among the levels of an attribute. Since the valuation of the attribute-levels are jointly determined through Eq. (6), the marginal utilities for a respondent are comparable across the attributes and features included in the analysis. The marginal utility of a product feature can therefore be compared to the utility of changes in the levels of other attributes, including price.

2.2 Volumetric choices A more general model for conjoint analysis is one that introduces non-linearities into the utility function (Allenby et al., 2017): u (x, z) =

ψk k

γ

ln (γ xk + 1) + ln(z)

(7)

where γ is a parameter that governs the rate of satiation of the good. The marginal utility for the inside and outside goods is: ψj ∂u (x, z) = ∂xj γ xj + 1 ∂u (x, z) 1 uz = = ∂z z

uj =

(8)

The marginal utility of the good is equal to ψj when xj = 0 and decreases as demand increases (xj > 0) and as the satiation parameter (γ ) increases. The general solution to maximizing the utility in (7) subject to the budget constraint in (2) is to employ the

3 Constant marginal utility, however, is not required to obtain a discrete choice model as shown by Allenby and Rossi (1991), wherein the budgetary allotment does not drop out.

2 Conjoint analysis

Kuhn-Tucker (KT) optimality conditions: uj uk = pk pj uj uk > if xk > 0 and xj = 0 then λ = pk pj if xk > 0 and xj > 0 then λ =

for all k and j for all k and j

(9)

Assuming that ψj = exp[aj β + εj ] and solving for εj leads to the following expression for the KT conditions: εj = gj εj < gj

if if

xj > 0 xj = 0

where gj = −aj β + ln(γ xj + 1) + ln

(10) (11)

pj E − p x

(12)

The assumption of i.i.d. extreme-value errors, i.e., EV(0, σ ),4 results in a closedform expression for the probability that R of N goods are chosen. The error scale (σ ) is identified in this model because price enters the specification without a separate price coefficient. We assume there are N choice alternatives and R items are chosen: x1 , x2 , . . . , xR > 0,

xR+1 , xR+2 , . . . , xN = 0.

The likelihood (θ ) of the model parameters is proportional to the probability of observing n1 chosen goods (n1 = 1, . . . , R) and n2 goods with zero demand (n2 = R + 1, . . . , N ). The contribution to the likelihood of the chosen goods is in the form of a probability density because of the equality condition in (10) while the goods not chosen contribute as a probability mass because of the inequality condition in (11). We obtain the likelihood by evaluating the joint density of model errors at gj for the chosen goods and integrating the joint density to gi for the goods that are not chosen: (θ ) ∝ p(xn1 > 0, xn2 = 0|θ )

gN

gR+1 ··· f (g1 , . . . , gR , εR+1 , . . . , εN )dεR+1 , . . . , dεN = |JR | −∞ ⎫⎧ ⎫ ⎧−∞ R N ⎬ ⎬ ⎨ ⎨ exp(−gj /σ ) = |JR | exp −e−gi /σ exp −e−gj /σ ⎭⎩ ⎭ ⎩ σ j =1 i=R+1 ⎧ ⎫ N R ⎨ exp(−gj /σ ) ⎬ exp(−gi /σ ) = |JR | exp − ⎩ ⎭ σ j =1

4 F (x) = exp[−e−x/σ ].

i=1

157

158

CHAPTER 3 Economic foundations of conjoint analysis

where f (·) is the joint density for ε and |JR | is the Jacobian of the transformation from random-utility error (ε) to the likelihood of the observed data (x), i.e., |JR | = |∂εi /∂xj |. For this model, the Jacobian is equal to: |JR | =

R k=1

γ γ xk + 1

R k=1

γ xk + 1 pk · +1 γ E − p x

The expression for the likelihood of the observed demand vector xt is seen to be the product of R “logit” expressions multiplied by the Jacobian, where the purchased quantity, xj is part of the value (gj ) of the choice alternative. To obtain the standard logit model of discrete choice we set R = 1, set the scale of the error term to one (σ = 1), and allow the expression for gj to include a price coefficient (i.e., gj = −aj β − ψz pj ). The Jacobian equals one for a discrete choice model because demand (x) enters the KT conditions through the corner solutions only corresponding to mass points. Variation in the specification of the choice model utility function and budget constraint results in different values of (gj ) and the Jacobian |JR |, but not to the general form of the likelihood, i.e., ⎧ ⎫⎧ ⎫ R N ⎨ ⎬⎨ ⎬ p(x|θ ) = |JR | f (gj ) F (gi ) (13) ⎩ ⎭⎩ ⎭ j =1

i=R+1

The analysis of conjoint data arising from either discrete or volumetric choices proceeds by relating utility parameters (ψj ) to product attributes (aj ) as in Eq. (6). It is also possible to specify non-linear mappings from attributes to aspects of the utility function as discussed, for example, in Kim et al. (2016).

2.3 Computing expected demand Demand (xht ) for subject h at time t is a function of parameters of the demand model (θh ), a realization of the vector of error terms (εht ), characteristics of the available choice set (At ), and prices (pt ). We need to obtain expected demand quantities for deriving various measures of economic value. For the discrete choice model, expected demand is expressed as a choice probability, while expected demand for the volumetric demand model does not have a closed-form solution. We therefore introduce D, the function of demand for one realization of model parameters θh and one realization of the error term εht , given the characteristics of the set of alternatives At and corresponding prices pt : xht = D (θh , εht |At , pt )

(14)

Here, θh = βh , ψh,z = βhp for the discrete choice model and θh = {βh , γh , Eh , σh } for the volumetric model. Expected demand is obtained by integrating out the error

2 Conjoint analysis

term and posterior distribution of model parameters θh . First, consider integrating over the error term:

E (xht |θh ) = D (θh , εht |At , pt )p(εht )dεht (15) εht

For volumetric choices, there is no closed form solution for integrating out εht and numerical methods are used to obtain expected demand by simulating a large number of realizations of ε and computing D for each. Expected demand is the average of D values. One way to compute D is to use a general-purpose algorithm for maximization, such as constrOptim in R. A more efficient algorithm is provided in the appendix.

2.4 Heterogeneity Data from a conjoint study can be characterized as being wide and shallow – i.e., wide in the sense that there may be 1000 or more respondents included in the study and shallow in the sense that each respondent provides, at most, about 8-16 responses to the choice task. This type of panel structure is commonly found in marketing applications involving consumer choices. We therefore employ hierarchical models, or a random-effect specification, to deal with the lack of data at the individual respondentlevel. The lower level of the hierarchical model applies the direct utility model to a specific respondent’s choice data, and the upper level of the model incorporates heterogeneity in respondent coefficients. Respondent heterogeneity can be incorporated into conjoint analysis using a variety of random-effect models. The simplest and most widely used is a Normal model for heterogeneity. Denoting all individual-level parameters θh for respondent h we have: θh ∼ Normal(θ¯ , )

(16)

where θh = βh , γh , Eh , σh for the volumetric model, and θh = βh , βph for the discrete choice model. The price coefficient βph is referred to as (ψz ) in Eq. (5). Estimation of this model is easily done using modern Bayesian (Monte Carlo Markov chain) methods as discussed in Rossi et al. (2005). More complicated models for heterogeneity involve specifying more flexible distributions. One option is to specify a mixture of Normal distributions: ϕk Normal(θ¯k , k ) (17) θh ∼ k

(here ϕk are the mixture probabilities) or to add covariates (z) to the mean of the heterogeneity distribution as in a regression model: θh ∼ Normal( zh , )

(18)

159

160

CHAPTER 3 Economic foundations of conjoint analysis

The parameters θ¯ , , and are referred to as hyper-parameters (τ ) because they describe the distribution of other parameters. Covariates in the above expression might include demographic variables or other variables collected as part of the conjoint survey, e.g., variables describing reasons to purchase. Alternatively, one could use a non-parametric distribution of heterogeneity as discussed in Rossi (2014). The advantage of employing Bayesian estimators for models of heterogeneity is that they provide access to individual-level parameters θh in addition to the hyper-parameters (τ ) that describe their distribution. It should be emphasized that the hyper-parameters τ can be shown to be consistently estimated as the sample size, or number of respondents, in the conjoint survey increases. However, the individual-level estimates of θh cannot be shown to be consistent because the number of conjoint question for any one respondent is constrained to be small. Therefore inferences for conjoint analysis should always be based on the hyper-parameters and not on the individual-level estimates.

2.5 Market-level predictions In order to make statements about a population of customers, we need to aggregate the results from the hierarchical model by integrating over the distribution heterogeneity. These statements involve quantities of interest Z, such as expected demand (xt ), or derived quantities based on the demand model such as willingness-to-pay, price elasticities, and associated confidence intervals. We can approximate the posterior distribution of Z by integrating out the hyper-parameters of the distribution of heterogeneity (τ ) and model error (εht ):

p (Z|θh , εht )p (θh |τ ) p (τ ) p(εht )dεht dθh dτ (19) p (Z|Data) = τ

θh

εht

The integration is usually done numerically. Posterior realizations of τ are available when using a Markov chain Monte Carlo estimation algorithm, and can be used to generate individual-level draws of θh . These draws, along with draws of the model error term, can then be used to evaluate posterior estimates of Z. The distribution of evaluations of Z can be viewed as its posterior distribution.

2.6 Indirect utility function The indirect utility function is the value function of the maximum attainable utility of a utility maximization problem. It is useful to define the indirect utility function for the volumetric demand case, as it is the basis for computing economic measures such as willingness-to-pay. Conditional on the characteristics of the choice alternatives A, parameters θh and a realization of the vector error terms εht , the indirect ∗ ): utility function is defined in terms of optimal demand (x∗ht , zht ∗ V (pt , E|A, θh , εht ) = u x∗ht , zht |pt , θh , εht = u D θh , εht |A, pt

(20)

3 Measures of economic value

Eq. (20) can be evaluated by first determining optimal demand and then computing the corresponding value of the direct utility function (Eq. (7)). There is no closed form solution for the value function since there is no closed-form solution to D. Moreover, the error term εht and individual-level parameters (θh ) need to be integrated out to obtain an expression for expected indirect utility. We do this by using the structure of the heterogeneity distribution, where uncertainty in the hyper-parameters (τ ) induces variation on the distribution of individual-level parameters (θh ): V (pt , E|At )

= τ

θh

u (D (θh , εht |At , pt ) p (θh |τ ) p (τ )) p(εht )dεht dθh dτ

(21)

εht

3 Measures of economic value The economic value of a product feature requires an assessment of consumer welfare and its effect on marketplace demand and prices. Measuring the value of a product feature often begins by expressing part-worths (βh ) in monetary terms by dividing by the price coefficient, i.e., βh /βp . This monetization of utility is useful when summarizing results across respondents because utility, by itself, is not comparable across consumers. A monetary conversion of the part-worth of a product feature, however, is not sufficient for measuring economic value because it does not consider the effects of competitors in the market. A criticism of the simple monetization of utility is that consumers never have to pay the maximum they are willing in the marketplace. A billionaire may be willing to pay a large amount of money to attend a sporting event, but does not end up doing so because of the availability of tickets sold by those with lower WTP. Firms in a competitive market are therefore not able to capture all of the economic surplus of consumers. Consumers can switch to other providers and settle for alternatives that are less attractive but still worth buying. Below we investigate three measures of economic value that incorporate the effects of competing products. In theory, WTP can be the monetary value of an entire option or of a featurechange (or improvement) to a given product (Trajtenberg, 1989). As discussed below, WTP for an entire choice option involves the addition of a new model error term for the new product, while a new error term is not added when assessing the effects of a feature-change. The perspective in this chapter is focused on the feature change/improvement.

3.1 Willingness to pay (WTP) WTP is a demand-based estimate of monetary value. The simple monetization of utility to a dollar measure (e.g., βh /βhp ) does not correspond to a measure of consumer welfare unless there is only one good in the market and consumers are forced to select it. The presence of alternatives with non-zero choice probabilities means

161

162

CHAPTER 3 Economic foundations of conjoint analysis

that the maximum attainable utility from a transaction is affected by more than one good. Increasing the number of available choice alternatives increases the expected maximum utility a consumer can derive from a marketplace transaction, and ignoring the effect of competitive products leads to an misstatement of consumer welfare. Any measurement of the economic value of a product feature cannot be made in isolation of the set of available alternatives because it is not known, a priori, which product will be chosen (Lancsar and Savage, 2004). The evaluation of consumer welfare needs to account for private information held by the consumer at the time of choice. This information is represented as the error term in the model, whose value is not realized until the respondent is confronted with a choice. Consumer welfare is determined by the maximum attainable utility of a transaction and cannot be based on the likelihood, or probability that an alternative is utility maximizing. Choice probabilities do not indicate the value, or utility arising from a transaction. Welfare should be measured in terms of the expected maximum attainable utility, (E[max(u(.))]), where the maximization operator is taken over the choice alternatives and the expectation operator is taken over error realizations. The effect of competitive offers can be taken into account by considering respondent choices among all the choice alternatives in the conjoint study. Let A be a J × K matrix that define the set of products in a choice set, where J is the number of choice alternatives and K is the number of product features under study. The rows of the choice set matrix, aj indicates the features of the j th product in the choice set. Similarly, let A∗ be a choice matrix similar to A except that one of its rows is different indicating a different set of features for one of the products. Typically, just one element in the row differs when comparing A to A∗ because the WTP measure typically focuses on what respondents are willing to pay for an enhanced version of one of the attributes. The maximum attainable utility for a given choice set is defined in terms of the indirect utility function: V (p, E|A) = max u(x, z|A)

subject to

x

p x ≤ E

(22)

WTP is defined as the compensating value required to make the utility derived from the feature-poor set, A, equal to the utility derived from the feature-rich set, A∗ : V (p, E + WTP|A) = V (p, E|A∗ )

(23)

3.1.1 WTP for discrete choice The expected maximum attainable monetized utility, or consumer welfare, for a logit demand model can be shown to be equal to (Small and Rosen, 1981; Manski et al., 1981; Allenby et al., 2014b): J ∗ W A, p, θh = {βh , βhp } = E + ln exp βh aj − β hp pj (24) βhp j =1

This expression measures the benefit of spending some portion of the budgetary allotment, E, on one of the choice alternatives. As the attractiveness of the inside goods

3 Measures of economic value

declines, the consumer is more likely to select the outside good and save his or her money. Thus, the lower bound of consumer welfare confronted with an exchange is their budgetary allotment E. The improvement in welfare provided by a feature enhancement can be obtained as the difference of the maximum attainable utility of the enriched and original set of alternatives. J ∗ WTP = ln exp βh aj − β hp pj βhp j =1 J βhp − ln exp βh aj − β hp pj (25) j =1

Ofek and Srinivasan (2002) define an alternative measure, which they refer to as the market value of an attribute improvement (MVAI) based on the change in price needed to restore the aggregate choice share of a good in a feature-rich (A∗ ) relative to a feature-poor (A) choice set. Their calculation, however, does not monetize the change in consumer welfare, or utility, gained by the feature enhancement because choice probabilities do not provide a measure of either. Moreover, the MVAI measure is not a “market” measure of value as there is no market equilibrium calculation. MVAI also only applies to continuous attributes. The gain in utility is a function of both the observable product features and the unobserved error realization that are jointly maximized in a marketplace transaction, thus requiring a consideration of the alternative choices available to a consumer because, prior to observing the choice, the values of the error realizations are not known. Our proposed WTP measure monetizes the expected improvement in the maximized utility that comes from feature-enhanced choices.

3.1.2 WTP for volumetric choice For the volumetric choice model, WTP is the additional budgetary allotment amount that is necessary for restoring the indirect utility of the feature-rich set given the feature-poor set. Conditional on realizations of εht and θh , WTP can be obtained by numerically solving Eq. (26) for WTP: (26) V (p, E + WTP|A, θh , εht ) − V p, E|A∗ , θh , εht = 0 Computation of indirect utility V () has been described in Section 2.6. In the volumetric model, subjects are compensated for the loss in utility in a demand space that extends beyond unit demand. WTP therefore depends on purchase quantities, and are expected to be larger when the affected products have larger purchase quantities.

3.2 Willingness to buy (WTB) WTB is an alternative measure of economic value based on the expected increase in demand for an enhanced offering. It is similar to the measure proposed by Ofek and Srinivasan (2002) but does not attempt to monetize the change in share due to a feature enhancement. Instead, economic value is determined by calculating expected

163

164

CHAPTER 3 Economic foundations of conjoint analysis

increase in revenue or profit due to a feature enhancement, using WTB as an input to that calculation. The increase in demand due to the improved feature is calculated for one offering, holding fixed all of the other offerings in the market.

3.2.1 WTB for discrete choice In a discrete choice model, WTB is defined in terms of the change in market share that can be achieved by moving from a diminished to an improved feature set: WTB = MS(j |p, A∗ ) − MS(j |p, A)

(27)

It is computed as the increase in choice probability for each respondent in the survey, which is then averaged to produce an estimate of the change in aggregate market share. As mentioned above in Section 2.5, statements about the market level behavior require integration over uncertainty in model hyper-parameters (τ ), the resulting distribution of individual-level coefficients (θh ), and the model error (ε).

3.2.2 WTB for volumetric choice We can express WTB as a change in absolute sales or as a change in market share. Since there is no closed form for demand, D, we suggest first simulating a set of realizations of εht , then compute demand for the initial A and changed feature set A∗ conditional on the same set of εht and θh realizations. As shown in Section 2.3, we do this by numerically solving for D and evaluating the change in demand for the two feature sets:

WTBsales = Dj θ, ε|A∗ , p − Dj (θ, ε|A, p) (28)

ε Dj (θ, ε|A∗ , p) Dj (θ, ε|A, p) − (29) WTBshare = ∗ D (θ, ε|A, p) ε D (θ, ε|A , p)

3.3 Economic price premium (EPP) The EPP is a measure of feature value that allows for competitive price reaction to a feature enhancement or introduction. An equilibrium is defined as a set of prices and accompanying market shares which satisfy the conditions specified by a particular concept of equilibrium. In our discussion below, we employ a standard Nash Equilibrium concept for differentiated products using a discrete choice model of demand, not the volumetric version of conjoint analysis. The calculation of an equilibrium price premium requires additional assumptions beyond those employed in a traditional, discrete choice conjoint study: • The demand specification is a standard heterogeneous logit that is linear in the attributes, including prices. • Constant marginal cost for the product. • Single product firms. i.e., each firm has just one offering. • Firms cannot enter or exit the market after product enhancement takes place. • Firms engage in a static Nash price competition.

4 Considerations in conjoint study design

The first assumption can be easily replaced by any valid demand system, including the volumetric demand model discussed earlier. One can also consider multi-product firms. The economic value of a product feature enhancement to a firm is the incremental profits that it will generate: π = π(p eq , meq |A∗ ) − π(p eq , meq |A)

(30)

where π are the profits associated with the equilibrium prices and shares given a set of competing products defined by the attribute matrix A. The EPP is the increase in profit maximizing prices of an enhanced product given some assumptions about costs and competitive offerings. Each product provider, one at a time, determines their optimal price given the prices of other products and the cost assumptions. This optimization is repeated for each provider until an equilibrium is reached where it is not in anyone’s interest to change prices any further. Equilibrium prices are computed for the offerings with features set at the lower level for all the attributes (A), and then recomputed for the offerings set to its higher level (A∗ ). EPP introduces the concept of price competition in the valuation of product features assuming static Nash price competition. In a discrete choice setting, firm profits is π(pj |p−j ) = MS(j |pj , p−j , A)(pj − cj )

(31)

where pj is the price of good j , p−j are the prices of other goods, and cj is the marginal cost of good j . The first-order conditions of the firm are: ∂ ∂π = MS(j |pj , p−j , A)(pj − cj ) + MS(j |pj , p−j , A) ∂pj ∂pj

(32)

The Nash equilibrium is a root of the system of equations defined by the first-order conditions for all J firms. If we define: ⎡ ⎤ ∂π h1 (p) = ∂p 1 ⎢ ⎥ ⎢ h2 (p) = ∂π ⎥ ∂p ⎢ 2 ⎥ h (p) = ⎢ (33) ⎥ .. ⎢ ⎥ . ⎣ ⎦ ∂π hJ (p) = ∂p J then the equilibrium price vector p∗ is a zero of the function h(p).

4 Considerations in conjoint study design Conjoint studies rely on general principles of survey design (Diamond, 2000) in collecting data for analysis. This includes the consideration of an appropriate sampling frame, screening questions, and conjoint design to provide valid data for analysis. Of

165

166

CHAPTER 3 Economic foundations of conjoint analysis

critical concern in designing a conjoint survey is to ensure that questions are posed in an unambiguous and easy-to-understand manner. Specifically, the conjoint attributes and their levels should be specified so as not to bias preference measurement and introduce unnecessary measurement error due to confusion or lack of understanding. Typically, conjoint design is informed by qualitative research prior to the drafting of the conjoint survey questionnaires. Questionnaires should also be pre-tested to determine if the survey is too long and if the information intended to be collected in the survey questions are understandable to survey respondents. For example, a survey designed to understand consumer demand for smartphones might include questions about product usage in form of data and text charges. Not all respondents may have a clear understand of the term ‘data,’ and this question might need to be phrased in terms of wording used in current advertisements (e.g., ‘talk’ and text). Records should be kept of both the qualitative and pre-testing phases of the survey design. In particular, revisions in the questionnaire should be documented. Modern conjoint studies are often administered using Internet panels where respondents are recruited to join a survey. Internet panel providers such as Survey Sampling International (https://www.surveysampling.com) or GfK (https://www.gfk. com) maintain a population of potential respondents who are emailed invitations to participate in a survey. Respondents usually do not know the exact nature of the survey (e.g., a survey about laundry detergent) which helps to reduce self-selection bias. Participants are then asked to complete a survey that typically begins with a series of demographic and screening questions to help establish the representativeness of the initial sample and to exclude respondents who do not qualify for inclusion in the survey. As discussed below, the general population is generally not the intended target of the survey because not all people have the interest or ability to answer the survey questions. Next is a series of questions that document attitudes, opinions and behaviors related to the focal product category. A glossary section follows in which attributes and attribute levels of the conjoint study are defined, following by the conjoint choice tasks. Additional classification variables for the analysis of subpopulations of interest are included at the end of the survey. We consider each of these survey components in detail below.

4.1 Demographic and screening questions The incidence of product usage varies widely across product categories, and screening questions help ensure that only qualified respondents are surveyed. Respondents in a conjoint study should represent a target market of potential buyers who have preexisting interests in the product category, often identified as people who have recently purchased in the product category and those who report they are considering a purchase in the near future. These individuals are referred to as “prospects” in marketing textbooks, defined as individuals with the willingness and ability to make purchases (Kotler, 2012). The purpose of the screening questions is to remove respondents who either lack the expertise to provide meaningful answers to the survey questions, or who are

4 Considerations in conjoint study design

not planning on making a purchase in the near future. The presence of screening questions creates some difficulty in assessing the representativeness of the survey sample. Demographic variables are available to profile the general population, but not prospects in a specific product category. Some product categories appeal to younger consumers and others to older consumers. Claiming that the resulting sample is demographically representative therefore relies on obtaining a representative sample prior to screening respondents out of the survey. There is a recent conjoint literature on incentive-alignment techniques used to induce greater effort among respondents so that their responses provider a ‘truer’ reflection of their actual preferences (Ding et al., 2005; Ding, 2007; Yang et al., 2018). The idea behind these techniques is that respondents will be more likely to provide thoughtful responses when offered an incentive that is derived from their choices, such a being awarded some version of the product under study that is predicted to give them high utility. In this literature, it is common to apply conjoint analysis to student samples in a lab setting. Improvements in predicting hold-out choices of these respondents are offered as evidence of increased external validity. However, a test of external validity needs to relate choice experiments to actual marketplace behavior. Shoppers in an actual marketplace setting may face time constraints and distractions after a busy workday that differ from a lab setting. Motivating lab subjects through incentive alignment does not necessarily lead to more realistic marketplace predictions and inferences. In industry-grade conjoint studies, respondents are screened so that conjoint surveys are relevant. Some panel providers can screen respondents based on past purchases from a given category to ensure familiarity with the product category. Respondents thus have the opportunity to help improve products relevant to them, so there is little reason to assume they would receive utility from lying. Moreover, it is often not possible to conduct incentive-aligned studies because Internet panel providers do not allow additional incentives to be offered to panel members. At worst, respondents may respond in a careless manner. Respondents with low in-sample model fit can be screened out of the sample as discussed in Allenby et al. (2014b), Section 6. We show below that conjoint estimates can be closely aligned with marketplace behavior without the use of incentive-alignment by screening for people who actively purchase in the product category.

4.2 Behavioral correlates Respondents participating in a conjoint study are often queried about their attitudes and opinions related to the product category. A study of smartphones would include questions about current payment plans, levels of activity (voice, text, Internet), and other electronic devices owned by the respondent. Questions might also involve details of their last purchase, competitive products that were considered, and specific product features that the respondent found attractive and/or frustrating to use. The purpose of collecting this data is two-fold: i) it encourages the respondent to think about the product and recall aspects of a recent or intended purchase that

167

168

CHAPTER 3 Economic foundations of conjoint analysis

are important to them; and ii) it provides an opportunity to establish the representativeness of the sample using data other than demographics. The behavioral correlates serve to ‘warm up’ the respondent so that they engage in the choice tasks with a specific frame of reference. It also provides information useful in exploring antecedents and consequences of product purchase through their relationship to estimates of the conjoint model. Behavioral covariates are often reported for products by various media, and these reports can be used as benchmarks for assessing sample representativeness. For example, trade associations and government agencies report on the number of products or devices owned by households, and the amount of time people spend engaged in various activities. There are also syndicated suppliers of data that can be used to assess product penetration, product market shares, and other relevant information to demand in the product category under study. Behavioral covariates are likely to be more predictive of product and attribute preferences than simple demographic information and, therefore, may be of great value in establishing representativeness.

4.3 Establishing representativeness In many instances, a survey is done for the purpose of projecting results to a larger target population. In particular, a conjoint survey is often designed to project demand estimates based on the survey sample to the larger population of prospective buyers. In addition, many of the inference methods used in the analysis of survey data are based on the assumption that the sample is representative or, at least, acquired via probability sampling methods. All sampling methods start with a target population, a sample frame (a particular enumeration of the set of possible respondents to be sampled from), and a sampling procedure. One way of achieving representativeness is to use sampling procedures that ensure representativeness by construction. For example, if we want to construct a representative sample of dentists in the US, we would obtain a list of licensed dentists (the sample frame) and use probability sampling methods to obtain our sample. In particular, simple random sampling (or equal probability of selection) would produce a representative sample with high probability. The only way in which random sampling would not work is if the sample size was small. Clearly, this approach is ideal for populations for which there are readily available sample frames. The only barrier to representativeness for random samples is potential nonresponse bias. In the example of sampling dentists (regarding a new dental product, for example), it would be relatively easy to construct a random sample but the survey response rate could be very low (less than 50 per cent). In these situations, there is the possibility of a non-response bias, i.e. that those who respond to the survey have different preferences than those who do not respond. There is no way to assess the magnitude of non-response bias except to field a survey with a higher response rate. This puts a premium on well-designed and short surveys and careful design of adequate incentive payments to reduce non-response. However, it is not always possible to employ probability-based sampling methods. Consider the problem of estimating the demand for a mid-level SUV prototype.

4 Considerations in conjoint study design

Here we use conjoint because this prototype is not yet in the market. Enumerating the target population of prospective customers is very difficult. One approach would be to start with a random sample of the US adult population and then screen this sample to only those who are in the market for a new SUV. The sample prior to screening is sometimes called the “inbound” sample. If this sample is representative, then clearly any resulting screened sample will also be representative, unless there is a high nonresponse rate to the screening questions. Of course, this approach relies on starting with representative sample of the US adult population. There are no sample frames for this population. Instead, modern conjoint practitioners use internet panels maintained by various suppliers. These are not random samples but, instead, represent the outcome of a continuous operation by the supplier to “harvest” the email addresses of those who are willing to take surveys. Internet panel providers often tout the “quality” of their panels but, frequently, this is not a statement about the representativeness of the sample but merely that the provider undertakes precautions to prevent fraud of various types including respondents who are bots or who live outside of the US. Statisticians will recognize that the internet panels offered commercially are what are called “convenience” samples and there is no assurance of representativeness. This means that it is incumbent on the researcher who uses an internet panel to establish representativeness by providing affirmative evidence of the representativeness of their sample. It is our view that, with adequate affirmative evidence, samples that are based on internet panels can be used as representative. Internet panel providers are aware of the problem of establishing representativeness and have adopted a variety of approaches and arguments. The first argument is that their panel may be very large (exceeding 1 million). The argument here is that this makes it more difficult for the internet panel to be skewed toward any particular subpopulation. This is not a very strong argument given that there are some 250 million US adults. Internet panels tend to be over-represented in older adults and under-representative of the extremes of the income and education distribution. To adjust for possible non-representativeness, internet panel providers use “click-balancing.” Internet panel members are surveyed at regular intervals regarding their basic demographic (and many other) characteristics. The practice of “click-balancing” is used to ensure that the “inbound” sample is representative by establishing quotas. For example, if census data establishes that the US adult population is 51 per cent female and 49 per cent male, then internet provider establishes quotas of male and female respondents. Once over quota, the internet provider rejects potential respondents. Typically, clickbalancing is only used to impose quotas for age, sex, and region, even though many internet providers have a wealth of other information which can be used to implement click-balancing. Statisticians will recognize this approach as quota sampling. Quota sampling cannot establish representativeness unless the quantities that are measured by the survey are highly correlated with whatever variables are used to balance. If I clickbalance only on age and gender, my conjoint demand estimates could be very non-

169

170

CHAPTER 3 Economic foundations of conjoint analysis

representative unless product and attribute preferences are highly correlated with age and gender. This is unlikely in any real world application. Our view is that one should measure a set of demographic variables that are most likely to be related to preference but also to measure a set of behavioral correlates. For example, we might want to include ownership of current make and model cars to establish a rough correspondence between our inbound sample and the overall market share by make or type of car available from sources such as JD Power. We might also look at residence type, family composition, and recreational activities as potential behavioral correlates for the SUV prototype sample. We could use our conjoint survey constrained to only existing products to simulate market shares which should be similar to the actual market shares for the set of products in the simulation. In short, we recognize that, for some general consumer products, probability samples are difficult to implement and that we must resort to the use of internet panels. However, we do not believe that click-balancing on a handful of demographic variables is sufficient to assert representativeness.

4.4 Glossary The glossary portion of a survey introduces the respondent to the product features and levels included in the choice tasks. It is important that product attributes are described factually, in simple terms, and not in terms of the benefits that might accrue to a consumer. Doing so would possibly educate the respondent about possible uses of the product that were not previously known and threaten the validity of the study. For example, a conjoint study of digital point-and-shoot camera might include the attribute “WiFi enabled.” Benefits associated with this attribute include easy downloading and storage of pictures, and the ability to quickly post photos on social media platforms such as Facebook or Instagram. However, not all respondents in a conjoint survey may use social media, and may not make the connection between the attribute “WiFi enabled” and benefits that accrue from its use. Moreover, there are many potential benefits of posting photos on social media, including telling others that you are safe, that you care about them, or that the photograph represents something you value. Including such a message is problematic for a conjoint study because the attribute description extends beyond a description of the product and introduces an instrumentation bias (Campbell and Stanley, 1963) into the survey instrument. The utility that respondents have for the attributes and features of a product depends on their level of knowledge of the attributes and how they can be useful to them. In some cases, firms may anticipate an advertising campaign to inform and educate consumers about the advantage of a particular attribute, and may want to include information on the benefits that can accrue from its use. However, incorporating such analysis into a conjoint study is problematic because it assumes that consumer attention and learning in the study is calibrated to levels of attention and learning in the marketplace. A challenge in constructing an effective glossary is getting respondents to understand differences among the levels of an attribute being investigated. A study of

4 Considerations in conjoint study design

FIGURE 1 Screenshot of luxury car seat glossary.

automotive luxury car seats, for example, may include upgrades such as power head restraints, upper-back support, and independent thigh support (Kim et al., 2017). Respondents need to pay careful attention to the glossary to understand exactly how these product features work and avoid substituting their own definitions. An effective method of accomplishing this is to produce a video in which the attributes are defined, and requiring respondents to watch the video before proceeding to the choice task. A screenshot explaining upper-back support is provided in Fig. 1.

4.5 Choice tasks The simplest case of conjoint analysis involves just two product features – brand and price – because every marketplace transaction involves these features. A brand-price analysis displays an array of choice alternatives (e.g., varieties of yogurt, different 12-packs of beer) and prices, and asks respondents to select the alternative that is most preferred. The purpose of a brand-price conjoint study is to understand the strength of brand preferences and its relationship to prices. That is, a brand-price analysis allows for the share prediction of existing offerings at different prices. It should be emphasized that a conjoint design with only brand and price is likely to produce unrealistic results as there are few products that can be characterized only two features, however important. The inclusion of non-price attributes to a conjoint study allows analysis to expand beyond brands and prices. An example of a choice task to study features of digit cameras is provided in Fig. 2 (Allenby et al., 2014a,b). Product attributes are listed

171

172

CHAPTER 3 Economic foundations of conjoint analysis

FIGURE 2 Example choice task.

on the left side of the figure, and attribute levels are provided in the cells of the feature grid. Brand and price are present along with the other product features. Also included on the right side of the grid is the ‘no-choice’ option, where respondents can indicate they would purchase none of the products described. The choice task illustrated in Fig. 2 illustrates a number of aspects of conjoint analysis. First, the choice task does not need to include all brands and offerings available in the marketplace. Just four brands of digital cameras are included in the digital camera choice task, with the remaining cameras available for purchase represented by the no-choice option. Second, the choice task also doesn’t need to include all the product features that are present in offerings. The brand name serves as a proxy for the unmentioned attributes of a brand in a conjoint study, and consumer knowledge of these features is what gives the brand name its value. For example, in a brandprice conjoint study, the brand name ‘Budweiser’ stands for a large number of taste attributes that would be difficult to enumerate in a glossary. Researchers have found that breaking the conjoint response into two parts results in more accurate predictions of market shares and expected demand (Brazell et al., 2006). The two-part, or dual response is illustrated in Fig. 3. The first part of the response asks the respondent to indicate their preferred choice option and the second part asks if the respondent would really purchase their most preferred option. The advantage of this two-part response is that it slows down the respondent so that they think through the purchase task, and results in a higher likelihood of a respondent selecting the no-choice option. Research has shown that this two-part response leads to more realistic predictions of market shares. Survey respondents are asked to express their preference across multiple choice tasks in a conjoint study. Statistical experimental design principles are used to make sure that the attribute levels are rotated across the choice options so that the partworths can be estimated from the data. Statistical experimental design principles are used to design the choice tasks so that the data are informative about the part-worths (Box et al., 1978).

4 Considerations in conjoint study design

FIGURE 3 Dual-response choice task.

The economic foundation of conjoint analysis rests on the assumption that consumers have well-defined preferences for offerings that are recalled when responding to the choice task. That is, consumer utility is not constructed based on the choice set, but is recalled from memory. This assumption is contested in the psychological choice literature (Lichtenstein and Slovic, 2006) where various framing effects have been documented (Roe et al., 2001). However, by screening respondents for inclusion in a conjoint study, the effects of behavioral artifacts of choice are largely reduced. Screened respondents, who are familiar with the product category and understand the product features, are more likely to have well-developed preferences and are less likely to construct their preferences at the time of decision. We discuss this issue again below when discussing the robustness of conjoint results. As discussed earlier, conjoint analysis can be conducted for decisions that involve volumetric purchases using utility functions that allow for the purchase of multiple goods. The responses can be non-zero for multiple choice alternatives corresponding to an interior solution to a constrained maximization problem. Fig. 4 provides an illustration of a volumetric choice task for the brand-price conjoint study discussed by Howell et al. (2015).

4.6 Timing data While samples based on internet panels are not necessarily representative, the internet format of survey research provides measurement capabilities that can be used to determine the validity of the data. In addition to timing the entire survey, researchers can measure the time spent reading and absorbing the glossary and on each of the choice tasks. This information can and should be gathered in the pre-test stage as well as the fielding of the final survey. The questionnaire can be reformulated if there are pervasive problems with attention or “speeding.” Sensitivity analyses with respect

173

174

CHAPTER 3 Economic foundations of conjoint analysis

FIGURE 4 Volumetric choice task.

to inclusion of respondents who appear to be giving the survey little attention are vital to establishing the credibility of results based on a conjoint survey which has been observed by many practitioners to be viewed by at least some respondents as tedious. Clearly, there is a limit to the amount of effort any one respondent is willing to devote to even the most well-crafted conjoint survey and conjoint researchers should bear this in mind before designing unnecessarily complex or difficult conjoint tasks.

4.7 Sample size For a simple estimator such as a sample mean or sample proportion, it is a relatively simple matter to undertake sample size computations. That is, with some idea of the variance of observations, sample sizes sufficient to reduce sampling error or posterior uncertainty below a specific margin can be determined. Typically, for estimation of sample proportions, sample sizes of 300-500 are considered adequate. However, a conjoint survey based on a valid economic formulation is designed to allow for inference regarding market demand and various other functions of demand such as equilibrium prices. In these contexts, there are no rules of thumb that can easily be applied to establish adequate sample sizes. In a full Bayesian analysis, the posterior distribution of any summary of the conjoint data, however complicated, can easily be constructed using the draws from the predictive posterior distribution of

5 Practices that compromise statistical and economic validity

the respondent level parameters (Eq. (19)). This posterior distribution can be used to assess the reliability or statistical information in a given conjoint sample. However, analytical expressions for these quantities are not typically not available. All we know is that the posterior distribution tightens (at least asymptotically) at the rate of the square root of the number of respondents. This does not help us plan sample sizes, in advance, for any specific function of demand parameters. The only solution to this problem is to perform a pilot study and scale the sample off of this study in such a way as to assure a given margin of error or posterior interval. √ This can be done by assuming that the posterior standard error will tighten at rate N. Our experience with equilibrium pricing applications is that sample sizes considerably larger than what is often viewed by conjoint practitioner as adequate are required. Many conjoint practitioners assume that a sample size of 500-1000 with a 10-12 conjoint tasks will be adequate. This may be on the low side for more demanding computations. We hasten to add that many conjoint practitioners do not report any measures of statistical reliability for the quantities that they estimate. Given the ease of constructing the posterior predictive distribution of any quantity of interest, there is no excuse for failure to report a measure of uncertainty.

5 Practices that compromise statistical and economic validity While conjoint originated as a method to estimate customer preferences or utility, many practitioners of conjoint have created alternative methodologies which either invalidate statistical inference or compromise the economic interpretation of conjoint results. As long as there are well-formulated and practical alternatives, it is our view that there is no excuse for using a method that is not statistically valid. Whether or not conjoint methods should be selected so that the results can be interpreted as measuring a valid economic model of demand is more controversial. A pure predictivist point of view is that any procedure is valid for predicting demand as long as it predicts well. This point of view opens the range of possible conjoint specifications to any specification that can predict well. Ultimately, conjoint is most useful in predicting or simulating demand in market configurations which differ from that observed in the real world. Here we are expressing our faith that specifications derived from valid utility structures will ultimately prevail in a prediction context where the state of the world is very different from that which currently prevails.

5.1 Statistical validity There are two threats to the statistical validity of conjoint analyses: 1) the use of estimation methods which have not been shown to provide consistent estimators and 2) improper methods for imposing constraints on conjoint part-worths.

175

176

CHAPTER 3 Economic foundations of conjoint analysis

5.1.1 Consistency A consistent estimator is an estimator whose sampling distribution collapses on the true parameter values as the sample size grows infinitely large. Clearly, this is a very minimal property. That is, there are consistent but very inefficient estimators. The important point is that when choosing an estimation procedure, we should only select from the set of procedures that are consistent. The only reason one might resort to using a procedure whose consistency has not been established is if there are no practical consistent alternatives. Even then, our view is that any analysis based on the procedure for which consistency cannot be verified is that this must be termed tentative at best. However, in conjoint, we do not have to resort to the use of unverified procedures since all Bayes procedures are consistent unless the prior is dogmatic in the sense of putting zero probability on a non-trivial portion of the parameter space. Moreover, Bayes procedures are admissible which means that it will be very difficult to find a procedure which dominates Bayes methods in either estimation or prediction (Bernardo and Smith, 2000). In spite of these arguments in favor of only using methods for which consistency can be demonstrated, there has been work in marketing on conjoint estimators (see Toubia et al., 2004) that propose estimators based on minimization of some criterion function. It is clear that, for a given criterion function, there is a minimum which will be an estimator. However, what is required to establish consistency is a proof that the criterion function used to derive the estimator converges (as the sample size grows to infinity) to a function with a minimum at the true parameter value. This cannot be established by sampling experiments alone as this convergence is required over the entire parameter space. The fact that an estimator works “well” in a few examples in finite samples does not mean that the estimator has good finite sample or asymptotic properties. We should note that there are two dimensions of conjoint panel data (N – number of respondents and T – the number of choice tasks). We are using large N and fixed T asymptotics in the definition of consistency. Estimates of respondent-level partworths are likely to be unreliable as they are based only on a handful of observations and an informative prior, and cannot be shown to be consistent if T is fixed. A common practice in conjoint analysis is to use respondent-level coefficients to predict the effects of changes to product attributes and price. An advantage of Bayesian methods of estimating conjoint models is its ability to provide individuallevel estimates of part-worths (θh ) in Eq. (16) in addition to the hyper-parameters that describe the distribution of part-worths (τ ). Using the individual-level estimates for predicting the effects of product changes on sales, however, is problematic because of the shallow nature of the data used in conjoint analysis. Respondents provide at most about 16 responses to the choice tasks before becoming fatigued, and while these estimates may be consistent in theory, they are not consistent in practice because of data limitations. The effect of using individual-level estimates is to under-state the confidence associated with predicted effects. That is, the confidence intervals are too large when

5 Practices that compromise statistical and economic validity

using the individual-level estimates. This is because uncertainty in the individuallevel estimates is due to two factors – uncertainty in the hyper-parameters and uncertainty arising from the individual-level data. As the sample size increases in a conjoint study, it is only possible to increase the number of respondents N , not the number of observations per respondent T . As a result, the individual-level estimates will always reflect a large degree of uncertainty, even when the hyper-parameters are accurately estimated. The accurate measurement of uncertainty in predictions from conjoint analysis must be based on model hyper-parameters as shown in Eq. (21).

5.1.2 Using improper procedures to impose constraints on partworths The data used in conjoint analysis can be characterized as shallow in the sense that there are few observations per respondent. Individual-level parameters (θh ) can therefore be imprecisely estimated and can violate standard economic assumptions. Price coefficients, for example, are expected to be negative in that people should want to pay less for a good than more, and attribute-levels may correspond to an ordering where consumers should want more of a feature or attribute holding all else constant. There are two approaches for introducing this prior information into the analysis. The first is to reparameterize the likelihood so that algebraic restrictions on coefficients are enforced. For example, the price coefficient in Eq. (5) can be forced to be negative through the transformation: ψz = − exp(βp ) and estimating βp unrestricted. Alternatively, constraints can be introduced through the prior distribution as discussed by Allenby et al. (1995). Sign constraints as well as monotonicity can be imposed automatically using our R package, bayesm. It is a common practice in conjoint analysis to impose sign restrictions by simply zeroing out the estimates or to use various “tying” schemes for ordinal constraints in which estimates are arbitrarily set equal to other estimates to enforce the ordinal constraints. This is an incoherent practice which violates Bayes theorem and, therefore, removes the desirable properties of the Bayes procedures.

5.2 Economic validity There are four threats to the economic validity of conjoint analyses: 1. 2. 3. 4.

Using conjoint specifications contrary to valid utility or indirect utility functions. Various sorts of self-explicated conjoint which violate utility theory. Comparison of part-worths across respondents. Attempts to combine conjoint and rankings data.

5.2.1 Non-economic conjoint specifications We have emphasized that conjoint specifications should be derived from a valid direct utility function (see (1) or (7)). For discrete-choice conjoint, the respondent choice probabilities are a standard logit function of a linear index where prices enter in

177

178

CHAPTER 3 Economic foundations of conjoint analysis

levels not in logs. It is common for practitioners to enter prices as a sequence of dummy variables, each for a different level of price used in the conjoint design. The common explanation is that the dummy variable specification is “non-parametric.” This approach is not only difficult to reconcile with economic theory but also opens the investigator for a host of violations of the reasonable economic assumption that indirect utility is monotone in price (not to mention convex). In general, utility theory imposes a certain discipline on investigators to derive the empirical specification from a valid utility function. In the volumetric context, the problems are even worse as the conjoint specifications are often completely ad hoc. This ad-hocery arises, in part, from the lack of software to implement an economically valid volumetric conjoint – a state of affairs we trying to remedy by our new R package, echoice.5

5.2.2 Self-explicated conjoint Some researchers advocate using self-explicated methods of estimating part-worths (see, for example, Srinivasan and Park, 1997; Netzer and Srinivasan, 2011). In some forms of self-explicated conjoint, respondents are asked to rate the relative importance of a product feature on some sort of integer valued scale, typically 5 or 7 points. In other versions of self-explicated conjoint, separate measures of the relative importance and desirability of product attributes are combined in an arbitrary way to form an estimate of a part-worth. There are many ways in which these procedures violate both economic and statistical principles. Outside of the demand context (as in choice-based or volumetric conjoint), there is no meaning to the “importance” or “desirability” of features. The whole point of a demand experiment is to infer a valid utility function from demand responses. No one knows what either “importance” or “desirability” means, including the respondents. The scales used are only ordinal and therefore cannot be converted to a utility scale which is an interval scale. In short, there is no way to transform or convert relative importance or desirability to a valid part-worth. Finally, as there is no likelihood function (or error term) in these models, it is impossible to analyze the statistical properties of self-explicated procedures.

5.2.3 Comparing raw part-worths across respondents Conjoint part-worths provide a measure of the marginal utility associated with changes in the levels of product attributes. The utility measure obtained from a conjoint analysis allows for the relative assessment of changes in the product attributelevels, including price. However, utility is not a measure that is comparable across respondents because it is only intended to reflect the preference ordering of a respondent, and a preference ordering can be reflected by any monotonic transformation of the utility scale. That is, the increase or decrease in utility associated with changes in the attribute-levels are person-specific, and cannot be used to make statements that one respondent values changes in the levels of an attribute more than another.

5 Development version available at https://bitbucket.org/ninohardt/echoice/.

6 Comparing conjoint and transaction data

Making utility comparisons across respondents requires the monetization of utility to express the value of changes in the levels of an attribute on a scale common to all respondents. While the pain or gain of changes to a product attribute is not comparable across people, the amount they would be willing to pay is comparable across respondents. The WTP, WTB, and EPP measures discussed above provide a coherent metric for summarizing the results of conjoint analysis across respondents.

5.2.4 Combining conjoint with other data One could argue that there are covariates predictive of preferences that can be measured outside the conjoint portion of the survey. The proper way to combine this data with conjoint choice or demand data is to employ a hierarchical model specification in which individual-level parameters are related to each other in the upper-level of the model hierarchy. The upper-level model in conjoint analysis typically is used to describe cross-sectional variation in part-worth estimates using these covariates to model observed heterogeneity. Individual-level coefficients from other models, calibrated on other datasets, could be used as variables to describe the cross-sectional variation of part-worths (Dotson et al., 2018). Combining data in this way automatically weights the datasets and can improve the precision of the part-worth estimates. A disturbing practice is the combination of conjoint choice data with Max-Diff rankings data. Practitioners are well aware that conjoint surveys are not popular with respondents but that a standard Max-Diff exercise is found to be very easy for respondents. Max-Diff is a procedure to rank (by relative importance) any set of attributes or products. The Max-Diff procedure breaks the task of ranking all in the set down into a sequence of smaller and more manageable tasks which consist of picking the most and least “important” from a small set of alternatives. A logit-style model can be used to analyze Max-Diff data to provide a ranking for each respondent. It is vital to understand that rankings are not utility weights and rankings only have ordinal properties. The exact coefficients used to implement the ranking are irrelevant and have no value. That is to say, the ranking of 3 things represented by (1, 3, 2) is the same as (10, 21, 11). There is no meaning to the intervals separating values or to ratios of values. This is true even setting aside the thorny question of what “importance” means. There are no trade-offs in Max-Diff analysis, so there is no way to predict choice or demand behavior on the basis of the Max-Diff derived rankings. Unfortunately, some practitioners have taken to scaling Max-Diff ranking coefficients and interpreting these scaled coefficients as part-worths. They are not. There is no way of combining Max-Diff and conjoint data in a coherent fashion. The only way to do so is the regard “importance” as the same as utility and to use the Max-Diff results as the basis of an informative prior used in analysis of conjoint data.

6 Comparing conjoint and transaction data The purpose of conducting a conjoint analysis is to predict changes in demand for changes in a product’s configuration and its price. Conjoint analysis provides an ex-

179

180

CHAPTER 3 Economic foundations of conjoint analysis

perimental setting for exploring these changes when revealed preference data from the marketplace lacks sufficient variation for making inferences and predictions. A natural question to ask is the degree to which conjoint analysis provides reliable estimates of changes that would occur. In this chapter we have argued that there are many requirements of conducting a valid conjoint study, beginning with the use of a valid economic model for conducting inference and the use of rigorous methods in data collection to make sure the respondent is qualified to provide answers and understands the choice task. We investigate the consistency of preferences and implications between conjoint and transaction data using a dataset containing marketplace transactions and conjoint responses from the same panelists. Frequent buyers in the frozen pizza product category were recruited to provide responses to a conjoint study in which frozen pizza attribute-levels changed. The fact that participants were frequent purchasers in the product category made them ideal respondents for the conjoint survey in the sense that they are known to be prospects who were well acquainted with the category. Most attempts to reconcile results from stated and revealed preferences data have tended to focus on aggregate predictors of demand and market shares. Examples range from the study of vegetables (Dickie et al., 1987) and grocery items (Burke et al., 1992; Ranjan et al., 2017) that are frequently purchased, to infrequently purchased goods such as automobiles (Brownstone et al., 2000). Lancsar and Swait (2014) provide an overview of studies across different disciplines. We start by investigating the consistency of estimated parameters, including estimates of marginal utility. We then assess the extent to which market demand predictions can be made using conjoint-based estimates. Finally, we show estimates of measures of economic value.

6.1 Preference estimates The conjoint study choice task involved six frozen pizza offerings and included a ‘no-choice’ option. The transaction data comprised 103 unique product offerings and included attributes such as brand name (e.g., DiGiorno, Red Baron, and Tombstone), crust (i.e., thin, traditional, stuffed, rising), and toppings (e.g., pepperoni, cheese, supreme). The volumetric demand model arising from non-linear utility (Eq. (7)) was used to estimate the model parameters because pizza purchases typically involve the purchase of multiple units and varieties. Table 1 provides a list of product attributes. The attributes and attribute-levels describing the 103 UPCs were used to design the conjoint study. Among the 297 households in an initial transaction dataset, 181 households responded to a volumetric conjoint experiment and had more than 5 transactions in the 2-year period. Qualifying respondents in this way ensures that they are knowledgeable about the category and typical attribute levels. In each of the 12 choice tasks, respondents choose how many units of each of the 6 product alternatives they would purchase the next time they are looking to buy frozen pizza. A sample choice

6 Comparing conjoint and transaction data

Table 1 Attributes. Brand

Size for Crust

One Thin DiGiorno Frescetta (Fr) Two (FT) Traditional (TC) Red Baron (RB) Stuffed (SC) Private Label (Pr) Rising (RC) Tombstone (Tm) Tony’s (Tn)

Topping type Pepperoni Cheese (C) Vegetarian (V) Surpreme (Sr) PepSauHam (PS) Hawaii (HI)

Topping spread Moderate Dense (DT)

Cheese No claim Real (RC)

FIGURE 5 Choice task.

task is shown in Fig. 5. Price levels were chosen in collaboration with the sponsoring company to mimic the actual price range.

181

182

CHAPTER 3 Economic foundations of conjoint analysis

Table 2 Estimated parameters (volumetric model). Mean of random-effects dis¯ tribution θ. β0

Conjoint −2.91 (0.11)

Transaction −4.66 (0.18)

Brand

Frescetta Red Baron Private Label Tombstone Tony’s

−0.35 (0.09) −0.66 (0.10) −0.72 (0.10) −0.80 (0.11) −1.29 (0.13)

−0.19 (0.11) −0.38 (0.12) −0.85 (0.15) −0.63 (0.16) −2.05 (0.17)

Size

Serves two

0.64 (0.07)

1.22 (0.08)

Crust

Traditional Stuffed Rising

0.11 (0.07) −0.04 (0.08) 0.07 (0.08)

0.31 (0.07) −0.04 (0.15) 0.26 (0.07)

Topping Type

Cheese Vegetarian Surpreme PepSauHam Hawaii

−0.40 (0.12) −1.12 (0.16) −0.30 (0.11) −0.13 (0.09) −1.14 (0.15)

−0.57 (0.09) −0.96 (0.14) −0.39 (0.09) −0.22 (0.08) −0.99 (0.21)

Topping

Dense

0.06 (0.05)

−0.02 (0.06)

Cheese

Real

0.10 (0.05)

−0.01 (0.10)

ln γ ln E ln σ

−0.50 (0.08) 3.57 (0.07) −0.57 (0.05)

−2.07 (0.09) 2.89 (0.06) −0.87 (0.05)

Boldfaced parameters signify that the 95% posterior credible interval of the estimate does not include zero. Standard Deviations printed in parentheses.

We use dummy coding in our model estimation, where the first level of each attribute in Table 1 are the reference levels. The vector of ‘part-worths’ β includes a baseline coefficient β0 , which represents the value of an inside good relative to the outside good. The remaining elements of β refer to the dummy coefficients. We use the volumetric demand model described in Section 2.2. Individual-level parameters are given by the vector of ‘part-worths’ β, the rate of satiation of inside goods γ , the alloted budget E, and the scale of the error term σ . The latter three parameters are log-transformed to ensure positivity. A multivariate normal distribution of heterogeneity is assumed with default (diffuse) priors. Parameter estimates for the conjoint and transaction data are provided in Table 2. The left side of the table reports estimates from the conjoint data, and estimates from the transaction data are displayed on the right side of the table. All estimates are based on models with Type 1 extreme value error terms.

6 Comparing conjoint and transaction data

FIGURE 6 Comparison of part-worths (volumetric model).

We find general agreement among the estimates except for the inside good intercept (β0 ), the estimated rate of satiation (γ ), the budgetary allotment (E), and the scale of the error terms (σ ). That is, the relative value of the brand names and product attribute-levels are approximately the same when estimated with either conjoint or transaction data. Fig. 6 provides a plot of the mean of the random-effects distribution for the two datasets. Average part-worth estimates are plotted close to the 45 degree line indicating that the estimates are similar. There are a number of potential reasons for the difference in estimates of the brand intercept (β0 ), satiation parameter (γ ), and the budgetary allotment (E). Respondents in the conjoint task were told to consider their choices the next time they were buying frozen pizza, while data from the transaction data conditioned on some purchase in the frozen pizza category. Most respondents in the dataset purchased frozen pizza less than 10 times over a two year period, and including data in the analysis from shopping occasions in which they did not make a frozen pizza purchase implicitly assumes that they were in the market for pizza on each and every occasion. We therefore excluded shopping occasions from the transaction data in which pizza wasn’t purchased so that it would mimic the conjoint data where respondents were told to consider their next purchase occasion. There is no way of knowing for certain when shoppers considered making a frozen pizza purchase, but did not, in the revealed preference (transaction) data and this discrepancy is partly responsible for the difference in the brand intercept (β0 ) estimates. Conjoint data cannot account for variation in the context of purchase and consumption, and this limitation may lead to differences in estimates of satiation (γ ) and budgetary allotment (E). For example, frozen pizza may occasionally be purchased for social gatherings, which may not be taken into account when providing conjoint responses, resulting in an estimate of the budgetary allotment that is too high for the

183

184

CHAPTER 3 Economic foundations of conjoint analysis

FIGURE 7 Comparison of monetized part-worths (discrete choice model).

typical conjoint transaction. The over-estimation of E may also affect the estimate of the rate of satiation. Another consequence of the hypothetical nature of conjoint tasks is that respondents may apply a larger budget when making allocations because they aren’t actually spending their own money. We find that the conjoint data reflect lesser satiation and a greater budgetary data than that found in revealed preference data. We also investigate a discrete choice approximation to the demand data by ‘exploding’ the volumetric data to represent a series of discrete choices. When q units of a good are purchased, it is interpreted as q transactions with a discrete choice for that good. This allows for the estimation of a discrete choice model on the volumetric conjoint and transaction data. However, this practice results in no consumption of an ‘outside good’ in the transaction data because only nonzero quantities are observed. We estimate a hierarchical Logit model with a multivariate normal distribution of heterogeneity and relatively diffuse priors. The price coefficient βp is re-parameterized to ensure negativity of the price coefficient. Estimated coefficients are shown in Table 3. Monetized estimates of the partworths, obtained by dividing the part-worth by the price coefficient, are compared in Fig. 7, where we use the corresponding means of the random effects distribution. We find close agreement of part-worth estimates from the transaction and conjoint data.

6.2 Marketplace predictions We next compare marketplace predictions from conjoint and transaction data by aggregating across the 103 UPCs to obtain brand-level demand estimates. Parameter estimates from the volumetric conjoint and transaction data are used to produce two

6 Comparing conjoint and transaction data

Table 3 Estimated parameters (discrete choice model). Mean of random-effects ¯ distribution θ. Conjoint 3.67 (0.31)

Transaction

β0 Brand

Frescetta Red Baron Private Label Tombstone Tony’s

−0.25 (0.11) −0.60 (0.11) −0.73 (0.11) −0.60 (0.11) −0.80 (0.13)

−0.58 (0.23) −1.61 (0.27) −2.90 (0.38) −2.22 (0.43) −5.45 (0.47)

Size

Serves two

0.28 (0.07)

3.63 (0.30)

Crust

Traditional Stuffed Rising

0.04 (0.08) −0.01 (0.08) −0.03 (0.08)

0.59 (0.15) 0.45 (0.25) 0.71 (0.16)

Topping type

Cheese Vegetarian Surpreme PepSauHam Hawaii

−0.22 (0.12) −0.52 (0.13) −0.12 (0.11) −0.06 (0.09) −0.38 (0.13)

−1.33 (0.21) −2.09 (0.35) −0.96 (0.24) −0.46 (0.19) −1.60 (0.24)

Topping

Dense

0.02 (0.06)

−0.03 (0.11)

Cheese

Real

0.09 (0.06)

−0.16 (0.25)

ln βp

−1.93 (0.20)

−0.33 (0.10)

Boldfaced parameters signify that the 95% posterior credible interval of the estimate does not include zero. Standard Deviations printed in parentheses.

forecasts for each brand that are displayed in Fig. 8 for the volumetric demand model and Fig. 9 for the logit model with ‘exploded’ choices. We find that the demand curves are roughly parallel in Fig. 8 with predictions from the conjoint data consistently higher than that based on the transaction data. The reason for the shift is due to differences in the estimate of the brand coefficient (β0 ) which we attribute to differences in the treatment of the ‘no-choice’ option. That is, the smaller estimated brand intercept in the transaction data results in lower estimates of demand. While the level of demand is estimated to be different in Fig. 8, changes in demand for changes in price and other product attributes is approximately the same because the demand curves are roughly parallel to each other and because the part-worth coefficients enter the Kuhn-Tucker conditions in Eq. (12) linearly. The consistency of the estimates from conjoint and transaction data is observed more readily observed when we convert the volumetric predictions to shares in Fig. 9. It is useful to remember that the purpose of conjoint analysis is to predict changes in marketplace demand as a result of changes in the formulation of marketplace of-

185

186

CHAPTER 3 Economic foundations of conjoint analysis

FIGURE 8 Comparison of predictions (volumetric model).

FIGURE 9 Comparison of predictions converted to shares (volumetric model).

ferings. Even though the aggregate demand curves displayed in Figs. 8 and 9 are seen to be vertically translated, predictions of the change in volume and change in share are closer to each other.

6.3 Comparison of willingness-to-pay (WTP) We compute the consumer willingness-to-pay (WTP) for the family-sized (‘for two’) attribute of a DiGiorno pepperoni pizza with rising crust and a real cheese claimed

6 Comparing conjoint and transaction data

Table 4 Willingness-to-pay estimates (volumetric and logit), ‘for-two’ attribute.

0.034 0.029 −0.009 0.074

WTP Transaction With β0trans 0.027 0.026 0.021 0.031

0.113 0.070 0.020 0.144

0.013 0.002 0.000 0.015

Conjoint Logit

mean median perc25 perc75

Volumetric mean median perc25 perc75

p-WTP Conjoint Transaction 0.782 0.776 0.704 0.859

3.315 3.278 3.185 3.430

0.023a 0.009a 0.001a 0.026a

a

Volumetric WTP estimates based on conjoint data except for brand intercepts, which are substituted from transaction data estimates.

attribute. As discussed above, a true WTP measure includes the effects of alternative products that consumers could consider as alternatives if a particular attribute is unavailable, and is not simply a monetary rescaling of the ‘for two’ part-worth. Ignoring the effect of competitive offerings will over-state the value of product attributes because it ignores the other options consumers have available to them as they make their purchases. We compare estimates of pseudo willingness-to-pay (p-WTP) based on partworth monetization (i.e., βh /βhp ), to estimates of willingness-to-pay (WTP) based on compensating valuation calculations (see Eqs. (25) and (26)). Table 4 shows estimates for the Logit and Volumetric demand models, using parameter estimates based on conjoint and transaction data. The top portion of Table 4 reports results for the logit model with ‘exploded’ data, and the bottom portion of the table pertains to the volumetric demand model. WTP estimates are reported on the left side of the table, and pseudo-WTP estimated are reported on the right side. We find that the WTP estimates for the logit model are much smaller than p-WTP estimates because of the large number of choice alternatives present in the marketplace. The absence of a ‘for-two’ DiGiorno pepperoni pizza creates an economic loss that is worth, on average about three cents using the WTP statistic as opposed to either 78 cents or $3.31 based on the p-WTP estimate. The loss in utility is much smaller in the WTP calculation because consumers can recognize that they can purchase a different option to generate their utility and are not constrained to purchase the same good. Moreover, we find that estimates of WTP for the logit model is about three cents using conjoint estimates of the part-worths, and about two cents using the transaction data estimates. These estimates are not statistically different from each other. WTP estimates based on the volumetric demand model are less consistent, with estimates based on the conjoint data equal to eleven cents versus one cent for the transaction data. This difference is due to differences in the estimated baseline in-

187

188

CHAPTER 3 Economic foundations of conjoint analysis

tercept coefficient (β0 ). When the transaction data intercept is substituted for the conjoint intercept, the estimated WTP for the conjoint data reduces from eleven cents to two cents. Thus, overall, we find that estimates based on the conjoint data are slightly higher, but not significantly higher, than those based on the transaction data for both models once the difference is the baseline intercept is aligned.

7 Concluding remarks Conjoint analysis is an indispensable tool for predicting the effects of changes to marketplace offerings when observational data does not exist to inform an analysis. This occurs when predicting sales of new products and product configurations with new attribute-levels and their combinations and when existing products have little price variation. Conjoint data reflects simulated choices among a set of competitive products for which respondents are assumed to have well-defined preferences. Analysis is based on a combination of economic and statistical principles that afford inferences and predictions about the expected sales of new offerings. We argue in this chapter that a valid conjoint analysis requires a valid model and constructs for inference and valid data. We discuss two economic models for analysis based on random-utility theory for discrete choice and volumetric demand, and discuss alternative measures (WTP, WTB, and EPP) of economic value. We demonstrate that these measures are consistently estimated using either conjoint or transaction data using a conjoint analysis conducted on scanner panelists in the frozen pizza category. Part-worth estimates of product features are shown to be approximately the same, and forecasts of the change in demand for changes in attributes such as price are found to be similar. Some model parameters, however, are not consistently estimated across stated and revealed preference data. The largest discrepancy involves the baseline brand intercept (β0 ) for the no-choice option, which is difficult to align, because conjoint studies ask respondents to think about the next time they will be making a purchase in the category while revealed preference cannot exactly identify these shopping trips. Shopping trips limited to purchases in the category ignore occasions when shoppers may be in the market but decide not to purchase because prices are not sufficiently low, and the collection of all shopping trips to the store contain instances where consumers are not in the market for items in the category. The difference in the estimated baseline intercept results in a vertical translation of demand curves and heightened measures of economic willingness-to-pay. This chapter demonstrates that a properly designed conjoint study, using valid economic models to analyze the data, can produce accurate estimates of economic demand in the marketplace. We identify practices that should be avoided, and demonstrate that ad-hoc estimates of value, such as the pseudo willingness-to-pay (p-WTP), provide poor estimates of the economic value of product features. Additional research is needed to better design conjoint studies to obtain valid estimates of budgetary allotments and the rate of satiation of purchase quantities.

Technical appendix

Technical appendix: Computing expected demand for volumetric conjoint The algorithm described here first determines the optimal amount of the outside good zht , which then allows computing the corresponding inside good quantities xht . From Eqs. (8) and (9) we have that: pj = uj z pj ≥ uj z

if xj > 0

(A.1)

if xj = 0

(A.2)

At the optimum, ui /pi = uj /pj for the R goods with non-zero demand. Solving for x yields an equation for optimal demand quantities for the inside goods: xk =

ψ k z − pk γ pk

(A.3)

Substituting Eq. (A.3) into the budget constraint (2) yields: γ +E R k=1 pk z= if R > 0 R γ + k=1 ψk z=E

if R = 0

(A.4) (A.5)

Re-arranging (A.1) yields the following for z: pj ψj pj zs ≤ ψj

zs =

if xj > 0

(A.6)

if xj = 0

(A.7)

where s=

1 γ xj + 1

The algorithm needs R iterations to complete. At each step k, we compute the corresponding quantity xk and z, as if R = k. Then checking Eqs. (A.6) and (A.7) will determine if the breakpoint has been reached. To implement this, let: pi for 1 ≤ i ≤ K ψi ρ0 = 0 ρK+1 = ∞ ρi =

(A.8) (A.9) (A.10)

and order the values ρi in ascending order so that ρi ≤ ρi+1 for 1 ≤ i ≤ K. Then, z > ρk implies z > ρi for i ≤ k. At the optimum, xi > 0 for 1 ≤ k ≤ K, xi = 0 for

189

190

CHAPTER 3 Economic foundations of conjoint analysis

k < i ≤ K, and ρk < z < ρk+1 . The algorithm is guaranteed to stop at optimal z and 0 ≤ k ≤ K. The steps are as follows: 1. a ←− γ E, b ←− γ , k ←− 0 2. z ←− a/b 3. while z ≤ ρk or z > ρk+1 : (a) k ←− k + 1 (b) a ←− a + ρk (c) b ←− b + ψk (d) z ←− a/b Once the algorithm terminates, we can insert optimal z into Eq. (A.3) to compute the optimal inside good quantities x.

References Allenby, Greg M., Arora, Neeraj, Ginter, James L., 1995. Incorporating prior knowledge into the analysis of conjoint studies. Journal of Marketing Research, 152–162. Allenby, Greg M., Brazell, Jeff, Howell, John R., Rossi, Peter E., 2014a. Valuation of patented product features. The Journal of Law and Economics 57 (3), 629–663. Allenby, Greg M., Brazell, Jeff D., Howell, John R., Rossi, Peter E., 2014b. Economic valuation of product features. Quantitative Marketing and Economics 12 (4), 421–456. Allenby, Greg M., Kim, Jaehwan, Rossi, Peter E., 2017. Economic models of choice. In: Handbook of Marketing Decision Models. Springer, pp. 199–222. Allenby, Greg M., Rossi, Peter E., 1991. Quality perceptions and asymmetric switching between brands. Marketing Science 10 (3), 185–204. Allenby, Greg M., Rossi, Peter E., 1998. Marketing models of consumer heterogeneity. Journal of Econometrics 89 (1–2), 57–78. Becker, Gordon M., DeGroot, Morris H., Marschak, Jacob, 1964. Measuring utility by a single-response sequential method. Behavioral Science 9 (3), 226–232. Bernardo, José M., Smith, Adrian F.M., 2000. Bayesian Theory. Wiley. Berry, Steven, Levinsohn, James, Pakes, Ariel, 1995. Automobile prices in market equilibrium. Econometrica, 841–890. Box, George E.P., Hunter, William Gordon, Hunter, J. Stuart, et al., 1978. Statistics for Experimenters. John Wiley and Sons, New York. Brazell, Jeff D., Diener, Christopher G., Karniouchina, Ekaterina, Moore, William L., Séverin, Válerie, Uldry, Pierre-Francois, 2006. The no-choice option and dual response choice designs. Marketing Letters 17 (4), 255–268. Brownstone, David, Bunch, David S., Train, Kenneth, 2000. Joint mixed logit models of stated and revealed preferences for alternative-fuel vehicles. Transportation Research Part B: Methodological 34 (5), 315–338. Burke, Raymond R., Harlam, Bari A., Kahn, Barbara E., Lodish, Leonard M., 1992. Comparing dynamic consumer choice in real and computer-simulated environments. Journal of Consumer Research 19 (1), 71–82. Campbell, Donald T., Stanley, Julian C., 1963. Experimental and Quasi-Experimental Designs for Research. Houghton Mifflin Company, Boston. Diamond, Shari Seidman, 2000. Reference guide on survey research. In: Reference Manual on Scientific Evidence, pp. 221–228. Dickie, Mark, Fisher, Ann, Gerking, Shelby, 1987. Market transactions and hypothetical demand data: a comparative study. Journal of the American Statistical Association 82 (397), 69–75.

References

Ding, Min, 2007. An incentive-aligned mechanism for conjoint analysis. Journal of Marketing Research 44 (2), 214–223. Ding, Min, Grewal, Rajdeep, Liechty, John, 2005. Incentive-aligned conjoint analysis. Journal of Marketing Research 42 (1), 67–82. Dotson, Marc, Büschken, Joachim, Allenby, Greg, 2018. Explaining preference heterogeneity with mixed membership modeling. https://doi.org/10.2139/ssrn.2758644. Green, Paul E., Rao, Vithala R., 1971. Conjoint measurement for quantifying judgmental data. Journal of Marketing Research, 355–363. Hardy, Melissa A., 1993. Regression with Dummy Variables, vol. 93. Sage. Howell, John R., Lee, Sanghak, Allenby, Greg M., 2015. Price promotions in choice models. Marketing Science 35 (2), 319–334. Kim, Dong Soo, Bailey, Roger A., Hardt, Nino, Allenby, Greg M., 2016. Benefit-based conjoint analysis. Marketing Science 36 (1), 54–69. Kim, Hyowon, Kim, Dongsoo, Allenby, Greg M., 2017. Benefit Formation and Enhancement. Working paper. Fisher College of Business, The Ohio State University. Kotler, Philip, 2012. Kotler on Marketing. Simon and Schuster. Lancsar, Emily, Savage, Elizabeth, 2004. Deriving welfare measures from discrete choice experiments: inconsistency between current methods and random utility and welfare theory. Health Economics 13 (9), 901–907. Lancsar, Emily, Swait, Joffre, 2014. Reconceptualising the external validity of discrete choice experiments. PharmacoEconomics 32 (10), 951–965. Lichtenstein, Sarah, Slovic, Paul, 2006. The Construction of Preference. Cambridge University Press. Louviere, Jordan J., Hensher, David A., 1983. Using discrete choice models with experimental design data to forecast consumer demand for a unique cultural event. Journal of Consumer Research 10 (3), 348–361. Luce, R. Duncan, Tukey, John W., 1964. Simultaneous conjoint measurement: a new type of fundamental measurement. Journal of Mathematical Psychology 1 (1), 1–27. Manski, Charles F., McFadden, Daniel, et al., 1981. Structural Analysis of Discrete Data with Econometric Applications. MIT Press, Cambridge, MA. Matzkin, Rosa L., 1991. Semiparametric estimation of monotone and concave utility functions for polychotomous choice models. Econometrica, 1315–1327. Matzkin, Rosa L., 1993. Nonparametric identification and estimation of polychotomous choice models. Journal of Econometrics 58 (1–2), 137–168. Netzer, Oded, Srinivasan, Visvanathan, 2011. Adaptive self-explication of multiattribute preferences. Journal of Marketing Research 48 (1), 140–156. Ofek, Elie, Srinivasan, Venkataraman, 2002. How much does the market value an improvement in a product attribute? Marketing Science 21 (4), 398–411. Orme, Bryan K., 2010. Getting Started with Conjoint Analysis: Strategies for Product Design and Pricing Research. Research Publishers. Pachali, Max J., Kurz, Peter, Otter, Thomas, 2017. The perils of ignoring the budget constraint in singleunit demand models. https://doi.org/10.2139/ssrn.3044553. Ranjan, Bhoomija, Lovett, Mitchell J., Ellickson, Paul B., 2017. Product launches with new attributes: a hybrid conjoint-consumer panel technique for estimating demand. https://doi.org/10.2139/ssrn. 3045379. Roe, Robert M., Busemeyer, Jermone R., Townsend, James T., 2001. Multialternative decision field theory: a dynamic connectionist model of decision making. Psychological Review 108 (2), 370. Rossi, P.E., 2014. Bayesian Semi-Parametric and Non-Parametric Methods with Applications to Marketing and Micro-Econometrics. Princeton University Press. Rossi, Peter E., Allenby, Greg M., McCulloch, Robert, 2005. Bayesian Statistics and Marketing. John Wiley and Sons Ltd. Small, Kenneth A., Rosen, Harvey S., 1981. Applied welfare economics with discrete choice models. Econometrica, 105–130.

191

192

CHAPTER 3 Economic foundations of conjoint analysis

Srinivasan, Vinay, Park, Chan Su, 1997. Surprising robustness of the self-explicated approach to customer preference structure measurement. Journal of Marketing Research, 286–291. Toubia, Olivier, Hauser, John R., Simester, Duncan I., 2004. Polyhedral methods for adaptive choice-based conjoint analysis. Journal of Marketing Research 41 (1), 116–131. Trajtenberg, Manuel, 1989. The welfare analysis of product innovations, with an application to computed tomography scanners. Journal of Political Economy 97 (2), 444–479. Yang, Cathy L., Toubia, Olivier, de Jong, Martijn G., 2018. Attention, information processing and choice in incentive-aligned choice experiments. Journal of Marketing Research 55 (6), 783–800.

CHAPTER

Empirical search and consideration sets✩

4

Elisabeth Honkaa , Ali Hortaçsub,c,∗ , Matthijs Wildenbeestd a UCLA

Anderson School of Management, Los Angeles, CA, United States b University of Chicago, Chicago, IL, United States c NBER, Cambridge, MA, United States d Kelley School of Business, Indiana University, Bloomington, IN, United States ∗ Corresponding author: e-mail address: [email protected]

Contents 1 Introduction ...................................................................................... 2 Theoretical framework ......................................................................... 2.1 Set-up ................................................................................ 2.2 Search method...................................................................... 2.2.1 Simultaneous search............................................................. 2.2.2 Sequential search ................................................................ 2.2.3 Discussion ......................................................................... 3 Early empirical literature ...................................................................... 3.1 Consideration set literature ....................................................... 3.1.1 Early 1990s ........................................................................ 3.1.2 Late 1990s and 2000s .......................................................... 3.1.3 2010s – present .................................................................. 3.1.4 Identification of unobserved consideration sets ............................ 3.2 Consumer search literature ....................................................... 3.2.1 Estimation of search costs for homogeneous products ................... 3.2.2 Estimation of search costs for vertically differentiated products......... 4 Recent advances: Search and consideration sets ......................................... 4.1 Searching for prices ................................................................ 4.1.1 Mehta et al. (2003)............................................................... 4.1.2 Honka (2014) ..................................................................... 4.1.3 Discussion ......................................................................... 4.1.4 De los Santos et al. (2012) ..................................................... 4.1.5 Discussion ......................................................................... 4.1.6 Honka and Chintagunta (2017) ............................................... 4.2 Searching for match values ....................................................... 4.2.1 Kim et al. (2010) and Kim et al. (2017)...................................... 4.2.2 Moraga-González et al. (2018)................................................. 4.2.3 Other papers.......................................................................

194 197 197 199 199 201 203 204 204 205 205 207 207 208 209 212 217 217 218 219 224 224 225 226 229 230 235 237

✩ We thank Stephan Seiler and Raluca Ursu for their useful comments and suggestions. Handbook of the Economics of Marketing, Volume 1, ISSN 2452-2619, https://doi.org/10.1016/bs.hem.2019.05.002 Copyright © 2019 Elsevier B.V. All rights reserved.

193

194

CHAPTER 4 Empirical search and consideration sets

5 Testing between search methods ............................................................. 5.1 De los Santos et al. (2012) ....................................................... 5.2 Honka and Chintagunta (2017) .................................................. 6 Current directions ............................................................................... 6.1 Search and learning ................................................................ 6.2 Search for multiple attributes .................................................... 6.3 Advertising and search............................................................. 6.4 Search and rankings ............................................................... 6.5 Information provision .............................................................. 6.6 Granular search data ............................................................... 6.7 Search duration ..................................................................... 6.8 Dynamic search ..................................................................... 7 Conclusions ...................................................................................... References............................................................................................

238 239 241 243 243 245 246 247 248 249 250 251 252 253

1 Introduction “Prices change with varying frequency in all markets, and, unless a market is completely centralized, no one will know all the prices which various sellers (or buyers) quote at any given time. A buyer (or seller) who wishes to ascertain the most favorable price must canvass various sellers (or buyers)—a phenomenon I shall term ‘search’.” (Stigler, 1961, p. 213)

Dating back to the classic work of Stigler (1961), a large literature in economics and marketing documents the presence of substantial price dispersion for similar, even identical goods. For example, looking across 50,000 consumer products, Hitsch et al. (2017) find that, within a 3-digit zip code, the ratio of the 95th to the 5th percentile of prices for the median UPC (brand) is 1.29 (1.43). Substantial price dispersion has been reported in many different product categories including e.g. automobiles (Zettelmeyer et al., 2006), medical devices (Grennan and Swanson, 2018), financial products (Duffie et al., 2017; Hortaçsu and Syverson, 2004; Ausubel, 1991; Allen et al., 2013), and insurance products (Brown and Goolsbee, 2002; Honka, 2014). Again dating back to Stigler (1961), the presence and persistence of price dispersion for homogeneous goods has often been attributed to search/information costs. Understanding the nature of the search and/or information costs is a crucial step towards quantifying potential losses to consumer and social surplus induced by such frictions, and to assess the impact of potential policy interventions to improve market efficiency and welfare. Quantitative analyses of consumer and social welfare rely on empirical estimates of demand and supply parameters and comparing observed market outcomes to counterfactual efficient benchmarks. However, departures from the assumption that consumers have full information pose important methodological challenges to demand (and supply) estimation methods that have been the mainstay of quantitative marketing and economics. Consider, for example, a consumer who is observed to purchase a

1 Introduction

product for $5 when the identical product is available for purchase at $4 somewhere nearby. A naive analysis may conclude that demand curves are upward sloping, or, at the very least, that demand is very inelastic. Similarly, consider the observation that lower income but high-achieving high school seniors do not apply to selective four year colleges despite being admitted at high rates (see Hoxby and Turner, 2015). This may be because these seniors do not value a college education or because they are not aware of the financial aid opportunities or their chances of admission. Indeed, as in Stigler (1961), consumers are likely not perfectly informed about both prices and non-price attributes of all products available for purchase in a given market. It is therefore important, from the perspective of achieving an accurate understanding of preferences, to gain a deeper understanding of the choice process, and especially which subsets of products actually enter a consumer’s “consideration” set1 and how much a consumer knows about the price/non-price attributes. Understanding how consumers search for products and eventually settle on the product they are observed to purchase is the subject of a large and burgeoning literature in economics and marketing. Our focus in this essay is on the econometric literature that allows for the specification of a search process, leading to the formation of consideration sets, along with a model of preferences. In much of this literature, the specification of the search process is motivated by economic models of consumer search. While this constrains the specification of the search process, economic theory provides a useful guidepost as to how consumers may search under counterfactual scenarios that may be very far out of sample. We will thus start, in Section 2, with a brief survey of popular theoretical models of consumer search that motivate econometric specifications.2 While this chapter is centered around econometric methods and models, many of these methods and models are motivated by substantial findings. For example, in addition to the price dispersion/information cost argument by Stigler (1961), empirical marketing research in the 1980s found that many consumers effectively choose from (or “consider”) a surprisingly small number of alternatives – usually 2 to 5 – before making a purchase decision (see e.g. Hauser and Wernerfelt, 1990; Roberts and Lattin, 1991; Shocker et al., 1991). This empirical observation sparked a rich stream of literature in marketing that developed and estimated models that take consumers’ consideration into account. We discuss this stream of literature, that was pioneered by Hauser and Wernerfelt (1990) and Roberts and Lattin (1991), in Section 3.1. One of the main findings from this line of research, which has been validated in more recent work, is that advertising and other promotional activities create very little true consumption utility, but first and foremost affect awareness and consideration. We then turn to the early work in economics in Section 3.2, which was primarily motivated by Stigler’s (1961) price dispersion observation and search/information 1 Throughout this chapter, we use the terms “consideration set,” “search set,” “evoked set,” and “(endogenous) choice set” interchangeably unless stated otherwise. 2 Readers interested in much more exhaustive surveys of the theoretical research can refer to Baye et al. (2006) and Anderson and Renault (2018).

195

196

CHAPTER 4 Empirical search and consideration sets

cost argument. With the proliferation of the Internet around the turn of the century and increased availability of data, researchers worked on quantifying the amount of price dispersion in online markets as well as quantifying consumer search costs. Some of the rather surprising results of these efforts were that amount of price dispersion remained substantial even in online markets, i.e. prices did not seem to follow the Law of One Price. Starting with Sorensen (2000), Hortaçsu and Syverson (2004), and Hong and Shum (2006), researchers have utilized economic theories of search to rationalize observed price dispersion patterns and to infer search costs and preference parameters. The search cost estimates recovered in these papers appeared relatively large at first sight. However, subsequent work has confirmed that the costs consumers incur while gathering information remain quite high for a variety of markets. One of the main shortcomings of the consideration set literature discussed in Section 3.1 was that it mostly used reduced-form modeling approaches. One of the main shortcomings of the early search literature in economics discussed in Section 3.2 was that it modeled consumers as randomly picking which alternatives to search. In Section 4 we turn to the more recent literature that aims to overcome both shortcomings by covering a more general setting in which, following discrete choice additive random utility models popular in demand estimation, products are both vertically and horizontally differentiated. Here an important distinction is made regarding what consumers are searching for: we discuss models in which consumers are searching for prices in Section 4.1 and models in which consumers search for a good product match or fit in Section 4.2. We also discuss approaches that utilize both individual level data and aggregate (market share) data. Papers discussed in this section have in common that they think carefully about the data is needed to identify search costs. They also advance estimation methodologies by developing approaches to handle the curse of dimensionality that appears in the simultaneous search model and the search path dimensionality problem of the sequential search model. However, many of these models are not straightforward to estimate, and more work is need to obtain models that are both realistic and tractable in terms of estimation. Since the beginning of the search literature, the question of how consumers search, i.e. whether consumers search in a simultaneous or sequential fashion, has been heavily debated. Because researchers did not think that the search method was identified using observational data, it was common to make an assumption on the type of search protocol that consumers were using (frequently driven by computational considerations). Starting with De los Santos et al. (2012) and Honka and Chintagunta (2017), researchers have begun empirically testing observable implications of sequential versus simultaneous search and the broader question of the identifiability of the search method utilized by consumers. This also highlighted the importance of expectations: which search method is supported by data patterns can change depending on whether researchers assume that consumers have rational expectations. How consumers search also has implications on the estimated search costs (and thus any subsequent analyses such as welfare calculations): if consumers search simultaneously (sequentially), but the researcher falsely assumes that they search sequentially (simultaneously), search costs will be overestimated (underestimated).

2 Theoretical framework

In Section 6, we discuss various extensions and applications of the econometric frameworks discussed in the prior sections. Section 6.1 explores generalizations of the modeling framework when consumers are not perfectly informed regarding the distribution of prices and/or match utilities and learn about these distributions as they search. Section 6.3 discusses how advertising interacts with search and choice. Section 6.4 discusses the very related setting where the ranking and/or spatial placement of different choices on for instance a webpage affect search and eventual choice. Section 6.5 considers an interesting emerging literature on the issue of information provision made available to consumers at different stages of search, e.g. at different layers of a website. Sections 6.6 and 6.7 discuss how the availability of more granular information on consumer behavior such as search duration can improve inference/testing regarding the search method and preferences. This is clearly an important area of growth for the literature as consumer actions online and, increasingly, offline are being monitored closely by firms. Finally, Section 6.8 discusses the important case in which dynamic demand considerations (such as consumer stock-piling) interact with consumer search. The econometric literature on consumer search and consideration sets is likely to grow much further beyond what is covered here as more and more data on the processes leading to eventual purchase become available for study. We therefore hope our readers will find what is to follow a useful roadmap into what has been done so far, but that they will ultimately agree with us that there are many more interesting questions to answer in this area than has been attempted so far.

2 Theoretical framework 2.1 Set-up We start by presenting the general framework of search models.3 In these models, consumers are utility maximizers. Consumers know the values of all product characteristics but one (usually price or match value) prior to searching and have to engage in search to resolve uncertainty about the value of that one product characteristic. Search is costly so consumers only search a subset of all available products. Formally, we denote consumers by i = 1, . . . , N , firms/products by j = 1, . . . , J , and time periods by t = 1, . . . , T . Consumer i’s search cost (per search) for product j is denoted by cij and the number of searches consumer i makes is denoted by ki = 1, . . . , K with K = |J |. Firm j ’s marginal cost is denoted by rj . Consumer i’s

3 The two most common types of search models are price search models and match value search models. In the former, consumer search to resolve uncertainty about prices, while in the latter consumers search to resolve uncertainty about the match value or fit. The set-up of both price and match value search models fits under the general framework presented in this section. The set-up of both types of search models is identical with one exception denoted in footnote 4. However, the estimation approaches differ as discussed in Section 4.

197

198

CHAPTER 4 Empirical search and consideration sets

indirect utility for product j is an independent draw uij from a distribution Fj with density fj where uij is given by uij = αj + Xij β + γpij + ij .

(1)

The parameters αj are the brand intercepts, Xij represents observable product and/or consumer characteristics, pij is the price, and ij is the part of the utility not observed by the researcher.4 The parameters αj , β, and γ are the utility parameters. Although the framework is constructed to analyze differentiated goods, it can also capture special cases such as a price search model for homogeneous goods with identical consumers and firms. In this case, Eq. (1) simplifies to uij = −pij with Fj = F . We start by going through the set of assumptions that most search models share. Assumption 1. Demand is unit-inelastic. In other words, each consumer buys at most one unit of the good. Assumption 1 holds for all papers discussed in this chapter. Assumption 2. Prior to searching consumers know the (true) utility distribution, but not the specific utility a firm is going to offer on a purchase occasion. To put it differently, while consumer i does not know the specific utility uij he would get from buying from firm j (potentially on a specific purchase occasion), he knows the shape and parameters of the utility distribution, i.e. the consumer knows Fj . This assumption is often referred to as the “rational expectations assumption” because it is assumed that consumers are rational and know the true distribution from which utilities are being drawn. We relax Assumption 2 in Section 6.1 and discuss papers which have studied consumer search in an environment in which consumers are uncertain about the utility distribution and learn about it while searching. Assumption 3. All uncertainty regarding the utility from consuming a specific product is resolved during a single search. In other words, consumers learn a product’s utility in a single search. In Section 6.7, we present recent work that relaxes Assumption 3. Assumption 4. The first search is free. This assumption is made for technical reasons. Two alternative assumptions are sometimes made in the literature: (search costs are sufficiently low so that) all consumers search at least once (e.g. Reinganum, 1979) or all consumers use the same search method, but it is optimal for some consumers to search/not to search depending on the level of their search costs (e.g. Janssen et al., 2005). 4 In price search models, is observed by the consumer prior and post search. In match value search ij models, consumers do not know ij prior to search, but know its value after searching. In both price and match value search models, the researcher does neither observe ij prior nor post search.

2 Theoretical framework

2.2 Search method The consumer search literature has predominantly focused on two search methods: simultaneous and sequential search. In this subsection, we introduce these two search methods.

2.2.1 Simultaneous search Simultaneous search – also referred to as non-sequential, fixed sample, or parallel search – is a search method in which consumers commit to searching a fixed set of products (or stores or firms) before they begin searching. Consumers using this method will not stop searching until they have searched all firms in their predetermined search set. Note that – despite its name – simultaneous search does not mean that all firms have to be sampled simultaneously. Firms can be sampled one after another. What characterizes simultaneous search is the consumer’s commitment (prior to the beginning of the search process) to search a fixed set of firms. In the most general case, consumer i’s search problem consists of picking a subset of firms Si that maximizes the expected maximum utility to consumer i from searching that subset of firms net of search costs, i.e., ⎡

⎤ cij ⎦ , Si = arg max ⎣E max uij − S

j ∈S

(2)

j ∈S

where E denotes the expectation operator. Unfortunately, a simple solution for how a consumer should optimally pick the firms to be included in his search set Si does not exist for the simultaneous search model. In general, it will not be optimal to search the firms randomly, so the question is: which firms should the consumer search? This is referred to as ordered or directed search in the literature.5 When searching simultaneously, to pick the optimal set of products to be searched, the consumer has to enumerate all combinatorially possible search sets (varying by their size and composition) and calculate the corresponding expected gains of search while taking the cost of sampling all products in the search set into account, i.e., calculate the expected maximum utility minus the cost of searching for every search set (as in Eq. (2)). The following example illustrates the problem. Suppose there are four companies A, B, C, and D in the market. Then the consumer has to choose among the following search sets: A, B, C, D, AB, AC, AD, BC, BD, CD, ABC, ABD, ACD, BCD, and ABCD. The difficulty with this approach is that the number of possible search sets grows exponentially with the number of firms |J |, i.e. if there are |J | firms in the market, the consumer chooses among 2|J | − 1 search sets.6 This exponential growth

5 See Armstrong (2017) for an overview of the theoretical literature on ordered search. 6 The researcher can reduce the number of search sets the consumer is choosing from by dropping all

search sets that do not include the consumer’s chosen (purchased) option (see Mehta et al., 2003).

199

200

CHAPTER 4 Empirical search and consideration sets

in the number of search sets is referred to as the curse of dimensionality of the simultaneous search model. One avenue to deal with the curse of dimensionality is to only estimate the simultaneous search model for markets with relatively few products (see e.g. Mehta et al., 2003). Another avenue to overcome the curse of dimensionality is to make an additional assumption which allows one to derive a simple strategy for how the consumer should optimally choose his search set. The following two assumptions have been used in the literature: 1. Assumption of first-order stochastic dominance: Vishwanath (1992) showed that, for a simultaneous search model with first-order stochastic dominance among the utility distributions, the rules derived by Weitzman (1979) constitute optimal consumer search and purchase behavior.7 2. Assumption of second-order stochastic dominance8 : Chade and Smith (2005) showed that, for a simultaneous search model with second-order stochastic dominance among the utility distributions, it is optimal for the consumer to a. rank firms in a decreasing order of their expected utilities, b. pick the optimal number of searches conditional on the ranking according to the expected utilities, and c. purchase from the firm with the highest utility among those searched. The assumptions of first- or second-order stochastic dominance are typically implemented by assuming that the means or variances, respectively, of the price distributions are identical (see e.g. Honka, 2014). While adding an additional assumption restricts the flexibility of a model, making this additional assumption allows researchers to apply the simultaneous search model to markets with a large number of products. Furthermore, the appropriateness of these assumptions is empirically testable, i.e. using price data, researchers can test the hypothesis of identical means or variances across products. In the special case of homogeneous goods with identical firms, the search problem reduces to choosing the optimal number of products to search and the dimensionality problem disappears.9 The simultaneous search model for homogeneous goods was initially proposed by Stigler (1961). In Stigler’s model, the consumer has to decide which set of firms to search. Since firms are identical, i.e., Fj = F , in the setting he analyzes, it is optimal for the consumer to randomly pick the firms to be searched. Therefore, the only objective of the consumer is to determine the optimal number of firms to search. Since goods are homogeneous in Stigler’s model, the utility function in Eq. (1) simplifies to uij = −pij . The consumer’s objective is to minimize his cost 7 See next page for a detailed discussion of the rules derived by Weitzman (1979). 8 Additionally, to apply the theory developed by Chade and Smith (2005), the researcher also have to

assume that search costs are not company-specific. 9 This discussion and results hold when search costs are identical across consumers and when search costs are heterogeneous across consumers. The discussion and results do not hold when there is heterogeneity in search costs across firms.

2 Theoretical framework

of acquiring the good, i.e. to minimize the sum of the expected price paid and his search costs. Formally, a consumer’s objective function can be written as

p

min k

p

kp (1 − F (p))k−1 f (p) dp + (k − 1) c,

search cost

(3)

expected min. price for k searches

where F (p) is price distribution with a minimum price p and maximum price p. The intuition behind the expression for the expected minimum price in Eq. (3) is as follows: the probability that all price draws are greater than p is given by Pr(p1 > p, . . . , pk > p) = (1 − F (p))k . This implies that the cdf of the minimum draw is 1 − (1 − F (p))k and the pdf of the minimum draw is k(1 − F (p))k−1 f (p). It can be shown that there is a unique optimal number of searches k ∗ that minimizes Eq. (3) (see e.g. Hong and Shum, 2006). This optimal number of searches k ∗ is the size of the consumer’s search set.

2.2.2 Sequential search A main drawback of the simultaneous search method is that it assumes that the consumer will continue searching even after getting a high utility realization early during the search process. For example, consider a simultaneous search in which a consumer commits to searching three firms and in the first search he gets quoted the maximum utility u. Because of Assumption 2, the consumer knows that this is the highest possible utility so it would not be optimal to continue searching. To address this drawback, the sequential search method has been developed. When searching sequentially, consumers determine, after each utility realization, whether to continue searching or to stop. Before we discuss the sequential search model in detail, we have to add another technical assumption to the list of assumptions laid out in Section 2.1: Assumption 5. Consumers have perfect recall. In other words, once a consumer has searched a firm, he remembers the utility offered by this firm going forward. This assumption is equivalent to assuming that a consumer can costlessly revisit stores already searched. In Section 6.7, we present recent work that relaxes Assumption 5. The sequential search problem in its most general form has been analyzed by Weitzman (1979). The problem of searching for the best outcome from a set of options that are independently distributed can be stated as the following dynamic programming problem W u˜ i , S¯i = max u˜ i , max −cij + Fj (u˜ i )W (u˜ i , S¯i − {j }) +

j ∈S¯i

∞

u˜ i

W (u, S¯i − {j })fj (u)du

,

201

202

CHAPTER 4 Empirical search and consideration sets

Table 1 Example. Option A B C

Reservation utilities (zij ) 14 12 10

Utilities (uij ) 11 7 9

where u˜ i is consumer i’s highest utility sampled so far and S¯i is the set of firms consumer i has not searched yet. Weitzman (1979) shows that the solution to this problem can be stated in terms of J static optimization problems. Specifically, for each product j , consumer i derives a reservation utility zij . This reservation utility zij equates the benefit and cost of searching product j , i.e., cij =

∞

(uij − zij )fj (u)du.

zij

This consumer- and product-specific reservation utility zij can then be used to determine the order in which products should be searched as well as when to stop searching. Specifically, Weitzman (1979) shows that it is optimal for a consumer to follow three rules: 1. search companies in a decreasing order of their reservation utilities (“selection rule”), 2. stop searching when the maximum utility among the searched firms is higher than the largest reservation utility among the not-yet-searched firms (“stopping rule”), and 3. purchase from the firm with the highest utility among those searched (“choice rule”). It is important to note that the consumer will not always purchase from the firm searched last. What follows is an example that shows when this can happen: in Table 1, we show the reservation utilities and utilities (which the consumer only knows after searching) for three firms. Given Weitzman’s (1979) selection rule, the consumer searches the firms in a decreasing order of their reservation utilities. The consumer first searches firm A and learns that the utility is 11. Using the stopping rule, the consumer determines that the maximum utility among the searched firms (11) is smaller than the largest reservation utility among the not-yet-searched firms (12) and thus decides to continue searching. In the second search, the consumer searches firm B and learns that the utility is 7. Using the stopping rule, the consumer determines that the maximum utility among the searched firms (11 from firm A) is higher than the largest reservation utility among the not-yet-searched firms (10 for firm C) and thus decides to stop searching. The consumer then purchases from the firm with the highest utility among those searched – firm A. Note that firm A is the firm the consumer searched in his first and not in his second (and last) search.

2 Theoretical framework

In the special case of homogeneous products and identical firms (i.e., uij = −pij and Fj = F ), just like for the simultaneous search model, the sequential search model greatly simplifies.10 Because firms are identical, the consumer randomly picks a firm to search. As in the more general case, the consumer needs to solve an optimal stopping problem, i.e. solve the problem of balancing the benefit of further search with the cost of searching. Following McCall (1970), the first-order condition for the optimal stopping problem is given by z = (4) c (z − p) f (p) dp marginal cost

p

marginal benefit

where z is the lowest price found in the search so far. According to Eq. (4), a consumer is indifferent between continuing to search and stopping the search when the marginal cost of an additional search equals the marginal benefit of performing an additional search given the lowest price found so far. A consumer thus searches as long as the marginal benefit from searching is greater than the marginal cost of searching. The marginal benefit in Eq. (4) is the expected savings from an additional search given the lowest price found so far. Eq. (4) implies that there is a unique price z∗ for which the marginal cost of searching equals the marginal benefit of searching. This unique price z∗ is the beforementioned reservation price. Note that z∗ is a function of consumer search cost c. We can now describe the consumer’s decision rule: if the consumer gets a price draw above his reservation price, i.e. p > z∗ , he continues to search. If he gets a price draw below his reservation price, i.e. p ≤ z∗ , he stops searching and purchases. Note that in the case that all firms are identical (homogeneous good) and consumer search cost are identical across all firms, a consumer has a single (constant) reservation price z∗ (for all firms). The consumer stops searching after receiving the first price below his reservation price and makes a purchase. Thus the consumer always purchases from the firm searched last in such a setting.

2.2.3 Discussion A question that often comes up is whether one search method is “better” than the other, i.e. whether consumers should always search simultaneously or should always search sequentially. While in many settings searching sequentially is better for consumers because they can stop searching at any point of time if they get a good draw early on, Morgan and Manning (1985) showed that simultaneous search can be better than sequential search when a consumer is close to a deadline, i.e. needs to gather information quickly. Chade and Smith (2006) and Kircher (2009) further found that

10 This discussion and results hold when search costs are identical across consumers and when search

costs are heterogeneous across consumers. The discussion and results do not hold when there is heterogeneity in search costs across firms.

203

204

CHAPTER 4 Empirical search and consideration sets

simultaneous search is better when the other side of the market might reject the individual (e.g. students searching for higher education by submitting college applications). It is important to note that neither simultaneous nor sequential search by itself is the best search method for consumers. Morgan and Manning (1985) show that a combination of both simultaneous and sequential search (at the various stages of search) dominates either pure simultaneous or pure sequential search. In an experimental study, Harrison and Morgan (1990) make a direct comparison between such a hybrid search strategy and either simultaneous or sequential search strategies and find that their experimental subjects use the least restrictive search strategy if they are allowed to do so.

3 Early empirical literature 3.1 Consideration set literature The marketing literature has long recognized that consumers may not consider all products in a purchase situation. Consideration has been viewed as one among commonly three stages (awareness, consideration, and choice/purchase) in a consumer’s purchase process.11 However, the marketing literature has varied in the approaches used to view and model consideration. We structure our discussion of this stream of literature in a chronological order and review three groups of papers from the early 90’s, the late 90’s and 00’s, and more recent work from the 10’s.12 This chronological structure is not driven by time itself, but rather by papers written around the same time sharing common themes. For example, the papers from the early 90’s are rooted in the economic search literature, while the papers from late 90’s and 00’s employ more descriptive, statistical models. The last group of papers from the 10’s contains a more diverse set of papers that, for example, uses experimentation in support of statistical modeling or studies under which circumstances unobserved consideration sets can be identified. Before we dive into the details, it is important to note why consideration matters. When consumers have limited information, i.e. only consider a subset of all available products for purchase, and this limited information is not accounted for in the model and estimation, it will lead to biased preference estimates. Since preference estimates are used to calculate (price) elasticities, make recommendations on the employment of marketing mix elements, draw conclusions of the competitiveness of a market, etc., biased preference estimates might result in the wrong conclusions. This is a point that has been consistently made throughout this stream of literature.

11 These three stages are also sometimes referred to as the “purchase funnel.” 12 Roberts and Lattin (1997) provide an overview of the marketing literature on consideration sets be-

tween 1990 and 1997.

3 Early empirical literature

3.1.1 Early 1990s This group of papers contains work by Hauser and Wernerfelt (1990) and Roberts and Lattin (1991). Both papers base their approaches on the consumer search literature.13 Hauser and Wernerfelt (1990) propose that consumers search sequentially to add (differentiated) products to their consideration sets.14 Through search, consumers resolve uncertainty about the net product utility, i.e. utility minus price. The authors then discuss aggregate, market-level implications of their model such as order-ofentry penalties and competitive promotion intensity. Hauser and Wernerfelt (1990) also provide an overview table with mean or median consideration set sizes from previously published studies and the Assessor database for a variety of product categories. They find most consideration sets to include 3 to 5 brands. Roberts and Lattin (1991) develop a simultaneous search model in which consumers consider a brand as long as the utility of that brand is above an individualspecific utility threshold. To make the model estimable, the authors include a misspecification error. They calibrate their model using survey data from the ready-to-eat cereal market containing information on consumers’ considerations and purchases. Roberts and Lattin (1991) find a median consideration set size of 14.15 Lastly, the authors compare the predictive ability of their two-stage model to several benchmark models.

3.1.2 Late 1990s and 2000s This group of papers contains a body of work that focuses on more descriptive, statistical models. The main characteristics can be summarized as follows: first, consideration is neither viewed nor modeled as driven by uncertainty about a specific product attribute, e.g. price or match value. And second, there is limited empirical consensus on the drivers of consideration. Commonly, consideration is modeled as a function of marketing mix variables (advertising, display, feature, and, to a lesser extent, price). For example, Allenby and Ginter (1995) and Zhang (2006) estimate the effects of feature and display on consideration; Andrews and Srinivasan (1995) model consideration as a function of loyalty, advertising, feature, display, and price; Bronnenberg and Vanhonacker (1996) model consideration (saliency) as a function of promotion, shelf space, and recency (among others); Ching et al. (2009) model consideration as a function of price; and Goeree (2008) and Terui et al. (2011) model consideration as a function of advertising.

13 Ratchford (1980) develops a simultaneous search model for differentiated goods in which consumers

have uncertainty about prices and other product attributes and estimates the gains to searching using data on four household appliances. However, search is not explicitly connected to consideration in this paper. 14 Hauser and Wernerfelt (1990) also propose that, conditional on consideration, consumers pick a smaller subset of products that they evaluate for purchase. This evaluation is costly to consumers and consumers form the smaller subset for purchase evaluation using a simultaneous search step. 15 Roberts and Lattin (1991) explain the larger consideration set sizes by clarifying that they equate consideration with awareness and that aided awareness was used to elicit the considered brands.

205

206

CHAPTER 4 Empirical search and consideration sets

In the following, we discuss three aspects of this group of papers: modeling, decision rules, and data together with empirical results. Two approaches to modeling consideration emerge: one in which the probability of a consideration set is modeled (e.g. Andrews and Srinivasan, 1995; Chiang et al., 1999) and a second one in which the probability of a specific product being considered is modeled (e.g. Siddarth et al., 1995; Bronnenberg and Vanhonacker, 1996; Goeree, 2008). The papers also vary in terms of the specific model being estimated ranging from a heteroscedastic logit model (Allenby and Ginter, 1995) over dogit models (e.g. Siddarth et al., 1995) and (utility) threshold models (e.g. Siddarth et al., 1995; Andrews and Srinivasan, 1995) to aggregate random coefficients logit demand model based on Berry et al. (1995) (e.g. Goeree, 2008).16,17 Most of the consideration set papers published during this time period assume that consumers use compensatory decision rules, i.e. a product can “compensate” a very poor value in one attribute with a very good value in another attribute. However, a smaller set of papers models consideration using non-compensatory rules, i.e. a product attribute has to meet a certain criterion for the product to be considered and/or chosen (see also Aribarg et al., 2018). Non-compensatory rules are often proposed based bounded rationality arguments, often in the form of decision heuristics. For example, Fader and McAlister (1990) develop an elimination-by-aspects model in which consumers screen brands depending on whether these brands are on promotion. Gilbride and Allenby (2004) propose a model that can accommodate several screening rules for consideration: conjunctive, disjunctive, and compensatory. While Fader and McAlister (1990) find that their elimination-by-aspects and a compensatory model fit the data similarly (but result in different preference estimates), Gilbride and Allenby (2004) report that their conjunctive model fits the data best. In general, identification of compensatory and non-compensatory decision rules with non-experimental data is very difficult (see also Aribarg et al., 2018). Due to data restrictions, all the models suggested in the before-mentioned papers are estimated using choice data alone (usually supermarket scanner panel data). Therefore identification of consideration comes from functional form, i.e. nonlinearities of the model and modeling approaches are assessed based on model fit criteria. Few of the before-mentioned papers report predicted consideration set sizes. Two exceptions are Siddarth et al. (1995) and Bronnenberg and Vanhonacker (1996): the former paper reports that the average predicted consideration set includes 4.2 brands, while the latter predicts that the average consideration set for loyal and notloyal customers includes 1.5 and 2.8 brands, respectively. And lastly, most papers find

16 In a dogit model, a consumer probabilistically chooses from either the full set of alternatives or from a

considered set of alternatives. 17 Threshold models such as in Siddarth et al. (1995) and Andrews and Srinivasan (1995) assume that a

consumer’s utility for a product has to be above a certain value for the product to be in the consideration set. In contrast, models using non-compensatory rules assume that one or more attributes of a product have to have a value above or below a certain threshold for the product to be in the consideration set.

3 Early empirical literature

that marketing mix variables rather affect consideration than purchase. For example, Terui et al. (2011) find advertising to affect consideration, but not purchase.

3.1.3 2010s – present This group contains a more diverse set of papers that, for example, uses experimentation in support of statistical modeling, studies under which circumstances unobserved consideration sets can be identified or investigates preference heterogeneity estimates. Van Nierop et al. (2010) estimate a consideration and purchase model in which advertising affects the formation of consideration sets, but does not affect preferences. The authors combine scanner panel data with experimental data to show that consideration sets can be reliably recovered using choice data only and that feature and display affect consideration in their empirical context. And lastly, as discussed at the beginning of this section, not accounting for consumers’ limited information results in biased preference estimates (e.g. Bronnenberg and Vanhonacker, 1996; Chiang et al., 1999; Goeree, 2008; De los Santos et al., 2012; Honka, 2014). Two papers, Chiang et al. (1999) and Dong et al. (2018), put a special focus on preference heterogeneity estimates under full and limited information. Both papers find that the amount of preference heterogeneity is overestimated if consumers’ limited information is not accounted for.

3.1.4 Identification of unobserved consideration sets An important identification question in the consideration set literature as well as the search literature is whether changes in demand originate from shifts in consideration or shifts in utility. In many empirical settings the researcher does not have access to data on consideration, or may not have access to an instrument that can be excluded from utility or consideration, which makes it important to know to what extent a consideration set model can still be separately identified from a full information model. Abaluck and Adams (2018) show that consideration set probabilities are identified from asymmetries of cross-derivatives of choice probabilities with respect to attributes of competing products. This means that, for identification, it is not necessary to use data on consideration sets or to assume that there are characteristics that affect consideration set probabilities but do not appear in the utility function. In a model in which consumers have full information, consumers will consider all available options. The full consideration assumption implies that there is a symmetry in cross derivatives with respect to one or more characteristic of the product: a consumer will be equally responsive to a change in the price of product A as to a similar change in the price of product B. However, under certain assumptions it can be shown that this symmetry breaks down when a change in the characteristic of the product changes the consideration set probability of that product. Abaluck and Adams (2018) provide formal identification results for two classes of consideration set models: the “Default-Specific Consideration” (DSC) model and the “Alternative-Specific Consideration” (ASC) model. The DSC model fits into a rational inattention framework and assumes that the probability of considering other options than a default option only

207

208

CHAPTER 4 Empirical search and consideration sets

depends on the characteristics of the default option. The ASC model assumes that the probability of considering an option only depends on the characteristics of that good. In most theoretical search models, the probability of considering an option depends on the characteristics of all goods, which means that conventional search models do not fit in either the DSC or ASC framework. Even though this implies that their formal identification results do not apply directly to search models, Abaluck and Adams (2018) do suggest that cross-derivative asymmetries remain a source of identifying power for consideration probabilities in more complicated consideration set models in which the consideration probability for one good depends on the consideration probability for another good. Whether this indeed implies that conventional search models can be identified using asymmetric demand responses only is not formally shown, however, and remains an important area for future research. In a related paper, Crawford et al. (2018) show how to estimate preferences that are consistent with unobserved, heterogeneous consumer choice sets using the idea of sufficient sets. These sufficient sets are subsets of the unobserved choice sets and can be operationalized as products purchased by the consumer in the past or products contemporaneously purchased by other similar consumers. Kawaguchi et al. (2018) focus on advertising effectiveness and show how to use variation in product availability as a source of identification in consideration set models.

3.2 Consumer search literature The early empirical literature in economics initially focused on documenting price dispersion as well as testing some of the comparative statics results that were derived from theoretical search models. For instance, Sorensen (2000) examines retail prices for prescription drugs and finds substantial price variation, even after controlling for differences among pharmacies. In addition, he finds evidence that prices and price dispersion are lower for prescriptions that are purchased more frequently. This finding is consistent with search theory predictions since the gains from search are higher for frequently purchased products. Driven by the rise of e-commerce around the turn of the millennium, subsequent work focused on price dispersion for goods sold on the Internet and how online prices compared to prices for products sold in traditional brick and mortar stores. Most of these studies found substantial price dispersion for products sold online, despite the popular belief around the time that online comparison shopping would lead to Bertrand pricing. For instance, Clay et al. (2001) find considerable heterogeneity in pricing strategies for online bookstores and Clemons et al. (2002) report that online travel agents charge substantially different prices, even when given the same customer request. Starting with Hortaçsu and Syverson (2004) and Hong and Shum (2006), the literature began to move away from a reduced-form focused approach to more structural modeling. The idea was to use the structure of a theoretical search model to back out the search cost distribution from observed market data such as prices and quantities sold. In this section, we discuss both papers as well as several other studies that build

3 Early empirical literature

on these papers. Hong and Shum (2006) focus on the estimation of homogeneous goods search models, whereas Hortaçsu and Syverson (2004) allow for vertical product differentiation. However, in both papers, consumers search randomly across firms. This model feature distinguishes these two papers from more recent contributions in which consumers search a combination of horizontally and vertically differentiated firms, which makes consumers want to search in an ordered way.

3.2.1 Estimation of search costs for homogeneous products Hong and Shum (2006) develop methods to estimate search cost distributions for both simultaneous and sequential search models using only price data. An attractive feature of their simultaneous search model, which is based on Burdett and Judd (1983), is that search costs can be non-parametrically identified using price data only. To identify search costs in their sequential search model, parametric assumptions are needed. Since most of the follow-up literature has focused on simultaneous search, we will now briefly describe the essence of their estimation method for that search model. The main idea is to use the equilibrium restrictions of the theoretical search model as well as observed prices to back out the search cost distribution that is consistent with the theoretical model. As in Burdett and Judd (1983), firms are assumed to be homogeneous. Price dispersion emerges as a symmetric mixed-strategy Nash equilibrium: firms have an incentive to set lower prices to attract consumers who are searching, but at the same time face an incentive to set higher prices to extract surplus from consumers who are not searching. By playing a mixed strategy in prices according to a distribution F (p), firms can balance these two forces. Given such a price distribution F (p), a firm’s profit when setting a price p is given by ∞ k−1 qk (1 − F (p)) , (p) = (p − r) k=1

where qk is the share of consumers who search k times and r is the firm’s unit cost. The mixed strategy equilibrium requires firms to be indifferent between setting any price in the support of the price distribution, which results in the following equilibrium profit condition: ∞ k−1 (p − r)q1 = (p − r) qk (1 − F (p)) , (5) k=1

where the expression on the left-hand side of this equation is the profit when setting a price equal to the upper bound p of the price distribution F (p). Eq. (5) has to hold for any observed price that is consistent with this equilibrium condition, i.e., (p − r)q1 = (pi − r)

K k=1

qk (1 − F (pi ))

k−1

, i = 1, . . . , n − 1,

(6)

209

210

CHAPTER 4 Empirical search and consideration sets

where K is the maximum number of firms from which a consumer obtains price quotes and n is the number of price observations. Since Eq. (6) implies n − 1 equations and K unknowns, this system can be solved for the unknowns {r, q1 , . . . , qK } as long as K < n − 1. Hong and Shum (2006) develop a Maximum Empirical Likelihood (MEL) approach to do so. To obtain a non-parametric estimate of the search cost distribution, the estimates of the qk ’s can then be combined with estimates of the critical search cost values i , which are given by i = Ep1:i − Ep1:i+1 , i = 1, 2, . . . , n − 1, where p1:i is the lowest price out of i draws from the price distribution F (p). To illustrate their empirical approach, Hong and Shum (2006) use online prices for four economics and statistics textbooks for model estimation. The estimates of their nonsequential search model indicate that roughly half of consumers do not search beyond the initial free price quote. Moraga-González and Wildenbeest (2008) extend Hong and Shum’s (2006) approach to the case of oligopoly. Besides allowing for a finite number of firms instead of infinitely many firms, the model is similar to the simultaneous search model in Hong and Shum (2006). However, instead of using Eq. (6) and a MEL approach, they use a maximum likelihood (MLE) procedure. Specifically, Moraga-González and Wildenbeest (2008) solve the first-order condition for the equilibrium price density, which is then used to construct the likelihood function. The density function is given by N f (p) =

k−1 k=1 kqk (1 − F (p))

(p − r)

N

k=1 k(k

− 1)qk (1 − F (p))k−2

where N is the number of firms and F (p) solves Eq. (6) for K = N . The loglikelihood function is then LL = log f (p; q1 , . . . qN ), n

where the parameters to be estimated are the shares of consumers searching qk times. Moraga-González and Wildenbeest (2008) estimate the model using online price data for computer memory chips and find that even though a small share of consumers of around ten percent searches quite intensively, the vast majority of consumers does not obtain more than three price quotes. Moreover, estimates of average price-cost margins indicate that market power is substantial, despite having more than twenty stores operating in this market. Although MEL has some desirable properties such as requiring fewer assumptions regarding the underlying distribution, estimating the model using MEL requires solving a computationally demanding high-dimensional constrained optimization problem, which may fail to converge when the number of search cost parameters is large. Indeed, Moraga-González and Wildenbeest (2008) compare the two approaches in a

3 Early empirical literature

Monte Carlo study and find the MLE approach to work better in practice, especially with respect to pinning down the consumers who search intensively. Moreover, they find that the MLE procedure outperforms the MEL procedure in terms of fit. Several papers have extended the Hong-Shum approach. Most of these papers use a MLE approach as in Moraga-González and Wildenbeest (2008). A general finding is that in most of the markets studied, consumers either search very little (at most two times) or search a lot (close to all firms). This finding has been interpreted as some consumers using price comparison tools, which allows a consumer to get a complete picture of prices without having to visit each retailer individually. Wildenbeest (2011) adds vertical product differentiation to the framework and derives conditions under which the model can still be estimated using price data only. Specifically, by assuming that consumers have identical preferences towards quality, that input markets are perfectly competitive, and that the quality production function has constant returns to scale, he maps a vertical product differentiation model into a standard homogeneous goods model with firms playing mixed strategies in utilities. The model is estimated using price data for a basket of staple grocery items that are sold across four major supermarket chains in the United Kingdom. The estimates indicate that approximately 39 percent of price variation is explained by search frictions, while the rest is due to quality differences among stores. About 91 percent of consumers search at most two stores, suggesting that there is not a lot of search going on in this market. Moreover, ignoring vertical product differentiation when estimating the model leads to higher search cost estimates. Moraga-González et al. (2013) focus on the non-parametric identification of search costs in the simultaneous search model and show that the precision of the estimates can be improved by pooling price data from different markets. They propose a semi-nonparametric (SNP) density estimator that uses a flexible polynomial-type parametric function, which makes it possible to combine data from different markets with the same underlying distribution of search costs, but with different valuations, unit costs, and numbers of firms. The estimator is designed to maximize the joint likelihood from all markets, and as such the SNP procedure exploits the data more efficiently than the spline methods that are used in earlier papers (e.g. Hong and Shum, 2006; Moraga-González and Wildenbeest, 2008). To illustrate the estimation approach, Moraga-González et al. (2013) use a dataset of online prices for ten memory chips. Median search costs are estimated to be around $5. Search costs are dispersed, with most consumers having high enough search costs to find it optimal to search at most three stores, while a small fraction of consumers searches more than four times. Blevins and Senney (2019) add dynamics to the model by allowing consumers to be forward looking. In addition to deciding how many times to search in each period, consumers have the option to continue searching in the next period. Per-period search costs can be estimated using the approach in Moraga-González and Wildenbeest (2008) or Wildenbeest (2011), but to estimate the bounds of the population search cost distribution, a specific policy function must be estimated. Blevins and Senney (2019) apply the estimation procedure to the online market for two popular

211

212

CHAPTER 4 Empirical search and consideration sets

econometrics textbooks and find that median search costs for the dynamic model are much lower than for a static model, which suggests that search cost estimates are likely to be biased upwards when forward-looking behavior of consumers is ignored. Sanches et al. (2018) develop a minimum distance approach to estimate search costs, which is easier to implement than previous methods. In addition, they propose a two-step sieve estimator to estimate search costs when data from multiple markets are available. The sieve estimator only involves ordinary least squares estimation and is therefore easier to compute than other approaches that combine data from multiple markets, such as the SNP estimator in Moraga-González et al. (2013). As an illustration of their approach, Sanches et al. (2018) estimate search costs using online odds for English soccer matches as prices and find that search costs have fallen after bookmakers were allowed to advertise more freely as a result of a change in the law. Nishida and Remer (2018) provide an approach to combine search cost estimates from different geographic markets and show how to incorporate wholesale prices and demographics into the Hong-Shum framework. Specifically, they first nonparametrically estimate market-specific search cost distributions for a large number of geographically isolated gasoline markets using a vertical product differentiation model similar to Wildenbeest (2011). Then they use these estimates to parametrically estimate a search cost distribution that allows them to incorporate demographic information. Nishida and Remer (2018) find significant variation in search costs across the different geographic markets. Moreover, they find a positive relation between the estimated distribution of search costs and the income distribution. Zhang et al. (2018) use a MEL approach, as in Hong and Shum (2006), and show how to incorporate sales data into the estimation of both the simultaneous and the sequential search model. They show that including sales data results in estimates that are less sensitive to assumptions about the maximum number of searches consumers can conduct. Moreover, the sequential search model can be estimated nonparametrically when both price and sales data are used. The model is estimated using price and transaction data for a chemical product in a business-to-business environment. Findings show that the sequential search model provides a better fit.

3.2.2 Estimation of search costs for vertically differentiated products Hortaçsu and Syverson (2004) extend the methodology of Hong and Shum (2006) to the case where products are allowed to be vertically differentiated and a sequential search protocol is followed. Vertical differentiation takes the form of an index based on observable product attributes and an unobservable attribute, where the index weightings, along with search cost parameters, can be estimated. Unlike Hong and Shum (2006), price data alone is not sufficient to identify model parameters; quantity and market share information, along with data on product characteristics are also necessary. Like Hong and Shum (2006), nonparametric identification results are obtained for the underlying search cost distribution and the quality index for each product (which, in a manner similar to Berry (1994) and Berry et al. (1995), can be projected onto observable product characteristics with a suitable instrumental variable for the unobserved product attribute). While the model allows for specification

3 Early empirical literature

of preference heterogeneity across different consumer segments, horizontal differentiation in the form of additive random utility shocks is not considered; the model rationalizes nonzero market shares for dominated products through search costs. Empirically, Hortaçsu and Syverson (2004) study the S&P 500 index fund market, where substantial dispersion in fees is documented. While this may be surprising given that all index funds have the goal of replicating the return characteristics of the S&P 500 index, some return deviations across funds may exist, along with nonfinancial drivers of differentiation. Thus, the model allows for vertical differentiation between funds with non-trivial market shares for dominated products arising from costly sequential search. The utility from investing in fund j is a linear function of fund characteristics: u j = X j β − p j + ξj ,

(7)

where Xj are fund characteristics other than price pj and an unobservable component ξj . The coefficient on the price term is normalized to −1, so utilities are expressed in terms of basis points in fees (one hundredth of a percentage point). Thus one can think of uj as specifying fund utility per dollar of assets the investor holds in it. Search costs are heterogeneous in the investor population and follow distribution G(c). As in Carlson and Preston McAfee (1983) investors search with replacement and are allowed to revisit previously researched funds. Defining investors’ belief about the distribution of funds’ indirect utilities as H (u), the optimal search rule for an investor with search cost ci is given by the reservation utility rule u (u − u∗ )dH (u), ci ≤ u∗

where u is the upper bound of H (u), and u∗ is the indirect utility of the highest-utility fund searched up to that point. Assuming that investors observe the empirical cumulative distribution function of funds’ utilities, u1 < .... < uN , the expression for H (u) becomes H (u) =

N 1 I[uj ≤ u]. N j =1

The optimal search rule yields critical cut-off points in the search distribution given by cj =

N

ρk (uk − uj ),

(8)

k=j

where ρk is the probability that fund k is sampled on each search and cj is the lowest possible search cost of any investor who purchases fund j in equilibrium. Funds’ market shares can be written in terms of the search cost cdf by using the search-cost cutoffs from Eq. (8). Only investors with very high search costs (c > c1 )

213

214

CHAPTER 4 Empirical search and consideration sets

purchase the lowest-utility fund, u1 ; all others continue to search. However, not all investors with c > c1 purchase the fund; only those ones who happen to draw fund 1 first, which happens with probability ρ1 . Thus the market share of the lowest-utility fund is given by q1 = ρ1 (1 − G(c1 )) = ρ1 1 − G

N

ρk (uk − u1 )

.

(9)

k=1

Analogous calculations produce a generalized market share equation for funds 2 to N: j −1 ρk G(ck ) ρ1 G(c1 ) q j = ρj 1 + + 1 − ρ1 (1 − ρ1 − · · · − ρk−1 )(1 − ρ1 − · · · − ρk ) k=2 G(cj ) . (10) − (1 − ρ1 − · · · − ρj −1 ) These equations form a system of linear equations linking market shares to cutoffs in the search cost distribution. Eq. (10) maps observed market shares to the cdf of the search cost distribution evaluated at the critical values. Given the sampling probabilities ρj , all G(cj ) can be calculated directly from market shares. Solving the linear system (10) to recover G(c1 ), . . . , G(cN−1 ) and using the fact that G(cN ) = 0 (Eq. (8) implies cN = 0 and search costs cannot be negative) gives all critical values of the cdf. If the sampling probabilities are unknown and must be estimated, the probabilities as well as the search cost distribution can be parameterized as ρ(ω1 ) and G(c; ω2 ), respectively. Given ω1 and ω2 of small enough dimension, observed market shares can be used to estimate these parameters. While market share data can be mapped into the cdf of the search cost distribution, market shares do not generally identify the level of the critical search cost values c1 , . . . , cN , but only their relative positions in the distribution. However, shares do identify search cost levels in the special but often-analyzed case of homogeneous (in all attributes but price) products with unit demands; i.e., when uj = u − pj , where u is the common indirect utility delivered by the funds. In this case, Eq. (8) implies cj =

N k=j

ρk (u − pk − (u − pj )) =

N

ρk (pj − pk ).

(11)

k=j

Now, given sampling probabilities (either known or parametrically estimated), c1 , . . . , cN−1 can be calculated directly from observed fund prices using Eq. (11). In the more general case where products also differ in other attributes than price, information on fund companies’ optimal pricing decisions is required to identify cutoff search cost values. To do this, a supply side model has to be specified. Hortaçsu and Syverson (2004) assume that the F funds choose prices to maximize current

3 Early empirical literature

static profits. Let S be the total size of the market, pj and mcj be the price and (constant) marginal costs for fund j , and qj be fund j ’s market share given the price and characteristics of all sector funds. Then the profits of fund j are given by k = Sqj (p, X)(pj − mcj ). Profit maximization implies the standard first-order condition for pj : qj (p, X) + (pj − mcj )

∂qj (p, X) = 0. ∂pj

(12)

The elasticities ∂q/∂p faced by the fund are determined in part by the derivatives of the share equations (10). These derivatives are: ρ1 ρj2 g(c1 ) ρ2 ρj2 g(c2 ) ∂qj =− − ∂pj 1 − ρ1 (1 − ρ1 )(1 − ρ1 − ρ2 ) −

j −1

ρk ρj2 g(ck )

(1 − ρ1 − · · · − ρk−1 )(1 − ρ1 − · · · − ρk ) N ρk g(cj ) ρj

k=3

−

k=j +1

(1 − ρ1 − · · · − ρj −1 )

.

(13)

The pdf of the search cost distribution (evaluated at the cutoff points) enters the derivatives of the market share equations with respect to price (see Eq. (13)). Under Bertrand-Nash competition, the first order conditions for prices (Eq. (12)) imply: qj (p) ∂qj (p) =− . ∂pj pj − mcj

(14)

Given knowledge of marginal costs mcj , we can compute ∂qj /∂pj using the firstorder condition in Eq. (14). From Eq. (13), these derivatives form a linear system of N − 1 equations that can be used to recover the values of the search cost density function g(c) at the critical values c1 , . . . , cN−1 . If marginal costs are not known, they can be parameterized along with the search cost distribution and estimated from the price and market share data. Once both the values of the search cost cdf and pdf (evaluated at the cutoff search costs) have been identified, the level of these cutoff search costs cj in the general case of heterogeneous products can be identified. By definition, the difference between the cdf evaluated at two points is the integral of the pdf over that span of search costs. This difference can be approximated using the trapezoid method, i.e., G(cj −1 ) − G(cj ) = 0.5[g(cj −1 ) + g(cj )](cj −1 − cj ).

215

216

CHAPTER 4 Empirical search and consideration sets

This equation is inverted to express the differences between critical search cost values in terms of the cdf and pdf evaluated at those points, i.e. cj −1 − cj =

2G(cj −1 ) − G(cj ) . g(cj −1 ) + g(cj )

(15)

Given the critical values of G(c) and g(c) obtained from the data above, one can recover the cj , and from these trace out the search cost distribution.18 In non-parametric specifications, a normalization is required: the demand elasticity equations do not identify g(cN ), so a value must be chosen for the density at zero-search costs (recall that cN = 0). Finally, the critical values of the search cost distribution can be used to estimate the indirect utility function (Eq. (7)). The implied indirect utilities of the funds uj are derived from the cutoff search costs via the linear system in Eq. (8) above.19 One can then regress the sum of these values and the respective fund’s price (because of the imposed unit price coefficient) on the observable characteristics of the fund to recover β, the weights of the characteristics in the indirect utility function. One must be careful, however, as the unobservable components ξ are likely to be correlated with price, which would result in biased coefficients in ordinary least squares regressions. Therefore, as in Berry (1994) and Berry et al. (1995), one can use instrumental variables for price to avoid this problem. Estimation of the model using data on S&P 500 index funds between 1995-2000 reveals that product differentiation indeed plays an important role in this market: investors value funds’ non-financial characteristics such as fund age, total number of funds in the fund family, and tax exposure. After taking vertical product differentiation into account, fairly small but heterogeneous search costs (the difference between the 25th and 75th percentiles varies between 0.7 to 28 basis points) can rationalize the very substantial price dispersion (the 75th percentile fund charged more than three times the fee charged by the 25th percentile fund). The estimates also suggest that search costs are shifting over time, consistent with the documented influx of high search cost and financially inexperienced mutual fund investors into the market during a period of sustained stock market gains. Roussanov et al. (2018) utilize the Hortaçsu and Syverson (2004) model to analyze the broader market for U.S. equity mutual funds and find that investor search and the marketing efforts of mutual fund managers to influence investor search towards their funds can explain a substantial portion of the empirical relationship between mutual fund performance and mutual fund flows. Using their structural estimates, the 18 Any monotonically increasing function between the identified cutoff points could be consistent with

the true distribution; the trapezoid approximation essentially assumes this is linear. The approximated cdf converges to the true function as the number of funds increases. 19 In the current setup, Eq. (8) implies that u = 0, so fund utility levels are expressed relative to the least 1 desirable fund. This normalization results from the assumption that all investors purchase a fund; if there is an outside good that could be purchased without incurring a search cost, one could alternatively normalize the utility of this good to zero.

4 Recent advances: Search and consideration sets

authors find that marketing is a very important determinant along with performance, fees, and fund size. In a counterfactual exercise that bans marketing by mutual fund managers, Roussanov et al. (2018) find that capital shifts towards cheaper funds, and that capital is allocated in a manner more closely aligned with (estimated) manager skills. Econometrically estimated search models have found applications in several other important financial products markets, where products are complex and consumers are relatively inexperienced and/or uninformed about contract details. Allen et al. (2013) estimate search costs that rationalize the dispersion of mortgage rates in Canada. Woodward and Hall (2012) find substantial dispersion in mortgage closing/brokerage costs and, using a model of search with broker competition, estimate large gains (exceeding $1,000 for most borrowers) from getting one more quote. An important modeling challenge in many financial products markets is the fact that loans and most securities are priced through a process of negotiation. This poses an interesting econometric challenge in that the prices of alternatives that are not chosen by the consumer are not observed in the data. Woodward and Hall (2012), Allen et al. (2018), and Salz (2017) are recent contributions that address this problem by specifying an auction process between lenders/providers in the consumer’s consideration set. Given the importance of understanding choice frictions faced by consumers in these markets, which have been under much scrutiny and regulatory action since the 2008 financial crisis, future research in this area is very well warranted.

4 Recent advances: Search and consideration sets In this section, we discuss recent empirical work which makes an explicit connection between search and consideration, i.e. search is viewed as the process through which consumers form their consideration sets. While this idea might appear intuitive, the two streams of literature on consideration sets (in marketing) and consumer search (in economics) have existed largely separately until recently. We organize this section by consumers’ source of uncertainty: In Section 4.1, we discuss papers in which consumers search to resolve uncertainty about prices and in Section 4.2, we discuss papers in which consumers search to resolve uncertainty about the match value or product fit.

4.1 Searching for prices We structure our discussion of papers that have modeled consumer search for prices by search method: in Mehta et al. (2003), Honka (2014), and De los Santos et al. (2012), consumers search simultaneously, whereas consumers search sequentially in Honka and Chintagunta (2017).

217

218

CHAPTER 4 Empirical search and consideration sets

4.1.1 Mehta et al. (2003) The goal of Mehta et al. (2003) is to propose a structural model of consideration set formation. The authors view searching for prices as the process that consumers undergo to form consideration sets. Mehta et al. (2003) apply their model to scanner panel data, i.e., data that contains consumer purchases together with marketing activities but does not contain information on consideration sets, from two categories (liquid detergent and ketchup) with four brands in each. The authors find average predicted consideration set sizes to vary between 1.8 and 2.8 across both product categories, pointing to consumers incurring sizable search costs to resolve uncertainty about prices. Further, they find that in-store display and feature ads significantly reduce search costs, while income significantly increases search costs. Lastly, Mehta et al. (2003) report that consumers’ price sensitivity is underestimated if consumers’ limited information is not taken into account. In the following, we provide details on the modeling and estimation approach. As mentioned before, Mehta et al. (2003) develop a structural model in which consumers search simultaneously to learn about prices and this search process leads to the formation of consumers’ consideration sets. Consumer i’s utility function is given by uij t = θ qij t − pij t where prices pij t are assumed to follow an Extreme Value (EV) Type I distribution with pj ∼ EV (p j , σp2j ), and qij t being the perceived quality of a product which is observed by both the consumer and the researcher.20 The parameter θ is consumer i’s sensitivity to quality and is estimated. Note that there is no error term in the above utility specification. If an error term were to be included, Mehta et al. (2003) would not be able to separately identify the baseline search cost c0 from the true quality of brands qj . Given the distributional assumption for prices, consumer i’s utility also follows an EV Type I distribution with uij t ∼ EV θ qij t − p j , σu2j and σpj = σuj . Mehta et al. (2003) use the choice model approach described in Section 2.2 to model consumers’ choices of consideration sets, i.e. consumers calculate the net benefit of every possible consideration set and pick the one that gives them the largest net benefit. The choice model approach is feasible despite the curse of dimensionality of the simultaneous search model because Mehta et al. (2003) apply

20 The perceived quality is assumed to be updated in a Bayesian fashion after each product purchase. We

refer the reader to Mehta et al. (2003) for details on this process.

4 Recent advances: Search and consideration sets

their model to scanner panel data from two categories and focus on the four main brands in each category.21 The expected net benefit of a specific consideration set ϒ (determined by its size k and its composition) is given by22 k k 6σu π θ wilt − p il − exp √ cilt ln π 6σu

√ EBϒ =

l=1

(16)

l=1

with cilt = c0 + Wilt δ. The consumer picks the consideration set (determined by its size and composition) that maximizes the net benefit of searching. Once consumer i has searched all products in his consideration set, he has learned about their prices and all uncertainty is resolved. The consumer then picks the product that provides him with the largest utility among the considered ones. Next, we describe how Mehta et al. (2003) estimate their model. Consumer i’s unconditional purchase probability is the product of consumer i’s consideration and conditional purchase probabilities, i.e. Pij = PCi Pij |Ci . The probability that consumer i considers consideration set ϒ (determined by its size k and its composition) can be written as P (Ci = ϒ) = P (EBϒ ≥ EB ∀ϒ = ) . Lastly, given that the qualities qij t are truncated normal random variables (see Mehta et al., 2003), the conditional choice probabilities are given by probit probabilities.

4.1.2 Honka (2014) Honka (2014) studies the auto insurance market. Insurance markets can be inefficient for several reasons with adverse selection and moral hazard being the two most extensively studied reasons. Honka (2014) investigates a different source of market inefficiency: market frictions. She focuses on two types of market frictions, namely, search and switching costs and estimates demand for auto insurance in their presence. Honka (2014) uses an individual-level data set in which, in addition to information and purchases, prices, and marketing activities, she also observes consumers’ consideration sets. She finds search costs to vary from $35 to $170 and switching costs of $40. Further, she reports that search costs are the main driver of the very high customer retention rate in this industry and their elimination is the main lever to increase consumer welfare. 21 Mehta et al. (2003) can reduce the number of consideration sets by dropping those that do not include

the purchased brand. 22 Since prices and thus utilities (see Eq. (4.1)) follow an EV Type I distribution, the maximum utility of

a set of EV Type I distributions also follows an EV Type I distribution (Johnson et al., 1995).

219

220

CHAPTER 4 Empirical search and consideration sets

In the following, we provide details on the modeling and estimation approach: as in Mehta et al. (2003), Honka (2014) also estimates a structural model in which consumers search simultaneously to resolve uncertainty about prices. She models search as the process through which consumers form their consideration sets. Consumer i’s indirect utility for company j is given by uij = αij + βi Iij,t−1 + γpij + Xij ρ + Wi φ + ij with αij being consumer-specific brand intercepts. βi captures consumer inertia and can be decomposed into βi = β˜ + Zi κ with Zi being observable consumer characteristics. Iij,t−1 is a dummy variable indicating whether company j is consumer i’s previous insurer. Note that as observed heterogeneity interacts with Iij,t−1 , it plays a role in the conditional choice decisions. The parameter γ captures a consumer’s price sensitivity and pij denotes the price charged by company j . Note that – in contrast to the consumer packaged goods industry – in the auto insurance industry, prices depend on consumer characteristics. Prices pij follow an EV Type I distribution with location parameter ηij and scale parameter μ.23 Given that consumers know the distributions of prices in the market, they know ηij and μ. Xij is a vector of product- and consumer-specific attributes and Wi contains regional fixed effects, demographic and psychographic factors that are common across j . Although these factors drop out of the conditional choice decision, they may play a role in the search and consideration decisions. And lastly, ij captures the component of the utility that is observed by the consumer but not the researcher. Given that Honka (2014) studies a market with 17 companies, the curse of dimensionality of the simultaneous search model has to be overcome (see also Section 2.2). To do so, she assumes first-order stochastic dominance among the price belief distributions and uses the optimal selection strategy for consideration sets suggested by Chade and Smith (2005).24 She assumes a specific form of first-order stochastic dominance, namely, that the price belief distributions have consumer- and companyspecific means but the same variance across all companies and tests the appropriateness of this assumption using data on prices. Note that the consumer makes the decisions of which and how many companies to search at the same time. For expository purposes, we first discuss the consumer’s decision of which companies to search followed by the consumer’s decision of how many companies to search. Both decisions are jointly estimated. A consumer’s decision regarding which companies to search depends on the expected indirect utilities (EIU; Chade and Smith, 2005) where the expectation is taken with respect to the

23 This means the PDF is given by f (x) = μ exp (−μ (x − η)) exp (− exp (−μ (x − η))) and CDF is given by F (x) = exp (− exp (−μ (x − η))) with location parameter η and scale parameter μ. Mean is

2 η + eμc and variance π 2 where ec is Euler constant (Ben-Akiva and Lerman, 1985). 6μ

24 Honka (2014) also assumes that search costs are not company-specific – an assumption that also has to

be made to apply the theoretical results developed by Chade and Smith (2005).

4 Recent advances: Search and consideration sets

characteristic the consumer is searching for – in this case, prices. So consumer i’s EIU is given by E uij = αij + βi Iij,t−1 + γ E pij + Xij ρ + Wi φ + ij . Consumer i observes these EIUs for every company in his market (including ij ). To decide which companies to search, consumer i ranks all companies other than his previous insurance provider (because the consumer gets a free renewal offer from the previous insurer) according to their EIUs (Chade and Smith, 2005) and then picks the top k companies to search. Rik denotes the set of top k companies consumer i ranks highest according to their EIU. For example, Ri1 contains the company with the highest expected utility for consumer i, Ri2 contains the companies with the two highest expected utilities for consumer i, etc. To decide on the number of companies k a consumer searches, the consumer calculates the net benefit of all possible search sets given the ranking of EIUs, i.e. if there are N companies in the market, the consumer can choose among N − 1 search sets (one free quote comes from the previous insurer). A consumer’s benefit of a searched set is then given by the expected maximum utility among the searched brands. Given the EV distribution of prices, the maximum utility also has an EV distribution ⎞ ⎛ 1 max uij ∼ EV ⎝ ln (17) exp baij , b⎠ j ∈Rik b j ∈Rik

with aij = αij + βi Iij,t−1 + γ ηij + Xij ρ + Wi φ + ij and b = μ γ . If we further define 1 a˜ Rik = b ln j ∈Rik exp baij , then the benefit of a searched set is given by ec E max uij = a˜ Rik + j ∈Rik b where ec denotes the Euler constant. The consumer picks Sik which maximizes his net benefit of searching denoted by i,k+1 , i.e. the expected maximum utility among the considered companies minus the cost of search, given by ⎡ ⎤ i,k+1 = E ⎣

max $ uij # j ∈Rik ∪ jIij,t−1

⎦ − kci .

(18)

The consumer picks the number of searches k which maximizes his net benefit of search. If a consumer decides to search k companies, he pays kci on search costs and has k + 1 companies in his consideration set. Consumers can be heterogeneous in both preferences and search costs. Consumerspecific effects in both the utility function and search costs are not identified because of the linear relationship between utilities and search costs in Eq. (18). If we increase,

221

222

CHAPTER 4 Empirical search and consideration sets

for example, the effect of a demographic factor in the utility function and decrease its effect on search costs by an appropriate amount and the benefit of a consideration set, i,k+1 , would remain the same. In the empirical specification, Honka (2014) therefore controls for observed and unobserved heterogeneity in the utility function and for quoting channels (e.g. agent, insurer website) in search costs. This concludes the description of how a consumer forms his consideration set. Once a consumer has formed his consideration set and received all price quotes he requested, all price uncertainty is resolved. Both the consumer and the researcher observe prices. The consumer then picks the company with the highest utility among the considered companies with the utilities now including the quoted prices for consumer i by company j . Next, we describe how Honka (2014) estimates her model. The crucial differences between what the consumer observes and what the researcher observes are as follows: 1. Whereas the consumer knows each company’s position in the EIU ranking, the researcher only partially observes the ranking by observing which companies are being searched and which ones are not being searched. 2. In contrast to the consumer, the researcher does not observe αij and ij . Honka (2014) tackles the first point by pointing out that partially observing the ranking contains information that allows her to estimate the composition of consideration sets. Because the consumer ranks the companies according to their EIU and only searches the highest ranked companies, the researcher knows from observing which companies are searched that the EIUs among all the searched companies have to be larger than the EIUs of the non-searched companies or, to put it differently, that the minimum EIU among the searched companies has to be larger than the maximum EIU among the non-searched companies, i.e. min E uij ≥ max E uij . j ∈S / i

j ∈Si

As a consumer decides simultaneously which and how many companies to search, the following condition has to hold for any searched set min E uij ≥ max E uij ∩ ik ≥ ik

j ∈Si

j ∈S / i

∀k = k

(19)

i.e. the minimum EIU among the searched brands is larger than the maximum EIU among the non-searched brands and the net benefit of the chosen searched set of size k is larger than the net benefit of any other search set of size k . And finally, Honka (2014) accounts for the fact that the researcher does not observe αij and ij by integrating over their distributions. She assumes that α ∼ MV N (α, ¯ α ) where α¯ and α contain parameters to be estimated and ij ∼ EV Type I(0, 1). Then the probability that a consumer picks a consideration set ϒ is

4 Recent advances: Search and consideration sets

given by

Piϒ|α,

= Pr min E uij ≥ max E uij ∩ i,k+1 ≥ i,k +1 j ∈S / i

j ∈Si

∀k = k

. (20)

Note that the quote from the previous insurer directly influences the consumer’s choice of the size of a consideration set. A consumer renews his insurance policy with his previous provider if the utility of doing so is larger than the expected net benefit i,k+1 of any number of searches. Next, she turns to the purchase decision given consideration. The consumer’s choice probability conditional on his consideration set is (21) Pij |ϒ,α, = Pr uij ≥ uij ∀j = j , j, j ∈ Ci where uij now contains the quoted prices. Note that there is a selection issue: Given a consumer’s search decision, ij do not follow an EV Type I distribution and the conditional choice probabilities do not have a closed-form expression. The consumer’s unconditional choice probability is given by Pij |α, = Piϒ|α, Pij |ϒ,α, .

(22)

In summary, the researcher estimates the price distributions, only partially observes the utility rankings, and does not observe αij and ij in the consumer’s utility function. Accounting for these issues Honka (2014) derived an estimable model with consideration set probability given by (20) and the conditional and unconditional purchase probabilities given by (21) and (22). Parameters are estimated by maximizing the joint likelihood of consideration and purchase given by ⎛ ⎞ J N +∞ +∞ % L % % δ ϑil ⎝ Piϒ|α, Pijij|ϒ,α ⎠ f (α) f () dαd L= i=1 −∞

−∞

l=1 j =1

where ϑil indicates the chosen consideration set and δij the chosen company. Neither the consideration set nor the conditional purchase probability have a closed-form solution. Honka (2014) therefore uses a simulation approach to calculate them. In particular, she simulates from the distributions of αij and ij . She uses a kernelsmoothed frequency simulator (McFadden, 1989) in the estimation and smooths the probabilities using a multivariate scaled logistic CDF (Gumbel, 1961) F (w1 , . . . , wT ; s1 , . . . , sT ) =

1+

T

1

t=1 exp (−st wt )

∀ t = 1, . . . , T

(23)

where s1 , . . . , sT are scaling parameters. McFadden (1989) suggests this kernelsmoothed frequency simulator which satisfies the summing-up condition, i.e. that probabilities sum up to 1, and is asymptotically unbiased.

223

224

CHAPTER 4 Empirical search and consideration sets

4.1.3 Discussion In discussing Mehta et al. (2003) and Honka (2014), we start with pointing out the similarities: both papers estimate structural demand models, view search as the process through which consumers form their consideration sets, and use simultaneous search models. However, there are also multiple differences between the two papers. First, Mehta et al. (2003) only have access to purchase data and thus search cost identification comes from functional form, i.e. the data do not contain any direct search outcome measures (e.g. number of searches) and search costs are identified based on model non-linearities and fit. In contrast, Honka (2014) observes the sizes of consumers’ consideration sets and thus search costs are identified by variation in consumers’ consideration set sizes in her empirical application. Second, the utility function in Mehta et al. (2003) does not have an error term for identification reasons. The addition of such an error term would preclude Mehta et al. (2003) from separately identifying baseline search cost and the true quality of brands. Third, Mehta et al. (2003) use an exclusion restriction: the authors assume that promotional activities such as in-store display and feature affect consumers’ search costs but not their utilities. In contrast, advertising enters consumers’ utilities (but not their search costs) in Honka (2014). Note that – without exogenous variation – the effects of advertising and promotional activities on both utility and search cost are only identified based on model non-linearities and fit. And lastly, Mehta et al. (2003) and Honka (2014) use different approaches to deal with the curse of dimensionality of the simultaneous search model. While Mehta et al. (2003) use the choice model approach which makes estimation only feasible in categories with a small number of options, Honka (2014) applies the theory developed by Chade and Smith (2005) and develops an estimation approach that is feasible in categories with a large number of options. However, she has to assume second-order stochastic dominance among the price distributions and that search costs do not vary across firms – two assumptions that Mehta et al. (2003) do not have to make.

4.1.4 De los Santos et al. (2012) De los Santos et al. (2012) develop empirically testable predictions of data patterns that identify whether consumers search simultaneously or sequentially (see also Section 5). They use online browsing and purchase data for books to test these predictions and find evidence that consumers search simultaneously. Next, De los Santos et al. (2012) estimate a simultaneous search model and find average search costs of $1.35. In the following, we provide details on the modeling and estimation approach: De los Santos et al. (2012) estimate a model in which consumers search across differentiated online bookstores using simultaneous search. As in Mehta et al. (2003) and Honka (2014), consumers search for prices. Consumer’s i indirect utility of buying product at store j is given by uij = μj + Xi βj + αi pj + εij , where μj are store fixed effects, Xi are consumer characteristics, pj is store j ’s price for the product, and εij is an EV Type I-distributed utility shock that is observed by the consumer. Consumer search costs ci are consumer-specific and depend on consumer characteristics. Let miS denote the expected maximum utility of visiting the stores in S net of search

4 Recent advances: Search and consideration sets

costs, i.e., miS = E maxj ∈S {uij } − k · ci . By adding an EV Type I choice-set specific error term ζiS to miS , the probability that consumer i finds it optimal to search a subset of stores S can be written as PiS =

exp[miS /σζ ] , S ∈S exp[miS /σζ ]

(24)

where σζ is the scale parameter for ζiS . Conditional on visiting stores in S, the probability of purchasing from store j is then Pij |S = Pr(uij > uik ∀ k = j ∈ S). This probability does not have a closed-form solution, because a store with a higher ε draw is more likely to be selected in the choice-set selection stage. The probability of observing consumer i visiting all store in S and buying from store j is found by multiplying the two probabilities, i.e., Pij S = PiS Pij |S .

(25)

De los Santos et al. (2012) estimate the model using simulated maximum likelihood. They observe the stores visited by consumers in their data, and use Pij S in Eq. (25) to construct the log-likelihood function, i.e., the log-likelihood function is log Pîj S , LL = i

where Pîj S is the probability that individual i bought at store j from the observed choice set S. To obtain a closed-form expression for E maxj ∈S {uij } , De los Santos et al. (2012) follow Mehta et al. (2003) and Honka (2014) in their assumption that prices follow an EV Type I distribution with known parameters γj and σ , i.e. ⎛ ⎞ μj + Xi βj + εij + αi γj ⎠. exp E max{uij } = αi σ log ⎝ j ∈S αi σ j ∈S

Note that the utility shock εij that appears in both probabilities is integrated out using simulation, as part of a simulated maximum likelihood procedure.

4.1.5 Discussion There are several similarities and differences between De los Santos et al. (2012) and the two previously discussed papers, Mehta et al. (2003) and Honka (2014). First, De los Santos et al. (2012) have more detailed data on consumer search than Mehta et al. (2003) or Honka (2014): while Mehta et al. (2003) do not observe consumer search at all in their data and Honka (2014) observes consumers’ consideration sets, De los Santos et al. (2012) observe consumers’ consideration sets and the sequence of searches. Second, the error structures in the three papers are different: the utility

225

226

CHAPTER 4 Empirical search and consideration sets

function in Mehta et al. (2003) does not contain a classic error term. While there is an error term in the utility function in Honka (2014) and De los Santos et al. (2012), in De los Santos et al. (2012) there is also a search-set specific error term (see also Moraga-González et al., 2015). As shown in Eq. (24), the latter error term gives De los Santos et al. (2012) a closed-form solution for the search set probabilities. This closed-form solution makes estimation of the model easier, but may necessitate a discussion of what this search-set specific error term, which is assumed to be independent across (possibly similar) search sets, represents. And lastly, the positioning and contributions of the three papers are different: Mehta et al. (2003) is one of the first structural search models estimated with individual-level data on purchases. The contribution of this paper lies in the model development. Honka (2014) extends the model of Mehta et al. (2003) and develops an estimation approach that is feasible even in markets with a large number of alternatives. De los Santos et al.’s (2012) primary contribution is to show that consumers’ search method can be identified when the sequence of searches (and characteristics of searched products) made by individual consumers is observed in the data (see also Section 5).

4.1.6 Honka and Chintagunta (2017) Similar to De los Santos et al. (2012), Honka and Chintagunta (2017) are also primarily interested in the question of search method identification. They analytically show that consumers’ search method is identified by patterns of prices in consumers’ consideration sets (see also Section 5). They use the same data as Honka (2014), i.e. cross-sectional data in which consumers’ purchases and consideration sets are observed, to empirically test whether consumers search simultaneously or sequentially in the auto insurance industry. They find the price pattern to be consistent with simultaneous search. Then Honka and Chintagunta (2017) estimate both a simultaneous and a sequential search model and find preference and search cost estimates to be severely biased when the incorrect assumption on consumers’ search method is made. In the following, we discuss the details of the modeling and estimation approach for the sequential search model: Honka and Chintagunta (2017) develop an estimation approach for situations in which the researcher has access to individual-level data on consumers’ consideration sets (but not the sequence of searches) and purchases. Suppose consumer i’s utility for company j is given by uij = αij + βpij + Xij γ + ij where ij are iid and observed by the consumer, but not by the researcher. αij are p brand intercepts and pij are prices which follow a normal distribution with mean μij . Even though the sequence of searches is not observed, observing a consumer’s consideration set allows the researcher to draw two conclusions based on Weitzman’s (1979) rules: First, the minimum reservation utility among the searched companies has to be larger than the maximum reservation utility among the non-searched com-

4 Recent advances: Search and consideration sets

panies (based on the selection rule), i.e. min zij ≥ max zij j ∈S / i

j ∈Si

(26)

Otherwise, the consumer would have chosen to search a different set of companies. And second, the stopping and choice rules can be combined to the following condition max uij ≥ uij , max zij j ∈Si

j ∈S / i

∀j ∈ Si \ {j }

(27)

i.e. that the maximum utility among the searched companies is larger than any other utility among the considered companies and the maximum reservation utility among the non-considered companies. Eqs. (26) and (27) are conditions that have to hold based on Weitzman’s (1979) rules for optimal behavior under sequential search and given the search and purchase outcome that is observed in the data. At the same time, it must also have been optimal for the consumer not to stop searching and purchase earlier given Weitzman’s (1979) rules. The challenge is that the order in which the consumer collected the price quotes is not observed. The critical realization is that, given the parameter estimates, the observed behavior must have a high probability of having been optimal. To illustrate, suppose a consumer searches three companies. Then the parameter estimates also have to satisfy the conditions under which it would have been optimal for the consumer to continue searching after his first and second search. Formally, in the estimation, given a set of estimates for the unknown parameters, for each consumer i, let us rank all searched companies j according to their reservation utilities zˆ it (the “^” symbol refers to quantities computed at the current set of estimates) where t = 1, ..., k indicates the rank of a consumer’s reservation utility among the searched companies. Note that t = 1 (t = k) denotes the company with the largest (smallest) reservation utility zˆ it . Further rank all utilities of searched companies in the same order as the reservation utilities, i.e. uˆ i,t=1 denotes the utility for the company with the highest reservation utility zˆ it=1 . Then given the current parameter estimates, the following conditions have to hold uˆ i,t=1 < zˆ it=2

∩

max uˆ it < zˆ i,t=3

t=1,2

In other words, although by definition the reservation utility of the company with t = 1 is larger than that with t = 2, the utility of the company with t = 1 is smaller than the reservation utility of the company with t = 2 thereby prompting the consumer to do a second search. Similarly, the maximum utility from the (predicted) first and second search has to be smaller than the reservation utility from the (predicted) third search; otherwise the consumer would not have searched a third time. Generally, for a consumer searching t = 2, . . . , k companies, the following set of conditions has to

227

228

CHAPTER 4 Empirical search and consideration sets

hold k & l=2

max uˆ it < zˆ it=l . t zK+1 , which implies a lower bound of zK+1 − δj in the integral in Eq. (33). 27 For J = K the term (1 − (z − δ ))π has to be added to Eq. (33). j j j j

233

234

CHAPTER 4 Empirical search and consideration sets

Product j ’s predicted market share sˆj is obtained by averaging the buying probabilities Pr(j ) across consumers. The relation between market shares and sales ranks for pairs of products is modeled as follows: 1 if sj > sl ; S Ij l = 0 otherwise. Assuming that the difference between the actual and predicted market share has a iid

normally distributed measurement error, i.e., sj = sˆj + εjS with εjS ∼ N (0, τS2 /2), we get sˆj (, X) − sˆl (, X) Pr(IjSl = 1|, X) = . (34) τS The data used for estimation also contains information on the top products that were purchased conditional on searching for a specific product. These choices conditional on search correspond to the probability that product j is chosen conditional on searching an option l, i.e., J K=max(j,l) Pr(j, SK ) Pr(j |l) = , πl where K = j if j has a lower reservation utility than l and K = l otherwise, and πl is the probability of searching the lth option. Assuming that the difference between the predicted and observed conditional choice share data represents a measurement iid

error, i.e., sj |l = sˆj |l (|X) + εjCl with εjCl ∼ N (0, τC2 ), we get Pr(sj |l |, X) =

sˆj |l (, X) − sj |l τC

.

(35)

Combining the probabilities in Eqs. (32), (34), and (35) and summing over all relevant products, gives the following log-likelihood function, which is used to estimate the model by maximum likelihood: V Pr(Ij,lk = 1|, X) + Pr(IjSl = 1|, X) LL(|Y, X) = j

+

l =j k =l

l

j

l j

Pr(sj |l |, X).

j

Kim et al. (2017) estimate their model using view rank data, sales rank data, and data on choices conditional on search. They find mean and median search costs of $1.30 and $0.25, respectively, and predict median and mean search set sizes conditional on choice of 17 and 10 products, respectively. Kim et al. (2017) use their results to investigate substitution patterns in the camcorder market and conduct a market structure analysis using the framework of clout and vulnerability.

4 Recent advances: Search and consideration sets

4.2.2 Moraga-González et al. (2018) Moraga-González et al. (2018) develop a structural model of demand for car purchases in the Netherlands. The starting point for their search model is an aggregate random coefficients logit demand model in the spirit of Berry et al. (1995). However, whereas Berry et al. (1995) assume consumers have full information, MoragaGonzález et al. (2018) assume that consumers have to search to obtain information on εij in Eq. (30). As in Kim et al. (2017), consumers are assumed to search sequentially. To deal with the aforementioned search path dimensionality problem that arises because of the number of search paths that result in a purchase increases factorially in the number of products, Moraga-González et al. (2018) use insights from Armstrong (2017) and Choi et al. (2018) that make it possible to treat the sequential search problem as a discrete choice problem in which it is not necessary to keep track of which firms are visited by the consumer. Specifically, for every alternative (i.e. dealer) f , define the random variable wif = min{zif , uif }, where zif is the reservation utility for alternative f . According to Armstrong (2017) and Choi et al. (2018), the solution to the sequential search problem is equivalent to picking the alternative with the highest wif from all firms. To see that this is indeed optimal, consider the following example. Suppose there are three products, A, B, and C. The reservation and (ex-ante unobserved) realized utilities are 5 and 2 for product A, 10 and 4 for product B, and 3 and 7 for product C, respectively. Using Weitzman’s optimal search rules, the consumer first searches product B because it has the highest reservation utility, but continues searching product A because the realized utility for product B of 4 is smaller than product A’s reservation utility of 5. The consumer then buys product B because the next-best product in terms of reservation utilities, product C, has a reservation utility of 3, which means that the highest observed realized utility of 4 does not justify searching further. Now characterizing the search problem in terms of w avoids having to go through the specific ordering of firms in terms of reservation utilities and immediately reveals which product will be bought: since wB = min{10, 4} = 4 exceeds wA = min{5, 2} = 2 as well as wC = min{3, 7} = 3, product B will be purchased. Note that no additional assumptions have been made to resolve the search path dimensionality problem—all that is used is a re-characterization of Weitzman’s optimal search rules. To use this alternative characterization of Weitzman’s optimal search rules in practice, the distribution of wif has to be derived—this can be obtained by deriving the CDF of the minimum of two independent random variables: Fifw (x) = 1 − (1 − Fifz (x))(1 − Fif (x)), where Fif is the utility distribution and Fifz is the distribution of reservation utilities. Since Fifz (x) = 1 − Fifc (H (x)) with Fifc being the search cost CDF and H (x) being

235

236

CHAPTER 4 Empirical search and consideration sets

the gains from search, this can be written as Fifw (x) = 1 − Fifc (H (x))(1 − Fif (x)). The probability that consumer i buys from dealer f is then given by Pif = Pr(wig < wif ∀ g = f ) % w = Fig (wif ) fifw wif dwif , g =f

where Fifw and fifw are the CDF and PDF of wif , respectively. The probability that consumer i buys car j conditional on buying from seller f is given by exp δij Pij |f = . h∈Gf exp (δih ) The probability that buyer i buys product j is thus sij = Pij |f Pif . Note that these expressions are not necessarily closed-form. Although one can use numerical methods to directly estimate these expressions, this may slow down model estimation enough to make using it impractical. To speed up the estimation, MoragaGonzález et al. (2018) provide an alternative approach by working backwards and deriving a search cost distribution that gives a tractable closed-form expression for the buying probabilities. Specifically, a Gumbel distribution for w with location parameter δif − μif , where μif contains search cost shifters and parameters, can be obtained using the following search cost CDF: Fifc (c) =

1 − exp(− exp(−H0−1 (c) − μif )) 1 − exp(− exp(−H0−1 (c)))

,

+∞ where H0 (z) = z (u − z)dF (u) represents the (normalized) gains from search at z. Product j ’s purchase probability then simplifies to sij =

exp(δij − μif ) . J 1 + k=1 exp(δik − μig )

The closed-form expression for the purchase probabilities makes the model estimation of similar difficulty as most full information discrete choice models of demand. The estimation of the model closely resembles the estimation in Berry et al. (1995) – the most basic version of the model can be estimated using market shares, prices, product characteristics, as well as a search cost shifter (e.g. distance from the consumer to the car dealer is used in the application). The similarity with the framework in Berry et al. (1995) allows for dealing with price endogeneity: as in Berry

4 Recent advances: Search and consideration sets

et al. (1995), Moraga-González et al. (2018) allow for an unobserved quality component in the utility function, i.e., δij = αi pj + Xj βij + ξj , and allow ξj to be correlated with prices. When the model is estimated using aggregate data, the essence of the estimation method is to match predicted market shares to observed market shares, i.e., sj (ξj , θ) − sˆj = 0 for all products j , which gives a nonlinear system of equations in ξ . As in Berry et al. (1995), this system can be solved for ξ through contraction mapping. The identification assumption is that the demand unobservables are mean independent of a set of exogenous instruments W , i.e., E[ξj |W ] = 0, so that the model can be estimated using GMM while allowing for price endogeneity, as in Berry et al. (1995). An important limitation of estimating the model using data on market shares and product characteristics is that variables that enter the search cost specification have to be excluded from the utility function. For instance, distance cannot both affect utility and search costs when only purchase data (either aggregate or individual specific) is used to estimate the model. However, Moraga-González et al. (2018) show that when similar covariates appear in both the utility specification and the search cost specification, in their model it is possible to separate the effects of these common shifters using search data. Search behavior depends on reservation values, which respond differently to changes in utility than to changes in search costs; variation in observed search decisions therefore allows one the separately estimate the effect of common utility shifters and search cost shifters. To exploit this fully, Moraga-González et al. (2018) use moments based on individual purchase and search data for their main specification, which are constructed by matching predicted search probabilities to consumer-specific information on store visits from a survey. The aggregate sales data is then used to obtain the linear part of utility, following the two-step procedure in Berry et al. (2004). Search costs are found to be sizable, which is consistent with the limited search activity observed in this market. Moreover, demand is estimated to be more inelastic in the search model than in a full information setting. The counterfactual results suggest that the price of the average car would be €341 lower in the absence of search frictions.

4.2.3 Other papers Here, we discuss several other papers that have also modeled consumer search for a match value. We start by reviewing papers that assume that consumers search sequentially and then discuss papers that assume that consumers search simultaneously. Koulayev (2014) analyzes search decisions on a hotel comparison site using clickstream data, i.e. individual-level data with observed search sequences. The paper models the decision to click on one of the listed hotels or to move to the next page of search results. Koulayev (2014) derives inequality conditions that are implied by search and click decisions and which are used to derive the likelihood function for the joint click and search behavior. He finds that search costs are relatively large: median search costs are around $10 per result page. Note that Koulayev’s approach leads to

237

238

CHAPTER 4 Empirical search and consideration sets

multi-dimensional integrals for choice and search probabilities, which is manageable in settings with a small number of search options, but is potentially problematic in settings with larger choice sets. Jolivet and Turon (2018) derive a set of inequalities from Weitzman’s (1979) characterization of the optimal sequential search procedure and use these inequalities to set identify distributions of demand side parameters. The model is estimated using transaction data for CDs sold at a French e-commerce platform. Findings suggest that positive search costs are needed to explain 22 percent of the transactions and that there is heterogeneity in search costs. Dong et al. (2018) point out that search costs may lead to persistence in purchase decisions by amplifying preference heterogeneity and show that ignoring search frictions leads to an overestimation of preference heterogeneity. The authors use search and purchase decisions of individual consumers shopping for cosmetics at a large Chinese online store to separately identify preference heterogeneity from search cost heterogeneity. Two drawbacks are that the authors only observe a small proportion of consumers making repeat purchases in their data and that they have to add an error term to search costs to be able to estimate the model. Dong et al. (2018) find that the standard deviations of product intercepts are overestimated by 30 percent if search frictions are ignored, which has implications for targeted marketing strategies such as personalized pricing. Several studies have used a simultaneous instead of a sequential search framework when modeling consumer search for the match value. An advantage of simultaneous search is that search decisions are made before search outcomes are realized, which typically makes it easier to obtain closed-form expressions for purchase probabilities. However, as discussed in Section 2, the number of choice sets increases exponentially in the number of alternatives that can be searched (curse of dimensionality of the simultaneous search model). Moraga-González et al. (2015) show that tractability can be achieved by adding an EV Type I distributed choice-set specific error term to the search costs that correspond to a specific subset of firms. Murry and Zhou (2018) use this framework in combination with individual-level transaction data for new cars to quantify how geographical concentration among car dealers affects competition and search behavior. Donna et al. (2018) estimate the welfare effects of intermediation in the Portuguese outdoor advertising industry using a demand model that extends this framework to allow for nested logit preferences. Finally, Ershov (2018) develops a structural model of supply and demand to estimate the effects of search frictions in the mobile app market and uses the search model in Moraga-González et al. (2015) as a demand side model.

5 Testing between search methods In this section, we discuss the identification of the search method consumers are using. Beyond intellectual curiosity, the search conduct, i.e. search method, matters for consumers’ decision-making. It influences which and how many products a con-

5 Testing between search methods

sumer searches. More specifically, holding a consumer’s preferences and search cost constant, a consumer might end up with a different consideration set depending on whether he searches simultaneously or sequentially.28 In fact, it can be shown that, again holding a consumers’ preferences and search cost constant, a consumer who searches sequentially, on average, has a smaller search set compared to when the same consumer searches simultaneously.29 From a researcher’s perspective, this implies that estimates of consumer preferences and search cost will be biased under the incorrect assumption on the search method. This bias of consumer preferences and search cost estimates can, in turn, lead to, for example, biased (price) elasticities and different results in counterfactual simulations. For example, Honka and Chintagunta (2017) show that consideration set sizes and purchase market shares of the largest companies are over predicted under when it is wrongfully assumed that consumers search sequentially. For a long time, it was believed that the search method is not identified in observational data. In a first attempt to test between search methods, Chen et al. (2007) develop nonparametric likelihood ratio model selection tests which allow them to test between simultaneous and sequential search models. Using the Hong and Shum (2006) framework and data on prices, they do not find significant differences between the simultaneous and sequential search models using the usual significance levels in their empirical application. Next, we discuss two papers that developed tests to distinguish between simultaneous and sequential search using individual-specific data on search behavior. In Section 5.1, we discuss De los Santos et al. (2012), who use data on the sequence of searches to discriminate between simultaneous and sequential search, whereas in Section 5.2, we discuss Honka and Chintagunta (2017), who develop a test that does not require the researcher to observe search sequences. Jindal and Aribarg (2018) apply the key identification insights from De los Santos et al. (2012) and Honka and Chintagunta (2017) to a situation of search with learning (see also Section 6.1). Using their experimental data, the authors find that, under the assumption of rational price expectations, consumers appear to search simultaneously, while the search patterns are consistent with sequential search conditional on consumers’ elicited price beliefs. This finding highlights the importance of the rational expectations assumption for search method identification.

5.1 De los Santos et al. (2012) One of the objectives of De los Santos et al. (2012) is to provide methods to empirically test between sequential and simultaneous search. The authors use data on web browsing and online purchases of a large panel of consumers, which allows them 28 If a consumer has different consideration sets under simultaneous and sequential search, he might also

end up purchasing a different product. 29 This is the case because, under sequential search, a consumer can stop searching when he gets a good

draw early on.

239

240

CHAPTER 4 Empirical search and consideration sets

to observe the online stores consumers visited as well as the store they ultimately bought from. Thus De los Santos et al. (2012) observe the sequence of visited stores, which is useful for distinguishing between sequential and simultaneous search. De los Santos et al. (2012) first investigate the homogeneous goods case with a market-wide price distribution. Recall that, under simultaneous search, a consumer samples all alternatives in his search sets before making a purchase decision. In contrast, under sequential search, a consumer stops searching as soon as he gets a price below his reservation price (see also Section 2.2). Since the latter implies that the consumer purchases from the last store he searched, revisits should not be observed when consumers search sequentially, while they may be observed when consumers search simultaneously (a consumer may revisit a previously searched store to make a purchase). Whether or not consumers revisit stores can be directly explored with data on search sequences. De los Santos et al. (2012) find that approximately one-third of consumers revisit stores – a finding that is inconsistent with a model of sequential search for homogeneous goods. Recall that the no-revisit property of the sequential search model for homogeneous goods does not necessarily apply to more general versions of the sequential search model, including models in which stores are differentiated. Specifically, if goods are differentiated, the optimal sequential search strategy is to search until a utility level is found that exceeds the reservation utility of the next-best alternative. As pointed out in Section 2, when products are differentiated, reservation utilities are generally declining, so a product that was not good enough early on in the search may pass the bar after a number of different products have been inspected, triggering a revisit to make a purchase. In the following, we show a simple example of how that can happen. Suppose a consumer wants to purchase one of five products denoted by A, B, C, D, and E. Table 2 shows (realized) utilities u and reservation utilities z for all five alternatives. Note that the alternatives are ranked in a decreasing order of their reservation utilities z. In this example, the consumer first searches A – the alternative with the highest reservation utility. Since the reservation utility of the next-best alternative B is larger than the highest utility realized so far, i.e. zB > uA , he continues to search. The consumer also decides to continue searching after having sampled options B and C since zC > max{uA , uB } and zD > max{uA , uB , uC }. However, after having searched alternative D, the consumer stops because the reservation utility of option E is smaller than the maximum utility realized so far, i.e., zE < max{uA , uB , uC , uD }.30 The maximum realized utility among the searched options is offered by alternative B with uB = 9. The consumer therefore revisits and purchases B. Thus, for differentiated goods, revisits can happen when consumers search sequentially. For the researcher, this means that evaluating the revisit behavior of consumers does not help to discriminate between simultaneous and sequential search in such a setting. De los Santos et al. (2012) point out that a more robust difference between sequential and simultaneous search is that the search behavior depends on observed search 30 Here the assumption of perfect recall made in Section 2.2 comes into play.

5 Testing between search methods

Table 2 Example. Option A B C D E

Utility (u) 7 9 8 6 11

Reservation utility (z) 15 13 11 10 7

outcomes under the former, but not under the latter search method. This insight forms the basis of a second test which uses the following idea: if consumers search sequentially and know the price distribution(s), they should be more likely to stop searching after getting a below-mean price draw as opposed to an above-mean price draw. The reason is as follows: due to the negative price coefficient in the utility function, a below-mean price draw results in an above-mean utility, i.e. u ≥ E[u], and an abovemean price draw results in a below-mean utility, i.e. u ≤ E[u]. Holding everything else constant, the consumer is more likely to stop the search with an above-mean utility draw than a below-mean utility draw since the stopping rule is more likely to be satisfied, i.e. the maximum utility among the searched options is more likely to be larger than the maximum reservation utility among the non-searched options. Based on this idea, the search method can be (empirically) identified as follows: if consumers search sequentially, consumers who get a below-mean price draw should be significantly more likely to stop searching after that below-mean price draw. To address store differentiation, this test can be carried out within a store, i.e. if a product has a high price relative to the store’s price distribution, sequentially searching consumers are more likely to continue searching, while the high price (relative to the store’s price distribution) should not affect the behavior of simultaneously searching consumers. In their empirical application, De los Santos et al. (2012) do not find any evidence for search decisions being dependent on observed prices – even within stores. They conclude that a simultaneous search model fits the data better than a sequential search model.

5.2 Honka and Chintagunta (2017) As stated previously, Honka and Chintagunta (2017) also study search method identification. This paper provides an analytical proof for search method identification. It also shows that it is not necessary to observe the sequences of consumers’ searches (which was a crucial assumption in De los Santos et al., 2012); only data on search sets, purchases, and purchase prices are needed for search method identification. Suppose prices follow some well-defined(potentially company-specific and/or p consumer-specific) distributions and define Pr p < μij = λ, i.e. the probability that a price draw is below the expected price is λ. Further define the event X = 1 as a below-price expectation price draw and X = 0 as an above-price expectation price draw. Recall that, under simultaneous search, the consumer commits to searching a

241

242

CHAPTER 4 Empirical search and consideration sets

set Si consisting of ki companies. Then Honka and Chintagunta (2017) calculate the expected proportion of below-price expectation prices in a consumer’s consideration set of size ki as ⎤ ⎡ ki ki λki 1 1 Xm ⎦ = E [Xm ] = = λ. (36) E⎣ ki ki ki m=1

m=1

Thus, if consumers search simultaneously, a researcher can expect λ% of the price draws in consumers’ consideration sets to be below and (1 − λ)% to be above the expected price(s). The crucial ingredients for identification are that the researcher p observes the means of the price distributions μij , the actual prices in consumers’ consideration sets pij and the probability of a price draw being below its mean λ. Under sequential search, Honka and Chintagunta (2017) show that, for both homogeneous and differentiated goods allowing for consumer- and company-specific search costs, the expected proportion of below-price expectation prices in consumers’ consideration sets of size one, X1 , is always larger than λ, i.e. X1 > λ, under the necessary condition that a positive number of consumers is observed to make more than one search. The characteristic price patterns for simultaneous and sequential search described above hold for all models that satisfy the following set of assumptions: (a) prices are the only source of uncertainty for the consumer which he resolves through search; (b) consumers know the distribution of prices and have rational expectations for these prices; (c) price draws are independent across companies; (d) there is no learning about the price distribution from observing other variables (e.g. advertising); (e) (search costs are sufficiently low so that) all consumers search at least once; and (f) consumers have non-zero search costs. Models that satisfy this set of assumptions include (1) models for homogeneous goods, (2) models for differentiated products, (3) models that include unobserved heterogeneity in preferences and/or search costs, (4) models with correlations among preferences and search costs, and (5) models with observed heterogeneity in price distribution means μij . On the other hand, the researcher would not find the characteristic price patterns when there is unobserved heterogeneity in the price distribution means as the researcher would no longer be able to judge whether a price draw is above or below the mean. Note also that the identification arguments in Honka and Chintagunta (2017) are based on the first moments of prices; in principle there could be identification rules based on higher moments as well. Lastly, Honka and Chintagunta (2017) discuss the modeling assumptions stated in the previous paragraph and to what extent the search method identification results depend on them. Assumptions (a) through (e) are standard in both the theoretical and empirical literature on price search. With regard to assumption (f) that consumers have non-zero search costs, note that search costs have to only the marginally larger than zero for search method identification to hold in all model specifications. Alternatively, if the researcher believes that the assumption of non-zero search costs is not appropriate in an empirical setting, search method identification is also given under

6 Current directions

the assumption that the search cost distribution is continuous, i.e. has support, from 0 to a positive number A > 0.31

6 Current directions In this section, we review some of the current directions in which the search literature is moving. As discussed in Section 2, this largely involves relaxing some of the rather stringent assumptions made in that section and/or developing new modeling approaches to understand more detailed data on consumer search especially from the online environment. We start by discussing papers which study search with learning, i.e. research that relaxes the assumption that consumers know the true price or match value distribution (Assumption 2 in Section 2.1). Instead, these papers try to characterize optimal consumer search in the presence of concurrent learning of the price or match value distribution. In the following section, we discuss papers which investigate search for multiple attributes, e.g. price and match value. In Section 6.3, we review papers that study the role of advertising in a market which is characterized by consumer search. In the three subsections that follow, we describe research that focuses on how consumers search in an online environment. This includes papers that look at search and rankings in Section 6.4, papers that try to quantify the optimal amount of information shown to consumers in Section 6.5, and papers that work with granular, i.e., extremely detailed, data on consumer search in Section 6.6. We conclude this section by discussing papers that investigate the intensive margin of search, i.e. search duration, allowing for re-visits of sellers in Section 6.7 (relaxing Assumptions 3 and 5 of Sections 2.1 and 2.2) and papers that incorporate dynamic aspects of consumer search in Section 6.8.

6.1 Search and learning A standard assumption in the consumer search literature is that consumers know the distribution from which they sample (Assumption 2 in Section 2.1). However, starting with Rothschild (1974), several theoretical papers have studied search behavior in the case that consumers have uncertainty about the distribution from which they sample (Rosenfield and Shapiro, 1981; Bikhchandani and Sharma, 1996). Although the empirical literature has largely followed Stigler’s (1961) initial assumption that the distribution is known, several recent studies have departed from it and assume that consumers learn about the distribution of prices or utilities while searching. To quickly recap, Rothschild (1974) studies optimal search rules when individuals are searching from unknown distributions and use Bayesian updating to revise their 31 Note that it is not required that the search cost distribution is continuous over its full range. It is only

required that it is continuous over the interval 0 to A > 0. The search method identification goes through when a search cost distribution has support e.g. from 0 to A and from B to C with C ≥ B > A > 0.

243

244

CHAPTER 4 Empirical search and consideration sets

priors when new information arrives. An important example in his paper is the case of a Dirichlet prior distribution: if prior beliefs follow a Dirichlet distribution, then the reservation value property continues to hold, i.e. the qualitative properties of optimal search rules that apply to models in which the distribution is known carry over to the case of an unknown distribution. Koulayev (2013) uses this result to derive a model of search for homogeneous products with Dirichlet priors that can be estimated using only aggregate data such as market shares and product characteristics. An attractive feature is that Dirichlet priors imply that search decisions can be characterized by the number of searches carried out to date as well as the best offer observed up to that point. This feature simplifies integrating out search histories (which are unobserved in Koulayev’s, 2013 application) and makes it possible to derive a closed-form expression for ex-ante buying probabilities. The Dirichlet distribution is a discrete distribution. In settings in which consumers search for a good product match, which is typically modeled as an IID draw from a continuous distribution, a continuous prior may be more appropriate. Bikhchandani and Sharma (1996) extend the Rothschild (1974) model to allow for a continuous distribution of offerings by using a Dirichlet process – a generalization of the Dirichlet distribution. Dirichlet process priors also imply that the only parts of the search history that matter for search decisions are the identity of the best alternative observed so far and the number of searches to date, which simplifies the estimation of such a model. The search model in Häubl et al. (2010) features learning of this type and is empirically tested using data from two experiments. De los Santos et al. (2017) also use this property to develop a method to estimate search costs in differentiated products markets. Specifically, the paper uses observed search behavior to derive bounds on a consumer’s search cost: if a consumer stops searching, this implies that she found a product with a higher realized utility than her reservation utility. If she continues searching, her search costs should have been lower than the gains from search relative to the best utility found so far. Learning enters through the equation describing the gains from search, i.e., ∞ W (u − uˆ it ) · h(u) du, (37) G(uˆ it ) = W + t uˆ it where h(u) is the density of the initial prior distribution, W is the weight put on the initial prior, and t represents the number of searches to date. The term WW+t differentiates Eq. (37) from the non-learning case and reflects the updating process of consumers. Intuitively, every time a utility lower than uˆ it is drawn, the consumer puts less weight on offers that exceed uˆ it . If t = 0 and h(u) corresponds to the utility distribution, then Eq. (37) equals the gains from search equation in the standard sequential search model. Dzyabura and Hauser (2017) study product recommendations and point out that, in an environment in which consumers are learning about their preference weights while searching, it may not be optimal to recommend the product with the highest probability to be chosen or the product with the highest option value. Instead, the optimal recommendation system encourages consumers to learn by suggesting products

6 Current directions

with diverse attribute levels, undervalued products, and products that are most likely to change consumers’ priors. Synthetic data experiments show that recommendation systems that have these elements perform well. Jindal and Aribarg (2018) conduct a lab experiment during which consumers search and learn the price distribution for a household appliance at the same time. The authors elicit each consumer’s belief about the price distribution before the first search and after every search the consumer decides to make. Jindal and Aribarg (2018) observe that consumers update their beliefs about the price distribution while searching. Using their experimental data, the authors show that not accounting for belief updating or assuming rational expectations biases search cost estimates. The direction of the bias depends on the position of prior beliefs relative to the true price distribution. Further, Jindal and Aribarg (2018) find that accounting for the means of the belief distribution mitigates the bias in search cost estimates substantially; the standard deviation of the belief distribution has a relative small impact on the distribution of search costs, and hence, the bias. Most of the previously mentioned papers (e.g. Rothschild, 1974; De los Santos et al., 2017; Dzyabura and Hauser, 2017) assume that consumers are learning while searching and then derive implications for optimal consumer search and/or supply side reactions for such an environment. Crucial, unanswered questions remain: with observational data, is it possible to identify whether consumers know or learn the distribution of interest while searching? What data would be necessary to do so and what would be the identifying data patterns? Furthermore, how quickly do consumers learn? Can companies influence the learning process and how? These and other unanswered questions related to concurrent search and learning represent ample opportunity for future research.

6.2 Search for multiple attributes So far, work discussed in this chapter has modeled consumers’ search for the specific value of a single attribute, e.g. price or match value. However, in practice, consumers might be searching to resolve uncertainty about more than one attribute.32 Researchers have developed different approaches to address this issue. For example, De los Santos et al. (2012) derive a closed-form solution for the benefit of searching for the case that consumers search simultaneously for both prices and epsilon. However, their solution requires the researcher to assume that both prices and epsilon follow EV Type I distributions and that both distributions are independent. Chen and Yao (2017) and Yao et al. (2017) pursue a different approach: while consumers search for multiple attributes in their sequential search models, in both papers, the authors assume that consumers know the joint distribution of these attributes. Consumers then search to resolve uncertainty about the (one) joint distribution. Thus Chen and Yao (2017) and Yao et al. (2017) model search for multiple attributes by 32 Match value search models sometimes describe the match value as a summary measure for multiple

attributes.

245

246

CHAPTER 4 Empirical search and consideration sets

imposing an additional assumption, i.e. that consumers know the joint distribution of the attributes, which allows them to apply the standard solution for a sequential search model for a single attribute. While assuming that consumers know the joint distribution of multiple attributes or developing closed-form solutions under specific distributional assumptions are important steps forward, ample research opportunities remain to develop empirical models of search for multiple attributes with less stringent assumptions. On a different note, Bronnenberg et al. (2016) describe the values of attributes consumers observe while searching. However, an unanswered question is as to how many and which attributes a consumer searches for, i.e. resolves uncertainty about, versus simply observes their values because they are shown to the consumer by default. Field experiments might help shed light on this issue.

6.3 Advertising and search Researchers have long been interested in how advertising affects consumers’ decision-making in markets that are characterized by consumers’ limited information, i.e. markets in which consumers search and form consideration sets. The consideration set literature (Section 3.1) has often modeled consideration as a function of advertising and has often found advertising to have a significant effect on consideration, often larger than its effect on choice. For example, Terui et al. (2011) report advertising to significantly affect consideration but not choice. In the economics literature, by assumption, Goeree (2008) models advertising as affecting consideration but not choice. In a recent paper, using data on individual consumers’ awareness, consideration, and choices in the auto insurance industry over a time period of nine years, Tsai and Honka (2018) find advertising to significantly affect consumer awareness, but not conditional consideration or conditional choice.33 Further, the authors report that the advertising content that leads to consumers’ increased awareness is of non-informational nature, i.e. fun/humorous and/or brand name focused, implying that the effect on awareness is coming from non-informational content leading to better brand recall. Honka et al. (2017) develop a structural model which describes the three stages of the purchase process: awareness, consideration, and choice. It is one of the first papers that accounts for endogeneity – here: of advertising – within a model of consumer search. Potential endogeneity of advertising may arise in any or all stages of the purchase process and is addressed using the control function approach (Petrin and Train, 2010). The model is calibrated with individual-level data from the U.S. retail banking industry in which the authors observe consumers’ (aided) awareness and

33 Tsai and Honka (2018) observe unaided and aided awareness sets to, on average, contain 4.15 and 12.02, respectively. Looking at shoppers and non-shoppers separately, as expected, shoppers have larger unaided and aided awareness sets than non-shoppers. Consideration sets, on average, contain 3.12 brands (which includes the previous insurance provider).

6 Current directions

consideration sets as well as their purchase decisions. Consumers are, on average, aware of 6.8 banks and consider 2.5 banks. In modeling consumer behavior, Honka et al. (2017) view awareness as a passive occurrence, i.e. the consumer does not exert any costly effort to become aware of a bank. A consumer can become aware of a bank by, for example, seeing an ad or driving by a bank branch. Consideration is an active occurrence, i.e. the consumer exerts effort and incurs costs to learn about the interest rates offered by a bank. The consumer’s consideration set is thus modeled as the outcome of a simultaneous search process given the consumer’s awareness set (á la Honka, 2014). And finally, purchase is an active, but effortless occurrence in which the consumer chooses the bank which gives him the highest utility. The consumer’s purchase decision is modeled as a choice model given the consumer’s consideration set. Consideration and choice are modeled in a consistent manner by specifying the same utility function for both stages. This assumption is supported by Bronnenberg et al. (2016) who find that consumers behave similarly during the search and purchase stages. Honka et al. (2017) find that advertising primarily serves as an awareness shifter. While the authors also report that advertising significantly affects utility, the latter effect is much smaller in terms of magnitude. Advertising makes consumers aware of more options; thus consumers search more and find better alternatives than they would otherwise. In turn, this increases the market share of smaller banks and makes the U.S. banking industry more competitive. Further study of how advertising interacts with the process through which consumers search/consider products is a potentially very fruitful area for future research. For example, whether advertising in which explicit comparisons with competing products’ attributes and/or prices are made enlarges consumers’ consideration sets is a very interesting question (though how consumers may evaluate firms’ choices of which competitors they compare themselves against is a very interesting question for the literature on strategic information transmission). More broadly, understanding and quantifying the mechanisms through which different types of advertising affect the “purchase funnel” is likely to benefit from the availability of detailed data sets especially on online shopping behavior.

6.4 Search and rankings Most of the papers discussed so far assume that the order in which consumers obtain search outcomes is either random or, in the case of differentiated products, the outcome of a consumer’s optimal search procedure. However, in certain markets the order in which alternatives are searched may be largely determined by an intermediary or platform. For instance, search engines, travel agents, and comparison sites all present their search results ordered in a certain way and, as such, affect the way in which consumers search. In this section, we discuss several recent papers that study how rankings affect online search behavior. A particular challenge when estimating how rankings affect search is that rankings are endogenous. More relevant products are typically ranked higher by the intermediary. This endogeneity makes it difficult to estimate the causal effect of rankings

247

248

CHAPTER 4 Empirical search and consideration sets

on search: being ranked higher makes purchase or clicking more likely, which inflates the effect of relevance or quality. Ursu (2018) deals with this simultaneity problem by using data from a field experiment run by the online travel agent Expedia. Specifically, she compares click and purchase decisions of consumers who were either shown Expedia’s standard hotel ranking or a random ranking. Her findings suggest that rankings affect consumers’ search decisions in both settings, but conditional purchase decisions are only affected when hotels are ranked according to the Expedia ranking. This finding implies that the position effect of rankings is overestimated when rankings are not randomly generated. De los Santos and Koulayev (2017) focus on the intermediary’s decision of how to rank products. The authors propose a ranking method that optimizes click-through rates: it takes into account that, even though the intermediary typically knows very little about the characteristics of its consumers, the intermediary observes search refinement actions as well as other search actions. De los Santos and Koulayev (2017) find that their proposed ranking method almost doubles click-through rates for a hotel booking website in comparison to the website’s default ranking. Using an analytical model, Ursu and Dzyabura (2018) also study how intermediaries should rank products to maximize searches or sales. The authors incorporate the aspect that search costs increase for lower-ranked products. Contrary to common practice, Ursu and Dzyabura (2018) find that intermediaries should not always show the product with the highest expected utility first. Most online intermediaries give consumers the option to use search refinement tools when going through search results. These tools allow consumers to sort and filter the initial search rankings according to certain product characteristics, and can therefore significantly alter how products are ranked in comparison to the initial search results. Chen and Yao (2017) develop a sequential search model that incorporates consumers’ search refinement decisions. Their model is estimated using clickstream data from a hotel booking website. A key finding is that refinement tools result in 33% more searches leading a 17% higher utility of purchased products. The intersection of the research areas on rankings and consumer search offers plenty of opportunities for future work. For example, a more detailed investigation of the different types of rankings (e.g. search engine for information, intermediary offering products by different sellers, retailer selling own products) guided by different goals (e.g. maximize click-through, sales, profits) might result in different optimal rankings. Further, do ranking entities want consumers to search more or less? And lastly, some ranking entities offer many filtering and refinement tools, others do not at all or only a small number. The optimality of these decisions and reasons behind them remain open questions.

6.5 Information provision The Internet provides a unique environment in which companies can relatively easily and inexpensively change which and how much information to make (more or less) accessible to consumers and, through these website design decisions, to influence consumer search behavior.

6 Current directions

Gu (2016) develops a structural model of consumer search describing consumer behavior on the outer and the inner layer of a website. For example, on a hotel booking website, the outer layer refers to the hotel search results page and the inner layer refers to the individual hotel pages. Gu (2016) studies how the amount of information (measured by entropy) displayed on each layer affects consumer search behavior. The amount of information on the outer layer influences the likelihood of searching on the inner layer, i.e. clicking on the hotel page. More information on the outer layer reduces the need to visit the inner layer. At the same time, more information on the outer layer makes it more complex to process, which could decrease the likelihood of consumers searching. Thus there is a cost and a benefit to information on each layer. Gardete and Antill (2019) propose a dynamic model in which consumers search over alternatives with multiple characteristics. They apply their model to click-stream data from the website of a used car seller. In counterfactual exercises, Gardete and Antill (2019) predict the effects of different website designs and find that the amount of information and the characteristics of the information shown to consumers upfront affect search and conversion rates. More research is needed to understand how different information provision strategies affect consumer search and equilibrium outcomes. This is a domain where the burgeoning recent theoretical literature on information design (e.g. Kamenica and Gentzkow, 2011) may provide guideposts for empirical model building.

6.6 Granular search data In most studies that use online search data, the search data are either aggregated to the domain level or are restricted to only one retailer. For instance, the comScore data used in De los Santos et al. (2012) and De los Santos (2018) only allow the researcher to observe which domains have been visited, but not the browsing activity within a specific domain. This data limitation means that search is only partially observed, which could lead to biased estimates of search costs. Bronnenberg et al. (2016) use a much more detailed data set in which browsing is available at the URL-level. Their data track consumers’ online searches across and within different websites and are used to provide a detailed description of how consumers search when buying digital cameras. Their main finding is that consumers focus on a very small attribute space during the search and purchase process. This pattern can be interpreted as supporting the structural demand model assumption that consumers have the same utility for search and purchase. Moreover, consumers search more extensively than previous studies with more aggregate search data have found: consumers search, on average, 14 times prior to buying a digital camera. It is typically much easier to obtain search data for online than for brick-andmortar environments: online browsing data can be used as a proxy for search, whereas data on how consumers move within and across brick-and-mortar stores is typically not available. Seiler and Pinna (2017) use data obtained from radio-frequency identification tags that are attached to shopping carts to infer how consumers search within

249

250

CHAPTER 4 Empirical search and consideration sets

a grocery store.34 Specifically, the data tell them how much time consumers spent in front of a shelf, which is then used as a proxy for search duration. Using the consumers’ walking speed and basket size as instruments for search duration, the authors find that each additional minute spent searching results in a price paid that is lowered by $2.10. In a related paper, Seiler and Yao (2017) use similar path-track data to study the impact of advertising. They find that, even though advertising leads to more sales, it does not bring more consumers to the category being advertised. This finding suggests that advertising only increases conversion conditional on visiting the category. Moreover, Seiler and Yao (2017) find that search duration, i.e. time spent in front of a shelf, is not affected by advertising. This emerging literature shows that access to more and more detailed data on consumers’ actions on and across websites can enable researchers to get a detailed look into how consumers are actually searching for products. Further attempts to connect these exciting data sets to existing theoretical work on search, and collaboration between theorists and empiricists to build search models that better describe empirical patterns appear to be avenues for fruitful future research.

6.7 Search duration In many instances, consumers are observed to re-visit the same seller multiple times before making a purchase – a pattern that cannot be rationalized by the standard simultaneous and sequential search models (see Section 2). To explain this empirical pattern, Ursu et al. (2018) propose that consumers only partially resolve their uncertainty through a single search thus necessitating multiple searches of the same product if the consumer desires to know even more precisely about the product before making a purchase decision. Practically, Ursu et al. (2018) combine approaches from two streams of literature: the literature on consumer learning (using Bayesian updating) and the consumer search literature (more specifically, sequential search). Suppose a consumer wants to resolve the uncertainty about his match value with a vacation package. He has some prior belief or expectation of the match value. After searching once, e.g. by reading a review, the consumer receives a signal about the match value and updates his belief about it. The more the consumer searches, i.e. the more signals he receives, the less uncertainty he has about the match value. Thus, at each point in time during the search process, intuitively speaking, the consumer decides whether to search a previously unsearched option or to spend another search on a previously already searched option – allowing the model to capture the empirical observation of re-visits. A crucial question in this context relates to the characterization of optimal behavior in such a model of sequential search, i.e. how do consumers optimally decide 34 An alternative way to capture search behavior is by using eye-tracking equipment. For instance,

Stüttgen et al. (2012) use shelf images to analyze search behavior for grocery products and test whether consumers are using satisficing decision rules (see also Shi et al., 2013 for a study that analyzes information acquisition using eye-tracking data in an online environment).

6 Current directions

which product to search next (including the same), when to stop, and which one to purchase. For the standard sequential search model, Weitzman (1979) developed the well-known selection, stopping, and choice rules. For the model of sequential search with re-visits, Ursu et al. (2018) point to Chick and Frazier (2012) who developed analogue selection, stopping, and choice rules. The authors take the theoretical results by Chick and Frazier (2012) and apply them within their empirical context. The latter is a restaurant review site. Ursu et al. (2018) observe browsing behavior on this website and assume that a unit of search is represented by spending one minute on a restaurant page. Using this definition, the authors document that both the extensive and the intensive margins of search matter. They find that consumers search very few restaurants, but those that are searched are searched extensively. Studying search intensity is a new research subfield with plenty of opportunities for future research. Search intensity plays an important role not only in the online, but also in the offline environment. For example, many consumers take multiple test drives with the same car before making a purchase decision or look at a piece of furniture multiple times before buying it. These are big-ticket items and a better understanding of the search process might help both consumers and sellers. From a methodological perspective, collaboration between theorists and empiricists to rationalize search intensity as well as search sequences appears to be an interesting economic modeling challenge that is relatively unexplored.

6.8 Dynamic search All papers discussed so far assume static consumer behavior. This assumption is largely driven by data constraints: most papers have access to cross-sectional individual-level data or aggregate data. However, in many instances, consumers are observed to conduct multiple search spells over the course of a month or a year. For example, consumers might search for yoghurt once a week or for laundry detergent once a month. Seiler (2013) develops a structural model for storable goods in which it is costly for consumers to search, i.e. consumers have limited information. The model is estimated using data on purchases for laundry detergent. Seiler (2013) models the buying process as consisting of two stages: in the first stage, the consumer has to decide whether to search. Search is of the all-or-nothing type, i.e. the consumer can either search and obtain information on all products or not search at all. Only consumers who have searched can purchase. In the second stage, which is similar to a standard dynamic demand model for storable goods, the consumer then decides which product to buy. Dynamic models of demand are typically computationally burdensome and adding an extra layer in which consumers make search decisions will make the model more complex. To obtain closed-form solutions for the value functions in the search and purchase stages, Seiler (2013) adds a separate set of errors to each stage, which are unknown to the consumer before entering that stage. Moreover, he only allows for limited preference heterogeneity and the model does not allow price expectations to be affected by past price realizations, which can be a limitation for some markets.

251

252

CHAPTER 4 Empirical search and consideration sets

Seiler (2013) finds that search frictions play an important role in the market for laundry detergent: consumers are unaware of prices in 70% of shopping trips. Further, lower search costs by 50% would increase the elasticity of demand from −2.21 to −6.56. Pires (2016) builds on Seiler (2013), but instead of all-or-nothing search behavior, Pires (2016) allows consumers to determine which set of products to inspect using a simultaneous search strategy. The author only has access to data on purchases and adds a choice-set specific error term to the one-period flow utility from searching to deal with the curse of dimensionality of the simultaneous search model. The error term follows an EV Type I distribution and can be integrated out to obtain closedform expressions. Pires (2016) finds that search costs are substantial. Further, the effects of ignoring search on price elasticity depend on how often a product appears in consumers’ consideration sets. Since many products are purchased multiple times, understanding search behavior across purchase instances is clearly an important avenue for further research. This is an area where models of search come into contact with models of switching costs or customer inertia as well; thus we anticipate further work in this important interface.

7 Conclusions Although the theoretical literature on search is almost sixty years old, empirical and econometric work in the area continues to develop at a very fast pace thanks to the ever increasing availability of data on actual search behavior. Abundant data on consumers’ web searching/browsing/shopping behavior has become available through multiple channels on the Internet, along with the ability to utilize field experiments and randomized controlled trials. The availability of data is not restricted to the online/e-commerce domain; there is increasingly more data on consumer shopping patterns across and within physical retail stores. Data on consumer search and choice patterns is also becoming readily available in finance, insurance, and energy markets. On the public side of the economy, too, search and choice data are becoming increasingly more available (e.g. in educational choices – Kapor, 2016, health insurance – Ketcham et al., 2015, and public housing – van Dijk, 2019), creating more applications of econometric models that incorporate search and consideration, and also bring methodological challenges of their own. While our focus was on econometric methods to test between existing theories of search and to estimate the structural parameters of these models, the abundance and complexity of search data also pushes search theory to new frontiers. We also note that, while models of optimizing agents with rational expectations have been used fruitfully to make sense of search data, “behavioral” theories of search that combine insights from both economics and psychology may prove very fruitful in explaining observed patterns in the data. The mechanisms using which consumers search for and find the products they eventually purchase are important determinants of equilibrium outcomes in product

References

markets. While most of the empirical search models have focused on the demand side, investigating how firms optimize in the presence of consumer search needs further investigation. As we noted in Section 6.5, how firms choose to design their websites or advertising strategies to aid/direct search is a very important and relatively unexplored area of research, at least from the perspective of economics. Firms spend a lot of time and resources to design and optimize their websites or mobile interfaces. Quantifying how these design decisions affect market outcomes requires careful modeling of the (strategic) interaction between consumer and firm behavior. Another interesting area of research, highlighted in Section 3.1, is to investigate whether and how one can identify the presence of information/search frictions using conventional data sets (on prices, quantities, and product attributes) that do not contain information on search or credible variation in consumer information. Along with such investigations, efforts to supplement such conventional data sets with data on actual search or credible shifters of search/consumer information will help enrich the analyses that can be done with such data sets.

References Abaluck, Jason, Adams, Abi, 2018. What Do Consumers Consider Before They Choose? Identification from Asymmetric Demand Responses. Working Paper. Yale University. Allen, Jason, Clark, Robert, Houde, Jean-François, 2013. The effect of mergers in search markets: evidence from the Canadian mortgage industry. The American Economic Review 104 (10), 3365–3396. Allen, Jason, Clark, Robert, Houde, Jean-François, 2018. Search frictions and market power in negotiated price markets. Journal of Political Economy. Forthcoming. Allenby, Greg, Ginter, James, 1995. The effects of in-store displays and feature advertising on consideration sets. International Journal of Research in Marketing 12 (1), 67–80. Anderson, Simon, Renault, Régis, 2018. Firm pricing with consumer search. In: Corchón, L., Marini, M. (Eds.), Handbook of Game Theory and Industrial Organization, vol. 2. Edward Elgar, pp. 177–224. Andrews, Rick, Srinivasan, T., 1995. Studying consideration effects in empirical choice models using scanner panel data. Journal of Marketing Research 32 (1), 30–41. Aribarg, Anocha, Otter, Thomas, Zantedeschi, Daniel, Allenby, Greg, Bentley, Taylor, Curry, David, Dotson, Marc, Henderson, Ty, Honka, Elisabeth, Kohli, Rajeev, Jedidi, Kamel, Seiler, Stephan, Wang, Xin, 2018. Advancing non-compensatory choice models in marketing. Customer Needs and Solutions 5 (1), 82–91. Armstrong, Mark, 2017. Ordered consumer search. Journal of the European Economic Association 15 (5), 989–1024. Ausubel, Lawrence, 1991. The failure of competition in the credit card market. The American Economic Review 81 (1), 50–81. Baye, Michael, Morgan, John, Scholten, Patrick, 2006. Information, search, and price dispersion. In: Handbook on Economics and Information Systems, vol. 1, pp. 323–377. Ben-Akiva, Moshe, Lerman, Steven, 1985. Discrete Choice Analysis. MIT Press, Cambridge, MA. Berry, Steven, 1994. Estimating discrete-choice models of product differentiation. The Rand Journal of Economics 25 (2), 242–262. Berry, Steven, Levinsohn, James, Pakes, Ariel, 1995. Automobile prices in market equilibrium. Econometrica 63 (4), 841–890. Berry, Steven, Levinsohn, James, Pakes, Ariel, 2004. Differentiated products demand systems from a combination of micro and macro data: the new car market. Journal of Political Economy 112 (1), 68–105.

253

254

CHAPTER 4 Empirical search and consideration sets

Bikhchandani, Sushil, Sharma, Sunil, 1996. Optimal search with learning. Journal of Economic Dynamics and Control 20 (1–3), 333–359. Blevins, Jason, Senney, Garrett, 2019. Dynamic selection and distributional bounds on search costs in dynamic unit-demand models. Quantitative Economics. Forthcoming. Bronnenberg, Bart, Kim, Jun, Mela, Carl, 2016. Zooming in on choice: how do consumers search for cameras online? Marketing Science 35 (5), 693–712. Bronnenberg, Bart, Vanhonacker, Wilfried, 1996. Limited choice sets, local price response, and implied measures of price competition. Journal of Marketing Research 33 (2), 163–173. Brown, Jeffrey, Goolsbee, Austan, 2002. Does the Internet make markets more competitive? Evidence from the life insurance industry. Journal of Political Economy 110 (3), 481–507. Burdett, Kenneth, Judd, Kenneth, 1983. Equilibrium price dispersion. Econometrica 51 (4), 955–969. Carlson, John, Preston McAfee, R., 1983. Discrete equilibrium price dispersion. Journal of Political Economy 91 (3), 480–493. Chade, Hector, Smith, Lones, 2005. Simultaneous Search. Working Paper No. 1556. Department of Economics, Yale University. Chade, Hector, Smith, Lones, 2006. Simultaneous search. Econometrica 74 (5), 1293–1307. Chen, X., Hong, Han, Shum, Matt, 2007. Nonparametric likelihood ratio model selection tests between parametric likelihood and moment condition models. Journal of Econometrics 141 (1), 109–140. Chen, Yuxin, Yao, Song, 2017. Sequential search with refinement: model and application with click-stream data. Management Science 63 (12), 4345–4365. Chiang, Jeongwen, Chib, Siddhartha, Narasimhan, Chakravarthi, 1999. Markov chain Monte Carlo and models of consideration set and parameter heterogeneity. Journal of Econometrics 89 (1–2), 223–248. Chick, Stephen, Frazier, Peter, 2012. Sequential sampling with economics of selection procedures. Management Science 58 (3), 550–569. Ching, Andrew, Erdem, Tulin, Keane, Michael, 2009. The price consideration model of brand choice. Journal of Applied Econometrics 24 (3), 393–420. Choi, Michael, Dai, Anovia Yifan, Kim, Kyungmin, 2018. Consumer search and price competition. Econometrica 86 (4), 1257–1281. Clay, Karen, Krishnan, Ramayya, Wolff, Eric, 2001. Prices and price dispersion on the web: evidence from the online book industry. Journal of Industrial Economics 49 (4), 521–539. Clemons, Eric K., Hann, Il-Horn, Hitt, Lorin M., 2002. Price dispersion and differentiation in online travel: an empirical investigation. Management Science 48 (4), 534–549. Crawford, Gregory, Griffith, Rachel, Iaria, Alessandro, 2018. Preference Estimation with Unobserved Choice Set Heterogeneity Using Sufficient Sets. Working Paper. University of Zurich. De los Santos, Babur, 2018. Consumer search on the Internet. International Journal of Industrial Organization 58, 66–105. De los Santos, Babur, Hortaçsu, Ali, Wildenbeest, Matthijs R., 2012. Testing models of consumer search using data on web browsing and purchasing behavior. The American Economic Review 102 (6), 2955–2980. De los Santos, Babur, Hortaçsu, Ali, Wildenbeest, Matthijs R., 2017. Search with learning for differentiated products: evidence from E-commerce. Journal of Business & Economic Statistics 35 (4), 626–641. De los Santos, Babur, Koulayev, Sergei, 2017. Optimizing click-through in online rankings with endogenous search refinement. Marketing Science 36 (4), 542–564. Dong, Xiaojing, Morozov, Ilya, Seiler, Stephan, Hou, Liwen, 2018. Estimation of Preference Heterogeneity in Markets with Costly Search. Working Paper. Stanford University. Donna, Javier D., Pereira, Pedro, Pires, Pedro, Trindade, Andre, 2018. Measuring the Welfare of Intermediaries in Vertical Markets. Working Paper. The Ohio State University. Duffie, Darrell, Dworczak, Piotr, Zhu, Haoxiang, 2017. Benchmarks in search markets. The Journal of Finance 72 (5), 1983–2044. Dzyabura, Daria, Hauser, John R., 2017. Recommending Products When Consumers Learn Their Preference Weights. Working Paper. New York University. Elberg, Andres, Gardete, Pedro, Macera, Rosario, Noton, Carlos, 2019. Dynamic effects of price promotions: field evidence, consumer search, and supply-side implications. Quantitative Marketing and Economics 17 (1), 1–58.

References

Ershov, Daniel, 2018. The Effects of Consumer Search Costs on Entry and Quality in the Mobile App Market. Working Paper. Toulouse School of Economics. Fader, Peter, McAlister, Leigh, 1990. An elimination by aspects model of consumer response to promotion calibrated on UPC scanner data. Journal of Marketing Research 27 (3), 322–332. Gardete, Pedro, Antill, Megan, 2019. Guiding Consumers Through Lemons and Peaches: A Dynamic Model of Search over Multiple Attributes. Working Paper. Stanford University. Gilbride, Timothy, Allenby, Greg, 2004. A choice model with conjunctive, disjunctive, and compensatory screening rules. Marketing Science 23 (3), 391–406. Goeree, Michelle Sovinsky, 2008. Limited information and advertising in the US personal computer industry. Econometrica 76 (5), 1017–1074. Grennan, Matthew, Swanson, Ashley, 2018. Transparency and Negotiated Prices: The Value of Information in Hospital-Supplier Bargaining. Working Paper. University of Pennsylvania. Gu, Chris, 2016. Consumer Online Search with Partially Revealed Information. Working Paper. Georgia Tech. Gumbel, Emil, 1961. Bivariate logistic distributions. Journal of the American Statistical Association 56 (294), 335–349. Harrison, Glenn, Morgan, Peter, 1990. Search intensity in experiments. The Economic Journal 100 (401), 478–486. Häubl, Gerald, Dellaert, Benedict, Donkers, Bas, 2010. Tunnel vision: local behavioral influences on consumer decisions in product search. Marketing Science 29 (3), 438–455. Hauser, John, Wernerfelt, Birger, 1990. An evaluation cost model of consideration sets. Journal of Consumer Research 16 (4), 393–408. Hitsch, Guenter, Hortascu, Ali, Lin, Xiliang, 2017. Prices and Promotions in U.S. Retail Markets: Evidence from Big Data. Working Paper. University of Chicago. Hong, Han, Shum, Matthew, 2006. Using price distributions to estimate search costs. The Rand Journal of Economics 37 (2), 257–275. Honka, Elisabeth, 2014. Quantifying search and switching costs in the U.S. auto insurance industry. The Rand Journal of Economics 45 (4), 847–884. Honka, Elisabeth, Chintagunta, Pradeep, 2017. Simultaneous or sequential? Search strategies in the U.S. auto insurance industry. Marketing Science 36 (1), 21–40. Honka, Elisabeth, Hortaçsu, Ali, Vitorino, Maria Ana, 2017. Advertising, consumer awareness, and choice: evidence from the U.S. banking industry. The Rand Journal of Economics 48 (3), 611–646. Hortaçsu, Ali, Syverson, Chad, 2004. Product differentiation, search costs, and competition in the mutual fund industry: a case study of S&P 500 index funds. The Quarterly Journal of Economics 119 (2), 403–456. Hoxby, Caroline, Turner, Sarah, 2015. What high-achieving low-income students know about college. The American Economic Review 105 (5), 514–517. Janssen, Maarten, Moraga-Gonzalez, Jose Luis, Wildenbeest, Matthijs, 2005. Truly costly sequential search and oligopolistic pricing. International Journal of Industrial Organization 23 (5–6), 451–466. Jindal, Pranav, Aribarg, Anocha, 2018. The Importance of Price Beliefs in Consumer Search. Working Paper. University of North Carolina. Johnson, Norman, Kotz, Samuel, Balakrishnan, N., 1995. Continuous Univariate Distributions, 2nd ed. John Wiley and Sons Inc. Jolivet, Grégory, Turon, Hélène, 2018. Consumer search costs and preferences on the Internet. The Review of Economic Studies. Forthcoming. Kamenica, Emir, Gentzkow, Matthew, 2011. Bayesian persuasion. The American Economic Review 101 (6), 2590–2615. Kapor, Adam, 2016. Distributional Effects of Race-Blind Affirmative Action. Working Paper. Princeton University. Kawaguchi, Kohei, Uetake, Kosuke, Watanabe, Yasutora, 2018. Designing Context-Based Marketing: Product Recommendations Under Time Pressure. Working Paper. Hong Kong University of Science and Technology.

255

256

CHAPTER 4 Empirical search and consideration sets

Ketcham, Jonathan, Lucarelli, Claudio, Powers, Christopher, 2015. Paying attention or paying too much in Medicare part D. The American Economic Review 105 (1), 204–233. Kim, Jun, Albuquerque, Paulo, Bronnenberg, Bart, 2010. Online demand under limited consumer search. Marketing Science 29 (6), 1001–1023. Kim, Jun, Albuquerque, Paulo, Bronnenberg, Bart, 2017. The probit choice model under sequential search with an application to online retailing. Management Science 63 (11), 3911–3929. Kircher, Philipp, 2009. Efficiency of simultaneous search. Journal of Political Economy 117 (5), 861–913. Koulayev, Sergei, 2013. Search with Dirichlet priors: estimation and implications for consumer demand. Journal of Business & Economic Statistics 31 (2), 226–239. Koulayev, Sergei, 2014. Search for differentiated products: identification and estimation. The Rand Journal of Economics 45 (3), 553–575. McCall, John, 1970. Economics of information and job search. The Quarterly Journal of Economics 84 (1), 113–126. McFadden, Daniel, 1989. A method of simulated moments for estimation of discrete response models without numerical integration. Econometrica 57 (5), 995–1026. Mehta, Nitin, Rajiv, Surendra, Srinivasan, Kannan, 2003. Price uncertainty and consumer search: a structural model of consideration set formation. Marketing Science 22 (1), 58–84. Moraga-González, José Luis, Wildenbeest, Matthijs, 2008. Maximum likelihood estimation of search costs. European Economic Review 52 (5), 820–848. Moraga-González, José Luis, Sándor, Zsolt, Wildenbeest, Matthijs R., 2013. Semi-nonparametric estimation of consumer search costs. Journal of Applied Econometrics 28, 1205–1223. Moraga-González, José Luis, Sándor, Zsolt, Wildenbeest, Matthijs, 2015. Consumer Search and Prices in the Automobile Market. Working Paper. Indiana University. Moraga-González, José Luis, Sándor, Zsolt, Wildenbeest, Matthijs, 2018. Consumer Search and Prices in the Automobile Market. Working Paper. Indiana University. Morgan, Peter, Manning, Richard, 1985. Optimal search. Econometrica 53 (4), 923–944. Murry, Charles, Zhou, Yiyi, 2018. Consumer search and automobile dealer co-location. Management Science. Forthcoming. Nishida, Mitsukuni, Remer, Marc, 2018. The determinants and consequences of search cost heterogeneity: evidence from local gasoline markets. Journal of Marketing Research 55 (3), 305–320. Petrin, Amil, Train, Kenneth, 2010. A control function approach to endogeneity in consumer choice models. Journal of Marketing Research 47 (1), 3–13. Pires, Tiago, 2016. Costly search and consideration sets in storable goods markets. Quantitative Marketing and Economics 14 (3), 157–193. Ratchford, Brian, 1980. The value of information for selected appliances. Journal of Marketing Research 17 (1), 14–25. Reinganum, Jennifer, 1979. A simple model of equilibrium price dispersion. Journal of Political Economy 87 (4), 851–858. Roberts, John, Lattin, James, 1991. Development and testing of a model of consideration set composition. Journal of Marketing Research 28 (4), 429–440. Roberts, John, Lattin, James, 1997. Consideration: review of research and prospects for future insights. Journal of Marketing Research 34 (3), 406–410. Rosenfield, Donald, Shapiro, Roy, 1981. Optimal adaptive price search. Journal of Economic Theory 25 (1), 1–20. Rothschild, Michael, 1974. Searching for the lowest price when the distribution of prices is unknown. Journal of Political Economy 82 (4), 689–711. Roussanov, Nikolai, Ryan, Hongxun, Wei, Yanhao, 2018. Marketing Mutual Funds. Working Paper. University of Pennsylvania. Salz, Tobias, 2017. Intermediation and Competition in Search Markets: An Empirical Case Study. Working Paper. Columbia University. Sanches, Fabio, Junior, Daniel Silva, Srisuma, Sorawoot, 2018. Minimum distance estimation of search costs using price distribution. Journal of Business & Economic Statistics 36 (4).

References

Seiler, Stephan, 2013. The impact of search costs on consumer behavior: a dynamic approach. Quantitative Marketing and Economics 11 (2), 155–203. Seiler, Stephan, Pinna, Fabio, 2017. Estimating search benefits from path-tracking data: measurement and determinants. Marketing Science 36 (4), 565–589. Seiler, Stephan, Yao, Song, 2017. The impact of advertising along the conversion funnel. Quantitative Marketing and Economics 15, 241–278. Shi, Savannah, Wedel, Michel, Pieters, Rik, 2013. Information acquisition during online decision making: a model-based exploration using eye-tracking data. Management Science 59 (5), 1009–1026. Shocker, Allen, Ben-Akiva, Moshe, Boccara, Bruno, Nedungadi, Prakash, 1991. Consideration set influences on consumer decision-making and choice: issues, models, and suggestions. Marketing Letters 2 (3), 181–197. Siddarth, S., Bucklin, Randolph, Morrison, Donald, 1995. Making the cut: modeling and analyzing choice set restriction in scanner panel data. Journal of Marketing Research 32 (3), 255–266. Sorensen, Alan, 2000. Equilibrium price dispersion in retail markets for prescription drugs. Journal of Political Economy 108 (4), 833–850. Stigler, George, 1961. The economics of information. Journal of Political Economy 69 (3), 213–225. Stüttgen, Peter, Boatwright, Peter, Monroe, Robert, 2012. A satisficing choice model. Marketing Science 31 (6), 878–899. Terui, Nobuhiko, Ban, Masataka, Allenby, Greg, 2011. The effect of media advertising on brand consideration and choice. Marketing Science 30 (1), 74–91. Tsai, Yi-Lin, Honka, Elisabeth, 2018. Non-Informational Advertising Informing Consumers: How Advertising Affects Consumers’ Decision-Making in the U.S. Auto Insurance Industry. Working Paper. UCLA. Ursu, Raluca, 2018. The power of rankings: quantifying the effect of rankings on online consumer search and purchase decisions. Marketing Science 37 (4), 530–552. Ursu, Raluca, Dzyabura, Daria, 2018. Product Rankings with Consumer Search. Working Paper. New York University. Ursu, Raluca, Wang, Qingliang, Chintagunta, Pradeep, 2018. Search Duration. Working Paper. New York University. van Dijk, Winnie, 2019. The Socio-Economic Consequences of Housing Assistance. Working Paper. University of Chicago. Van Nierop, Erjen, Bronnenberg, Bart, Paap, Richard, Wedel, Michel, Franses, Philip Hans, 2010. Retrieving unobserved consideration sets from household panel data. Journal of Marketing Research 47 (1), 63–74. Vishwanath, Tara, 1992. Parallel search for the best alternative. Economic Theory 2 (4), 495–507. Weitzman, Martin, 1979. Optimal search for the best alternative. Econometrica 47 (3), 641–654. Wildenbeest, Matthijs, 2011. An empirical model of search with vertically differentiated products. The Rand Journal of Economics 42 (4), 729–757. Wolinsky, Asher, 1986. True monopolistic competition as a result of imperfect information. The Quarterly Journal of Economics 101 (3), 493–512. Woodward, Susan, Hall, Robert, 2012. Diagnosing consumer confusion and sub-optimal shopping effort: theory and mortgage-market evidence. The American Economic Review 102 (7), 3249–3276. Yao, Song, Wang, Wenbo, Chen, Yuxin, 2017. TV channel search and commercial breaks. Journal of Marketing Research 54 (5), 671–686. Zettelmeyer, Florian, Morton, Fiona Scott, Silva-Risso, Jorge, 2006. How the Internet lowers prices: evidence from matched survey and automobile transaction data. Journal of Marketing Research 43 (2), 168–181. Zhang, Jie, 2006. An integrated choice model incorporating alternative mechanisms for consumers’ reactions to in-store display and feature advertising. Marketing Science 25 (3), 278–290. Zhang, Xing, Chan, Tat, Xie, Ying, 2018. Price search and periodic price discounts. Management Science 64 (2), 495–510.

257

CHAPTER

Digital marketing✩

5 Avi Goldfarba,b,∗ , Catherine Tuckerc,b

a Rotman

School of Management, University of Toronto, Toronto, ON, Canada b NBER, Cambridge, MA, United States c MIT Sloan School of Management, Cambridge, MA, United States ∗ Corresponding author: e-mail address: [email protected]

Contents 1 Reduction in consumer search costs and marketing ...................................... 1.1 Pricing: Are prices and price dispersion lower online? ....................... 1.2 Placement: How do low search costs affect channel relationships? ....... 1.3 Product: How do low search costs affect product assortment? ............. 1.4 Promotion: How do low search costs affect advertising?..................... 2 The replication costs of digital goods is zero .............................................. 2.1 Pricing: How can non-rival digital goods be priced profitably?.............. 2.2 Placement: How do digital channels – some of which are illegal – affect the ability of information good producers to distribute profitably? ......... 2.3 Product: What are the motivations for providing digital products given their non-excludability? ........................................................... 2.4 Promotion: What is the role of aggregators in promoting digital goods? ... 3 Lower transportation costs..................................................................... 3.1 Placement: Does channel structure still matter if transportation costs are near zero? ...................................................................... 3.2 Product: How do low transportation costs affect product variety?.......... 3.3 Pricing: Does pricing flexibility increase because transportation costs are near zero? ...................................................................... 3.4 Promotion: What is the role of location in online promotion? ............... 4 Lower tracking costs............................................................................ 4.1 Promotion: How do low tracking costs affect advertising? ................... 4.2 Pricing: Do lower tracking costs enable novel forms of price discrimination?..................................................................... 4.3 Product: How do markets where the customer’s data is the ‘product’ lead to privacy concerns?......................................................... 4.4 Placement: How do lower tracking costs affect channel management? ... 5 Reduction in verification costs ................................................................ 5.1 Pricing: How willingness to pay is bolstered by reputation mechanisms .. 5.2 Product: Is a product’s ‘rating’ now an integral product feature? ..........

261 261 263 264 265 267 267 268 268 269 269 270 271 272 272 273 273 276 277 278 278 278 279

✩ This builds heavily on our Journal of Economic Literature paper, ‘Digital Economics.’ Handbook of the Economics of Marketing, Volume 1, ISSN 2452-2619, https://doi.org/10.1016/bs.hem.2019.04.004 Copyright © 2019 Elsevier B.V. All rights reserved.

259

260

CHAPTER 5 Digital marketing

5.3 Placement: How can channels reduce reputation system failures? ........ 5.4 Promotion: Can verification lead to discrimination in how goods are promoted? .......................................................................... 6 Conclusions ...................................................................................... References............................................................................................

280 280 281 282

Digital technology is the representation of information in bits, reducing the costs of collecting, storing, and parsing customer data. Such technologies span TCP/IP and other communications standards, improvements in database organization, improvements in computer memory, faster processing speeds, fiber optic cable, wireless transmission, and advances in statistical reasoning. These new digital technologies can be seen as reducing the costs of certain marketing activities. Digital marketing explores how traditional areas of marketing such as pricing, promotion, product, and placement change as certain costs fall substantially, perhaps approaching zero. Using the framework in our recent summary of the digital economics literature (Goldfarb and Tucker, 2019), we emphasize a shift in five different costs in addressing the needs of customers. 1. 2. 3. 4. 5.

Lower search costs for customers. Lower replication costs for certain digital goods. Lower transportation costs in transporting digital goods. Lower tracking costs enabling personalization and targeting. Lower verification costs of customers’ wishes and firms’ reputations.

We argue that each of these costs had the distinction of affecting marketing earlier and more dramatically than many other firm functions or sectors. As a consequence, marketing has become a testing lab for understanding how these shift in costs may affect the broader economy. This link between marketing and economics is important because each of these shifts in costs draws on familiar modeling frameworks from economics. For example, the search cost literature goes back to Stigler (1961). Search costs are lower in digital environments, enabling customers to find products and firms to find customers. Non-rivalry is another key concept, as digital goods can be replicated at zero cost. Transportation cost models, such as the Hotelling Model, provide a useful framework for the literature on the low cost of transportation of digital goods. Digital technologies make it easy to track any one consumer’s behavior, a theme of advertising models at least since Grossman and Shapiro (1984). Last, information models that emphasize reputation and trust help frame research that shows that digitization can make verification easier. Early work in digital economics and industrial organization emphasized the role of lower costs (Shapiro and Varian, 1998; Borenstein and Saloner, 2001; Smith et al., 2001; Ellison and Ellison, 2005). Goldfarb and Tucker (2019) analyzed how these shifts have been studied in the economics literature. We aim to focus on the extent to which quantitative marketing has led, and has been the first empirical testing ground for, many of these changes. As such, we will focus on work in quantitative marketing aiming to understand the effect of technology. In doing so, we will not emphasize

1 Reduction in consumer search costs and marketing

work from the consumer behavior literature or methodology-focused work from the marketing statistics literature. We will also not emphasize studies in the marketing strategy literature that document correlations which are of managerial importance, rather than measuring the causal effects of digital technologies.

1 Reduction in consumer search costs and marketing Search costs matter in marketing because they represent the costs consumers incur looking for information regarding products and services. The most important effect of lower search costs with respect to digital marketing is that it is easier to find and compare information about potential products and services online than offline. Many of the earliest reviews about the impact of the internet on the economy emphasized low search costs in the retail context (Borenstein and Saloner, 2001; Bakos, 2001) and the resulting impact on prices, price dispersion, and inventories. These papers built on a long-established economic literature on search costs (Stigler, 1961; Diamond, 1971; Varian, 1980). Recent work in marketing has examined the search process in depth, documenting the clickstream path and underlying search strategies (Bronnenberg et al., 2016; Honka and Chintagunta, 2017).

1.1 Pricing: Are prices and price dispersion lower online? Perhaps the dominant theme in the early literature was the impact of low search costs on prices and price dispersion. Brynjolfsson and Smith (2000) hypothesized that low internet search costs should lower both prices and price dispersion. They empirically tested these ideas, comparing the prices of books and CDs online and offline. They found that online prices were lower. Similarly, Brown and Goolsbee (2002) showed that insurance prices are lower online and Orlov (2011) found that airline prices are lower online. A series of related studies (Zettelmeyer et al., 2001; Scott Morton et al., 2003; Zettelmeyer et al., 2006) showed how digitization reduced automobile prices, though not equally for all types of consumers. While prices fell, the results of the literature on price dispersion have been more mixed. Brynjolfsson and Smith (2000) show substantial price dispersion online. Nevertheless, they find that online price dispersion is somewhat lower than offline price dispersion. Baye and Morgan (2004) emphasize persistently high levels of price dispersion online. Orlov (2011) suggests that online price dispersion is higher. The persistence of price dispersion is a puzzle. Broadly, the literature gives two main answers. First, the earlier economics literature has emphasized that retailers differ, so the service provided for the same item differs across retailers. Firms with stronger brands command higher prices, though this has been decreasing somewhat over time (Waldfogel and Chen, 2006). This decline in importance of brands in the digital environment, as shown in Hollenbeck (2018), is related as we discuss to the reduction in verification costs.

261

262

CHAPTER 5 Digital marketing

Second, as a counterpoint to the notion that there are exogenously given differences in seller quality that formed the basis of the early economics literature – the marketing literature has emphasized the extent to which search can be influenced by the seller. In other words, in marketing we recognize that search costs are endogenous and a reflection of a firm’s marketing strategy. Honka (2014) and De los Santos et al. (2012) provide surprisingly large estimates of the cost of each click in the online context. By forcing customers to conduct an extra click or two, sellers can increase the relative cost of search in areas where they are weak. For example, a highquality, high-price firm might make it easy to compare product quality but difficult to find prices. Chu et al. (2008) show that price sensitivity is lower in online grocery compared to offline grocery. Fradkin (2017) has shown a similar phenomenon in the context of Airbnb. A number of scholars have shown that such endogenous increases in search costs can be sustained in equilibrium (Ellison and Ellison, 2009) and profitable (Hossain and Morgan, 2006; Dinerstein et al., 2018; Moshary et al., 2017). Ellison and Ellison (2009) showed how firms can obfuscate prices. They emphasize a setting where search costs should be very low: An online price comparison website. They show that retailers that display prices on that website emphasize their relatively low priced products. Then, when consumers click the link and arrive at the retailer’s own website, they are shown offers for higher prices and higher margin goods. Thus, price dispersion is low at the price comparison website where search costs are low, but dispersion is high where comparison is more difficult. More recently, Moshary et al. (2017) demonstrate the effectiveness of similar price obfuscation in the context of a massive field experiment at StubHub. The experiment compared purchase prices and demand estimates when the service fees for using StubHub were shown early in the search process versus immediately before purchase. The experiment showed that customers were less sensitive to the same fee when it was shown late in the process. The company deliberately made some price information more difficult to find, and this increased quantity demanded at the same price. Another area where search costs are endogenous to firm marketing strategy reflects the use of devices. Firms recognize that tablets and mobile devices, with smaller screens, may facilitate this process of restricting the information that consumers see initially (Xu et al., 2017; Ghose et al., 2013). A developing area of marketing is trying to understand, given this different environment, how best to present price information to consumers to maximize profits in a mobile environment (Andrews et al., 2015; Fong et al., 2015). The early online price dispersion literature and the more recent literature demonstrating endogenous online search costs show where a close study of marketing contexts has been able to add nuance to a puzzle noted in the economics literature, by exploring how firms can increase search costs for consumers in a digital environment.

1 Reduction in consumer search costs and marketing

1.2 Placement: How do low search costs affect channel relationships? Reduced search costs facilitate exchange more generally, often enabled by large digital platforms. Many major technology firms can be seen as platform-based businesses. For example, Google and Facebook are platforms for advertisers and buyers. Jullien (2012) highlighted that digital markets give rise to platforms because of low search costs that facilitate matching and enable trade. Horton and Zeckhauser (2016) emphasize that many large digital platforms are driven by low search costs that enable efficient use of unused capacity for durable goods. This emphasis on unused capacity, and the need to match supply and demand, means that much research takes a market design perspective (Einav et al., 2018). Cullen and Farronato (2016) emphasize the challenges of matching supply and demand over time and the importance of economies of scale in matching. Zervas et al. (2017) emphasize how supply changes in response to changes in demand. Bapna et al. (2016) emphasize that platform design can also be informed by consumer theory. These platforms often provide an alternative type of distribution channel, through which sellers can reach buyers. This can enable new markets and affect incumbents. For example, several papers have examined the accommodation industry (Fradkin, 2017; Farronato and Fradkin, 2018; Zervas et al., 2017). Zervas et al. (2017) examine how the introduction of Airbnb as a channel for selling accommodations reduced demand in the incumbent hotel industry in a particular way. Airbnb provided a channel for selling temporary accommodation. This enabled accommodations to go on and off the market as demand fluctuated. Consequently, the impact of Airbnb is largest in periods of peak demand (such as the SXSW festival in Austin, Texas). In these periods, hotel capacity is constrained. Airbnb ‘hosts’ play a role in providing additional capacity. This means that hotel prices do not rise as much. Digital platforms serve as distribution channels in a wide variety of industries, including airlines (Dana and Orlov, 2014), books (Ellison and Ellison, 2017), food trucks (Anenberg and Kung, 2015), entertainment (Waldfogel, 2018), and cars (Hall et al., 2016). In many of these cases, a key role of online platforms is to provide an additional channel to overcome capacity constraints (Farronato and Fradkin, 2018). These constraints may be regulatory, as in the case of limited taxi licensing, related to fixed costs and technological limits as in the case of YouTube as a substitute for television, or both, as in the case of accommodation, where hotel rooms have high fixed costs and short term rentals are constrained by regulation. Given that a key role of these online platforms is to overcome capacity constraints in offline distribution channels, this provides a structure for understanding where new online platforms may arise. They will likely appear in places where existing distribution channels generate capacity constraints, particularly in the presence of large demand fluctuations. Furthermore, it provides a structure for identifying which existing channels and incumbent sellers will be most affected by online platforms: Those in which capacity constraints generate a key source of their profits. As Farronato and Fradkin (2018) show, hotels lost their ability to charge unusually high prices during periods of peak demand.

263

264

CHAPTER 5 Digital marketing

In summary, digital platforms facilitate a reduction in search costs. This creates an opportunity for sellers working at a small scale to find buyers. By enabling an influx of sellers, online platforms overcome capacity constraints, creating new opportunities for sellers, new benefits to buyers, and new threats to the existing distribution channels and the larger incumbent sellers. Much of the initial literature on platform or two-sided networks was led by economists inspired by antitrust litigation in credit cards (Rochet and Tirole, 2003; Armstrong, 2006). However, recently the literature has exploded in marketing because so many large platforms are primarily marketing channels – such as Amazon, Facebook, and Google. This means that the digital marketing literature is at the core of much of the debate about the extent to which such platforms represent a challenge for antitrust regulators (Chiou and Tucker, 2017b).

1.3 Product: How do low search costs affect product assortment? Anderson (2006) emphasized that the internet increases the purchase of niche or ‘long tail’ products relative to mainstream or superstar products. Consistent with this hypothesis, Brynjolfsson et al. (2011) find that the variety of products available and purchased online is higher than offline. Zentner et al. (2013) use a quasi-experimental estimation strategy to show that consumers are more likely to rent niche movies online and blockbusters offline. Datta et al. (2018) demonstrate that the move to streaming, rather than purchasing, music has led to a wider variety of music consumption and increased product discovery, which in turn increases the variety of music available (Aguiar and Waldfogel, 2018). Zhang (2018) links this discovery of relatively unknown products to low search costs. Empirical evidence suggests that this increase in variety increased consumer surplus (Brynjolfsson et al., 2003). Furthermore, Quan and Williams (2018) suggest that the increase in the variety of products purchased by consumers has been overestimated by the literature. In particular, they note that tastes are spatially correlated, and examine the consequences of spatially correlated tastes on the distribution of product assortment both online and offline. The key finding is that offline product assortment has been mis-measured, because products that might appear to be rarely purchased in a national sample could still have sufficient local demand in certain markets that they would be available. Drawing on this insight, they build a structural model of demand and show that the welfare effects of the internet through the long tail are much more modest than many previous estimates. These relatively low welfare benefits of the long tail or the benefits of increased variety are consistent with Ershov’s (2019) research in the context of online software downloads from the Google Play store, which emphasizes the general benefits of a reduction in search costs for consumers. While much of the popular discussion has emphasized the long tail, the effect of search costs on product assortment is ambiguous. If there are vertically differentiated products, low search costs mean that consumers will all be able to identify the best product. Bar-Isaac et al. (2012) provide a theoretical framework that combines superstar and long tail effects as search costs fall, demonstrating that lower search costs hurt middle-tier products while helping extremes. Elberse and Eliashberg (2003) document both effects in the entertainment industry.

1 Reduction in consumer search costs and marketing

As noted above, search costs can be endogenously chosen by firms. Recommendation engines are one tool through which firms choose which attributes to emphasize, lowering search costs in some dimensions and not others. Fleder and Hosanagar (2009) show that simple changes to recommendation engine algorithms can bias purchases toward superstar or long tail effects. Superstar effects occur when the recommendation engine primarily suggests ‘people who bought this also bought’. In contrast, long tail effects occur when the engine instead suggests ‘people who bought this disproportionately bought’. Consistent with this framing, Zhang and Liu (2012) and Agrawal et al. (2015) show how recommendation engines can lead to a small number of products receiving the most attention when they focus on showing which products are most popular. In contrast, Tucker and Zhang (2011) provide an example in which a recommendation engine which highlights the popularity of a digital choice has asymmetrically large effects for niche products. This occurs for a different reason than the one highlighted in Fleder and Hosanagar (2009). In this case, the release of popularity information allowed niche sellers to appear to be relatively popular and consequently signal their quality or general attractiveness. Overall, reduced search costs appear to increase product assortment while also increasing sales at the top of the distribution. We have empirical evidence of both long tail and superstar effects, probably at the expense of products in the middle of the distribution. While the variety of products offered has increased, Quan and Williams (2018) highlight that the welfare consequences of this appear to be small in the context of a particular set of online products. This contrasts with the evidence summarized in Waldfogel (2018) who argues that digitization has led to a substantial increase in consumer welfare in the entertainment industry. Currently, the literature does not have a systematic structure for identifying when increased product assortment will have a large welfare impact. The evidence presented by both Quan and Williams (2018) and Waldfogel (2018) is compelling. It suggests that the particular characteristics of the product category will determine welfare effects. It remains an open research question what those characteristics might be, and this is something we feel that marketing contexts are well able to exploit.

1.4 Promotion: How do low search costs affect advertising? Advertising is often modeled as a process for facilitating search (Bagwell, 2007). Online search costs affect advertising in a variety of ways. For example, low search costs online can also change the nature and effectiveness of offline advertising. Joo et al. (2014) show that television advertising leads to online search. Such searches can in turn lead to better information about products and higher sales. In other words, the ability to search online can make other advertising more effective, and it can enable advertisers to include an invitation to search in their messaging. Modeling advertising as a search process is particularly useful in the context of search engine advertising. This is advertising that responds directly to consumer search behavior. Consumers enter what they are looking for into the search engine. Advertisers respond to that statement of intent. Search engine advertising allows both

265

266

CHAPTER 5 Digital marketing

online and offline advertisers to find customers. Kalyanam et al. (2017) show how search engine ads affect offline stores. Even within search engine advertising, search costs vary. Ghose et al. (2012) demonstrate the importance of rank in search outcomes. Higher ranked products get purchased more. Narayanan and Kalyanam (2015) and Jeziorski and Moorthy (2017) both document that rank matters in search engine advertising in particular, but that this effect varies across advertisers and contexts. Despite the widespread use of search advertising, there is some question about whether search advertising is effective at all. Li et al. (2016) discuss how industry attributes consumer purchases to particular advertising. Search engine advertising appears most effective because people who click on search ads are very likely to end up purchasing and because the click on the search ad is often the ‘last click’ before purchase. Many industry models treat this last click as the most valuable and hence search engine advertising is seen as particularly effective. Li et al. (2016) argue that the industry models overestimate the effectiveness of search ads. Blake et al. (2015) describe why the effectiveness of search advertising may be overestimated. They emphasize the importance of the counterfactual situation where the ad did not appear. If the search engine user would click the algorithmic link instead of the advertisement, then the advertiser would receive the same result for free. The paper shows the result of a field experiment conducted by eBay, in which eBay stopped search engine advertising in a randomly selected set of local markets in the United States. Generally, eBay sales did not fall in markets without search engine advertising compared to markets with search engine advertising. In the absence of search ads, it appears that users clicked the algorithmic links and purchased at roughly the same rate. This was particularly true of the branded keyword search term ‘eBay’. In other words, careful analysis of the counterfactual suggested that search engine advertising generally did not work (except in a small number of specialized situations). This research led eBay to substantially reduce its search engine advertising. Simonov et al. (2018a) revisit the effectiveness of search engine advertising and focus on advertisements for the branded keyword, but for less prominent advertisers than eBay. Using data from search results at Bing’s search engine, they replicate the result that search engine advertising is relatively ineffective for very well known brands. They then demonstrate that search engine advertising is effective for less well known brands, particularly for those that do not show up high in the algorithmic listings. Overall, these papers have shown that understanding the search process – for example through examining heterogeneity in the counterfactual options when search advertising is unavailable – is key to understanding when advertising serves to lower search costs. Coviello et al. (2017) also find that search advertising is effective for a less well-known brand. Simonov et al. (2018b) show that competitive advertising provides a further benefit of search engine advertising: If competitors are bidding on a keyword (even if that keyword is a brand name), then there can be a benefit to paying to search engine advertising even for advertisers who appear as the top algorithmic link. In other words, Blake et al. (2015), Simonov et al. (2018a), and Simonov et al. (2018b) together demonstrate what might seem obvious ex post: Search advertis-

2 The replication costs of digital goods is zero

ing meaningfully lowers search costs for products that are relatively difficult to find through other means. This finding is nevertheless important. Search advertising is a multi-billion dollar industry and many marketers appear to have been mis-attributing sales to the search advertising channel. These three papers provide a coherent picture of how search engine advertising works. The open questions relate to how changes in the nature of search advertising – and the addition of new search channels such as mobile devices and personal assistants – might affect this picture. As we move off the larger screens into more limited bandwidth devices, then search costs may rise and even strong brands may benefit from search advertising. Tools and questions from economics – such as thinking about the right counterfactual – have led to an extensive and important literature on search engines that spans marketing and economics. Even though search engines are so recent, the speed with which this literature has sprung up reflects the growing importance of search engines and other search mechanisms in the digital economy.

2 The replication costs of digital goods is zero Digital goods are non-rival. They can be consumed by one person without reducing the amount or quality available to others. Fire is a non-rival good. If one person starts a fire, they can use it to light someone else’s fire without diminishing their own. The non-rival nature of digital goods leads to important implications for marketing, particularly with respect to copyright and privacy. The internet is, in many ways, a “giant, out of control copying machine” (Shapiro and Varian, 1998). This means that a key challenge for marketers in the era of digitization is controlling product reproduction – free online copying – by consumers.

2.1 Pricing: How can non-rival digital goods be priced profitably? Non-rival goods create pricing challenges. If customers can give their purchases away without decreasing the quality of what they bought, this creates challenges to the ability to price positively. The initial response by many producers of digital products was both legal (through copyright enforcement) and technological (through digital rights management). The effectiveness of such policies on consumer purchases is theoretically ambiguous, and the empirical evidence is mixed (Varian, 2005; Vernik et al., 2011; Danaher et al., 2013; Li et al., 2015; Zhang, 2018). Non-rivalry can lead to opportunities for price discrimination. Lambrecht and Misra (2016) examine price discrimination in sports news. In this context, the highest willingness to pay customers appear all year, while casual fans primarily read news in-season. Therefore, it is profitable for a sports website to provide more free articles during periods of peak demand. During the off-season, more content should require a subscription because that is when the highest value customers appear. Rao and Hartmann (2015) examine price discrimination in digital video, comparing options to rent

267

268

CHAPTER 5 Digital marketing

or buy a digital movie. The paper shows that in the zero marginal cost digital context, dynamic considerations play an important role. The marketing literature has long being focused on tactical and practical questions about how to price (Rao, 1984). Therefore, it is not surprising that scholars at the boundary between marketing and economics have been exploring the new frontier question about how to price non-rival digital goods.

2.2 Placement: How do digital channels – some of which are illegal – affect the ability of information good producers to distribute profitably? Digital channels affect the ability of the producers of information goods to distribute profitably. For example, music industry revenue began to fall in 1999 and this has been widely blamed on the impact of digitization generally and free online copying in particular (Waldfogel, 2012). This leads to a question of optimal restrictions on free online copying by governments through copyright and by firms through digital rights management. While the direct effect of free online copying is to reduce revenues, such free copying may induce consumers to sample new music and buy later (Peitz and Waelbroeck, 2006). Furthermore, Mortimer et al. (2012) show that revenues for complementary goods (like live performances) could rise. Despite this ambiguity, the vast majority of the empirical literature has shown that free online copying does reduce revenue across a wide variety of industries (Zentner, 2006; Waldfogel, 2010; Danaher and Smith, 2014; Godinho de Matos et al., 2018; Reimers, 2016). The core open marketing questions therefore relate to the development and distribution of complementary goods. To the extent that free and even illegal channels for distribution are inevitable for many digital goods, what are the opportunities for intermediaries to facilitate profitable exchange? In other words, besides selling tickets to live music, it is important to understand the ways in which industry has reacted to these changes. As free video distribution becomes widespread through platforms like YouTube and through illegal channels, it may generate incentives to offer subscription bundles (as in Netflix) rather than charging per view (as in the cinema). It may also generate incentives to produce merchandizable content, and then earn profits through toy and clothing licensing, theme parks, and other channels. This in turn may affect the role of entertainment conglomerates in the industry. If merchandizing is necessary, then companies like Disney may have an advantage because they own theme parks, retail stores, and other channels. Aside from Mortimer et al. (2012), our empirical understanding is limited as to how complements to digital information goods arise, how they work, and how they change the nature of the firm.

2.3 Product: What are the motivations for providing digital products given their non-excludability? Intellectual property laws exist because they can generate incentives to innovate and create new products. The non-rival nature of digital goods leads to widespread viola-

3 Lower transportation costs

tion of copyright and questions about what constitutes fair use. Many people consume digital products without paying for them. While the owners of copyrighted works are harmed, the provision of a product at zero price increases consumer surplus and eliminates deadweight loss. It also allows for valuable derivative works. In a static model, this is welfare-enhancing. Consumers benefit more than producers are hurt. Therefore, the key question with respect to digitization and copyright is with respect to the creation of new products. Waldfogel (2012) provides evidence that suggests that the quality of music has not fallen since Napster began facilitating free online copying in 1999. While digitization did reduce incentives to produce because of online copying, the costs of production and distribution fell as well. For distribution, low marginal costs of reproduction meant that early-stage artists could distribute their music widely and get known, even without support from a music label or publisher (Waldfogel, 2016; Waldfogel and Reimers, 2015). The title of the new book ‘Digital Renaissance’ (Waldfogel, 2018) summarizes a decade of his research emphasizing that digitization has led to more and better quality entertainment despite increased copying, largely because of reduced production and distribution costs. In our view, the argument Waldfogel presents is convincing. The challenge for marketing scholars is to extend this literature by understanding better the profitable provision of goods which serve as complements to digital goods.

2.4 Promotion: What is the role of aggregators in promoting digital goods? Non-rivalry means that it is easier for companies to replicate and aggregate the digital content of other firms. Such aggregators both compete with the producing firms content and promote the producing firm’s content (Dellarocas et al., 2013). Thus, the distinction between advertisement and product can become ambiguous. This tension has been empirically examined in the context of the new aggregators. Examining policy changes in Europe, three different studies have shown that news aggregators served more as promotion tools than to cannibalize the revenues of producing firms (Calzada and Gil, 2017; Chiou and Tucker, 2017a; Athey et al., 2017b). For example, Calzada and Gil (2017) show that shutting down Google News in Spain substantially reduced visits to Spanish news sites. Chiou and Tucker (2017a) found similar evidence of market expansion looking at a contract dispute between the Associated Press and Google News. Therefore, in general empirical evidence suggests that news aggregators appear to have a market expansion effect rather than being cannibalizing.

3 Lower transportation costs Information stored in bits can be transported at the speed of light. Therefore digital goods and digital information can be transported anywhere at near-zero cost. Furthermore, the transportation cost to the consumer of buying physical goods online can be relatively low. As emphasized by Balasubramanian (1998), the transportation costs

269

270

CHAPTER 5 Digital marketing

of traveling to an offline retailer are reduced, even if an online retailer still needs to ship a physical product.

3.1 Placement: Does channel structure still matter if transportation costs are near zero? Digitization added a new marketing channel (Peterson et al., 1997). For digital goods, this channel is available to anyone with an internet connection. For physical goods bought online and shipped, this channel is available to anyone within the shipping range. In the United States, this means just about anybody with a mailing address. A variety of theory papers examined the impact of the online channel on marketing strategy (Balasubramanian, 1998; Liu and Zhang, 2006). These papers build on existing models of transportation costs that build themselves on Hotelling (1929). They model online retailers as being equidistant from all consumers, while consumers have different costs of visiting offline retailers, depending on their location. Empirical work has generally supported the use of these models. The new channel competed with the existing offline channels for goods that needed to be shipped (Goolsbee, 2001; Prince, 2007; Brynjolfsson et al., 2009; Forman et al., 2009; Choi and Bell, 2011) and for goods that could be consumed digitally (Sinai and Waldfogel, 2004; Gentzkow, 2007; Goldfarb and Tucker, 2011a,d; Seamans and Zhu, 2014; Sridhar and Sriram, 2015). Forman et al. (2009) aim to explicitly test the applicability of Balasubramanian (1998) to the context of online purchasing on Amazon. Using weekly data on top-selling Amazon books by US city, the paper examines changes in locally top-selling books when offline stores open (specifically Walmart, Target, Barnes & Noble, and Borders). The paper shows that when offline stores open, books that are relatively likely to appear in those stores disproportionately fall out of the top sellers list. The paper interprets this as evidence of offline transportation costs by consumers of visiting retailers: When a retailer opens nearby, consumers become more likely to buy books offline. If the retailer is relatively far away, then consumers are more likely to buy online. A key limitation of this paper is that the data are ranked sales, rather than actual purchase data. Work with purchase data is more limited, though Choi and Bell (2011) show similar evidence of online-offline substitution in the context of online diaper purchasing. This result of online-offline substitution is not always evident. For multi-channel retailers, while substitution does occur in many situations, there are particular situations in which the offline channel enhances the online channel, such as when a brand is relatively unfamiliar in a location where a new store opens (Wang and Goldfarb, 2017; Bell et al., 2018). In particular, Wang and Goldfarb (2017) examined US sales at a large clothing retailer with a substantial presence both online and offline. During the period of study, the retailer substantially expanded the number of offline stores. Using internal sales data, as well as information on website visits, the analysis compares locations in which sales were high at the beginning of the sample period with locations in which sales were low. For places that already had high sales, opening an offline store reduced online purchasing. In these places, online and offline served as

3 Lower transportation costs

competing channels, consistent with the prior literature. In contrast, in locations in which sales were low, the opening of offline stores led to an increase in online sales. This increase occurred in a variety of product categories, not only those that required the customer to know whether the clothes fit. The evidence suggests a marketing communications role for the offline channel. These results suggest more nuance than simply ‘online is a substitute for offline.’ They suggest some validity to the widespread use among practitioners of the jargon term ‘omnichannel’ (Verhoef et al., 2015). In particular, while the previous paragraph summarized a long and careful literature that suggests the arrival of online competition reduced offline sales – and that new offline competitors reduce online sales – within a firm the results are more nuanced. The offline store can drive traffic to the online channel and in doing this it serves two roles: Sales channel and communications channel. This suggests the possibility that an online store can also drive traffic to an offline channel – there is a nascent literature that explores this but, as might be expected, establishing causality is hard.

3.2 Product: How do low transportation costs affect product variety? In the absence of the online channel, all purchases would be made offline. Each person would be constrained to purchase the products available locally. As highlighted above in the context of the long tail, the online channel provides access to a much wider variety of products and services. Sinai and Waldfogel (2004) show that online media enables non-local news consumption. In particular, they show that digitization makes it relatively easy for African Americans living in primarily white neighborhoods to read similar news to African Americans living in African American neighborhoods. In addition, digitization makes it relatively easy for whites living in African American neighborhoods to read similar news to whites living in white neighborhoods. Similarly, Gandal (2006) shows that online media enables local language minorities to read news in their language of choice. Choi and Bell (2011) document that sales of niche diaper brands are higher online in zipcodes where such brands are generally not available offline. Low transportation costs enable product variety, by reducing geographic barriers to distribution. While tastes are spatially correlated (Blum and Goldfarb, 2006; Quan and Williams, 2018), distribution is not limited by local tastes. As discussed earlier, Quan and Williams (2018) show that spatially correlated tastes are reflected in offline offerings. This means that the welfare impact of online product variety is smaller than it might seem if measured by number of varieties available. Combined, these results suggest that the welfare impact of increased product variety will disproportionately accrue to people with distinct preferences from their neighbors, what Choi and Bell (2011) call ‘preference minorities’. This provides an additional layer for interpreting the results of Quan and Williams (2018). If the welfare impact of increased online variety accrues to local minorities, then it might indicate a larger benefit than straight utilitarian analysis might suggest.

271

272

CHAPTER 5 Digital marketing

3.3 Pricing: Does pricing flexibility increase because transportation costs are near zero? Low transportation costs constrain online pricing in several ways. First, there is the competition highlighted above, both between the various online retailers and between online and offline retailers. Second, one aspect of low online transportation costs involves the reduced physical effort when consumers are not required to carry items home from the store. Pozzi (2013) shows that online grocery buyers stockpile more than offline grocery buyers, purchasing in bulk when a discount appears. This ability to stockpile further restricts online pricing strategies. Third, it is difficult, though not impossible, to charge different prices for the same item at different locations; the media has not treated retailers well who have been caught charging different online prices to buyers in different locations (even if to match local offline store prices) (Valentino-Devries et al., 2012; Cavallo, 2017).

3.4 Promotion: What is the role of location in online promotion? Location matters in online promotion. This is partly because – as mentioned above – tastes are spatially correlated. In addition, a long sociology literature, at least since Hampton and Wellman (2003), shows that social networks are highly local. Marketers have long known that word of mouth is perhaps the most effective form of promotion (Dellarocas, 2003). Online word of mouth has become increasingly important, as we discuss in the context of verification costs, but offline word of mouth remains a key tool for promotion even for products sold entirely online. Even though individuals can communicate with anyone anywhere, much online communication is between people who live in the same household or work in the same building. Promotion through local social networks can be effective (Bell and Song, 2007; Choi et al., 2010). For example, in the context of online crowdfunding of music, Agrawal et al. (2015) show that local social networks provided early support that helped promote musicians to distant strangers. There is also suggestive evidence that online recommendations are more effective if provided by people who live nearby (Forman et al., 2008). In other words, although the transportation costs for digital costs are near zero, and the transportation costs for consumers of visiting stores are reduced, a different type of transportation cost persists. This leads to spatially correlated social networks, which in turn leads to spatially correlated word-of-mouth promotion. While the online word-of-mouth literature has grown rapidly, there is still little understanding of how online and offline social networks interact. We expect the quantitative marketing literature to be well placed to address this. As Facebook and online social network platforms become increasingly important promotion channels, this gap in understanding limits our ability to design online promotion strategies.

4 Lower tracking costs

4 Lower tracking costs Literatures on search, replication, and transportation all began in the 1990s and were well established in the early digital marketing literature. More recently, it has become clear that two additional cost shifts have occurred: Tracking costs and verification costs have fallen. It is easy to track digital activity. Tracking is the ability to link an individual’s behavior digitally across multiple different media, content venues, and purchase contexts. Often, information is collected and stored automatically. Tracking enables extremely fine segmentation, and even personalization (Ansari and Mela, 2003; Murthi and Sarkar, 2003; Hauser et al., 2014). This has created new opportunities for marketers in promotion, pricing, and product offerings. The effect in placement has been weaker simply because often there are coordination difficulties between different vertical partners that make tracking harder.

4.1 Promotion: How do low tracking costs affect advertising? Marketing scholars have been particularly prolific in studying the impact of low tracking costs on advertising. The improved targeting of advertising through digital media is perhaps the dominant theme in the online advertising literature (Goldfarb and Tucker, 2011b; Goldfarb, 2014). Many theoretical models on how digitization would affect advertising emphasize targeting (Chen et al., 2001; Iyer et al., 2005; GalOr and Gal-Or, 2005; Anand and Shachar, 2009). Much of this work has emphasized online-offline competition when online advertising is targeted, and the scarcity of advertising space online and offline (Bergemann and Bonatti, 2011; Athey et al., 2016). A large empirical literature has explored various strategies for successful targeting. Goldfarb and Tucker (2011c) show that targeted banner advertising is effective, but only as long as it does not take over the screen too much. Targeting works when it is subtle, in the sense that it has the biggest impact on plain banner ads, relative to how it increases the effectiveness of other types of ads. Tucker (2014) shows a related result in the context of social media targeting. Targeting works when it is not too obvious to the end consumer that an ad is closely targeted. Other successful targeting strategies include retargeting (to a partial extent) (Lambrecht and Tucker, 2013; Bleier and Eisenbeiss, 2015; Johnson et al., 2017a), targeting by stage in the purchase funnel (Hoban and Bucklin, 2015), time between ad exposures (Sahni, 2015), search engine targeting (Yao and Mela, 2011), and targeting using information on mobile devices (Bart et al., 2014; Xu et al., 2017). In each case, digitization facilitates targeting and new opportunities for advertising. In addition to better targeting, better tracking enables the measurement of advertising effectiveness (Goldfarb and Tucker, 2011b). Early attempts to measure banner advertising effectiveness included Manchanda et al. (2006) and Rutz and Bucklin (2012). Tracking makes it relatively straightforward to identify which customers see ads, to track purchases, and to randomize advertising between treatment and control groups. More generally, prior to the diffusion of the internet, advertising measure-

273

274

CHAPTER 5 Digital marketing

ment has relied on aggregate correlations (with the exception of a small number of expensive experiments such as Lodish et al., 1995). Perhaps the clearest result of the increased ability to run advertising experiments because of better tracking is the finding that correlational studies of advertising effectiveness are deeply flawed. For example, Lewis et al. (2011) use data from banner ads on Yahoo to show the existence of a type of selection bias that they label ‘activity bias’. This occurs because users who are online at the time an advertisement is shown are disproportionately likely to undertake other online activities, including those used as outcome measures in advertising effectiveness studies. They show activity bias by comparing a randomized field experiment to correlational individual-level analysis. Measured advertising effectiveness is much lower in the experimental setting. One interpretation of this result would be to treat correlational analysis as an upper bound on the effectiveness of advertising. Gordon et al. (2019) demonstrate that this is not correct, and instead it is best to treat correlational analysis as having no useful information for measuring advertising effectiveness in the context they study. They examine a series of advertising field experiments on Facebook. Consistent with Lewis et al. (2011), they show that correlational analysis fails to measure advertising effectiveness properly. Importantly, they show that sometimes correlational analysis underestimates the effectiveness of an advertisement. Schwartz et al. (2017) demonstrate the usefulness of reframing experimental design as a multi-armed bandit problem. Measurement challenges extend beyond the need to run experiments. Ideally, advertising effectiveness would be measured based on the increase in long term profits caused by advertising. Given the challenge in measuring long term profits, research has focused on various proxies for advertising success. For example, in measuring the effectiveness of banner advertising, Goldfarb and Tucker (2011c) used data from thousands of online advertising campaigns and randomized advertising into treatment and control groups. The analysis delivered on the promise of better measurement, but the outcome measure was far from a measure of long term profits. In order to get a systematically comparable outcome measure across many campaigns, the paper used the stated purchase intent of people who took a survey after having randomly allocated into seeing the advertisement or seeing a public service announcement. Advertising effectiveness was measured as the difference in stated purchase intent between the treatment and control groups. This is a limited measure of effectiveness in at least two ways. First, only a small fraction of those who saw the ads (whether treatment or control) are likely to take the survey and so the measure is biased to the type of people who take online surveys. Second, purchase intent is different from sales (which in turn is different from long term profits). In our view, for the purpose of comparing the effectiveness of different types of campaigns, this measure worked well. We were able to show that contextually targeted advertising increases purchase intent compared to other kinds of advertising, and that obtrusive advertising works better than plain advertising. Furthermore, we found that ads that were both targeted and obtrusive lifted purchase intent less than ads that were either targeted or obtrusive but not both. At the same time, this mea-

4 Lower tracking costs

sure would not be useful for measuring the return on advertising investment or for determining the efficient allocation of advertising spending. To address questions like these, subsequent research has built new tools for measuring actual sales. Lewis and Reiley (2014) link online ads to offline sales using internal data from Yahoo! and a department store. The paper linked online user profiles to the loyalty program of the department store using email addresses. With this measure, they ran a field experiment on 1.6 million users that showed that online advertising increases offline sales in the department store. While still not a measure of long term profits, this outcome measure is more directly related to the true outcome of interest. This came at the cost of challenges in comparing across types of campaigns and across categories. This study was possible because the research was conducted by scholars working in industry. Such industry research has been important in developing better measures of outcomes, as well as more effective experimentation. Other examples include Lewis and Nguyen (2015), who show spillovers from display advertising to consumer search; Johnson et al. (2017a), who provide a substantially improved method for identifying the control group in the relevant counterfactual to firms that choose not to advertise; and Johnson et al. (2017a), who examine hundreds of online display ad campaigns to show that they have a positive effect on average. Even in the presence of experiments and reliable outcome measures, Lewis and Rao (2015) show that advertising effects are relatively low powered. In other words, the effect of seeing one banner ad once on an eventual purchase is small. It is meaningful and can deliver a positive return on investment, but demonstrating that requires a large number of observations. Johnson et al. (2017b) show that better controls can increase the power of the estimated effects, though this effect is modest. In addition, they found that careful experimental design and sample selection can lead to a substantial boost in power. In general, given these findings, advancing the literature poses some challenges for marketing scholars. This is because it appears increasingly necessary, given the high variance of advertising effectiveness and small effect sizes, to work with marketing platforms to calibrate effects. This need is magnified because of the use of advertising algorithms in these platforms which make understanding a counterfactual problematic (Eckles et al., 2018). It is unlikely that advertising platforms would encourage researchers to study newer issues facing their platforms such as algorithmic bias (Lambrecht and Tucker, 2018) or the spread of misinformation through advertising (Chiou and Tucker, 2018). This is important because some of the biggest research questions that are open in digital marketing communications are no longer simply about advertising effectiveness. Instead, there are now large policy issues about the consequences of the ability to track and target consumers in this way. An example of the challenges facing the online targeting policy debate, is the extent to which regulators should be worried about advertising that is deceptive or distortionary. Though there has been much discussion about the actions of firms such as Cambridge Analytica that use Facebook data to target political ads, as of yet there has been limited discussion in marketing

275

276

CHAPTER 5 Digital marketing

about the issues of deceptive uses of targeting. Again, we expect this will be a fruitful avenue of research.

4.2 Pricing: Do lower tracking costs enable novel forms of price discrimination? Low tracking costs can enable new ways to price discriminate. Early commentators on the impact of digitization emphasized this potential (Shapiro and Varian, 1998; Smith et al., 2001; Bakos, 2001). Tracking means that firms can observe customer behavior and keep tabs on customers over time. This enables behavioral price discrimination (see Fudenberg and Villas-Boas (2012) and Fudenberg and Villas-Boas (2007) for reviews). This literature emphasizes how identifying previous customers affects pricing strategy and profitability (Villas-Boas, 2004; Shin and Sudhir, 2010; Chen and Zhang, 2011). While digital price discrimination has received a great deal of attention in the theory literature, empirical support is limited. Other examples of online price discrimination include Celis et al. (2014) and Seim and Sinkinson (2016). Perhaps the best example is Dube and Misra (2017), who document that targeting many prices to different customers can be profitable in the context of an online service. This paper relies on a large scale field experiment to learn the optimal price discrimination policy. It then demonstrates that the learned policy outperforms other pricing strategies, using an experiment. In other words, the paper demonstrates the opportunity in price targeting and convincingly shows it works in a particular context using experimental design. One area where we have seen high levels of price discrimination is online advertising. Individual-level tracking means that there are thousands of advertisements to price to millions of consumers. Price discrimination is feasible but price discovery is difficult. As a consequence, digital markets typically use auctions to determine prices for advertising. Auctions facilitate price discovery when advertisements can be targeted to individuals based on their current and past behavior. In the 1990s, online advertising was priced according to a standard rate in dollars (or cents) per thousand impressions. Early search engine Goto.com was the first to recognize that an auction could be used to price discriminate in search advertising. Rather than a fixed price per thousand on the search page, prices could vary by search term. Today, both search and display advertising run on this insight, and a large literature has explored various auction formats for online advertising (Varian, 2007; Edelman et al., 2007; Levin and Milgrom, 2010; Athey and Ellison, 2011; Zhu and Wilbur, 2011; Arnosti et al., 2016). As long as an auction is competitive, the platform is able to price discriminate with much more detail than before. While this might generate more efficient advertising in the sense that the highest bidder values the advertisement the most, it also may enable the platform to capture more of the surplus from advertising. In other words, by enabling better price discrimination, advertising auctions likely lead to the familiar welfare effects of price discrimination between buyers and sellers, in this case the buyers and sellers of advertising. The impact on

4 Lower tracking costs

consumer welfare is ambiguous and likely depends on the particular way in which advertising enters the utility function.

4.3 Product: How do markets where the customer’s data is the ‘product’ lead to privacy concerns? Tracking is an opportunity for marketers to segment. It also creates privacy concerns. Therefore, low tracking costs have led to a resurgence of policy interest in privacy. A core question in the privacy literature is whether privacy is an intermediate good that is only valuable because it affects consumers indirectly (such as through higher prices) or whether privacy a final good that is valued in and of itself (Farrell, 2012). The theoretical literature has focused on privacy as an intermediate good (Taylor, 2004; Acquisti and Varian, 2005; Hermalin and Katz, 2006), while policy discussions often emphasize privacy as a final good. Research outside of marketing such as Acquisti et al. (2013, 2015) have argued that this discussion has been complicated by inconsistent behavior of consumers towards their desires for privacy – leading to a privacy paradox – where consumers behave in a way which contradicts their stated preferences (Athey et al., 2017a). Many examples of privacy regulation have been aimed at marketers. Such regulation limits what marketers can do with data. It affects the nature and distribution of outcomes (Goldfarb and Tucker, 2012). For example, European privacy regulation in the early 2000s substantially reduced the effectiveness of online advertising in Europe (Goldfarb and Tucker, 2011e). Assuming that opt-in policies mean that fewer users can be tracked, Johnson (2014) builds a structural model to estimate the financial costs of opt-in privacy policies relative to opt-out. The estimates suggest that opt-in policies can have substantial financial costs to platforms. While negative effects of privacy regulation have been shown in a variety of contexts (Miller and Tucker, 2009, 2011; Goldfarb and Tucker, 2011e; Miller and Tucker, 2018; Johnson et al., 2017c), firm-implemented policies that protect the privacy of their consumers can have strongly positive effects (Tucker, 2012, 2014). Privacy regulation also affects the nature of product market competition (Campbell et al., 2015). It can either constrain the ability of smaller firms to compete cost-effectively (Campbell et al., 2015), or lead firms to intentionally silo data about consumers (Miller and Tucker, 2014). In our view, the empirical privacy literature in marketing is surprisingly sparse. Marketers have an important role to play in the debate about data flows because we are among the primary users of data. While there has been some progress on research with respect to marketing policy, we have little empirical understanding of the strategic challenges that relate to privacy. How should firms balance customer demands for privacy and the usefulness of data to provide better products? What is the best way to measure the benefits of privacy to consumers, given that short term measures suggest consumers are often not willing to pay much to protect their privacy, while the policy debate suggests consumers may care in the longer term? Overall, there are a number of opportunities for marketing scholars to provide a deeper understanding of

277

278

CHAPTER 5 Digital marketing

when increased privacy protection will generate strategic advantage. We expect that one such opportunity will be regulations, such as the EU General Data Protection Regulation (GDPR) which came into effect in May 2018. It was significant as the first privacy regulation which has had a truly global impact and therefore affects not just firms within the EU but across the world.

4.4 Placement: How do lower tracking costs affect channel management? Lower tracking costs can make it easier for a manufacturer to monitor behavior in retail channels by tracking prices available online. Israeli (2018) discusses the usefulness of minimum advertised pricing restrictions that manufacturers sometimes impose on retailers to reduce downstream price competition. Using a quasi-experimental setting, the paper demonstrates that easier tracking of online prices makes minimum advertised pricing policies more effective. Easier tracking enables different levels of control in channel relationships. We believe there are opportunities for further research in this area, especially in understanding how conflicts over control of digital technologies affects channel conflict. A recent example of such work is Cao and Ke (2019), who investigate how channel conflict emerges when it is possible to pinpoint precisely a pair of eyeballs that may be interested in a particular search query and try and advertise to them.

5 Reduction in verification costs Reduced tracking costs have had an additional effect of improving verification. This was not anticipated in the early literature which emphasized online anonymity. Perhaps the most familiar verification technology in marketing is the brand (Shapiro, 1983; Erdem and Swait, 1998; Tadelis, 1999; Keller, 2003). The ability to verify online identity and reputation without the need to invest in mass market branding has affected marketing in a variety of ways. Verification is likely to continue to improve, with the advent of new digital verification technologies such as blockchain (Catalini and Gans, 2016).

5.1 Pricing: How willingness to pay is bolstered by reputation mechanisms Digital markets involve small players who may be unfamiliar to potential customers. An estimated 88% of online Visa transactions are with a merchant that the customer does not visit offline (Einav et al., 2017). While brands do play a role online (Brynjolfsson and Smith, 2000; Waldfogel and Chen, 2006), for small players to thrive, other verification mechanisms are needed. Online reputation mechanisms reduce the importance of established brands and enable consumers to trust small online sellers. Furthermore, Hollenbeck (2018) provides evidence that online reputation mecha-

5 Reduction in verification costs

nisms can reduce the importance of offline brands. In particular, the paper demonstrates that high online ratings lead to higher sales in offline independent hotels. Luca (2016) finds a similar result for restaurants. There are many ways that a platform might regulate the behavior of its users. This includes systems that ban users who behave undesirably. However, the majority of platforms lean on online ratings systems. In such systems, past buyers and sellers post ratings for future market participants to see. There is a large literature on the importance of eBay’s online rating system to its success, as well as a variety of papers that explore potential changes and improvements to that system and their impact on prices, market outcomes, and willingness to pay (Resnick and Zeckhauser, 2002; Ba and Pavlou, 2002; Lucking-Reiley et al., 2007; Cabral and Hortacsu, 2010; Hui et al., 2016). For example, Hui et al. (2016) demonstrate that eBay’s reputation system is effective in reducing bad behavior on the part of sellers, but it needs to be combined with eBay’s ability to punish the worst behavior in order to create a successful marketplace on which small sellers can thrive. Perhaps the key theme of this literature is that online reputation mechanisms increase willingness to pay and sometimes enable markets that otherwise would not exist.

5.2 Product: Is a product’s ‘rating’ now an integral product feature? In addition to enhancing trust and willingness-to-pay, ratings systems provide information on product quality. The rating becomes a key feature of a platform. Ratings inform consumers about the best products available within the platform, and are therefore a key element of the overall product offering. Platforms benefit because rating information guides consumers to the highest quality products. For example, Chevalier and Mayzlin (2006) demonstrate that positive reviews lead to higher sales in the context of online retail. Even online identities that are consistent over time but not connected to a name or home address can influence consumption (Yoganarasimhan, 2012). For some online platforms, such as Yelp, their product is to provide ratings about offline settings. As noted above, Luca (2016) and Hollenbeck (2018) show that high online ratings improve sales in offline restaurants and hotels, particularly for independents. In both cases, the online rating system is a substitute for a widely known chain brand. Godes and Silva (2012) also show that such ratings have the potential to exhibit dynamics that reflect real economic effects. This insight is built on by Muchnik et al. (2013), who document herding in ratings behavior on a news website. In addition to the idea of a rating system controlled by the platform as being an integral product feature, organic and digital forms of word-of-mouth are also essential heuristics that consumers use when making purchase decisions about a product (Godes and Mayzlin, 2009). Work such as Toubia and Stephen (2013) has also studied why it is that consumers post word of mouth on platforms such as Twitter, and has drawn a distinction between the intrinsic and extrinsic utility that consumers derive from posting. Lambrecht et al. (2018), however, suggest that some of the most attractive potential spreaders of word-of-mouth, people who start memes on social platforms, are also the most resistant to advertising.

279

280

CHAPTER 5 Digital marketing

5.3 Placement: How can channels reduce reputation system failures? In addition to understanding the successes of reputation systems, a wide literature has explored when reputation systems fail. A key source of failure is the inability to verify whether the person doing the online rating actually experienced the product. Mayzlin et al. (2014) and Luca and Zervas (2016) show evidence that firms seem to give themselves high ratings while giving low ratings to their competitors. A related issue is selection bias in who chooses to provide ratings (Nosko and Tadelis, 2015). Anderson and Simester (2014) show evidence of a related problem: Many reviewers never purchase the product. They review anyway and these reviews distort the information available. In response to these and other concerns, platforms regularly update their reputation systems. For example, Fradkin et al. (2017) document two experiments made at Airbnb to improve their reputation system. What was striking about these experiments is that rather than too many ‘fake’ reviews being a problem, instead here the challenge the platform faced was incentivizing users to give accurate accounts of negative experiences. This paper established that too much ‘favorable’ opinion can be a problem in such settings. The existing literature has provided a broad sense of when and how online reputation systems might fail. This suggests new opportunities for scholars focused on market design. Given the challenges in building online reputation systems, it is important to carefully model and build systems that are robust to these failures.

5.4 Promotion: Can verification lead to discrimination in how goods are promoted? Improved verification technology meant that the early expectations of online anonymity have not been met. For example, early literature showed that online car purchases could avoid the transmission of race and gender information, thereby leading to a reduction of discrimination based on these characteristics (Scott Morton et al., 2003). As verification technology has improved, this anonymity has largely disappeared from many online transactions. This has led to concerns that online identities can be used to discriminate. For example, when information about race or gender is revealed online, consumers receive advertisements for different products and may even receive offers of different prices (Pope and Sydnor, 2011; Doleac and Stein, 2013; Edelman and Luca, 2014). One recent example of this has been the question of algorithmic bias in the way that advertising is distributed – something that has been highlighted by computer scientists (Sweeney, 2013; Datta et al., 2015). In Marketing and Economics, Lambrecht and Tucker (2018) show that a career ad that was intended to highlight careers in the STEM fields that was shown to more men than women, did so due to the price mechanism underlying the distribution of ads. Male eyeballs are cheaper than female eyeballs, so an ad algorithm that is trying to be cost-effective will show any ad to fewer women than men.

6 Conclusions

This type of apparent algorithmic bias is a surprising consequence of improvements in verification technology. In the past, it was not possible to verify gender easily. Instead, firms used content to separate out likely gender affiliation – such as assuming men were more likely to read fishing magazines and women more likely to read beauty magazines. However, in a digital ecosystem where characteristics such as gender can be verified, it means that there is now the possibility that inadvertently our ability to classify gender could lead to perceptions of bias in areas where the distribution of content in a non-gender-neutral way is problematic.

6 Conclusions Digital marketing is inherently different to offline marketing due to a reduction of five categories of costs: Search, reproduction, transportation, tracking, and verification. In defining the scope of this article, we drew boundaries. We focus on understanding the impact of the technology on marketing using an economic perspective. Therefore, we did not discuss much work written in marketing that focuses on methodology, such as the statistical modeling in digital environments literature (Johnson et al., 2004; Moe and Schweidel, 2012; Netzer et al., 2012). We also did not detail the consumer behavior literature on the effect of digital environments (Berger and Milkman, 2012; Castelo et al., 2015). This overview highlights that changes to marketing that result from the change of costs inherent in the digital context are not as obvious as initial economic models may imply. Instead, as may be expected, the complexities of both firm and consumer behavior have led to less than predictable outcomes. It is these less predictable outcomes which have allowed marketing contexts to inform the economics literature on the likely effects of digitization outside of marketing. Going forward, we anticipate the most influential work to fall into one of three categories. First, there are still many opportunities to unpack the existing models and identify new complexities in how the drop in search, reproduction, transportation, tracking, and verification costs affect various aspects of marketing. Many recent papers fall in this category, including Blake et al. (2015), Simonov et al. (2018a), Hollenbeck (2018), and Farronato and Fradkin (2018). In the above discussion, we have highlighted some areas that we see as particularly important topics for future research. Second, as policies change, new business models arise, and new technologies diffuse, there will be opportunities to understand these changes in light of existing models. Recent papers of this type include Bart et al. (2014), Miller and Tucker (2018), Lambrecht and Tucker (2018), and Johnson et al. (2017c). Third, some of the changes brought by digitization and other advances in information technology will require recognition of different types of cost changes. Just as the early internet literature emphasized search, replication, and transportation costs, and only later were tracking and verification costs recognized as important consequences, we anticipate technological change to lead to the application of other well-established

281

282

CHAPTER 5 Digital marketing

models into new contexts. For example, one recent hypothesis is that recent advances in machine learning can be framed as a drop in the cost of prediction which can be modeled as a reduction in uncertainty (Agrawal et al., 2018). For each of these categories, economic theory plays a fundamental role. Search theory provided much of the initial impetus for the digital marketing literature. It provided hypotheses on prices, price dispersion, and product variety. Some of these hypotheses were supported, but others were not. In turn, this generated new models that could explain the data, and the cycle continued. Models of reproduction costs, transportation, tracking, and verification played similar roles. This led to a much deeper understanding of the consequences of digitization on marketing.

References Acquisti, A., Brandimarte, L., Loewenstein, G., 2015. Privacy and human behavior in the age of information. Science 347 (6221), 509–514. Acquisti, A., John, L.K., Loewenstein, G., 2013. What is privacy worth? The Journal of Legal Studies 42 (2), 249–274. Acquisti, A., Varian, H.R., 2005. Conditioning prices on purchase history. Marketing Science 24 (3), 367–381. Agrawal, A., Catalini, C., Goldfarb, A., 2015. Crowdfunding: geography, social networks, and the timing of investment decisions. Journal of Economics and Management Strategy 24 (2), 253–274. Agrawal, A., Gans, J., Goldfarb, A., 2018. Prediction Machines: The Simple Economics of Artificial Intelligence. Harvard Business Press. Aguiar, L., Waldfogel, J., 2018. Quality predictability and the welfare benefits from new products: evidence from the digitization of recorded music. Journal of Political Economy 126 (2), 492–524. Anand, B., Shachar, R., 2009. Targeted advertising as a signal. Quantitative Marketing and Economics 7 (3), 237–266. Anderson, C., 2006. The Long Tail. Hyperion. Anderson, E.T., Simester, D.I., 2014. Reviews without a purchase: low ratings, loyal customers, and deception. Journal of Marketing Research 51 (3), 249–269. Andrews, M., Luo, X., Fang, Z., Ghose, A., 2015. Mobile ad effectiveness: hyper-contextual targeting with crowdedness. Marketing Science 35 (2), 218–233. Anenberg, E., Kung, E., 2015. Information technology and product variety in the city: the case of food trucks. Journal of Urban Economics 90, 60–78. Ansari, A., Mela, C., 2003. E-customization. Journal of Marketing Research 40 (2), 131–145. Armstrong, M., 2006. Competition in two-sided markets. The Rand Journal of Economics 37 (3), 668–691. Arnosti, N., Beck, M., Milgrom, P., 2016. Adverse selection and auction design for Internet display advertising. The American Economic Review 106 (10), 2852–2866. Athey, S., Calvano, E., Gans, J.S., 2016. The impact of consumer multi-homing on advertising markets and media competition. Management Science 64 (4), 1574–1590. Athey, S., Catalini, C., Tucker, C.E., 2017a. The Digital Privacy Paradox: Small Money, Small Costs, Small Talk. Working Paper. MIT. Athey, S., Ellison, G., 2011. Position auctions with consumer search. The Quarterly Journal of Economics 126 (3), 1213–1270. Athey, S., Mobius, M., Pal, J., 2017b. The Impact of News Aggregators on Internet News Consumption: The Case of Localization. Mimeo. Stanford University. Ba, S., Pavlou, P., 2002. Evidence of the effect of trust building technology in electronic markets: price premiums and buyer behavior. Management Information Systems Quarterly 26, 243–268.

References

Bagwell, K., 2007. The economic analysis of advertising. In: Armstrong, M., Porter, R. (Eds.), Handbook of Industrial Organization, vol. 3. Elsevier, pp. 1701–1844 (Chap. 28). Bakos, Y., 2001. The emerging landscape for retail e-commerce. The Journal of Economic Perspectives 15 (1), 69–80. Balasubramanian, S., 1998. Mail versus mall: a strategic analysis of competition between direct marketers and conventional retailers. Marketing Science 17 (3), 181–195. Bapna, R., Ramaprasad, J., Shmueli, G., Umyarov, A., 2016. One-way mirrors in online dating: a randomized field experiment. Management Science 62 (11), 3100–3122. Bar-Isaac, H., Caruana, G., Cunat, V., 2012. Search, design, and market structure. The American Economic Review 102 (2), 1140–1160. Bart, Y., Stephen, A.T., Sarvary, M., 2014. Which products are best suited to mobile advertising? A field study of mobile display advertising effects on consumer attitudes and intentions. Journal of Marketing Research 51 (3), 270–285. Baye, M.R., Morgan, J., 2004. Price dispersion in the lab and on the Internet: theory and evidence. The Rand Journal of Economics 35 (3), 449–466. Bell, D.R., Gallino, S., Moreno, A., 2018. Offline experiences and value creation in omnichannel retail. Available at SSRN. Bell, D.R., Song, S., 2007. Neighborhood effects and trial on the Internet: evidence from online grocery retailing. Quantitative Marketing and Economics 5 (4), 361–400. Bergemann, D., Bonatti, A., 2011. Targeting in advertising markets: implications for offline versus online media. The Rand Journal of Economics 42 (3), 417–443. Berger, J., Milkman, K.L., 2012. What makes online content viral? Journal of Marketing Research 49 (2), 192–205. Blake, T., Nosko, C., Tadelis, S., 2015. Consumer heterogeneity and paid search effectiveness: a large scale field experiment. Econometrica 83 (1), 155–174. Bleier, A., Eisenbeiss, M., 2015. Personalized online advertising effectiveness: the interplay of what, when, and where. Marketing Science 34 (5), 669–688. Blum, B., Goldfarb, A., 2006. Does the Internet defy the law of gravity? Journal of International Economics 70 (2), 384–405. Borenstein, S., Saloner, G., 2001. Economics and electronic commerce. The Journal of Economic Perspectives 15 (1), 3–12. Bronnenberg, B.J., Kim, J.B., Mela, C.F., 2016. Zooming in on choice: how do consumers search for cameras online? Marketing Science 35 (5), 693–712. Brown, J., Goolsbee, A., 2002. Does the Internet make markets more competitive? Evidence from the life insurance industry? Journal of Political Economy 110 (3), 481–507. Brynjolfsson, E., Hu, Y., Rahman, M., 2009. Battle of the retail channels: how product selection and geography drive cross-channel competition. Management Science 55 (11), 1755–1765. Brynjolfsson, E., Hu, Y., Simester, D., 2011. Goodbye Pareto principle, hello long tail: the effect of search costs on the concentration of product sales. Management Science 57 (8), 1373–1386. Brynjolfsson, E., Hu, Y.J., Smith, M.D., 2003. Consumer surplus in the digital economy: estimating the value of increased product variety at online booksellers. Management Science 49 (11), 1580–1596. Brynjolfsson, E., Smith, M., 2000. Frictionless commerce? A comparison of Internet and conventional retailers. Management Science 46 (4), 563–585. Cabral, L., Hortacsu, A., 2010. Dynamics of seller reputation: theory and evidence from eBay. Journal of Industrial Economics 58 (1), 54–78. Calzada, J., Gil, R., 2017. What Do News Aggregators Do? Evidence from Google News in Spain and Germany. Mimeo. University of Barcelona. Campbell, J., Goldfarb, A., Tucker, C., 2015. Privacy regulation and market structure. Journal of Economics & Management Strategy 24 (1), 47–73. Cao, X., Ke, T., 2019. Cooperative search advertising. Marketing Science 38 (1), 44–67. Castelo, N., Hardy, E., House, J., Mazar, N., Tsai, C., Zhao, M., 2015. Moving citizens online: using salience & message framing to motivate behavior change. Behavioral Science and Policy 1 (2), 57–68.

283

284

CHAPTER 5 Digital marketing

Catalini, C., Gans, J.S., 2016. Some Simple Economics of the Blockchain. SSRN Working Paper 2874598. https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2874598. Cavallo, A., 2017. Are online and offline prices similar? Evidence from large multi-channel retailers. The American Economic Review 107 (1), 283–303. Celis, L.E., Lewis, G., Mobius, M., Nazerzadeh, H., 2014. Buy-it-now or take-a-chance: price discrimination through randomized auctions. Management Science 60 (12), 2927–2948. Chen, Y., Narasimhan, C., Zhang, Z.J., 2001. Individual marketing with imperfect targetability. Marketing Science 20 (1), 23–41. Chen, Y., Zhang, T., 2011. Equilibrium price dispersion with heterogeneous searchers. International Journal of Industrial Organization 29 (6), 645–654. Chevalier, J., Mayzlin, D., 2006. The effect of word of mouth online: online book reviews. Journal of Marketing Research 43, 345–354. Chiou, L., Tucker, C., 2017a. Content aggregation by platforms: the case of the news media. Journal of Economics and Management Strategy 26 (4), 782–805. Chiou, L., Tucker, C., 2017b. Search Engines and Data Retention: Implications for Privacy and Antitrust. Discussion Paper. National Bureau of Economic Research. Chiou, L., Tucker, C., 2018. Fake News and Advertising on Social Media: A Study of the Anti-Vaccination Movement. Working Paper 25223. National Bureau of Economic Research. Choi, J., Bell, D., 2011. Preference minorities and the Internet. Journal of Marketing Research 58 (3), 670–682. Choi, J., Hui, S.K., Bell, D.R., 2010. Spatiotemporal analysis of imitation behavior across new buyers at an online grocery retailer. Journal of Marketing Research 47 (1), 75–89. Chu, J., Chintagunta, P., Cebollada, J., 2008. Research note – a comparison of within-household price sensitivity across online and offline channels. Marketing Science 27 (2), 283–299. Coviello, L., Gneezy, U., Goette, L., 2017. A Large-Scale Field Experiment to Evaluate the Effectiveness of Paid Search Advertising. CESifo Working Paper Series No. 6684. Cullen, Z., Farronato, C., 2016. Outsourcing Tasks Online: Matching Supply and Demand on Peer-to-Peer Internet Platforms. Working Paper. Harvard University. Dana, James D.J., Orlov, E., 2014. Internet penetration and capacity utilization in the US airline industry. American Economic Journal: Microeconomics 6 (4), 106–137. Danaher, B., Smith, M.D., 2014. Gone in 60 seconds: the impact of the megaupload shutdown on movie sales. International Journal of Industrial Organization 33, 1–8. Danaher, B., Smith, M.D., Telang, R., 2013. Piracy and Copyright Enforcement Mechanisms. University of Chicago Press, pp. 25–61. Datta, A., Tschantz, M.C., Datta, A., 2015. Automated experiments on ad privacy settings. Proceedings on Privacy Enhancing Technologies 2015 (1), 92–112. Datta, H., Knox, G., Bronnenberg, B.J., 2018. Changing their tune: how consumers’ adoption of online streaming affects music consumption and discovery. Marketing Science 37 (1), 5–21. De los Santos, B.I., Hortacsu, A., Wildenbeest, M., 2012. Testing models of consumer search using data on web browsing and purchasing behavior. The American Economic Review 102 (6), 2955–2980. Dellarocas, C., 2003. The digitization of word of mouth: promise and challenges of online feedback mechanisms. Management Science 49 (10), 1407–1424. Dellarocas, C., Katona, Z., Rand, W., 2013. Media, aggregators, and the link economy: strategic hyperlink formation in content networks. Management Science 59 (10), 2360–2379. Diamond, P., 1971. A model of price adjustment. Journal of Economic Theory 3 (2), 156–168. Dinerstein, M., Einav, L., Levin, J., Sundaresan, N., 2018. Consumer price search and platform design in Internet commerce. The American Economic Review 108 (7), 1820–1859. Doleac, J.L., Stein, L.C., 2013. The visible hand: race and online market outcomes. The Economic Journal 123 (572), F469–F492. Dube, J.-P., Misra, S., 2017. Scalable Price Targeting. Working Paper. University of Chicago. Eckles, D., Gordon, B.R., Johnson, G.A., 2018. Field studies of psychologically targeted ads face threats to internal validity. Proceedings of the National Academy of Sciences, 201805363. Edelman, B., Luca, M., 2014. Digital Discrimination: The Case of Airbnb.com. HBS Working Paper.

References

Edelman, B., Ostrovsky, M., Schwarz, M., 2007. Internet advertising and the generalized second-price auction: selling billions of dollars worth of keywords. The American Economic Review 97 (1), 242–259. Einav, L., Farronato, C., Levin, J., Sundaresan, N., 2018. Auctions versus posted prices in online markets. Journal of Political Economy 126 (1), 178–215. Einav, L., Klenow, P., Klopack, B., Levin, J., Levin, L., Best, W., 2017. Assessing the Gains from ECommerce. Working Paper. Stanford University. Elberse, A., Eliashberg, J., 2003. Demand and supply dynamics for sequentially released products in international markets: the case of motion pictures. Marketing Science 22 (3), 329–354. Ellison, G., Ellison, S.F., 2005. Lessons about markets from the Internet. The Journal of Economic Perspectives 19 (2), 139–158. Ellison, G., Ellison, S.F., 2009. Search, obfuscation, and price elasticities on the Internet. Econometrica 77 (2), 427–452. Ellison, G., Ellison, S.F., 2017. Match Quality, Search, and the Internet Market for Used Books. Working Paper. MIT. Erdem, T., Swait, J., 1998. Brand equity as a signaling phenomenon. Journal of Consumer Psychology 7 (2), 131–157. Ershov, D., 2019. The Effect of Consumer Search Costs on Entry and Quality in the Mobile App Market. Working Paper. Toulouse School of Economics. Farrell, J., 2012. Can privacy be just another good? Journal on Telecommunications and High Technology Law 10, 251. Farronato, C., Fradkin, A., 2018. The Welfare Effects of Peer Entry in the Accommodation Market: The Case of Airbnb. Discussion Paper. National Bureau of Economic Research. Fleder, D.M., Hosanagar, K., 2009. Blockbuster culture’s next rise or fall: the impact of recommender systems on sales diversity. Management Science 55 (5), 697–712. Fong, N.M., Fang, Z., Luo, X., 2015. Geo-conquesting: competitive locational targeting of mobile promotions. Journal of Marketing Research 52 (5), 726–735. Forman, C., Ghose, A., Goldfarb, A., 2009. Competition between local and electronic markets: how the benefit of buying online depends on where you live. Management Science 55 (1), 47–57. Forman, C., Ghose, A., Wiesenfeld, B., 2008. Examining the relationship between reviews and sales: the role of reviewer identity disclosure in electronic markets. Information Systems Research 19 (3), 291–313. Fradkin, A., 2017. Search, Matching, and the Role of Digital Marketplace Design in Enabling Trade: Evidence from Airbnb. Mimeo. MIT Sloan School of Business. Fradkin, A., Grewal, E., Holtz, D., 2017. The determinants of online review informativeness: evidence from field experiments on Airbnb. Working Paper. MIT Sloan School of Management. Fudenberg, D., Villas-Boas, J.M., 2007. Behaviour-based price discrimination and customer recognition. In: Hendershott, T. (Ed.), Economics and Information Systems, vol. 1. Elsevier Science, Oxford. Fudenberg, D., Villas-Boas, J.M., 2012. Price discrimination in the digital economy. In: Oxford Handbook of the Digital Economy. Oxford University Press. Gal-Or, E., Gal-Or, M., 2005. Customized advertising via a common media distributor. Marketing Science 24, 241–253. Gandal, N., 2006. The effect of native language on Internet use. International Journal of the Sociology of Language 182, 25–40. Gentzkow, M., 2007. Valuing new goods in a model with complementarity: online newspapers. The American Economic Review 97 (32), 713–744. Ghose, A., Goldfarb, A., Han, S.P., 2013. How is the mobile Internet different? Search costs and local activities. Information Systems Research 24 (3), 613–631. Ghose, A., Ipeirotis, P.G., Li, B., 2012. Designing ranking systems for hotels on travel search engines by mining user-generated and crowdsourced content. Marketing Science 31 (3), 493–520. Godes, D., Mayzlin, D., 2009. Firm-created word-of-mouth communication: evidence from a field test. Marketing Science 28 (4), 721–739. Godes, D., Silva, J.C., 2012. Sequential and temporal dynamics of online opinion. Marketing Science 31 (3), 448–473.

285

286

CHAPTER 5 Digital marketing

Godinho de Matos, M., Ferreira, P., Smith, M.D., 2018. The effect of subscription video-on-demand on piracy: evidence from a household-level randomized experiment. Management Science 64 (12), 5610–5630. Goldfarb, A., 2014. What is different about online advertising? Review of Industrial Organization 44 (2), 115–129. Goldfarb, A., Tucker, C., 2011a. Advertising bans and the substitutability of online and offline advertising. Journal of Marketing Research 48 (2), 207–227. Goldfarb, A., Tucker, C., 2011b. Online advertising. In: Zelkowitz, M.V. (Ed.), Advances in Computers, vol. 81. Elsevier, pp. 289–315. Goldfarb, A., Tucker, C., 2011c. Online display advertising: targeting and obtrusiveness. Marketing Science 30, 389–404. Goldfarb, A., Tucker, C., 2011d. Search engine advertising: channel substitution when pricing ads to context. Management Science 57 (3), 458–470. Goldfarb, A., Tucker, C., 2011e. Privacy regulation and online advertising. Management Science 57 (1), 57–71. Goldfarb, A., Tucker, C., 2012. Privacy and innovation. In: Innovation Policy and the Economy, vol. 12. National Bureau of Economic Research, Inc. NBER Chapters. Goldfarb, A., Tucker, C., 2019. Digital economics. Journal of Economic Literature 57 (1), 3–43. Goolsbee, A., 2001. Competition in the computer industry: online versus retail. Journal of Industrial Economics 49 (4), 487–499. Gordon, B., Zettelmeyer, F., Bhargava, N., Chapsky, D., 2019. A comparison of approaches to advertising measurement: evidence from big field experiments at Facebook. Marketing Science 38 (2), 193–225. Grossman, G.M., Shapiro, C., 1984. Informative advertising with differentiated products. The Review of Economic Studies 51 (1), 63–81. Hall, J., Kendrick, C., Nosko, C., 2016. The Effects of Uber’s Surge Pricing: A Case Study. Working Paper. Uber. Hampton, K., Wellman, B., 2003. Neighboring in Netville: how the Internet supports community and social capital in a wired suburb. City and Community 2 (4), 277–311. Hauser, J.R., Liberali, G., Urban, G.L., 2014. Website morphing 2.0: switching costs, partial exposure, random exit, and when to morph. Management Science 60 (6), 1594–1616. Hermalin, B., Katz, M., 2006. Privacy, property rights and efficiency: the economics of privacy as secrecy. Quantitative Marketing and Economics 4 (3), 209–239. Hoban, P.R., Bucklin, R.E., 2015. Effects of Internet display advertising in the purchase funnel: modelbased insights from a randomized field experiment. Journal of Marketing Research 52 (3), 375–393. Hollenbeck, B., 2018. Online reputation mechanisms and the decreasing value of brands. Journal of Marketing Research 55 (5), 636–654. Honka, E., 2014. Quantifying search and switching costs in the US auto insurance industry. The Rand Journal of Economics 45 (4), 847–884. Honka, E., Chintagunta, P., 2017. Simultaneous or sequential? Search strategies in the U.S. auto insurance industry. Marketing Science 36 (1), 21–42. Horton, J.J., Zeckhauser, R.J., 2016. Owning, Using and Renting: Some Simple Economics of the “Sharing Economy”. NBER Working Paper No. 22029. Hossain, T., Morgan, J., 2006. ... Plus shipping and handling: revenue (non)equivalence in field experiments on eBay. Advances in Economic Analysis and Policy 6 (3). Hotelling, H., 1929. Stability in competition. The Economic Journal 39 (153), 41–57. Hui, X., Saeedi, M., Shen, Z., Sundaresan, N., 2016. Reputation and regulations: evidence from eBay. Management Science 62 (12), 3604–3616. Israeli, A., 2018. Online MAP enforcement: evidence from a quasi-experiment. Marketing Science, 539–564. Iyer, G., Soberman, D., Villas-Boas, M., 2005. The targeting of advertising. Marketing Science 24 (3), 461. Jeziorski, P., Moorthy, S., 2017. Advertiser prominence effects in search advertising. Management Science 64 (3), 1365–1383.

References

Johnson, E.J., Moe, W.W., Fader, P.S., Bellman, S., Lohse, G.L., 2004. On the depth and dynamics of online search behavior. Management Science 50 (3), 299–308. Johnson, G., 2014. The impact of privacy policy on the auction market for online display advertising. Available at SSRN 2333193. Johnson, G.A., Lewis, R.A., Nubbemeyer, E.I., 2017a. Ghost ads: improving the economics of measuring online ad effectiveness. Journal of Marketing Research 54 (6), 867–884. Johnson, G.A., Lewis, R.A., Reiley, D.H., 2017b. When less is more: data and power in advertising experiments. Marketing Science 36 (1), 43–53. Johnson, G.A., Shriver, S., Du, S., 2017c. Consumer Privacy Choice in Online Advertising: Who Opts Out and at What Cost to Industry? Mimeo. Boston University. Joo, M., Wilbur, K.C., Cowgill, B., Zhu, Y., 2014. Television advertising and online search. Management Science 60 (1), 56–73. Jullien, B., 2012. Two-sided B to B platforms. In: Peitz, M., Waldfogel, J. (Eds.), Oxford Handbook of the Digital Economy. Oxford University Press, pp. 161–185. Kalyanam, K., McAteer, J., Marek, J., Hodges, J., Lin, L., 2017. Cross channel effects of search engine advertising on brick & mortar retail sales: meta analysis of large scale field experiments on Google.com. Quantitative Marketing and Economics 16 (1), 1–42. Keller, K.L., 2003. Strategic Brand Management, second edition. Prentice Hall. Lambrecht, A., Misra, K., 2016. Fee or free: when should firms charge for online content? Management Science 63 (4), 1150–1165. Lambrecht, A., Tucker, C., 2013. When does retargeting work? Information specificity in online advertising. Journal of Marketing Research 50 (5), 561–576. Lambrecht, A., Tucker, C., Wiertz, C., 2018. Advertising to early trend propagators: evidence from Twitter. Marketing Science 37 (2), 177–199. Lambrecht, A., Tucker, C.E., 2018. Algorithmic bias? An empirical study into apparent gender-based discrimination in the display of STEM career ads. Management Science. Forthcoming. Levin, J., Milgrom, P., 2010. Online advertising: heterogeneity and conflation in market design. The American Economic Review 100 (2), 603–607. Lewis, R., Nguyen, D., 2015. Display advertising’s competitive spillovers to consumer search. Quantitative Marketing and Economics 13 (2), 93–115. Lewis, R.A., Justin, M.R., Reiley, D.H., 2011. Here, there, and everywhere: correlated online behaviors can lead to overestimates of the effects of advertising. In: Proceedings of the 20th ACM International World Wide Web Conference [WWW’11], pp. 157–166. Lewis, R.A., Rao, J.M., 2015. The unfavorable economics of measuring the returns to advertising. The Quarterly Journal of Economics 130 (4), 1941. Lewis, R.A., Reiley, D.H., 2014. Online ads and offline sales: measuring the effect of retail advertising via a controlled experiment on Yahoo! Quantitative Marketing and Economics 12 (3), 235–266. Li, H.A., Kannan, P.K., Viswanathan, S., Pani, A., 2016. Attribution strategies and return on keyword investment in paid search advertising. Marketing Science 35 (6), 831–848. Li, X., MacGarvie, M., Moser, P., 2015. Dead Poet’s Property – How Does Copyright Influence Price? Working Paper 21522. National Bureau of Economic Research. Liu, Y., Zhang, Z.J., 2006. Research note—the benefits of personalized pricing in a channel. Marketing Science 25 (1), 97–105. Lodish, L.M., Abraham, M., Kalmenson, S., Livelsberger, J., Lubetkin, B., Richardson, B., Stevens, M.E., 1995. How T.V. advertising works: a meta-analysis of 389 real world split cable T.V. advertising experiments. Journal of Marketing Research 32 (2), 125–139. Luca, M., 2016. Reviews, Reputation, and Revenue: The Case of Yelp.com. Harvard Business School NOM Unit Working Paper 12-016. Luca, M., Zervas, G., 2016. Fake it till you make it: reputation, competition, and Yelp review fraud. Management Science 62 (12), 3412–3427. Lucking-Reiley, D., Bryan, D., Prasad, N., Reeves, D., 2007. Pennies from eBay: the determinants of price in online auctions. Journal of Industrial Economics 55 (2), 223–233.

287

288

CHAPTER 5 Digital marketing

Manchanda, P., Dube, J.-P., Goh, K.Y., Chintagunta, P.K., 2006. The effect of banner advertising on Internet purchasing. Journal of Marketing Research 43 (1), 98–108. Mayzlin, D., Dover, Y., Chevalier, J., 2014. Promotional reviews: an empirical investigation of online review manipulation. The American Economic Review 104 (8), 2421–2455. Miller, A., Tucker, C., 2011. Can healthcare information technology save babies? Journal of Political Economy 119 (2), 289–324. Miller, A., Tucker, C., 2014. Health information exchange, system size and information silos. Journal of Health Economics 33 (2), 28–42. Miller, A.R., Tucker, C., 2009. Privacy protection and technology diffusion: the case of electronic medical records. Management Science 55 (7), 1077–1093. Miller, A.R., Tucker, C., 2018. Privacy protection, personalized medicine, and genetic testing. Management Science 64 (10), 4648–4668. Moe, W.W., Schweidel, D.A., 2012. Online product opinions: incidence, evaluation, and evolution. Marketing Science 31 (3), 372–386. Mortimer, J.H., Nosko, C., Sorensen, A., 2012. Supply responses to digital distribution: recorded music and live performances. Information Economics and Policy 24 (1), 3–14. Moshary, S., Blake, T., Sweeney, K., Tadelis, S., 2017. Price Salience and Product Choice. Working Paper. University of Pennsylvania. Muchnik, L., Aral, S., Taylor, S.J., 2013. Social influence bias: a randomized experiment. Science 341 (6146), 647–651. Murthi, B., Sarkar, S., 2003. The role of the management sciences in research on personalization. Management Science 49 (10), 1344–1362. Narayanan, S., Kalyanam, K., 2015. Position effects in search advertising and their moderators: a regression discontinuity approach. Marketing Science 34 (3), 388–407. Netzer, O., Feldman, R., Goldenberg, J., Fresko, M., 2012. Mine your own business: market-structure surveillance through text mining. Marketing Science 31 (3), 521–543. Nosko, C., Tadelis, S., 2015. The Limits of Reputation in Platform Markets: An Empirical Analysis and Field Experiment. Working Paper 20830. National Bureau of Economic Research. Orlov, E., 2011. How does the Internet influence price dispersion? Evidence from the airline industry. Journal of Industrial Economics 59 (1), 21–37. Peitz, M., Waelbroeck, P., 2006. Why the music industry may gain from free downloading – the role of sampling. International Journal of Industrial Organization 24 (5), 907–913. Peterson, R.A., Balasubramanian, S., Bronnenberg, B.J., 1997. Exploring the implications of the Internet for consumer marketing. Journal of the Academy of Marketing Science 25 (4), 329–346. Pope, D.G., Sydnor, J.R., 2011. What’s in a picture? Evidence of discrimination from Prosper.com. The Journal of Human Resources 46 (1), 53–92. Pozzi, A., 2013. E-commerce as a stockpiling technology: implications for consumer savings. International Journal of Industrial Organization 31 (6), 677–689. Prince, J., 2007. The beginning of online/retail competition and its origins: an application to personal computers. International Journal of Industrial Organization 25 (1), 139–156. Quan, T.W., Williams, K.R., 2018. Product variety, across-market demand heterogeneity, and the value of online retail. The Rand Journal of Economics 49 (4), 877–913. Rao, A., Hartmann, W.R., 2015. Quality vs. variety: trading larger screens for more shows in the era of digital cinema. Quantitative Marketing and Economics 13 (2), 117–134. Rao, V.R., 1984. Pricing research in marketing: the state of the art. Journal of Business, S39–S60. Reimers, I., 2016. Can private copyright protection be effective? Evidence from book publishing. The Journal of Law and Economics 59 (2), 411–440. Resnick, P., Zeckhauser, R., 2002. Trust among strangers in Internet transactions: empirical analysis form eBay auctions. In: Baye, M. (Ed.), Advances in Applied Microeconomics (vol. 11). Elsevier Science, Amsterdam, pp. 667–719. Rochet, J.-C., Tirole, J., 2003. Platform competition in two-sided markets. Journal of the European Economic Association 1 (4), 990–1029.

References

Rutz, O.J., Bucklin, R.E., 2012. Does banner advertising affect browsing for brands? Clickstream choice model says yes, for some. Quantitative Marketing and Economics 10 (2), 231–257. Sahni, N.S., 2015. Effect of temporal spacing between advertising exposures: evidence from online field experiments. Quantitative Marketing and Economics 13 (3), 203–247. Schwartz, E.M., Bradlow, E.T., Fader, P.S., 2017. Customer acquisition via display advertising using multiarmed bandit experiments. Marketing Science 36 (4), 500–522. Scott Morton, F., Zettelmeyer, F., Silva-Risso, J., 2003. Consumer information and discrimination: does the Internet affect the pricing of new cars to women and minorities? Quantitative Marketing and Economics 1 (1), 65–92. Seamans, R., Zhu, F., 2014. Responses to entry in multisided markets. The impact of craigslist on newspapers. Management Science 60 (2), 476–493. Seim, K., Sinkinson, M., 2016. Mixed pricing in online marketplaces. Quantitative Marketing and Economics 14 (2), 129–155. Shapiro, C., 1983. Premiums for high quality products as returns to reputation. The Quarterly Journal of Economics 98 (4), 659–680. Shapiro, C., Varian, H.R., 1998. Information Rules: A Strategic Guide to the Network Economy. Harvard Business School Press, Boston. Shin, J., Sudhir, K., 2010. A customer management dilemma: when is it profitable to reward one’s own customers? Marketing Science 21 (4), 671–689. Simonov, A., Nosko, C., Rao, J.M., 2018a. Competition and crowd-out for brand keywords in sponsored search. Marketing Science 37 (2), 200–215. Simonov, A., Nosko, C., Rao, J.M., 2018b. Competition and crowd-out for brand keywords in sponsored search. Marketing Science 37 (2), 200–215. Sinai, T., Waldfogel, J., 2004. Geography and the Internet: is the Internet a substitute or a complement for cities? Journal of Urban Economics 56 (1), 1–24. Smith, M.D., Bailey, J., Brynjolfsson, E., 2001. Understanding digital markets: review and assessment. In: Brynjolfsson, E., Kahin, B. (Eds.), Understanding the Digital Economy: Data, Tools, and Research. MIT Press, pp. 99–136. Sridhar, S., Sriram, S., 2015. Is online newspaper advertising cannibalizing print advertising? Quantitative Marketing and Economics 13 (4), 283–318. Stigler, G.J., 1961. The economics of information. Journal of Political Economy 69 (3), 213–225. Sweeney, L., 2013. Discrimination in online ad delivery. ACM Queue 11 (3), 10. Tadelis, S., 1999. What’s in a name? Reputation as a tradeable asset. The American Economic Review 89 (3), 548–563. Taylor, C.R., 2004. Consumer privacy and the market for customer information. The Rand Journal of Economics 35 (4), 631–650. Toubia, O., Stephen, A.T., 2013. Intrinsic vs. image-related utility in social media: why do people contribute content to Twitter? Marketing Science 32 (3), 368–392. Tucker, C., 2012. The economics of advertising and privacy. International Journal of Industrial Organization 30 (3), 326–329. Tucker, C., 2014. Social networks, personalized advertising, and privacy controls. Journal of Marketing Research 51 (5), 546–562. Tucker, C., Zhang, J., 2011. How does popularity information affect choices? A field experiment. Management Science 57 (5), 828–842. Valentino-Devries, J., Singer-Vine, J., Soltan, A., 2012. Websites vary prices, deals based on users’ information. The Wall Street Journal. Varian, H., 2007. Position auctions. International Journal of Industrial Organization 25 (6), 1163–1178. Varian, H.R., 1980. A model of sales. The American Economic Review 70 (4), 651–659. Varian, H.R., 2005. Copying and copyright. The Journal of Economic Perspectives 19 (2), 121–138. Verhoef, P.C., Kannan, P.K., Inman, J.J., 2015. From multi-channel retailing to omni-channel retailing: introduction to the special issue on multi-channel retailing. Journal of Retailing 91 (2), 174–181. Vernik, D.A., Purohit, D., Desai, P.S., 2011. Music downloads and the flip side of digital rights management. Marketing Science 30 (6), 1011–1027.

289

290

CHAPTER 5 Digital marketing

Villas-Boas, J.M., 2004. Price cycles in markets with customer recognition. The Rand Journal of Economics 35 (3), 486–501. Waldfogel, J., 2010. Music file sharing and sales displacement in the iTunes era. Information Economics and Policy 22 (4), 306–314. Waldfogel, J., 2012. Copyright research in the digital age: moving from piracy to the supply of new products. The American Economic Review 102 (3), 337–342. Waldfogel, J., 2016. Cinematic explosion: new products, unpredictability and realized quality in the digital era. Journal of Industrial Economics 64 (4), 755–772. Waldfogel, J., 2018. Digital Renaissance: What Data and Economics Tell Us About the Future of Popular Culture. Princeton University Press. Waldfogel, J., Chen, L., 2006. Does information undermine brand? Information intermediary use and preference for branded web retailers. Journal of Industrial Economics 54 (4), 425–449. Waldfogel, J., Reimers, I., 2015. Storming the gatekeepers: digital disintermediation in the market for books. Information Economics and Policy 31 (C), 47–58. Wang, K., Goldfarb, A., 2017. Can offline stores drive online sales? Journal of Marketing Research 54 (5), 706–719. Xu, K., Chan, J., Ghose, A., Han, S.P., 2017. Battle of the channels: the impact of tablets on digital commerce. Management Science 63 (5), 1469–1492. Yao, S., Mela, C.F., 2011. A dynamic model of sponsored search advertising. Marketing Science 30 (3), 447–468. Yoganarasimhan, H., 2012. Impact of social network structure on content propagation: a study using YouTube data. Quantitative Marketing and Economics 10 (1), 111–150. Zentner, A., 2006. Measuring the effect of file sharing on music purchases. The Journal of Law and Economics 49 (1), 63–90. Zentner, A., Smith, M., Kaya, C., 2013. How video rental patterns change as consumers move online. Management Science 59 (11), 2622–2634. Zervas, G., Proserpio, D., Byers, J.W., 2017. The rise of the sharing economy: estimating the impact of Airbnb on the hotel industry. Journal of Marketing Research 54 (5), 687–705. Zettelmeyer, F., Scott Morton, F., Silva-Risso, J., 2001. Internet car retailing. Journal of Industrial Economics 49 (4), 501–519. Zettelmeyer, F., Scott Morton, F., Silva-Risso, J., 2006. How the Internet lowers prices: evidence from matched survey and automobile transaction data. Journal of Marketing Research 43 (2), 168–181. Zhang, J., Liu, P., 2012. Rational herding in microloan markets. Management Science 58 (5), 892–912. Zhang, L., 2018. Intellectual property strategy and the long tail: evidence from the recorded music industry. Management Science 64 (1), 24–42. Zhu, Y., Wilbur, K.C., 2011. Hybrid advertising auctions. Marketing Science 30 (2), 249–273.

CHAPTER

The economics of brands and branding✩

6

Bart J. Bronnenberga,b,∗ , Jean-Pierre Dubéc,d , Sridhar Moorthye a Tilburg

School of Economics and Management, Tilburg University, Tilburg, The Netherlands b CEPR, London, United Kingdom c Booth School of Business, University of Chicago, Chicago, IL, United States d NBER, Cambridge, MA, United States e Rotman School of Management, University of Toronto, Toronto, ON, Canada ∗ Corresponding author: e-mail address: [email protected]

Contents 1 Introduction ...................................................................................... 2 Brand equity and consumer demand ......................................................... 2.1 Consumer brand equity as a product characteristic .......................... 2.2 Brand awareness, consideration, and consumer search ..................... 2.2.1 The consumer psychology view on awareness, consideration, and brand choice..................................................................... 2.2.2 Integrating awareness and consideration into the demand model .... 2.2.3 An econometric specification................................................. 2.2.4 Consideration and brand valuation .......................................... 3 Consumer brand loyalty ........................................................................ 3.1 A general model of brand loyalty ................................................ 3.2 Evidence of brand choice inertia ................................................ 3.3 Brand choice inertia, switching costs, and loyalty ............................ 3.4 Learning from experience ........................................................ 3.5 Brand advertising goodwill ....................................................... 4 Brand value to firms............................................................................. 4.1 Brands and market structure..................................................... 4.2 Measuring brand value............................................................ 4.2.1 Reduced-form approaches using price and revenue premia .......... 4.2.2 Structural models ............................................................... 5 Branding and firm strategy ..................................................................... 5.1 Brand as a product characteristic ............................................... 5.2 Brands and reputation ............................................................ 5.3 Branding as a signal...............................................................

292 293 293 299 299 302 304 306 307 307 308 311 314 318 319 319 321 322 323 327 328 331 335

✩ Dubé acknowledges the support of the Kilts Center for Marketing and Moorthy acknowledges the

support of the Social Sciences and Humanities Research Council of Canada. The authors thank Tülin Erdem, Pedro Gardete, Avi Goldfarb, Brett Hollenbeck, Carl Mela, Helena Pedrotti, Martin Peitz, Sudhir Voleti, and two anonymous reviewers for comments and suggestions. Handbook of the Economics of Marketing, Volume 1, ISSN 2452-2619, https://doi.org/10.1016/bs.hem.2019.04.003 Copyright © 2019 Elsevier B.V. All rights reserved.

291

292

CHAPTER 6 The economics of brands and branding

5.4 Umbrella branding................................................................. 5.4.1 Empirical evidence ............................................................. 5.4.2 Umbrella branding and reputation .......................................... 5.4.3 Umbrella branding and product quality signaling ........................ 5.5 Brand loyalty and equilibrium pricing .......................................... 5.6 Brand loyalty and early-mover advantage ...................................... 6 Conclusions ...................................................................................... References............................................................................................

338 338 339 341 344 345 347 349

1 Introduction The economics literature has long puzzled over the concept of brand preference and consumer willingness to pay a price premium for a product differentiated by little other than its brand. In blind taste tests, consumers are often unable to distinguish between their preferred brands and other competing products (Husband and Godfrey, 1934; Thumin, 1962; Allison and Uhl, 1964, p. 336). Nevertheless, branding and brand advertising are perceived to be important investments in sustainable market power: “A well-known soap-flake which is a branded article costs £150,000 per year to advertise. The price of two unadvertised soap-flakes is considerably less (one of them by more than 50 per cent) than that of the advertised product. Chemically there is absolutely no difference between the advertised product and the two unadvertised soap-flakes. Advertisement alone maintains the fiction that this soap-flake is something superfine. If the advertisement were stopped, the product would become merely one of a number of soap-flakes and would have to be sold at ordinary soap-flake prices. Yet the success of the undertaking, from the producer’s point of view, may be seen from the fact that this product brings in half a million net profit per year.” (Braithwaite, 1928, p. 30)

Brands have also long been recognized as invaluable assets to firms that create barriers to entry and contribute to supranormal economic profits: “The advantage to established sellers accruing from buyer preferences for their products as opposed to potential-entrant products is on the average larger and more frequent in occurrence at large values than any other barrier to entry.” (Bain, 1956, p. 216)

The conceptual meaning of a brand has evolved over time. According to the Oxford English Dictionary, the word “brand” originated in the 10th century. During the 1600s, the term brand was used in the American colonies to designate “mark of ownership impressed on cattle” (Kerschner and Geraghty, 2006, p. 21). Since the nineteenth century, the term brand has taken on a commercial role as “a trademark, whether made by burning or otherwise” on items ranging from wine and liquor to timber and metals (Murray, 1887, p. 1055). Current marketing practice interprets the

2 Brand equity and consumer demand

brand as “a name, symbol, design, or mark that enhances the value of a product beyond its functional purpose” where the added value of these enhancements to the basic product are often broadly termed “brand equity” (Farquar, 1989, p. 24). On the demand side, this added value can comprise consumption benefits such as image and information benefits such as quality reputation. On the supply side, industry experts associate very high economic value with the commercial rights to leading brands, with reported valuations in the billions of US dollars.1,2 This chapter discusses the economics of brands and branding to understand their impact on the formation of industrial market structures in consumer goods industries. We review the academic literature analyzing the underlying economic mechanisms through which consumers form brand preferences, on the demand side, and the economic incentives for firms to invest in the creation and maintenance of brands, on the supply side. Our discussion builds on earlier surveys of marketing science models of brand equity (e.g., Erdem and Swait, 2014). However, we refer readers seeking a psychological foundation of brands and branding to Muthukrishnan (2015) and Schmitt (2012). We have organized this chapter around the following topics. Section 2 discusses two principal roles of brands in affecting demand. First, we discuss how brands affect preferences, and incorporate brand preferences into a neoclassical “characteristics” model of demand. Here, we discuss how consumer brand equity is estimated from consumer choice data. Second, we discuss the role of brands in generating awareness, directing attention and consumer search, and determining the composition of the consideration sets from which brand choices are made. In Section 3, we focus on the formation of consumer brand preferences over time and the emergence of “brand loyalty.” Section 4 discusses brand value estimation from the firm’s point of view. In Section 5 we discuss the strategic considerations for firms to create brand value through reputations, the investment in brand capital, and potentially extending the use of a brand name across products marketed under a common brand umbrella. Finally, Section 6 concludes.

2 Brand equity and consumer demand 2.1 Consumer brand equity as a product characteristic In this subsection, we focus on characteristics models of demand and the role of brand as a quality-enhancing product feature. The incorporation of product quality

1 For instance, according to Forbes magazine, the 100 most valuable brands in 2017 represent a global value of US$ 1.95 trillion. 2 The broad use of the term “brand equity” in reference to both consumer and firm benefits creates confusion. In some of the literature, the added value of brand enhancements to consumers are termed “brand equity” whereas the add value of brand enhancements to firms are termed “brand value” (see for instance Goldfarb et al., 2008).

293

294

CHAPTER 6 The economics of brands and branding

into the modeling of consumer preferences represented a turning point in the consumption literature, allowing for a more granular analysis of product-level demand as opposed to commodity-group-level demand (Houthakker, 1953). The role of quality was formalized into a “characteristics approach.” The product is defined as a bundle of characteristics. Consumers have identical perceptions of the objectively measured characteristics comprising a product, and have potentially heterogeneous and subjective preferences for these characteristics (Lancaster, 1971; Baumol, 1967; Rosen, 1974). Early work in characteristics models of demand focused purely on objective attributes and did not consider brand. Unlike objective product characteristics, consumer brand preferences (or “brand equity”) typically comprises intangible, psychological factors and benefits. For instance, Keller’s (1993, p. 3) conceptual model of brand equity starts with a consumer’s brand knowledge, or “brand node in her memory to which a variety of associations are linked.” These associations in memory include the consumer’s brand awareness and her perceptions of the brand, or “brand image.” But, the psychological mechanism through which brand equity affects a consumer’s utility from a product presents a challenge for the neoclassical economic model. Economists have historically shied away from the psychological foundations of preferences: “The economist has little to say about the formation of wants; this is the province of the psychologist. The economist’s task is to trace the consequences of a given set of wants.” Friedman (1962, p. 13)

Not surprisingly, early micro-econometric work took a simplified view of the brand as a mark that merely identifies a specific product and links it to a supplier. In his hedonic specification, Rosen (1974, p. 36) explained: “The terms ‘product,’ ‘model,’ ‘brand,’ and ‘design’ are used interchangeably to designate commodities of given quality or specification.” Accordingly, Rosen (1974, p. 37) assumed: “If two brands offer the same bundle, but sell for different prices, consumers only consider the less expensive one, and the identity of sellers is irrelevant to their purchase decisions.” So the traditional characteristics approach assumes the consumer derives no preference from the brand itself other than through the objective product characteristics. Brand choice (i.e. “demand”) is therefore governed entirely by the brand’s objective characteristics. A micro-econometric demand specification that excludes brand preferences would have limited predictive power in many product markets. According to the standard characteristics model, “two brands which have approximately the same attribute values should have approximately the same market shares” (Srinivasan, 1979, p. 12), a prediction that is frequently rejected by actual market share data (see, for example, the brand share analysis in Bronnenberg et al., 2007). Blind taste tests with experienced consumers also reveal a strong role for brand. In comparisons of blinded and unblinded taste tests that hold all the attributes of popular national brands fixed except the brand labeling on the packaging, experienced consumers routinely exhibit different preference orderings (Husband and Godfrey, 1934; Thumin, 1962; Allison

2 Brand equity and consumer demand

and Uhl, 1964). In Allison and Uhl’s (1964) study, subjects—males who drank beer at least three times a week—tasted six bottles of beer over a week, first blind, with no brand identifiers, and then non-blind, with all the brand identifiers present. In the blind tasting, the six bottles of beer were actually three different brands with “taste differences discernible to expert taste testers.” In the non-blind tasting, the six bottles were actually six different brands—the three that they had originally tasted blind, plus three additional brands. After each tasting, subjects were asked to evaluate the beers, overall, and on particular product attributes such as “after-taste,” “aroma,” and “carbonation.” In the blind tasting, subjects generally rated all the beers to be about the same quality—including the brand that they drank most often. However, unblinded, subjects rated each of the original three beer brands higher and the increases in evaluation varied across brands. Subjects generally rated “their” brands as significantly better than the others even though they could not distinguish them in the blind test. Allison and Uhl conclude: “Participants, in general, did not appear to be able to discern the taste differences among the various beer brands, but apparently labels, and their associations, did influence their evaluations.3 Ratchford (1975) was an early study that acknowledged the close connection between the characteristics approach in economics and the multi-attribute psychometric approaches (e.g., Green and Srinivasan, 1978; Wilkie and Pessemier, 1974) used in consumer psychology to describe and measure brand preferences and brand attitudes. The lab-based nature of psychometric and stated-preference measures limited their broad applicability to the analysis of consumer purchase data in the field. A parallel literature using stated-preference data, or conjoint analysis,4 instead defined the brand equity as a residual which can be measured as a separate brand fixed effect in addition to the other objective product characteristics (Green and Wind, 1975; Srinivasan, 1979). With the advent of consumer shopping panel data, this same approach to brand equity was incorporated into empirical brand choice models (Guadagni and Little, 1983) derived from random utility theory. We now explore such quantitative models of brand choice. More formally, we consider the following discrete choice or “brand choice” formulation of demand. Suppose consumers have unit-elastic demands for j = 1, ..., J perfectly substitutable branded goods in a category.5 We also allow for a J + 1st “outside good” which we interpret for now as the non-purchase choice. Assume the consumer derives the following choice-specific utilities (i.e., conditional indirect util-

3 These findings would later inspire the famous “Pepsi Challenge” campaign during the 1970s in which

subjects exhibited a more than 50% chance of choosing Pepsi over Coca Cola in a blind taste test (http:// www.businessinsider.com/pepsi-challenge-business-insider-2013-5). 4 The conjoint approach to preference estimation defines consumers’ product preference by conjoining her tastes for the product’s underlying attributes, much like the “characteristics approach.” 5 Following Rosen (1974), we make the discrete choice assumption for ease of presentation.

295

296

CHAPTER 6 The economics of brands and branding

ities):

vj = U ψj , y − pj + εj , j = 1, ..., J (1) vJ +1 = U (0, y) + εJ +1

where εj is a random utility component for product j , y is the consumer’s budget, and pj is the price of product j . It is straightforward to include additional controls for point-of-sale marketing variables, such as in-store merchandizing like displays, in the model. The key object of interest in our discussion of brand preference is ψj , the consumer’s total perceived value from brand j (Guadagni and Little, 1983; Kamakura and Russell, 1989; Louviere and Johnson, 1988; Kamakura and Russell, 1993). In principle, the sign and magnitude of ψj can vary across customers so that branding can lead to both horizontal and vertical sources of differentiation. In Section 5, we discuss how firms endogenously make branding decisions on the supply side. The brand choice literature has proposed various methods to extract the intrinsic perceived brand value from ψj . Kamakura and Russell (1993) propose a framework to reconcile the gap between the psychological components of brand preference and the objective product attributes. They use a hierarchical structure that decomposes total brand value as follows ψ j = x j β + γj

(2)

where xj are the objectively measured product attributes, β is a vector of corresponding attribute tastes, and γj is an intrinsic utility for the intangible and psychological components of brand j . For the remainder of our discussion, we will refer to γj as the intrinsic value of brand j in reference to the added benefits beyond the usual consumption benefits associated with the objective attributes, xj . This decomposition reveals a potential identification problem if the attributes of a given brand j do not vary over time or across consumer. In this case, the marginal utilities of all the attributes, β, and the perceptual features of brand j , γj , are not separately identified. Kamakura and Russell (1993) impose additional parameter restrictions to resolve the problem in an application to consumer purchase data. Using stated-preference data, such as a conjoint experiment, circumvents the problem by randomizing the attributes xj . Alternatively, when more granular, individual product- or so called stock-keepingunit (SKU)-level data are available, the researcher can exploit the fact that a common brand name may be applied across multiple SKUs with different objective attributes such as pack size, packaging format, and flavor (e.g., Fader and Hardie, 1996). In Section 5, we discuss how firms can create brand differentiation through reputation even when products are otherwise undifferentiated (i.e., xj = xk , ∀j, k for all objective attributes). The total intrinsic brand value in Eq. (2) can be augmented to include subjective and perceptual aspects of the brand, such as biases in consumer perceptions of the objective attributes (Park and Srinivasan, 1994) and image associations. Typically these psychological attributes are elicited through consumer surveys (see Keller and Lehmann, 2006 for a discussion).

2 Brand equity and consumer demand

The estimated residual, γˆj , is typically interpreted as the brand equity or brandrelated value. We can then derive a natural micro-econometric measure of the total economic value to the consumer associated with the brand equity of product j using the classic Hicksian compensating differential (Hicks, 1939). The Hicksian compensating differential consists of the monetary transfer to the consumer to make her indifferent between the factual choice set in which brand j offers brand equity and a counter-factual choice set in which brand j no longer offers brand equity, all else equal. Researchers often use the term willingness-to-pay (WTP) since the compensating differential is equivalent to the maximum dollar amount a customer should objectively be willing to pay to retain brand j ’s equity in the choice set, all else equal.6 Suppose we assume quasi-linear utility, U ψj , y − pj = ψj + θ y − pj , where θ is the marginal utility of income (see Chapter 1 of this volume for more discussion). When the random utility shocks are also assumed ε ∼ i.i.d. EV(0, 1), we get the multinomial logit demand system and the willingness-to-pay for brand j ’s equity is: J exp (U (·; β, γk )) ln 1 + k=1 brand WTPj = θ J exp U ·; β, γk=j , γj = 0 ln 1 + −

k=1

θ

dF ()

(3)

where F () is the distribution reflecting the researcher’s statistical uncertainty over all the model parameters, .7 Swait et al. (1993) propose a similar measure, termed “Equalization Price,” which measures the compensating differential without taking into account the role of the random utility shocks, ε. Since the estimation of the brand intercepts typically requires a normalization, the exact interpretation of WTPbrand j depends on the definition of the base choice against which the brand intercepts are measured, typically the “no purchase” choice which is assumed to offer no brand equity. A more comprehensive set of survey-based, perceptual measures such as brand attitudes, consumer opinions, perceived fit, and reliability can also be incorporated into the analysis (e.g., Swait et al., 1993). In practice, some researchers use a simpler monetary measure of the brand equity based on the equivalent price reduction (e.g., Louviere and Johnson, 1988; Sonnier et al., 2007): γj BEj = . (4) θ 6 The terminology WTP dates back at least to Trajtenberg (1989) and is used throughout the literature on the value of new products and the value of product features. 7 Many applications also allow the equilibrium prices to adjust in response to the demand shift associated with the removal of brand j ’s equity, γj = 0.

297

298

CHAPTER 6 The economics of brands and branding

Holding all else constant, this price reduction ensures that the consumer has the same expected probability of buying brand j in the counterfactual scenario where γj = 0, i.e., where the intrinsic utility for the intangible, psychological components of brand j are absent. In practice, researchers typically plug point estimates of γ and θ in (4). Formally, one ought to use the correct expected incremental utility that takes into account the statistical uncertainty in the estimates: γj BEj = dF () (5) θ where, as before, F () is the distribution reflecting the researcher’s statistical uncertainty over the model parameters, . It is straightforward to show that (5) is identical to the willingness-to-pay for brand j ’s equity only in the extreme case where consumer utility is deterministic (i.e. there is no random utility component), brand j is the only available product and the consumer is forced to purchase it.8 Another advantage of using WTPbrand as in (3) versus (5) to measure brand equity is that the j former will vary depending on how we combine brand j with other product features and prices. In many demand studies, the intrinsic brand value, γj , is treated as a nuisance parameter that controls for all the intangible aspects of a product that are either difficult or impossible to measure objectively. In this regard, the brand intercepts improve the predictive power of the model. The non-parametric manner in which γj controls for brand preference is, however, both a blessing and a curse. Brand value research often interprets the estimated residual, γˆj , as marketing-based component of brand equity (e.g., Park and Srinivasan, 1994), in contrast with the product-based component captured by the objective product attributes, xj . An obvious limitation of this approach is that any omitted product characteristics will be loaded into γˆj . So brand equity measures like (4) and (3) should probably be interpreted as noisy measures of the marketing-based component of brand equity. An additional limitation is that the model treats perceived brand equity as a static feature of the product without providing any insight into the formation of brand preferences. In Section 5.2, we discuss an alternative informative theory of brand equity that assumes there is no intrinsic brand preference. Rather, the brand name conveys an informative signal used by the consumer to infer product quality through the brand’s marketing and/or reputation.9 In Section 3, we extend our discussion to a dynamic setting in which the consumer’s 8 Once we include random utility shocks into the model, BE is no-longer a welfare measure. j 9 As noted by Nelson (1974) and Sutton (1991), firms spend large amounts of money in seemingly unin-

formative advertising in categories that are essentially commodities. While Nelson (1974) has argued that this sort of advertising might signal product quality indirectly, the empirical evidence for his hypothesis is mixed at best. Caves and Greene (1996a, p. 50), after examining 196 product categories, conclude that “[t]hese results suggest that quality-signalling is not the function of most advertising of consumer goods”; Bagwell’s (2007, p. 1748) review is only slightly more circumspect: “... the studies described here do not offer strong support for the hypothesis of a systematic positive relationship between advertising and product quality.”

2 Brand equity and consumer demand

brand equity evolves over time through her past consumption and marketing experiences. A final concern is that the measures above fail to account for the supply side. If demand for a branded good is fundamentally altered or if a branded good is excluded from the market, then equilibrium prices would likely re-adjust on the supply side (along with other marketing decisions). In Section 4, we discuss firms’ branding strategies. The “characteristics approach” to consumer brand value described above is the most common approach to deriving and measuring the economic value for a brand. Becker and Murphy (1993) proposed an alternative “complementary goods” theory of brand value whereby the market good and its brand are both complementary consumption goods in the sense of Hicks and Allen (1934). In this framework, brand equity would need to be defined through the complementary effect of the consumption of the brand/branding and the consumption of the corresponding physical good. To the best of our knowledge, Kamenica et al. (2013) provide the only direct evidence for this theory of brands.10 They conduct randomized clinical trials to test whether the treatment effect of direct-to-consumer advertising has a causal effect on a subject’s physiological reaction to a drug. In particular, a branded antihistamine was found to be more effective when subjects were exposed to that brand’s advertising as opposed to a competitor brand’s advertising. Tests for brand effects as complementary goods (Becker and Murphy, 1993) and the specification of a demand system with complementary goods (e.g., Song and Chintagunta, 2007) are beyond the scope of this chapter.

2.2 Brand awareness, consideration, and consumer search 2.2.1 The consumer psychology view on awareness, consideration, and brand choice The Lancasterian model described above takes the extreme view that the consumer has complete information about the set of available brands and their attributes at the time of purchase. In the psychology literature on consumer behavior, Lynch and Srull (1982) refer to this scenario as “stimulus-based choice.” At the opposite extreme, the consumer uses a pure “memory-based choice” (Bettman, 1979), whereby all relevant choice information must be recalled from memory. As explained in Alba et al. (1991), in practice most brand purchase contexts will require at least some reliance on recalled information. Even when the purchase environment (e.g. the shelf display in a store) contains all the relevant brand and attribute information, the complexity

10 An indirect test of Becker and Murphy’s (1993) theory exploits the Slutsky symmetry condition by test-

ing whether a shift in demand for the physical good increases the consumption of the brand’s advertising. Tuchman et al. (2015) use data that match household-level time-shifted television viewing on digital video recorders with in-store shopping behavior. They find that in-store promotions that increase a household’s consumption of a brand cause an increase in the household’s propensity to watch (i.e. not skip) that same brand’s commercials.

299

300

CHAPTER 6 The economics of brands and branding

of the task, the ease with which certain brands are noticed relative to others, and the consumer’s time cost or effort can all lead to reliance on recalled information. In the brand choice literature, researchers studying brand choice under incomplete information distinguish between limited awareness and consideration. A consumer’s brand awareness comprises the set of brands recalled from her memory. This set may be much broader than the subset of brands the consumer evaluates more seriously for choice (Campbell, 1969), the so-called “evoked set” (Howard and Sheth, 1969), or “consideration set” (Wright and Barbour, 1977). However, the concept of awareness precedes that of consideration, i.e., any brand associations that facilitate recall will, in turn, influence a brand’s inclusion in the consideration set (Keller, 1993).

Awareness The extent of awareness for a given brand in the market has been studied in a number of ways. Laurent et al. (1995) define the unaided awareness for a brand as the fraction of households who spontaneously recall a specific brand when asked about choice options in a category. A related measure, top-of-mind awareness for a brand, indicates the fraction of households who spontaneously recall that brand as the first one when prompted. Aided awareness measures the fraction of households that recognize a specific brand name from a given list of brands in a category. Most studies find that a consumer’s brand awareness within a product category is quite limited. Laurent et al. (1995) report that unaided brand awareness in a product category is 15-20%, even for those brand names recognized by 75% of consumers once prompted. In addition to being limited, the literature also reports that unaided brand awareness varies over time within households (Day and Pratt, 1971; Draganska and Klapper, 2011). The relevance of awareness in this chapter on branding stems from research showing that a consumer’s ability to recall a specific product from memory is affected by the corresponding brand name. For instance, preferred brands tend to be recalled earlier than non-preferred brands (Axelrod, 1968; Nedungadi and Hutchison, 1985). Further, as consumers accumulate more knowledge about a product category, they tend to structure their memory around brands (Bettman and Park, 1980), largely due to the fact that for consumer goods, most experiences are brand-based (e.g., advertising, in-store merchandizing, and consumption experience). Even factors as simple as lack of name recognition can block a brand from being recalled and, subsequently, from entering a consumer’s consideration set (see the discussion in, e.g., Alba et al., 1991). The branding literature has viewed brand awareness as a necessary but insufficient condition for brand consideration and choice, at least since Axelrod (1968). In a study of the German ground coffee market, Draganska and Klapper (2011) report that even in a heavily advertised category like coffee, the typical consumer spontaneously recalls only three brands from the total available set consisting of five major national brands and many fringe brands. Furthermore, Draganska and Klapper (2011) report that the set of recalled brands varies across respondents and accounts for a large part of the heterogeneity in choices (we deliberately avoid using the term “heterogeneity in preferences”).

2 Brand equity and consumer demand

Consumer psychologists assign a distinct role to brand awareness versus brand preferences on brand choices. For instance, in lab experiments that manipulate the level of brand awareness and product quality, Hoyer and Brown (1990) find that subjects picked the “familiar” brand 77% of the time, even though the familiar brand was frequently not the one with the highest quality. Surprisingly, subjects were more likely to choose the high-quality alternative when none of the brands was “familiar.” Nedungadi (1990) also finds that choice outcomes can be affected by factors that affect brand recall but not brand preference. Consumer expertise likely moderates these effects. For instance, Heilman et al. (2000) find that first-time consumers in a product category are more likely to purchase more familiar brands than experienced consumers.

Consideration A separate literature has explicitly studied consumers’ brand consideration sets for choice. Even though a consumer may be aware of a number of brands, she may only consider a subset of them on any given purchase occasion (Narayana and Markin, 1975; Bettman and Park, 1980; Ratchford, 1980; Shugan, 1980). For an overview of early literature on empirical consideration sets, see the discussion in Shocker et al. (1991). In an empirical study of brand choices, Hauser (1978) found that consumers’ consideration sets explained 78% of the variance in their brand choices; only 22% was explained by preferences within consideration sets. Empirical researchers have typically found that consideration sets in brand choice settings range in size from only 2 to 8 alternatives (Bronnenberg et al., 2016; Hauser and Wernerfelt, 1990; Honka, 2014; Moorthy et al., 1997; Newman and Staelin, 1972; Punj and Staelin, 1983; Ratchford et al., 2007; Urban, 1975). These limited consideration sets are consistent with the psychological theory that individuals’ ability to evaluate choices may be cognitively limited to a maximum of about seven (Miller, 1956). In sum, consumer psychologists make a distinction between brand awareness, which is recalled from memory, and brand consideration, which reflects the consumer’s deliberation process of narrowing down the set of options before making a brand choice. The literature has further documented strikingly limited degrees of awareness and consideration. An interesting direction for future research might consist of testing the extent to which the limited varieties purchased by households in most categories of consumer goods reflects a lack of awareness of all the available brands. One recent study of over 32 million US shoppers found that over a 52-week period ending in June 2013, even the most frequent shoppers purchased only 260 of the 35,372 stock-keeping units available in supermarkets, about 0.7%. Across category, the amount varied from as low as 0.2% in Health & Beauty to as high as 1.7% in Dairy (Catalina Media, 2013). Moreover, awareness has been shown to influence brand choices independently of brand preferences. The literature has not yet studied whether consumers rationally plan their awareness, by informing themselves strategically about brands or, alternatively, whether this awareness set is exogenous to consumer decision making.

301

302

CHAPTER 6 The economics of brands and branding

A recent empirical literature has used data on consumers’ consideration sets to show that the assumption that consumers consider all the available brands in a market will likely result in biased estimates of brand preferences. In practice, if consumers are more likely to consider branded goods, which typically charge higher prices, a naive model of full consideration may generate a downward bias in the estimated price sensitivity (Honka, 2014). Similarly, a naive model that ignores the consideration stage may generate an upward bias in the degree of estimated preference heterogeneity (Dong et al., 2017).

2.2.2 Integrating awareness and consideration into the demand model We now formalize the notions of awareness and consideration into our economic model of consumer demand. We build on the Lancasterian framework from Section 2.1 that assumed consumers were fully aware of all available brands and considered each variant for choice. Throughout this section, we maintain the assumption that all products in a category are perfect substitutes and, hence, that consumers will make a pure discrete choice purchase decision. We assume that at the time of purchase from a commodity group, the consumer recalls the brand alternatives in the set Sa ⊆ S, where S is the full set of available products; but is unaware of brand alternatives in its complement \Sa = S − Sa . The consumer is uninformed about the availability or the characteristics of products in the complement of Sa and does not take this uncertainty into account when making a decision. In this section, we take a static view of the consumer that treats Sa as exogenous.11 In Section 2.2.3 below, we also allow for a consumer’s awareness set, Sa , to be influenced by the endogenous branding and marketing activities of firms, on the supply side. Conditional on her awareness set Sa , we assume the consumer’s purchase decision is the outcome of a two-stage sequential process: (1) the search and evaluation stage, and (2) the choice stage.12 During the first stage, or search and evaluation stage, the consumer forms her consideration set, Sc ⊆ Sa by evaluating ex ante which products to include in the consideration set so as to maximize her expected consumption utility net of search and evaluation costs (Shugan, 1980; Roberts, 1989; Roberts and Lattin, 1991).13 Formally,

Sc = arg max E max vj − C (Sc ) , Sc ⊆Sa

j ∈Sc

(6)

11 In Section 3 below, we also offer a dynamic view of the consumer that considers her purchase history. In the multi-period setting, awareness can form endogenously through a consumer’s past purchase experiences. 12 A broad literature has documented evidence that consumers use such two-step “consider-then-choose” decision-making (e.g., Payne, 1976, 1982). 13 While consumers generally use cost/benefit decision rules, they may rely on simpler heuristic approaches in situations with more complex decision tasks (e.g., Payne, 1976, 1982).

2 Brand equity and consumer demand

where vj is the indirect utility from consuming alternative j , and C (Sc ) is a product evaluation or search cost associated with assembling the consideration set Sc . Unlike the traditional Lancasterian model in which branding affected choices through preferences for the branded goods, we now consider the possibility that branding plays another, complementary, role. Let the incremental cost of gathering information for brand j be denoted by Cj . We allow the cost of gathering and/or interpreting information Cj to depend on branding. This effect of branding could reflect explicit factors at the point of purchase, such as shelf placement, that can aid consumers in processing information. It could also reflect branding efforts outside the store, like advertising, which may affect the consumer’s ability to recall a specific brand from memory. The exact composition of the consideration set Sc depends on the consumer’s search conduct. The traditional search literature assumes that consumers randomly sample information (e.g. prices) from a set of ex-ante identical sellers at a fixed and constant cost for each of the firms sampled (Stigler, 1961). Weitzman (1979) was the first to consider search over differentiated products, allowing consumers to prioritize search systematically for those products with the highest anticipated utility. If awareness or familiarity make brands easier to search, then consumers prioritize their search across brands in an order determined by brand awareness, or “prominence” (Arbatskaya, 2007). As Armstrong and Zhou (2011) put it: “In many circumstances, however, consumers consider options in a non-random manner and choose first to investigate those sellers or products which have high brand recognition, which are recommended by an intermediary, which are prominently displayed within a retail environment, which are known to have low price, or from which the consumer has purchased previously.” (Armstrong and Zhou, 2011, p. F368)

Similarly, Erdem and Swait (2004) find that self-reported measures of brand credibility affect the likelihood that a brand enters a consumer’s consideration set. Because of the presence of search and evaluation costs in the first stage of the choice process, Eq. (6), information gathering often ends before exhausting all options in Sa and Sc ⊂ Sa . During the second stage, or “choice stage,” the consumer picks the alternative in her consideration set, k ∈ Sc , that yields the highest utility: k = arg max vj . (7) j ∈Sc

The decision in Eq. (7) is a reformulation of the discrete choice problem in Eq. (1) from Section 2.1, where all brands entered the choice problem. Models in the literature typically assume that search fully resolves the uncertainty about considered alternatives.14 14 In Section 3.4 below, we will formally distinguish between search and experience characteristics (e.g., Nelson, 1970). We will then allow for ex post uncertainty about experience characteristics at the time of

303

304

CHAPTER 6 The economics of brands and branding

A discussion of additional literature, in Sections 3.2 and 3.3 below, suggests that brand reputation, loyalty, and pioneering advantage can cause a brand to be more likely to enter the awareness and consideration sets.

2.2.3 An econometric specification To illustrate, we now modify the econometric modeling framework from Section 2.1. An early literature abstracted entirely from the formation of consideration sets, modeling them in reduced-form instead (see Shocker et al., 1991). In this literature, the consideration set is treated as an additional random variable and the likelihood that a consumer chooses brand j is as follows: Pr (j ; ) =

Pr (j |Sc ; ) × Pr (Sc ; )

(8)

Sc ∈P(S)

where Pr (j |Sc ) is the brand choice probability conditional on the consideration set, Pr (Sc ) is the probability of observing consideration set Sc ∈ P (S), and P (S) is the power set of S. If Sc = S and all brands are considered, then Pr (S) = 1 and Pr (j |S) is the choice probability from Section 2. In practice, consideration sets are unobserved and the likelihoods Pr (j |Sc ) and Pr (Sc ) are not separately identified without strong, and often ad hoc, functional form assumptions.15 While a literature has estimated models of consideration and choice without observing the consideration set, this approach is clearly prone to severe model misspecification concerns. Even when consideration sets are observed, the standard discrete choice models like logit and probit are probably not the correct reduced form for the conditional brand choices.16 In particular, the choice problem Pr (j |Sc ) = Pr vj ≥ vk , for k, j ∈ Sc

(9)

selects on realizations of random utility shocks in vj for products j that were systematically considered. So the choice problem (9) cannot simply “integrate out” the random utility shocks under the usual i.i.d. assumptions to obtain the discrete choice probabilities because Sc is also a function of the realizations of ε for searched brands. Mehta et al. (2003) and Joo (2018) use exclusion restrictions based on the assumption that in-store promotional variables, like display and feature, affect the consumer’s information about products and not consumption utility and preferences: Pr (Sc ; Z) where Z contains the variables excluded from the brand choices conditional on con-

purchase. This uncertainty will be resolved slowly over time across repeated purchase and consumption experiences. 15 Recently, Abaluck and Adams (2017) show that symmetry in the cross-derivatives of choice probabilities only holds when the consumer considers all possible options. They propose to identify consideration probabilities from the violations of symmetry in the cross-derivatives. 16 For instance, Srinivasan et al. (2005) require the strong assumption that brand awareness is independent of brand preferences in order to retain the conventional logit functional form.

2 Brand equity and consumer demand

sideration.17 In addition, the model in Eq. (8) suffers from a curse of dimensionality since the dimension of P (S) becomes unmanageable as the set of available products grows. To provide an illustrative model, at the consideration stage, we assume that the consumer’s indirect utility from purchase is additively separable in the factors that are known deterministically to the consumer and the factors about which she is still uncertain. Formally, as in the discrete choice problem of Eq. (1) above, assume the consumer’s choice-specific indirect utilities vj have a component xj β + γj + εj that is known deterministically to the consumer (the econometrician only observes the distribution of ε and E (ε) = 0). In addition, the indirect utilities contain the vector of unknown match values, ξ ∼ F (ξ ) about which the consumer is uncertain. F (ξ ) represents the consumers’ beliefs about the unknown match values. Thus, vj = xj β + γj + εj + ξj , j ∈ Sa .

(10)

During the search stage, the consumer endogenously resolves her uncertainty ξj for a set of considered products, Sc ⊆ Sa . Products in the consideration set are selected based on (1) their respective option values (e.g., the variance of ξj ), (2) known indirect utilities (xj β + γj + εj ), and (3) search costs (Cj ). In contrast to the Lancasterian approach above, the deterministic component of preferences only partially determines the chosen product. Search costs and the option value from additional search and evaluation also influence the overall considered set and, therefore, the chosen product alternative. In our illustrative model, we assume that the total search and evaluation cost associated with a consideration set Sc is C (Sc ) = j ∈Sc cj where Cj ≡ cj is independent of ck (∀k = j ), and cj ≥ 0.18 In turn, the search costs, cj , reflect a consumer’s past experiences with the available products in the category or a firms’ advertising strategy.19 For instance, we could assume: cj = c0 + G1j (H , A; 1 ) ,

(11)

where the function G1j captures the effect of state vectors summarizing a consumer’s brand purchase history, H , past exposure to advertising, A, and 1 is a vector of parameters. Thus, shopping history, purchase experience, and advertising exposure 17 At least since Guadagni and Little (1983), the brand choice literature has found that these promotional

variables affect choices. The exclusion restriction assumes, logically, that the effect reflects the ease of search and product evaluation as opposed to preference. Indeed, it seems unlikely that consumers derive consumption utility from an in-store display. However, to the best of our knowledge, this point has not been tested empirically. 18 This specification treats search and evaluation costs as a fixed parameter. Alternatively, the rational inattention literature on consumer choice (e.g., Matejka and McKay, 2015) uses Shannon entropy to model the costs associated with the precision of the signals a consumer endogenously collects to learn about product values. 19 To the extent that advertising influences brand knowledge and psychological associations, it could also facilitate recall.

305

306

CHAPTER 6 The economics of brands and branding

influence her costs, cj , of gathering and evaluating information about the choice alternative. For instance, in a case study of residential plumbers, McDevitt (2014) finds that low-quality firms systematically use easier-to-find brand names that start with A or a number and tend to be located at the top of directories. Furthermore, in a study of consumers’ retail bank choices, Honka et al. (2017) find that advertising has a much larger effect on awareness and consideration than on consumers’ choices from their respective consideration sets. The framework herein points towards a major limitation of the extant empirical literature and an opportunity for future research. While some progress has been made in the collection of consideration set data, a consumer’s awareness set, Sa , is not observed in typical choice datasets. Consequently, the consideration papers derived from search theory generally assume that consumers are aware of all the product alternatives and search over an i.i.d. match value. This assumption is at odds with laboratory studies conducted by consumer psychologists that question the plausibility of “full awareness” and document evidence suggesting very limited consumer brand awareness, even at the point of purchase. Econometrically, the distinction between consideration and awareness offers a potential direction for future research, along with a push to integrate research on memory into our demand models. If consumers only search and, thus, only consider brands they can recall from memory (i.e., brands in their awareness set), then awareness and marketing investments that stimulate awareness may create barriers to entry for new products.

2.2.4 Consideration and brand valuation The possibility of a brand effect in the pre-purchase search and product evaluation process raises concerns about the measurement of brand equity measures like those in Section 2.1 above. Standard discrete choice models that ignore the search and consideration aspects of demand will load the entire brand effect, including that of recall γ and search, into preferences for the brand γj , defining brand equity as BEj = θj . The omission of the consideration stage could bias estimates of brand value. Consider the illustrative case where the consumer has homogeneous ex-ante beliefs about her indirect utility for each of the products in her awareness set, i.e., E vj = v, ∀j ∈ Sa , but where the search costs, cj , to resolve the match value ξj are lower for branded than unbranded alternatives among the products in Sa . In this case, the consumer’s consideration set and observed brand choices would systematically contain branded products over unbranded ones; not because of higher utility, vj , but because of lower search costs cj . The omission of the consideration stage could confound search costs and brand preferences. The extent of this confound would be exacerbated by the number of alternatives available and/or the magnitude of search costs. To the extent that search stops before any unbranded alternatives are discovered and considered, a traditional Lancasterian model would generate strong estimated brand preferences and potentially high estimates of brand equity. Substantively, this scenario could lead to the conclusion that brand value stems from preferences, as opposed to from the ease of search.

3 Consumer brand loyalty

3 Consumer brand loyalty 3.1 A general model of brand loyalty In the previous section, we used a static perspective on brand choice that treated brand equity as a persistent residual (or “fixed effect”) in a characteristics specification of the economic model of consumer demand. However, the model abstracted from the manner in which the brand preference was formed. If brand preference is merely a nuisance or control, this may be sufficient for predicting demand. However, as we show below, the dynamics associated with the formation of brand preferences may be important for understanding product differentiation and the foundations of market structure and concentration. In this section, we discuss various dynamic theories of the formation of brand preferences. Consumer psychologists have studied how a consumer develops a brand preference through positive associations between the brand and the consumption benefits of the underlying product. Such associative learning could arise, for instance, through signals whereby the consumer learns that the brand predicts a positive consumption experience. Alternatively, under evaluative conditioning, the consumer forms a positive preference for a brand through repeated co-occurrences with positive stimuli, like good mood, affect, or a popular celebrity. In the same vein, a consumer may learn about a brand through her memory of positive experiences with similar products. We refer the interested reader to Van Osselaer (2008) for a survey of the consumer psychology literature on consumer learning processes. This chapter will not discuss the deeper psychological mechanisms through which preferences are formed. Instead, we focus on how different sources of brand preference formation create dependence on past choices or state dependence in consumer demand. Empirically, state dependence can lead to inertia in a consumer’s observed sequence of brand choices: consumers have a higher probability of choosing products that they previously purchased. Brand choice inertia is one of the oldest and most widely-studied empirical phenomena in the marketing literature (e.g., Brown, 1952, 1953) as it has typically been interpreted as “brand loyalty.” Below we survey the empirical evidence for inertia in consumer brand choices and discuss the econometric challenges associated with disentangling spurious sources of inertia from genuine structural state dependence, such as loyalty. We then discuss several consumer theoretic mechanisms that can generate brand choice inertia as a form of structural state dependence. To formalize our discussion of the empirical literature, we consider the choices of a household, h, over brands, j , at time, t. We use Xht to denote the contemporaneous factors such as product characteristics and marketing considerations like prices, promotions, and shelf space. We use the state vector H ht to denote a consumer’s historic brand experiences. We include these state variables, Xht and H ht , into a consumer h’s indirect utility for brand j on date t: vjht = μj X ht ; h + Fj H ht ; h , j = 1, ..., J

(12)

307

308

CHAPTER 6 The economics of brands and branding

where we now decompose the consumer’s brand equity into the static components discussed in the last section, μj Xht ; h , and the consumer’s past experiences, Fj H ht ; h , comprising a stock of historically formed brand “capital.” The vectors h and h are parameters to be estimated. Theorists have analyzed various mechanisms through which current willingness to pay for brands reflects past brand experiences. In the subsections below, we ex plore several formulations of the brand capital stock, Fj H ht ; h , such as switching costs (e.g. Farrell and Klemperer, 2007), advertising and branding goodwill (e.g., Doraszelski and Markovich, 2007; Schmalensee, 1983), evolving quality beliefs through learning (Schmalensee, 1982), habit formation (e.g. Becker and Murphy, 1988; Pollak, 1970), and peer influence (e.g. Ellison and Fudenberg, 1995).

3.2 Evidence of brand choice inertia The empirical analysis of brand loyalty, or inertia in brand choice, has been one of the central themes of the quantitative marketing research on brand choices. Most of the literature has focused on short-term forms of persistence in brand choices over time horizons of no more than one or two years. Early work by Brown (1952, 1953) exploited household-level diary purchase panel data to document the high incidence of spells during which a household repeat-purchased the same brand over time. Such persistent, repeat-purchase of the same brand has been detected subsequently across a wide array of industries, including those dominated by sellers with products differentiated mainly by brand names rather than objective features. Empirical generalizations across a broad array of CPG categories have found low rates of household brand switching (Dekimpe et al., 1997) and high rates of expenditure concentration with typically over 50% of spending allocated to the most preferred brand in a category (Hansen and Singh, 2015). Similar patterns of inertia in choices have been documented in other industries such as insurance (Handel, 2013; Honka, 2014), broadband services (Liu et al., 2010), cellular services (Grubb and Osborne, 2015), and financial services (Allen et al., 2016). While early work interpreted short-term brand re-purchase spells as evidence of loyalty, the mere incidence of repeat-buying need not imply inertia per se. A consumer with a strong preference for Coca-Cola has a high probability of repeatpurchasing Coca-Cola over time, even if her shopping behavior is memoryless and static. A test for inertia consists of testing for non-zero-order behavior in a consumer’s choice sequence. Early work tested for higher-order behavior using the within-household variation in choices, often with a non-parametric analysis of the observed runs20 within a given consumer’s choice sequence (e.g., Frank, 1962; Massy, 1966; Bass et al., 1984). Unfortunately, short sample sizes typically limited the power of these within-household tests and the findings were typically mixed or inconclusive; although early work often interpreted a failure to reject the null hypothesis of a 20 A run is broadly defined as a sequence of repeat-purchases of the same brand. Typically, researchers

look at pairs of adjacent shopping trips during which the same brand was purchased.

3 Consumer brand loyalty

zero-order choice process as evidence against loyalty. Alternative testing approaches that pooled choice sequences across consumers ran into the well-known identification problem of distinguishing between choice inertia and heterogeneity in consumer tastes (e.g., Massy, 1966; Heckman, 1981). More recent structural approaches have applied non-linear panel methods to test for choice inertia while controlling for heterogeneity between consumers using detailed consumer shopping panels (Roy et al., 1996; Keane, 1997; Seetharaman et al., 1999; Shum, 2004; Dubé et al., 2010; Osborne, 2011).21 This literature has documented surprisingly high levels of inertia in brand choices. For instance, Dubé et al. (2010) find a substantial decline in the predictive fit of a choice model when, all else equal, the exact sequence of a consumer’s purchases is randomized. This evidence confirms that the observed sequence of choices contains information for predicting demand. We discuss these methods further below in Section 3.3. Patterns of brand choice persistence have also been measured over much longer time horizons, spanning decades or even an individual’s lifetime. For instance, Guest (1955) surveyed 813 school children on their preferred brands in early 1941. Twelve years later, in the spring of 1953, he repeated the same brand preference survey among the 20% of original respondents that he was able to contact. Across 16 product categories, a respondent indicated the same preferred brand in both waves in 39% of the cases. In 1961, a third wave of the same survey continued to find the same preferred brand in 26% of cases. These survey results suggest that brand preferences developed during childhood partially persist into adulthood. However, “obviously, one cannot simply assume that what is learned during childhood somehow ‘transfers intact’ to adult life” (Ward, 1974). Returning to our model in Eq. (12), under this extremeview, a consumer’s preferences throughout her lifetime are entirely driven by Fj H h0 ; h where H0 represents her initial experiences in life, and μj Xht ; h = 0. A proponent of this view, Berkman et al. (1997, pp. 422-423) suggests that preferences may be inter-generational: “[i]f Tide laundry detergent is the family favorite, this preference is easily passed on to the next generation. The same can be said for brands of toothpaste, running shoes, golf clubs, preferred restaurants, and favorite stores.” The literature on consumer socialization research has studied mechanisms through which adult brand preferences are formed early in life during childhood (Moschis and Moore, 1979), especially through inter-generational transfer and parental influence (Ward, 1974; Moschis, 1985; Carlson et al., 1990; Childers and Rao, 1992; Moore et al., 2002) and peer influence (Reisman and Roseborough, 1955; Peter and Olson, 1996). Anderson et al. (2015) document a strong correlation in the automobile brand preferences of parents and their adult children. Sudhir and Tewari (2015) use a twenty-year survey panel of individual Chinese consumers and find that growing up in a region that experienced rapid economic growth during one’s adolescence is correlated with consumption of non-traditional “aspirational” goods and 21 These methods also control for other causal factors, such as prices and point-of-purchase marketing

that could potentially confound evidence of inertia.

309

310

CHAPTER 6 The economics of brands and branding

brands during adulthood.22 Similarly, having a birth year in 1962 or 1978 is a very strong predictor of whether a male Facebook user “likes” the New York Mets in the mid-2000s, implying the user was seven to eight years old when the Mets won a world series (in 1969 and 1986) – an age at which team preferences are typically formed (Stephens-Davidowitz, 2017). Bronnenberg et al. (2012) match current and historic brand market share data across US cities.23 These data confirm that consumers brands (1) had very different shares across regions in the 1950s and 1960s, and (2) that the local market leaders of the 1950s and 1960s remain dominant in their respective markets today. In practice, decades-long panels are difficult to maintain and rarely available for research purposes. Therefore, the within-household shopping purchase information is too short to learn about the formation of preferences. Instead, Bronnenberg et al. (2012) surveyed over 40,000 US households to learn the migration histories of each household’s primary shopper, including her birth market, year of move, and her age. They exploit the historic migration behavior of households and the long-term regional differences in brand preference to study the long-term formation of brand preference and loyalty. Studying the two top brands across 238 product categories, Bronnenberg et al. (2012) document two striking regularities. First, immediately after a migrant family moves, 60% of the difference in brand shares between the state of origin and current state of residence is eliminated. This finding holds both within and between households, suggesting that a significant portion of brand preferences is determined by the local choice environment. Second, the remaining 40% of the preference gap is very persistent, with migrants exhibiting statistically significant differences in brand preference than non-migrants even 50 years after moving. Collapsing the data by age cohorts, Bronnenberg et al. (2012) find that “migrants who moved during childhood have relative shares close to those of non-migrants in their current states, while those who move later look closer to non-migrants in their birth states.” This finding is consistent with the brand capital stock theory whereby older migrants, having accumulated more brand capital in their birth state, should exhibit more inertia in brand choice.24 Even migrants that moved before the age of 6 exhibit some persistence in the local preference from the birth location, suggesting a role for some inter-generational transfer of brand preferences. The authors conclude that “since the stock of past experiences has remained constant across the move, while the supply-side environment has changed, we infer that approximately 40 percent of 22 These aspirational goods consist primarily of western brands consumed socially. 23 The current brand shares were collected through AC Nielsen’s scanner data. The historic brand shares

were obtained from the Consolidated Consumer Analysis (CCA) database, collected by a group of participating newspapers from 1948 until 1968 in their respective markets. The CCA volumes report the fraction of households who state that they buy a given brand in a given year. 24 An alternative hypothesis is that the aging process makes working memory decline more than longterm memory (Carpenter and Yoon, 2011), as does processing of information. Both aging effects favor relying on fewer considered options (John and Cole, 1986) and engaging in fewer product comparisons (Lambert-Pandraud et al., 2005). These factors are thought to contribute to persistence, or at least less flexibility, of purchasing patterns among aging consumers (see also Drolet et al., 2010).

3 Consumer brand loyalty

the geographic variation in market shares is attributable to persistent brand preferences, with the rest driven by contemporaneous supply-side variables.” In terms of our model in Eq. (12), approximately 40% of consumers’expected conditional indirect utility derives from Fj H ht ; h and 60% from μj Xht ; h . Consistent with these findings, Atkin (2013) reports a similar long-term habit formation for nutrient sources from different foods. Bronnenberg et al. (2012) formulate a simple model of habit formation (e.g., Pollak, 1970; Becker and Murphy, 1988) in which individual households’ brand choices depend on current marketing and prices, as well as a stock of past consumption experiences. Assuming (1) that consumers did not move across state lines motivated by their preferences for CPG brands and (2) that a brand’s past local market share is on average equal to the share today, they determine that the effects of past consumption are highly persistent and depreciate at a rate of only 2.5% per year. Thus, they find that the half-life of brand capital is 26.5 years. In sum, a large body of empirical work has documented patterns of persistence in brand preferences and choices. This persistence has been documented both at a high-frequency from “shopping trip to shopping trip” as well as at a much lower frequency spanning decades and even entire lifetimes. If consumers do indeed form strong attachments to brands, then understanding the mechanisms through which these attachments are formed will likely point to some of the important drivers of the industrial organization of consumer goods’ markets.

3.3 Brand choice inertia, switching costs, and loyalty Switching costs constitute one of the simplest theories of brand loyalty: “A product exhibits classic switching costs if a buyer will purchase it repeatedly and find it costly to switch from one seller to another.” Klemperer (2006)

Switching costs can be financial, such as the early termination fee for a mobile phone service contract, temporal, such as the time required to learn how to use a new product, or psychological, such as the cognitive hassle of changing one’s habit.25 Switching costs introduce frictions that can deter a consumer from switching to different brands and, hence, can lead to repeat-purchase behavior. In the extreme case where switching costs are infinite, a consumer’s initial choice would determine her entire future brand choice sequence and the impact of μ X ht ; h would be zero. Consequently, switching costs can create brand loyalty even in the absence of any brand value other than the identifying feature of the brand name to a specific supplier. This behavior points to a simple theory of branding whereby the identifying features of a supplier’s product (i.e., the “mark”) can be sufficient to create loyalty in

25 At least since Mittelstaedt (1969), consumer psychologists have studied the role of psychological

switching costs in explaining repeat-purchase and inertia in brand choice. For an extensive review of this literature see (Muthukrishnan, 2015).

311

312

CHAPTER 6 The economics of brands and branding

consumer shopping behavior as long as consumers form shopping habits. This type of loyalty is also detectable in shorter panels spanning one or two years. The empirical brand choice literature typically allows for a brand switching cost to influence purchase decisions in consumer goods categories (e.g., Jeuland, 1979; Guadagni and Little, 1983; Jones and Landwehr, 1988; Roy et al., 1996; Keane, 1997; Seetharaman et al., 1999; Shum, 2004; Osborne, 2008; Dubé et al., 2010). Suppose we define a household h’s indirect utility net of switching cost to be μhj (X t ) = γjh + θ h x j t + εjht . Following the convention in the brand-choice literature, we assume that consumers obtain a utility premium from repeat-buying the brand chosen previously: Fj Hth ; h = λh I{H h =j } where Hth ∈ {1, ..., J } is household h’s loyalty state and t I{H h =j } indicates whether the previous brand purchased was brand j . The conditional t indirect utility on trip t is then vjht

= μhj Xht ; h + Fjh Hth ; h , j = 1, ..., J (13) = γjh + θ h xj t + λh I{H h =j } + εjht . t

This formulation nests the basic static model from Section 2.1 with the baseline brand utility, γ h . The additional parameter λhj allows for inertia in brand choices across time. As discussed above, the structural interpretation of λh is typically analogous to a psychological switching cost. The following null hypothesis constitutes a test for choice inertia: (14) H0 : E λh = 0, h where E λ > 0 implies a positive inertia in brand choice (such as loyalty) and E λh < 0 implies a negative inertia (such as variety-seeking). In practice, the researcher can specify a more general specification that relaxes both the linearity assumption and allows for higher-order choice behavior with a loyalty state that reflects the entire choice history. For instance, Keane (1997) and Guadagni and Little (1983) use a stock variable constructed as an exponential, smooth, weighted average of past choices. While most studies of brand loyalty assume consumers are myopic, a rational forward-looking consumer would plan her future loyalty, much like the rational addiction models fit to tobacco products (e.g., Becker and Murphy, 1988; Gordon and Sun, 2015). Since most of the brand choice literature pools choice sequences across households, a concern is that state dependence captured by λh may be spuriously identified by unobserved heterogeneity in tastes between households (Heckman, 1981).26 Even

26 When the researcher does not observe consumers’ initial choices, an “initial conditions” bias can also

arise from the endogeneity in consumers’ initial observed (to the researcher) states. Handel (2013) avoids this problem in his analysis of health plan choices. He exploits an intervention by an employer that changed the set of available health plans and forced employees to make a new choice from this changed menu.

3 Consumer brand loyalty

after rich controls for persistent unobserved heterogeneity and serial dependence in εjht , Keane (1997) and Dubé et al. (2010) find statistically and economically significant state dependence in choices. However, the magnitude of the state dependence, λ, is considerably smaller after the inclusion of controls for heterogeneity, falling on average by more than 50%. For instance, in a case study of refrigerated orange juice purchases, Dubé et al. (2010) estimate switching costs that, on average, are 21% of the magnitude of prices. Without controls for heterogeneity, these costs are inferred to be more than double.27 In addition to controlling for heterogeneity, Dubé et al. (2010) also test between several alternative sources of structural state dependence, such as price search and learning. Intuitively, state dependence through consumer learning should dissipate over time as consumers learn through their purchase and consumption decisions. In contrast, switching costs create a persistent form of inertia in choices. We discuss the mechanism through which product quality uncertainty and consumer learning can generate inertia in consumer brand choices below in Section 3.4. Dubé et al. (2010) conclude that the inferred brand switching costs are robust to these alternative specifications and that the estimated values of λ reflect true brand loyalty.28 Similarly, imperfect information about prices or availability could also create state dependence in the purchase of a known brand. In-store merchandizing, such as displays, should offset the costs of determining a brand’s price in which case inertia for a given brand would be offset by a display for a competing brand. Dubé et al. (2010) again find that their estimates of switching costs are robust to controls for search costs.29 Interestingly, Keane (1997) and Dubé et al. (2010) estimate economically large and heterogeneous brand intercepts, γj . On average, the persistent differences in households’ brand tastes appear to be much more predictive of choices than the loyalty arising through λ. In a case study of 16 oz tubs of margarine purchases, Dubé et al. (2010) find the importance weights for loyalty (λ), price (θ ), and brand (γ ) are 6.4%, 53.6%, and 40% respectively.30 Therefore, switching costs alone do not seem to explain the persistent consumer brand preferences typically inferred through CPG shopping panels. In sum, while there is a component of consumer switching that re27 Dube et al. (2018) find even larger magnitudes of switching costs when they control more formally for

endogenous initial conditions (i.e., the initial loyalty state for each household). 28 Using a structural model of consumer learning, Osborne (2011) also finds empirical evidence for both

learning and switching costs. 29 Using a structural model of search and switching costs, Honka (2014) also finds empirical evidence for

both search and switching costs. However, in her case study of auto insurance, search costs are found to have a larger effect on choices than switching costs. 30 Following the convention in the literature on conjoint analysis, an importance weight approximately describes the percentage of utility deriving from a given component. The model in Eq. (13) has three components to utility: brand, marketing variables, and loyalty with respective part-worths (or marginal utilities) P W brand (brand = j ) = λj − min(0, {λk }Jk=1 ), P W marketing Xj t = x = α(x − min(x)), and P W loyalty sj t = j = γ . Dubé et al. (2010) then assign an importance weight to each of these compo max P W brand , nents, scaled to sum to one, as follows: I Wbrand = max P W brand +max P W price +max P W loyalty

313

314

CHAPTER 6 The economics of brands and branding

flects dynamics related to loyalty, a large portion of consumers’ brand choices seem to reflect a far more persistent brand taste that is invariant over the time horizons of 1-2 years typically used in the literature. A limitation of this literature is the lack of a deeper test of the underlying mechanism creating this persistence in choices. As early as Brown (1952, p. 256), scholars have questioned whether inertia in brand choice “is a ‘brand’ loyalty rather than a store, price or convenience loyalty.” The subsequent access to point-of-sale data allows researchers to control for prices and other causal factors at the point of purchase. But, within a store a buying habit could merely reflect loyalty to a position on the shelf or other incidental factor that happens to be associated with a specific brand. In addition to unobserved sources of loyalty at the point of sale, the persistent brand tastes may also contain additional information about longer-term forms of brand loyalty, such as evolving brand capital stock (e.g., Bronnenberg et al., 2012), that are not detectable over one or two-year horizons. While these distinctions may not matter for predicting choices over a short-run horizon, they have important implications for a firm’s willingness to invest in branding or consumer-related marketing to cultivate the shopping inertia.

3.4 Learning from experience A more complex form of state-dependence in brand choices arises when a consumer faces uncertainty about aspects of product quality that are associated with the brand and that are learned over time. Following Nelson (1970), we modify the characteristics approach to demand by distinguishing between “search characteristics,” which can be determined prior to purchase, and “experience characteristics,” which are determined after the purchase through trial and consumption. The classification of brand as a search versus experiential characteristic is complicated. On the one hand, the brand name as an identifying mark acts like a search characteristic since it is likely verifiable prior to purchase through its presence on the packaging. On the other hand, intangible aspects of the product that are associated with the brand constitute experience characteristics that are learned over time through consumption (Meyer and Sathi, 1985) and informative advertising (Keller, 2012, Chapter 2). This view is consistent with the product-based associations that constitute part of a consumer’s brand knowledge (Keller, 2012, Chapter 2). We focus herein on rational models of consumers using Bayes’ rule to update their beliefs about products over time and to learn.31

price ) max(P W , and I Wloyalty = max P W brand +max P W price +max P W loyalty loyalty ) max(P W . max P W brand +max P W price +max P W loyalty 31 The Bayesian learning model predicts that consumers eventually become fully-informed about a brand.

I Wprice

=

However, lab evidence suggests that “blocking” may prevent a consumer from learning about objective product characteristics. If a consumer initially learns to use the brand name to predict an outcome (e.g.

3 Consumer brand loyalty

Suppose the consumer is uncertain about the intrinsic brand quality in any period t. At the start of each period, the consumer has a prior belief about brand quality, fj t (γ ). At the end of each period, she potentially receives a costless, exogenous, unbiased, and noisy signal about brand j , sj t ∼ gj (·|γ ). For example, the signal might reflect a free sample, word-of-mouth (Roberts and Urban, 1988), observational learning from peers’ choices (Zhang, 2010), or an advertising message (Erdem and Keane, 1996). The consumer then uses the signal to update her beliefs about the brand’s quality using Bayes’ Rule: fj (t+1) (γ ) ≡ fj t γ |sj t ∝ gj (·|γ ) fj t (γ ). In this case, the state variable tracking consumer brand experiences, H t , consists of her prior beliefs about each of the brand qualities, γj : H t = (f1t (γ ) , ..., fJ t (γ )).32 We use a discrete-choice formulation of demand, as in Section 2.1. If the consumer’s brand choice is made prior to receiving the signal, her expected indirect utility from choosing brand j at time t is λ γ − ργ 2 + xj β − αpj t + εj t gj (s|γ ) fj t (γ ) d (s, γ ) E uj t |H t ; θ = (15) where ρ > 0 captures risk aversion. As discussed in Erdem and Keane (1996) and Crawford and Shum (2005), risk aversion is essential for predicting inertia in consumer’s choices for familiar brands since a consumer may be reluctant to purchase a new brand with uncertain quality. The vector θ contains all the model parameters, including those characterizing the consumer’s beliefs. We can augment the model in (15) to allow the consumer to learn over time through her own endogenous brand choices (Erdem and Keane, 1996). Suppose each time the consumer purchases brand j , her corresponding consumption experience generates an unbiased, noisy signal about the quality of brand j : sj t ∼ gj (·|γ ). Let Dj t indicate whether the consumer purchased brand j at time t. To simplify, we follow the convention inmost of the literature and assume the consumer’s initial period 2 prior is fj 0 (γ ) = N γj 0 , σj 0 and that her consumption signal in a given period is sj t ∼ N γj , σs2 . The advantage of this Normal Bayesian Learning model is that the consumer’s state now consists of the beginning-of-period prior means and vari2 , rather than each σ ances for each of the J brands, H t = γ1t , ..., γJ t , σ1t2 , ..., JT of the J Normal prior distributions, fj t (γ ) = N γj t , σj2t . In addition, under Normal Bayesian learning the consumer’s period t prior mean and variance for brand

−1

γj 0 Dj τ τ 0, and Fψ(ψ) is bounded above. The latter assumption ensures that as brand quality levels increase, the incremental costs to raise quality do not become arbitrarily large. In the third stage, firms play a Bertrand pricing game conditional on the perceived product attributes and marginal costs c (Q; ), whereQ is the total quantity sold by the firm. If we further assume that c (Q; ) < y¯ < max y h , where y¯ is an upper bound on costs, then “the increase in unit variable cost is strictly less than the marginal valuation of the richest consumer” (Shaked and Sutton, 1987, p. 136). Accordingly, the model bounds above how quickly unit variable costs can increase in the level of quality being supplied.39 At the same time, on the demand side there will always be some consumers willing to pay for arbitrarily large brand quality levels. In other words, costs increase more slowly than the marginal valuation of the “highest-income” consumer. Seminal work by Shaked and Sutton (1987) and Sutton (1991) derived the theoretical mechanisms through which the manner in which brands differentiate goods, uψ and ud , and the convexity of the advertising cost function, F (ψ), ultimately determine the industrial market structure. The following propositions are proved in Shaked and Sutton (1987): Proposition 1. If uψ = 0 (i.e. no vertical differentiation), then for any ε > 0, there exists a market size S ∗ such that for any S > S ∗ , every firm has an equilibrium market share less than ε. Essentially, in a purely horizontally-differentiated market, the limiting concentration is zero as market size increases. The intuition for this result is that as the market size increases, we observe a proliferation of products along the horizontal dimension until, in the limit, the entire continuum is served and all firms earn arbitrarily small shares. Proposition 2. When uψ > 0, there exists an ε > 0 such that at equilibrium, at least one firm has a market share larger than ε, irrespective of the market size. As market size increases for industries in which firms can make fixed and sunk investments in brand quality (i.e., vertical attributes), we do not see an escalation in entry. Instead, we see a competitive escalation in advertising spending to increase the perceived quality of products. The intuition is that a firm perceived to be higherquality can undercut its “lower-quality” rivals. Hence, the firm perceived to be the highest-quality will always be able to garner market share and earn positive economic profits. At the same time, only a finite number of firms will be able to sustain such high levels of advertising profitably, which dampens entry even in the limit. These two results indicate that product differentiation per se is insufficient to explain concentration. Concentration arises from competitive investments in vertical product 39 Note that we are abstracting away from the richer setting where a firm can invest in marketing over

time to build and maintain a depreciating goodwill stock, as in Section 4.2.2 above.

329

330

CHAPTER 6 The economics of brands and branding

differentiation. When firms cannot build vertically-differentiated brands (by advertising) we expect markets to fragment as market size grows. In contrast, when firms can invest to build vertically-differentiated brands, we do not expect to see market fragmentation, but rather an escalation in the amount of advertising and the perseverance of a concentrated market structure. The crucial assumption is that the burden of advertising falls more on fixed than variable costs. This assumption ensures that costs do not become arbitrarily large (i.e. prohibitively large) as quality increases. Consequently, it is always possible to outspend rivals on advertising and still impact demand. This seems like a reasonable assumption for the CPG markets in which advertising decisions are made in advance of realized sales. It is unlikely that advertising spending would have a large influence on marginal (production) costs of a branded good. Extending the ESC theory to a setting in which firms make their entry and investment decisions sequentially, strengthens the barriers to entry by endowing the early entrant with a first-mover advantage. With vertical differentiation and endogenous sunk advertising costs, an early entrant can build an even larger brand that pre-empts future investment by later entrants (e.g., Lane, 1980; Moorthy, 1988; Sutton, 1991). Using cross-category variation, Sutton (1991) provides detailed and extensive, cross-country case studies that empirically confirm the central prediction of ESC theory in food industries, finding a lower bound on concentration in advertising-intense industries but not in industries where advertising is minimal or absent. Bronnenberg et al. (2011) find similar evidence for US CPG industries, looking across US cities of differing size, with a lower bound on concentration in advertising-intense CPG industries and a fragmentation of non-advertising-intense CPG industries in the larger US cities. Order of entry is also found to play an important role on market structure, with early entrants sustaining higher market shares even a century after they launched their brands (e.g., Bronnenberg et al., 2009). To understand the important role of the convexity of the marketing cost function, Berry and Waldfogel (2010) show that market concentration fragments and the range of qualities offered rises in larger markets in the restaurant industry, where quality investments comprise mainly variable costs. In contrast, for the newspaper industry, where quality investments comprise mainly fixed costs, they observe average quality rising with market size without fragmentation. Sutton (1991) even provides anecdotal evidence linking historical differences in advertising across countries to national market structure and demand for branded goods. For instance, relative to the United States, TV advertising used to be restricted in the United Kingdom. Consistent with advertising as an intangible attribute, Sutton (1991) notes that the market share of private labels is much higher in the UK than in the US. Interestingly, the theory does not imply that the market be served exclusively by branded goods. When the consumer population consists of those who value the vertical brand attribute and those who do not, it is possible to sustain entry by advertising and non-advertising firms. The latter will serve the consumers with no willingness-topay a brand premium. However, as the market size grows, the unbranded sub-segment

5 Branding and firm strategy

of the market fragments. In their cross-industry analysis, Bronnenberg et al. (2011) observe an escalation in the number of non-advertising CPG products in a given category in larger US cities, with concentration converging towards 0.

5.2 Brands and reputation Traditionally, the term brand was associated with the identity of the firm manufacturing a specific commodity. As a brand developed a reputation for reliability or high quality, consumers would potentially pay a price premium for the branded good. A central idea in markets for experience goods and services is that a firm’s reputation matters when consumers have incomplete information about product quality and fit prior to consuming the product (Nelson, 1970, 1974). In such markets, inexperienced consumers can be taken advantage of by firms selling low quality at high prices. The longer it takes for consumers to discover true product quality, the more beneficial it is for firms to sell low quality at a high price. Perhaps the most straightforward role of brands in such a setting is that they allow consumers to connect one purchase to the next. This connection provides the basis for holding firms accountable for their actions even without a third-party (e.g., government) enforcing contracts. In turn, it provides the basis for “reputations”—how they arise and why they are relevant to consumers.40 In common parlance, reputation is a firm’s track-record of delivering high quality; in theoretical models, it is consumers’ beliefs about product quality. There is a large literature that ties the provision of product quality to seller identification or lack thereof. On the negative side, Akerlof (1970) shows that non-identifiability of firms in experience goods markets leads to deterioration of quality because low quality firms cannot be punished, and high quality firms rewarded, for their actions.41 On the positive side, Klein and Leffler (1981, p. 616) observe that branding enables reputations to form and to be sustained: “...economists also have long considered ‘reputations’ and brand names to be private devices which provide incentives that assure contract performance in the absence of any third-party enforcer (Hayek, 1948, p. 97; Marshall, 1949, vol. 4, p. xi). This private-contract enforcement mechanism relies upon the value to the firm of repeat sales to satisfied customers as a means of preventing nonperformance.”

In a competitive market, the identification role of a brand can benefit a firm because “a firm which has a good reputation owns a valuable asset” (Shapiro, 1983, p. 659). For instance, Png and Reitman (1995) find that branded retail gasoline stations are 40 We focus herein on the role of brand reputations, referring readers looking for a broader discussion of

the literature on reputation to the survey in Bar-Isaac and Tadelis (2008). 41 “The purchaser’s problem, of course, is to identify quality. The presence of people in the market who

are willing to offer inferior goods tends to drive the market out of existence – as in the case of our automobile ‘lemons.’ ” (Akerlof, 1970, p. 495).

331

332

CHAPTER 6 The economics of brands and branding

more likely to carry products and services with important experiential characteristics that could be verifiable through a reputation for quality, such as premium gasoline and repair services. Similarly, Ingram and Baum (1997) report that chain-affiliated hotels in Manhattan had a lower failure rate than independent hotels. In case studies of jeans and juices, Erdem and Swait (1998) find that consumer demand responds to self-reported survey measures of brand credibility. Klein and Leffler (1981) and Shapiro (1983) examine the incentives firms need to maintain reputations. A simple model illustrates their arguments. Suppose a monopolist firm can offer high (H ) or low quality (L) every period, with H costlier to produce than L: ch > cl . Suppose also that there are N consumers in the market, each looking to buy at most one unit of the product. Assume for simplicity that all consumers buy at the same time or, equivalently, that there is instantaneous word-of-mouth from one consumer to all. Consumers’ reservation prices are vh and vl for products known to be high-quality and low-quality, respectively, with vh > vl . Assume vh − ch > vl − cl , i.e., the firm would prefer to offer high quality if consumers were perfectly informed about quality. The product, however, is an experience good and consumers only observe price before purchase. They observe quality after purchase. We now analyze the firm’s behavior in a “fulfilled expectations” equilibrium, i.e., an equilibrium in which consumers’ expectations about firm behavior match the firm’s actual behavior. In one such equilibrium, consumers expect low quality in every period and the firm delivers low quality every period. The more interesting equilibrium, however, is one in which the firm delivers a high quality product in every period when consumers expect it to do so. Such an equilibrium is sustained by consumer beliefs that punish the firm for reneging on its “reputation for high quality.” Specifically, consumers’ beliefs are that the firm offers H unless proven otherwise, in which case, their future expectations are that the firm will deliver L.42 The existence of this equilibrium requires the firm to have no incentive to deviate from H in any period and, hence, never to offer L in any period. Given a discount factor ρ, the payoff along the equilibrium path is

ρ (ph − ch )N πh = 1−ρ whereas (assuming vl > cl ) the payoff along the deviation path of making a low quality product, but selling it as a high quality product once, is 2

ρ max {0, (vl − cl )} N. πl = ρ(ph − cl )N + 1−ρ The reputation equilibrium is sustained, therefore, if and only if

1 (ch − cl ) + max {cl , vl } ph ≥ ph∗ = ρ 42 Such a punishment may appear draconian—and we will have more to say about whether real-world

firms get punished this way—but for now the theoretical point is that it is the possibility of punishment that provides firms the incentive to maintain reputations.

5 Branding and firm strategy

Since we also need vh ≥ ph , this equilibrium is feasible if and only if vh ≥ ph∗

(22)

Obviously, if ρ, the discount factor, is small enough, this condition cannot hold. On the other hand, if ρ is large enough and ch −cl small enough, it is possible to find a ph ∈ (ph∗ , vh ]. In short, if the firm has a sufficient stake in the future, and consumers are willing to pay a sufficient premium for high quality, then the firm is willing to maintain its reputation for high quality by offering high quality, foregoing the shortterm incentives to “harvest” its reputation. Shapiro (1982) extends a model like this to continuous time, and assumes that both quality and price are continuously adjustable. In addition, he allows for general reputation functions—i.e., reputation functions that do not instantaneously adjust to the last quality provided (as in the model above). The main result is that, with reputation adjustment lags, the firm will only be able to sustain less than perfect-information quality in a fulfilled-expectations equilibrium. Essentially, the firm has to pay a price for consumers’ imperfect monitoring technology when it is coupled with consumer rationality. Board and Meyer-ter Vehn (2013) extend this framework to long-lived investment decisions that affect quality and consider a variety of consumer learning processes. They find that when signals about quality are more likely to convey “good news” than “bad news,” a high-reputation firm has the incentive to run down its quality and reputation, while a low-reputation firm keeps investing to increase the possibility of good news. Conversely, when signals about quality are more likely to convey bad news, a low-reputation firm has weak incentives to invest, while a high-reputation firm keeps investing to protect its reputation. In practice, the extent to which reputation incentives discipline firms to deliver persistently high quality is an interesting direction for future empirical research. The recent experience of brands like Samsung, Tylenol, and Toyota, which rebounded quickly from product crises suggests that consumers might be forgiving of occasional lapses in quality, even major ones.43 A limitation of the reputation literature is the assumption that firms are consumerfacing and can be held accountable for their actions. For this reason, Tadelis (1999) notes that brands are natural candidates for reputations because they are observable, even when the firms that own them are not. For example, a consumer can hold a restaurant accountable for its performance across unobserved (to the consumer) changes in the establishment’s ownership as long as the restaurant’s name remains the same. Luca and Zervas (2016) find that restaurants that are part of an established branded chain are considerably less likely to commit review fraud on Yelp. They also find that independent restaurants are more likely to post fake reviews on Yelp when

43 See, for example, “Samsung rebounds with strong Galaxy S8 pre-orders after Note 7 disaster,” New

York Post, April 13, 2017. https://nypost.com/2017/04/13/samsung-rebounds-with-strong-galaxy-s8-preorders-after-note-7-disaster/.

333

334

CHAPTER 6 The economics of brands and branding

they are young and have weak ratings. By the same token, a new brand from an existing firm starts with a clean slate; thus a firm can undo the bad reputation of its existing brand by creating a new brand. An illustration of this point appears in a case study of residential plumbers. McDevitt (2011) finds that firms with a high track record of customer complaints typically changed their names. However, changing names is not costless: as we have noted above, besides the direct costs of doing so—researching names, registering the new name, etc.—the more significant expense is the cost of developing awareness of the new brand. Perhaps for this reason, companies whose corporate brands permeate their entire, large, product line—companies such as Ford, Sony, and Samsung—inevitably create sub-brands (a) to establish independent identities in multiple product categories, and (b) to insulate the corporate brand at least partially from the individual transgressions of one product. Examples include Mustang for Ford, Bravia for Sony, and Galaxy for Samsung. The idea that brands serve as repositories for reputation, and provide the right incentives for firms to maintain quality, is perhaps the most fundamental of all the ideas that the economics literature contributes to branding. Its power and empirical relevance is illustrated in a field experiment run on the Chinese retail watermelon market by Bai (2017). She randomly assigns markets either to a control condition, a traditional sticker on each watermelon identifying sellers, which is, however, frequently counterfeited, or to a laser-cut label, which is more expensive to implement and, hence, less likely to be counterfeited. Over time, Bai finds that sellers assigned to the laser-cut label start selling higher quality melons (based on sweetness) and earned a 30-40% profit increment due to higher prices and higher sales.44 These findings are consistent with the predictions of the reputational models above. In the domain of consumer goods, retailers have created brand images for their stores and chain through the assortment of manufacturer brands they carry: “Retailers use manufacturer brands to generate consumer interest, patronage, and loyalty in a store. Manufacturer brands operate almost as ‘ingredient brands’ that wield significant consumer pull, often more than the retailer brand does.” (Ailawadi and Keller, 2004, p. 2)

In some cases, retailers also use exclusive store brands or private labels to enhance the reputation of their stores and chain, by differentiating themselves through these exclusive offerings (Ailawadi and Keller, 2004). Dhar and Hoch (1997) report that a chain’s reputation correlates positively with the breadth and extent of its private label program. To shift their reputation from merely providing value, more recently, retailers have expanded their private label offerings into a full line of quality tiers, including premium private labels that compete head-on with national brands (see, e.g., Geyskens et al., 2010). Recent work suggests that private labels have closed the quality gap and have become vehicles for reputation themselves. For instance, Steenkamp

44 One year after the experiment, once the laser branding was removed, the market reverted back to its

original baseline outcome that was indistinguishable from a market with no labels at all.

5 Branding and firm strategy

et al. (2010) report that as private label programs mature and close the perceived quality gap, consumers’ willingness to pay premia for national brands decreases. The reputation literature underscores the role of time in establishing reputations. It is over time that a reputation develops, as the firm provides repeated evidence of fulfilling consumers’ expectations. A new brand coming into a market may therefore face a “start-up problem”—how to get going on the reputation journey when consumers are reluctant to try it even once. Possible solutions to this range from “introductory low prices,” to offering money-back guarantees, to “renting the reputation of an intermediary” (Chu and Chu, 1994; Choi and Peitz, 2018; Dawar and Sarvary, 1997; Moorthy and Srinivasan, 1995). Brand name reputation has value if and only if quality is not directly observable (Bronnenberg et al., 2015). With the advent of the Internet, independent websites providing direct information about quality have proliferated. As consumers increasingly rely on such websites for product information, the value of brand name reputation is bound to go down. Waldfogel and Chen (2006) noted this as early as 2006. They observed that consumers using information intermediaries such as BizRate.com substantially increased their shopping at “unbranded” retailers such as “Brands For Less,” at the expense of branded retailers such as Amazon.com. More recently, Hollenbeck (2017) and Hollenbeck (2018) have examined the revenue premium enjoyed by chain hotels over independent hotels and observed that it has shrunk over the period 2000-2015, just as online review sites such as TripAdvisor have increased in popularity.

5.3 Branding as a signal Much of what the industry refers to as “branding” activities would appear to the economist as “uninformative advertising,” i.e., advertising that is devoid of credible product quality information. In a series of seminal papers, Nelson (1970) argued that seemingly uninformative advertising for experience goods may nevertheless convey information if there exists a correlation between quality and advertising spending. Assuming consumers can perceive this correlation, it would be rational for them to respond to such advertising. Then the “money-burning” aspect of advertising will have signaling value and a brand value can be established through the mere act of spending money on advertising associated with the brand. A small literature has emerged that attempts to formalize Nelson’s ideas. Among these efforts are Kihlstrom and Riordan (1984), Milgrom and Roberts (1986), Hertzendorf (1993), and Horstmann and MacDonald (2003). In all of these papers, a key necessary condition for Nelson’s argument to work is the existence of a positive correlation between quality and advertising spending in equilibrium. This condition requires that the returns to advertising be greater for a high quality manufacturer than for a low quality manufacturer even after accounting for the latter’s potential incentive to copy the former’s advertising strategy (and thus fool consumers into thinking that its quality is higher than it actually is). In general, this condition is difficult to establish, as illustrated by both Kihlstrom and Riordan (1984) and Milgrom and

335

336

CHAPTER 6 The economics of brands and branding

Roberts (1986). The former works in a free-entry framework, with firms behaving as price-takers, and living for two periods. Firms decide whether to advertise or not at the beginning of the first period. In doing so they trade off the advertising benefit of being perceived by consumers as a high quality firm, which fetches higher prices, and the financial cost of advertising spending. As Kihlstrom and Riordan’s analysis demonstrates, it is possible to sustain an advertising equilibrium of the kind Nelson envisaged only under unrealistic cost assumptions or unrealistic informationtransmission assumptions. For instance, if consumers learn true quality in the long run—the second period, in Kihlstrom and Riordan’s framework—then marginal costs cannot be lower for the lower-quality product (for if they were lower, then the lowerquality firm may also be tempted to advertise). On the other hand, if marginal costs are assumed to be lower for the lower-quality product, then it must be assumed that high-quality manufacturers will never be discovered to be high quality (if they do not advertise). This condition rules out, for example, consumers spreading the word about “bargains”—high quality products sold at low prices in the first period because they were mistakenly identified as low quality products due to their lack of advertising. Milgrom and Roberts’s (1986) monopoly model shows that additional issues arise when prices are chosen by the firm. If advertising signals quality, then it is likely that the higher quality firm would also want to choose a higher price. But if prices also vary with quality, consumers can infer quality from the price rather than the advertising. It is unclear why, in a static model, a firm would need to burn money on advertising if it can signal quality through its prices. However, in a dynamic model with complete information about quality in the second period—akin to the first set of information-transmission assumptions in Kihlstrom and Riordan’s framework above—advertising may be needed to signal quality, but only if marginal costs increase in quality (in contrast with Kihlstrom and Riordan’s conditions). But even if marginal costs do increase in quality, the necessity of advertising to signal quality is not guaranteed. In Milgrom and Roberts’s own words: “advertising may signal quality, but price signaling will also typically occur, and the extent to which each is used depends in a rather complicated way, inter alia, on the difference in costs across qualities.” Given the theoretical difficulties in establishing a signaling role for uninformative advertising, perhaps it is not surprising that empirical attempts to find a correlation between advertising and quality have turned out to be inconclusive. Several empirical studies have relied on the correlation between advertising spending and consumers’ perception of product quality using laboratory studies (e.g., Kirmani and Wright, 1989) and transaction histories (e.g., Erdem et al., 2008).45 However, the correlation 45 Ackerberg (2001) is able to identify an informative role of advertising, separately from its consumption

role—what he calls “the prestige effects of advertising”—by contrasting the purchase behavior of “new consumers” and “experienced consumers.” However, as he notes, he can’t identify how advertising is informative: “There are a number of different types of information advertising can provide: explicit information on product existence or observable characteristics, or signaling information on experience characteristics.

5 Branding and firm strategy

one seeks is between “objective quality”—the quality actually built into the product, the sort of quality that might impact production costs—and advertising spending, and not “perceived quality” and advertising spending. As Moorthy and Hawkins (2005) have noted, a correlation between consumers’ perceptions of quality and advertising spending can occur through a variety of mechanisms, not necessarily Nelson’s mechanism. Turning now to the studies examining objective quality-advertising spending correlations, Rotfeld and Rotzoll (1976) find a positive correlation between advertising and quality (as reported in Consumer Reports and Consumers Bulletin) across all brands in their study of 12 product categories, but not within the subset of advertised brands. In a more comprehensive study, using a sample frame of 196 product categories evaluated by Consumer Reports, Caves and Greene (1996b) find generally low correlation between advertising spending and objective quality. They conclude: “These results suggest that quality-signalling is not the function of most advertising of consumers goods.” More recently, in a case study of the residential plumbing industry, McDevitt (2014) documents a novel use of branding as a signal of product quality. He finds that plumbers with names beginning with an A or a number, placing them at the top of alphabetical directories, “receive more than five times as many complaints with the Better Business Bureau, on average, and more than three times as many complaints per employee” (McDevitt, 2014, p. 910). This result is shown to be consistent with a signaling theory with heterogeneous consumer types in addition to firms with heterogeneous qualities. In equilibrium, low-quality firms use easy-to-find names that cater to low-frequency customers with low potential for repeat business and who will not find it beneficial to engage in costly search to locate the best firms. High-quality firms are less interested in such customers, focusing instead on customers with extensive needs who will devote more effort to searching for a good firm with which they can continue to engage in the future. These results corroborate Bagwell and Ramey’s (1993) observation that cheap talk in advertising can serve the function of matching sellers to buyers. With the rise of online marketplaces with well-established customer feedback mechanisms, it may be interesting to study whether the informational role of brands on consumer choices begins to erode in online markets. For instance, Li et al. (2016) discuss how Taobao’s “Rebate-for-Feedback” feature46 creates a similar equilibrium quality signal as in Nelson’s (1970) money-burning theory of advertising. Of course, signaling is not the only framework in which to interpret uninformative advertising. For instance, in the marketing literature, it is widely believed that such

It would be optimal to write down and estimate a consumer model including all these possible informative effects. Unfortunately, such a model would likely be computationally intractable, and more importantly, these separate informative effects would be hard, if not impossible, to empirically distinguish given my data set.” 46 Sellers have the option to pay consumers to leave feedback about the seller, where the payment is based on a Taobao algorithm that determines whether feedback is objectively informative.

337

338

CHAPTER 6 The economics of brands and branding

advertising is useful to create brand associations that help differentiate the brand in the consumer’s mind (see Keller, 2012, Chapter 2 for a survey). More recently, the economics literature has also recognized such a role for advertising via Becker and Murphy’s (1993) notion of “advertising as a good.”

5.4 Umbrella branding 5.4.1 Empirical evidence Many new products are brand extensions that leverage the reputation and/or goodwill associated with an established brand, a practice often termed “umbrella branding” or “brand stretching.” Examples abound including Arm & Hammer, originally a baking soda, which has been extended to toothpaste, detergent, and cat litter; and Sony, a brand name created for a transistor radio in 1955, which has been extended to televisions, computers, cameras, and many other categories. According to Aaker (1990), forty percent of the new products launched in US supermarkets between 1977 and 1984 were brand extensions. Among 7,000 new products launched in supermarkets during the 1970s, only 93 grossed over $15 million and two thirds of these were brand extensions. The entire institution of business-format franchising relies on umbrella branding for its consumer-side effects. In spite of the high incidence of umbrella branding, the empirical evidence for spillovers in consumer quality beliefs is limited. Erdem and Winer (1999) fit a structural model of demand to consumer purchase panel data for toothbrushes and toothpaste. The parameter estimates imply correlation in how consumers perceive a brand across categories. Using the same data, Erdem (1998) fits a structural model of demand with Normal Bayesian learning about product qualities in the two categories. Her parameter estimates imply that consumers’ prior beliefs about brand qualities are correlated between the two categories, which would allow for learning spillovers. Erdem and Sun (2002) extend the model to allow for learning effects from marketing decisions like advertising and promotion. The parameter estimates imply that advertising and promotion not only reduce uncertainty about product quality, these effects can spillover across the two categories. Any failures or negative associations with the extension could harm the original brand’s “reputation” (Aaker, 1990).47 The reputational cost from extending a brand to a low-quality new product also potentially creates an implicit exit cost if the new product fails, damaging the reputation of the brand and any future profit opportunities from the brand including the sales of established products. Thomas (1996) 47 Consumer psychologists have found mixed evidence for such spillovers. For instance, a poor experi-

ence with a new brand extension may be attributed to the extension component and not to the original brand, limiting feedback in the original category (Van Osselaer, 2008). Alternatively, the match in the specific associations evoked by a brand may also affect the success of an extension. Broniarczyk and Alba (1994) give the example of different brands of toothpaste, some of which carry the association of being superior in tooth cleaning, others of freshness. Even though a tooth cleaning brand may be liked better in the original category (tooth paste), the brand associated with freshness can be evaluated more positively when extended into the category mouth-wash.

5 Branding and firm strategy

conjectures that this exit cost creates a credible entry-deterring motive for brand extensions. The empirical evidence is mixed. In case studies of the US beer, coffee, and soft drink categories, Thomas (1995) finds that firms with established brand leaders are typically first to enter new sub-markets. However, in a comprehensive analysis of 95 brands across 11 CPG categories, Sullivan (1992) finds that new brands typically enter earlier into new product markets than brand extensions. However, brand extensions that enter later are more likely to succeed in the long run and typically exhibit above-average market shares after controlling for order-of-entry and advertising. In the next two sub-sections, we analyze theoretically the reputational effects of umbrella branding on a firm’s incentives to supply high quality, and on the signaling benefits of umbrella branding relative to the creation of entirely new brands.

5.4.2 Umbrella branding and reputation We first discuss how a brand reputation incentivizes a firm to supply high quality to all the products under the common brand umbrella. The idea of holding brand reputation as hostage in return for quality assurance acquires even more power when the same brand is applied to several products. Now, when brand reputation suffers, many products suffer,48 not just one (Cabral, 2009; Rasmusen, 2016; Hakenes and Peitz, 2008). In this case, the incentive for a firm to maintain quality is even higher than in the previous section. To illustrate, consider the following variation on the model presented in Section 5.2 above. Suppose the firm controls not just one product, but two products, denoted 1 and 2. The notation above carries over wholesale, with superscripts denoting product-specific quantities. Suppose inequality (22) is satisfied for product 1, but not for product 2, i.e., vh1 ≥ (1/ρ)(ch1 − cl1 ) + max cl1 , vl1 , but vh2 < (1/ρ)(ch2 −

cl2 ) + max cl2 , vl2 . In short, if these products were separately branded, the monopolist firm would offer only product 1 in high quality, pricing it at vh1 . Product 2 would be offered at low quality only because it is impossible to find a price for this product that simultaneously is low enough to appeal to consumers and high enough to provide incentives for the firm not to renege on a high-quality reputation. Umbrella-branding changes the quality incentives for the firm. Brand reputation is now a two-dimensional object. Accordingly, punishments, too, can be twodimensional. As Cabral (2009) notes, several punishment regimens are available, ranging from lenient to draconian, depending on whether consumers punish one or both products for the indiscretions of one or both. The most draconian punishment consists of assigning low quality expectations forever to both products, even after only one low-quality deviation by one of the products. Such a draconian punishment regimen also means that any deviation will be “all-or-nothing”: if both products will be punished for a single deviation, the optimal deviation is to deviate on both forever.

48 The empirical evidence for such spillovers comes from Sullivan (1990).

339

340

CHAPTER 6 The economics of brands and branding

The payoff in a potential equilibrium involving high quality on both products is

ρ ρ (vh1 − ch1 )N + (ph2 − ch2 )N 1−ρ 1−ρ whereas the payoff in a deviation is

ρ2 max 0, (vl1 − cl1 ) N 1−ρ 2

ρ 2 2 max 0, (vl2 − cl2 ) N. + ρ(ph − cl )N + 1−ρ

ρ(vh1 − cl1 )N +

A reputation-for-high-quality-on-both-products is sustained, therefore, if and only if

1 1 (ch2 − cl2 ) + max cl2 , vl2 − vh1 − (ch1 − cl1 ) − max cl1 , vl1 ph2 ≥ ρ ρ If the term in square brackets is strictly positive, then the

price-threshold for product 2 is reduced to below (1/ρ)(ch2 − cl2 ) + max cl2 , vl2 , making it possible for vh2 to exceed the threshold. Note that this construction works only with the most stringent punishment regimen. Anything more lenient, such as punishing only when both products have low quality, or punishing only the product which has low quality, would make it impossible to sustain a reputation for high quality on both products. Cabral (2009) and Hakenes and Peitz (2008) explore how imperfect monitoring affects umbrella branding incentives. The idea consists of distinguishing between the firm’s inputs and how the product performs. Only the former is controllable by the firm. Consumers observe product performance, but cannot observe the firm’s inputs—its “intent.” In this context, poor product performance could be the accidental bad outcome of “high quality intent” or the intentional bad outcome of “low quality intent.” Now it is possible for punishments to be “too strong.”49 Instead of encouraging high quality, punishments could have the opposite effect: the firm, recognizing that it is unable to prevent punishments even with the best intentions, sees no point in trying to sustain a high-quality reputation. The fault, however, is not with umbrella branding, but rather with the punishment regimen. It is still better to be umbrella-branded rather than not, because umbrella branding allows more flexible punishments. Cabral (2009), Hakenes and Peitz (2008), and Rasmusen (2016) only consider cases where consumers hold the brand accountable for product failures. But, as Choi’s (1998b) work suggests, the brand may also be held accountable for bad extensions. In other words, product failure may hurt brand reputation not only because the brand 49 In practice, punishments are rarely very strong. For instance, both Lexus (Woodyard, 2012) and Sam-

sung (Jeong, 2016) have had serious product recalls involving billion dollar write-offs, but nevertheless their respective brand reputations seem to have rebounded fairly quickly.

5 Branding and firm strategy

cannot be trusted to deliver high quality, but also because the brand cannot be trusted to be extended to other high quality products. Under those circumstances, subsequent extensions may not be purchased. While this literature is helpful in suggesting that umbrella branding strengthens the incentive to maintain reputations, it does not provide any guidance on what brand extensions a firm ought to pursue. A large literature in consumer psychology finds that fit between the parent brand and the extension category is important to the success of a brand extension (e.g., Aaker and Keller, 1990; Broniarczyk and Alba, 1994). This literature finds that perceived similarity of the category or usage situation, and the relevance of brand parent associations, makes a reputable brand (e.g., a toothpaste brand) more effective at extending its reputation into a closely related category (e.g., the mouthwash category versus a more distant category like shaving cream).

5.4.3 Umbrella branding and product quality signaling In the previous section, we analyzed the reputational effects of umbrella branding on product quality provision. We now analyze the case where a firm with a product line can use umbrella branding to signal quality. In contrast with the previous reputational role of brands, where brands merely carried a reputation, now the firm decides whether or not to umbrella brand. Of interest is whether a firm with high quality products can use umbrella branding to signal that a new product is also “high quality.” In a pioneering paper, Wernerfelt (1988) conceptualizes the problem as follows. Suppose a monopolist is endowed with two products, “old” and “new,” each of which, independently, can be “high” (h) or “low” quality (). Product performance is assumed to provide only an imperfect indication of quality: high quality products always work and low quality products work (w) or fail (f ), with probabilities θ and 1 − θ , respectively. Consequently, consumers do not observe product quality before or, potentially, after purchasing the product. When the new product is introduced in period 0, consumers observe the old product’s performance (w or f ); but not its quality. The firm does not observe the old product’s performance and decides whether or not to umbrella brand (B) or to create a new brand (N ). Umbrella branding costs β more than new branding. After the new product is introduced and purchased, its performance is observed (period 1), and consumers must then decide whether to buy the old product again. If consumers have beliefs that product quality is high with probability P , then their purchase decisions generate a revenue stream of x(P ), where x(·) is an increasing function. Suppose consumers have an initial prior that each of the two products is high quality with probability η ∈ (0, 1). Thus, the prior probability of (h, h) is η2 , the prior probability of (h, ) is η(1 − η), and so on. Of interest is whether umbrella branding can generate quality beliefs of (h, h) with probability 1 in equilibrium. Wernerfelt (1988) shows that such beliefs can be sustained if the relative cost of umbrella branding, β, is not too large. If β is small enough, there exists a separating equilibrium in which only the firm with two high-quality products umbrella brands, while all other firm types choose to use a new brand. The following equations characterize the sepa-

341

342

CHAPTER 6 The economics of brands and branding

rating equilibrium (here the notation π(B; n, o) (resp. π(N; n, o)), n, o = h, , refers to the firm’s profits when it umbrella brands (introduce a new brand) when its new product is of quality n and its old product is of quality o): π(B; h, h) = x(1) + [x(1)] − β ≥ x(P n |{w, N}) + x(P o |{w, N, w}) = π(N; h, h)

(23)

π(B; h, ) = x(1) + [θ x(1) + (1 − θ )x(ϕ)] − β ≤ x(P n |{w, N}) + [θ x(P o |{w, N, w}) + (1 − θ )x(P o |{w, N, f })] = π(N; h, ) (24) π(B; , h) = θ x(1) + (1 − θ )x(ψ) + [θ x(1) + (1 − θ )x(ϕ )] − β ≤ θ x(P n |{w, N}) + (1 − θ )x(P n |{f, N}) + θ x(P o |{w, N, w}) = π(N; , h) (25) π(B; , ) = θ x(1) + (1 − θ )x(ψ) + θ[θ x(1) + (1 − θ )x(ϕ)] + (1 − θ )[θ x(ϕ ) + (1 − θ )x(ϕ )] − β ≤ θ x(P n |{w, N}) + (1 − θ )x(P n |{f, N }) + θ [θ x(P o |{w, N, w}) + (1 − θ )x(P o |{w, N, f })] = π(N; , )

(26)

These equations ensure that umbrella branding will only signal that both new and old products are high quality if firms with (h, h) prefer to umbrella brand (23), but other firm types do not (24)-(26). It is easy to see that if umbrella branding does not cost too much, (23) will be satisfied by virtue of “signaling expectations.” The challenge consists of verifying that similar expectations would not tempt other firm types to choose umbrella branding. The key for separation lies in the off-equilibrium beliefs: consumers’ beliefs when they observe one or both of the umbrella branded products failing, which should not arise under (h, h). In the equations above, these are: (1) if the old product has failed in period 0, then consumers believe that the new product is good with probability ψ and the old product is good with probabilities ϕ and ϕ , depending on whether the new product works or fails, respectively, and (2) if the old product works in period 0 but the new product fails in period 1, then consumers believe that the old product is good with probability ϕ. In a perfect Bayesian equilibrium, off-equilibrium beliefs are a free parameter. Wernerfelt (1988) sets these beliefs to be very “negative”: ϕ = ϕ = ϕ = 0. The assumption ϕ = 0 is particularly draconian: even if the old product has worked in period 0, consumers are asked to view the old product as low quality with probability 1 based on the new product’s failure under umbrella branding. This is analogous to the “collective punishment” regimen we saw in Section 5.4.2 above in the reputations literature. Several subsequent writers have objected to Wernerfelt’s (1988) assumption that umbrella branding costs more than creating a new brand. After all, one of the leading arguments for umbrella branding is that one doesn’t have to undertake the expense of creating a new brand. In a variant of Wernerfelt’s (1988) model, with cost-neutrality

5 Branding and firm strategy

instead of umbrella branding costing more, Cabral (2000) shows that if old and new products are of the same quality and quality is continuous, then brand extension can still have some signaling value. However, since the signal is binary, and quality lies on a continuum, it can only separate qualities into regions: a “higher quality band” chooses to umbrella brand while a “lower quality band” chooses not to. More recently, Miklós-Thal (2012), also assuming cost-neutrality, but with ex ante independence between the two products, finds that Wernerfelt’s (1988) equilibrium cannot exist in her model. In an extensive critique of Wernerfelt’s (1988) model, Moorthy (2012) argues that signaling quality is a weak basis for justifying brand extensions. Besides the problematic assumption that brand extension costs more than creating a new brand, he shows that the signaling equilibrium relies on off-equilibrium beliefs that are poorly motivated in the model. For instance, Wernerfelt (1988) assumes that consumers must penalize the old product if the new product does not perform well, even if they have observed the old product perform well and they believe that the products’ qualities are uncorrelated (indeed, even negatively correlated). No empirical evidence supports such beliefs.50 These collateral-damage considerations are essential to the argument, but, in most real-world brand extension situations, the old product’s quality is rarely in doubt. In fact, brand extensions are only undertaken after old products have solidified the reputation of the brand performing well (Sullivan, 1992).51 This suggests that Choi’s (1998b) model provides a better basis for finding a signaling role for brand extensions. First, in his model brand extension works in conjunction with price to signal new product quality. The old product’s quality plays no role. Second, brand extension is not necessary to signal the new product’s quality. Price could have done so, too, but, by itself, it would have to bear a heavier signaling burden, with an attendant loss of profit. Brand extension reduces the signaling burden, similar to advertising (e.g., Milgrom and Roberts, 1986); but unlike advertising, it may not involve out-of-pocket costs. Third, brand extension’s signaling role is sustained by consumers’ implicit threat that if a firm “cheated”—extended a brand to a low quality product—then consumers would no longer take any signal from the brand. Future brand extensions would be treated like new brands. Price would then have to bear the entire signaling burden, reducing profits. Choi (1998b) thus provides a reputational basis for why consumers might want to assume that a new product

50 Not even Sullivan (1990). Her documentation of “image spillovers” from the Audi 5000 to other Audi

models, following a 60 Minutes story in 1986 alleging sudden-acceleration issues with the Audi 5000, do not negate this statement because the Audi situation was patently one in which consumers would be well-justified to assume a positive correlation in quality among all Audi models. Consumer psychologists also report mixed evidence of spillovers because to them spillovers are mediated by “fit.” By “fit” they mean whether the brand in its original application evokes thoughts that are consonant with the thoughts evoked by the new application (Broniarczyk and Alba, 1994). For example, the brand Exxon extended into chocolate milk would show poor fit, whereas the brand Haagen Daz doing the same extension would be well-regarded. When there is poor fit, the feedback effects are likely to be minimal (Van Osselaer, 2008). 51 For example, only successful movies are sequeled, not movies that fail.

343

344

CHAPTER 6 The economics of brands and branding

bearing an existing brand name provides the same good quality that old products under the same brand have been known to provide.

5.5 Brand loyalty and equilibrium pricing Section 3.1 discussed the empirical role of psychological switching costs as a source of brand loyalty. Brand loyalty from switching costs introduce two countervailing incentives for firms’ pricing decisions. The persistence in demand motivates firms (1) to prospect for new customers (the “investment” motive), and (2) to exploit the loyalty of existing customers (the “harvesting” motive). Therefore, the net effect of switching costs on the equilibrium prices in a market is not clear a priori. Klemperer (1995) and Farrell and Klemperer (2007) conclude that there is a “strong presumption” that switching costs soften price competition, leading to higher equilibrium prices. Therefore, the conventional wisdom asserts that brand loyalty leads to higher prices. The conventional wisdom is based on an early theoretical literature that studied stylized analytical models of dynamic oligopoly in which the harvesting motive outweighs the investment motive. In a two-period model, Klemperer (1987) derives the well-known “bargain then rip-off” result whereby firms compete for market share in the first period and then raise their prices in the second terminal period. In an infinite-horizon model of homogeneous goods and overlapping consumer generations, switching costs allow firms to raise their prices above marginal costs and to derive supra-normal economic profits (Farrell and Shapiro, 1988; Padilla, 1995; Anderson et al., 2004). Similarly, in an infinite-horizon model with differentiated products and perfect lock-in (i.e. infinite switching costs), equilibrium prices are also higher than in the absence of switching costs (Beggs and Klemperer, 1992). Most of the key assumptions in this literature are unlikely to hold in consumer markets where products are differentiated, price competition is not subject to a “terminal period,” and consumer loyalty is imperfect in the sense that consumers switch brands over time even in the absence of any price adjustments. Using empirically estimated demand for branded consumer goods, Dubé et al. (2009) compute the corresponding steady-state equilibrium prices under different magnitudes of switching costs. At the empirically-estimated magnitudes of switching costs, they find that equilibrium prices would be lower in the presence of switching costs. Moreover, they find that switching costs would need to be at least four times their empirically-estimated magnitudes before they would lead to higher equilibrium prices. The extant literature has not analyzed the impact of brand loyalty on other marketing decisions. Early work by Brown (1952, p. 257) observed that “If brand loyalty is high, the advertiser has a good case for ‘investment’ expenditures where large amounts are expended over short periods of time to win new users in the knowledge that continued purchases after the advertising has been curtailed will ‘amortize’ the advertising investment.”

Shum (2004) quantifies the extent of advertising expenditures required to over-come loyalty. But a lingering direction for future research might be to analyze how, in

5 Branding and firm strategy

equilibrium, price and advertising competition is moderated by consumers who form brand loyalties or shopping habits.

5.6 Brand loyalty and early-mover advantage In Section 3 we discussed the evidence for, and the motivations behind, persistence in consumers’ brand choices and the emergence of brand loyalty over medium-term horizons of several years. A separate literature has documented the persistence of market-share leadership by brands over much longer horizons spanning decades. A survey-based study in the business press found that for 25 large consumer products categories, 20 of the top brands in 1923 were still dominant in 1983, more than half a century later.52 All 25 brands were still ranked among the top 5 in the category in 1983.53 Persistence in dominance has also been documented for the initial entrants into a new product category, the “pioneering advantage” (Robinson and Fornell, 1985; Urban et al., 1986; Lambkin, 1988; Robinson, 1988; Parry and Bass, 1990; Kerin et al., 1992; Brown and Lattin, 1994). A similar persistence of dominance dating back to 1933 is reported for consumer brands in the UK (Keller, 2012, p. 21). Kalyanaram et al. (1995) provide a thorough survey of the literature along with empirical generalizations regarding the negative correlation between historic order of entry and current market share. While the exact definition of a “pioneer” is under debate (Golder and Tellis, 1993), the common finding across these studies is the evidence of persistence in the market shares for early movers that “survive” long-term. Since this work typically relies on a single time-series data set for any given industry, the results are subject to the usual identification concerns regarding the role of state-dependence (early-mover status) versus heterogeneity in managerial skills.54 More recently, Bronnenberg et al. (2009) use a panel approach to test for persistent, early-mover advantages in market shares versus heterogeneity with CPG data spanning multiple US cities. For each city, they obtain the exact date of entry for the leading brands in the category. Since most of the brands launched during the late 1800s, long before marketing and distributional technology existed to coordinate a national launch, the key identifying assumption of exogenous entry timing across cities may not be problematic. In six case studies, they find that the historic order-ofentry (often a century earlier) among survivor brands in a geographic market predicts the current rank-order of market shares in that market. These findings are visualized in Fig. 2 which plots the geographic distribution of brand shares for the Ground Coffee category across US cities. The diameter of each circle is proportional to a brand’s

52 Advertising Age (1983), “Study: Majority of 25 Leaders in 1923 Still On Top” (September 19), 32. 53 Golder (2000) extended this analysis to 100 categories and using more reliable 1997 market share data.

He finds that only 23% of the dominant firms in 1923 remain dominant in 1997; although nearly 50% remain in the top 5. 54 One exception is (Brown and Lattin, 1994) who use a cross-section of markets with no within-market variation. Unfortunately, in their data the first entrant is the same in 37 of the 40 studied markets.

345

346

CHAPTER 6 The economics of brands and branding

FIGURE 2 The geographic distribution of Ground Coffee brand market shares in the US. Source: Bronnenberg et al. (2009).

market share in that city, and shading indicates the earlier entrant. Historic order-ofentry in a geographic market also predicts the current rank order of brands’ perceived quality levels as measured by Young & Rubicam’s 2004 Brand Asset Valuator survey. For 49 of the top two national brands in 34 CPG categories, Bronnenberg et al. (2009) are able to identify the city-of-origin (although not the complete roll-out history). They find a strong correlation between a brand’s share in a given market and the Euclidean distance to its market of origin. In particular, a brand’s share is on average 20 percentage points higher in the market of origin than in a distant market more than 2,500 miles away. This finding is consistent with the historic diffusion of brands launched in the late 19th and early 20th centuries with entry in more distant markets occurring relatively later.55 Collectively, the persistent, early-mover effects for brands suggests an important role for branding in the shaping of the market structure of consumers goods industries.

55 See for instance (Bartels, 1976; Tedlow, 1990) for detailed discussions of how entrepreneurs in the late

19th century with new consumer brands gradually rolled them out across the US.

6 Conclusions

A number of mechanisms may yield first-mover advantages, and not all of them are brand loyalty-related. For example, Robinson and Fornell (1985) and Robinson (1988) note that first movers can benefit from a lower-cost position, which they achieve by riding further along the learning curve, or simply through scale and scope economies. On the demand-side, first mover advantages can arise because the first mover, having first use of the product space, is able to choose the most desirable product position (Lane, 1980; Moorthy, 1988; Sutton, 1991), or several such positions, effectively pre-empting the product space through product proliferation (Schmalensee, 1978, 1982; Judd, 1985). The first mover can also gain an advantage by using its additional time on the market to “perfect” its product—more generally, its entire marketing mix (Robinson and Fornell, 1985; Robinson, 1988)—or create more awareness and/or marketing-based goodwill (Doraszelski and Markovich, 2007), or simply because consumers, having experienced the first product in the market, are reluctant to try a new product whose experience attributes they are uncertain about (Schmalensee, 1982). In industrial goods, the cost-related mechanism may prove decisive (Robinson, 1988), whereas among consumer goods, the demand-related mechanisms may prove decisive.

6 Conclusions Brands and branding are central to the understanding of the market structure of consumer goods industries. On the demand side, we have discussed three potential effects of branding. First, brands may enhance the consumption utility of branded goods relative to unbranded substitutes. Second, brands may reduce search costs and stimulate consumer consideration. Finally, brands may signal product quality and affect consumer demand through reputation. These different influences of branding on consumers generate several striking patterns in purchase behavior including brand loyalty and inertia, a form of switching costs, and longer-term persistence in brand choice as consumers learn and change their beliefs about quality of branded goods. An interesting direction for future research consists of testing the exact mechanisms through which brands ease the consumer search and consideration process. The literature has yet to parse the extent to which estimated brand value to consumers reflects genuine preferences as opposed to the facilitation of search. A related direction for future research on the demand side consists of modeling how consideration sets are formed over time and distinguishing between how a consumer becomes aware of brands and how this awareness influences consideration during a given purchase occasion. Of particular interest is the extent to which long-term industrial market structure is shaped by firms’ marketing investments to build consumer awareness and consideration for their brands. Future research may benefit from digging more into the psychological roots of consumer memory and the persistent effects memory creates for brand preferences. Moreover, memory may be an important moderator through which branding-related expenditures on the supply side become valuable brand assets in the long-term.

347

348

CHAPTER 6 The economics of brands and branding

On the supply side, there still is no consensus regarding the role of brand assets to firms and the mechanism through which these assets are built. Attempts to test the traditional signaling, or “money-burning,” theory of advertising have mostly been inconclusive. Some progress has been made on testing reputation theories. It is possible that brands can emerge under different circumstances, supporting a co-existence of each of these theories. Cross-industry studies will likely provide a fruitful direction for future research to determine which institutional factors support a specific branding theory within a market. The value of brands to firms manifests itself mostly through price-premia and brand-equity. Although there is a strong consensus among scholars and policy makers that brand names are valuable, intangible assets, there is no agreement on how to measure the value of brands. The General Accepted Accounting Principles (GAAP) offer no explicit guidance to firms about how to value brand capital that is internally created. Current business practice disregards internally created brand value as an asset. At the same time, and seemingly inconsistent, the activation of externally acquired brands on the balance sheet is commonly accepted. This practice likely misrepresents many marketing investments as expenses. A recent literature has proposed a more rigorous structural approach to measure brand value that defines the incremental profits of brands through counterfactuals. Nevertheless, more research is needed to tie the value of brands back to these demand components. Another direction for future research consists of analyzing how firms invest in building brands and how such “branding” will likely evolve. The reduced barriers to entry in the digital era have led to a rapid influx of new consumer products with a more customer-centric image and focus (Islam, 2018), potentially enhancing the role of a brand as a reputation signal and changing the manner in which firms build the brand’s image. Similarly, the digital era has expanded the set of channels through which consumers can buy brands and through which firms can actively invest in branding. While most of the extant branding management literature has focused on advertising investment, another interesting direction may consist of the role of distribution and availability on brand performance.56 Most important, the literature has mostly been silent on how branding and brand investments moderate product market competition. Another important direction for future research will be the analysis of whether branding softens or toughens price competition. In addition, more work is required to understand how competitive branding underlies the industrial organization of consumer goods markets and the extent to which it leads to persistent, concentrated market structures as suggested by recent empirical research.

56 In a large-scale study of over 200 new product launches, Ataman et al. (2008) find reduced-form evidence that distribution may be more highly associated with a new brand’s success than other marketing variables, such as promotions and advertising.

References

References Aaker, D.A., 1990. Brand extensions: the good, the bad, and the ugly. Sloan Management Review 31, 47–56. Aaker, D.A., Keller, K.L., 1990. Consumer evaluations of brand extensions. The Journal of Marketing, 27–41. Abaluck, J., Adams, A., 2017. What Do Consumers Consider Before They Choose? Identification from Asymmetric Demand Responses. Working Paper. Ackerberg, D.A., 2001. Empirically distinguishing informative and prestige effects of advertising. The Rand Journal of Economics 32 (2), 316–333. Ackerberg, D.A., 2003. Advertising, learning, and consumer choice in experience good markets: an empirical examination. International Economic Review 44 (3), 1007–1040. Adams, W.J., 2006. Markets: beer in Germany and the United States. The Journal of Economic Perspectives 20 (1), 189–205. Ailawadi, K.L., Keller, K.L., 2004. Understanding retail branding: conceptual insights and research priorities. Journal of Retailing 80, 331–342. Ailawadi, K.L., Lehmann, D.R., Neslin, S.A., 2003. Revenue premium as an outcome measure of brand equity. Journal of Marketing 67 (4), 1–17. Akerlof, G.A., 1970. The market for “lemons”: quality uncertainty and the market mechanism. The Quarterly Journal of Economics 84 (3), 488–500. Alba, J.W., Hutchinson, W., Lynch, J.G., 1991. Memory and decision-making. In: Handbook of Consumer Behavior. Prentice, Englewood Cliffs, NJ, pp. 1–49. Allen, J., Clark, C.R., Houde, J.-F., 2016. Search Frictions and Market Power in Negotiated Price Markets. Working Paper. Allison, R.I., Uhl, K.P., 1964. Influence of beer brand identification on taste perception. Journal of Marketing Research 1 (3), 36–39. Anderson, E.T., Kumar, N., Rajiv, S., 2004. A comment on: “Revisiting dynamic duopoly with consumer switching costs”. Journal of Economic Theory 116 (1), 177–186. Anderson, S.T., Kellogg, R., Langer, A., Sallee, J.M., 2015. The intergenerational transmission of automobile brand preferences. Journal of Industrial Economics 63 (4), 763–793. Arbatskaya, M., 2007. Ordered search. The Rand Journal of Economics 38 (1), 119–126. Armstrong, M., Zhou, J., 2011. Paying for prominence. The Economic Journal 121 (556), F368–F395. Ataman, M.B., Mela, C.F., Heerde, H.J.V., 2008. Building brands. Marketing Science 27 (6), 1036–1054. Atkin, D., 2013. Trade, tastes, and nutrition in India. The American Economic Review 103 (5), 1629–1663. Axelrod, J.N., 1968. Attitude measures that predict purchase. Journal of Advertising Research 8 (1), 3–17. Bagwell, K., 2007. The economic analysis of advertising. In: Armstrong, M., Porter, R. (Eds.), Handbook of Industrial Organization, vol. 3. Elsevier, pp. 1701–1844. Bagwell, K., Ramey, G., 1993. Advertising as information: matching products to buyers. Journal of Economics and Management Strategy 2 (2), 199–243. Bai, J., 2017. Melons as Lemons: Asymmetric Information, Consumer Learning and Quality Provision. NBER Working Paper 191772. Bain, J.S., 1956. Barriers to New Competition. Harvard University Press, Cambridge. Bar-Isaac, H., Tadelis, S., 2008. Seller reputation. In: Foundations and Trends in Microeconomics, vol. 4. Now Publishers, pp. 273–351. Bartels, R., 1976. The History of Marketing Thought. Grid Publishers, Columbus, OH. Barth, M.E., Clement, M.B., Foster, G., Kasznik, R., 1998. Brand values and capital market valuation. Review of Accounting Studies 3, 41–68. Bass, F.M., Givon, M.M., Kalwani, M.U., Reibstein, D., Wright, G.P., 1984. An investigation into the order of the brand choice process. Marketing Science 3 (4), 267–287. Baumol, W.J., 1967. Calculation of optimal product and retailer characteristics: the abstract product. Journal of Political Economy 75 (5), 674–685. Becker, G.S., Murphy, K.M., 1988. A theory of rational addiction. Journal of Political Economy 96 (4), 675–700.

349

350

CHAPTER 6 The economics of brands and branding

Becker, G.S., Murphy, K.M., 1993. A simple theory of advertising as a good or bad. The Quarterly Journal of Economics 108 (4), 941–964. Beggs, A., Klemperer, P., 1992. Multi-period competition with switching costs. Econometrica 60, 651–666. Belo, F., Lin, X., Vitorino, M.A., 2014. Brand capital and firm value. Review of Economic Dynamics 17 (1), 150–169. Berkman, H.W., Lindquist, J.D., Sirgy, M.J., 1997. Consumer Behavior: Concepts and Marketing Strategy. NTC Business Books. Berry, S., Waldfogel, J., 2010. Product quality and market size. Journal of Industrial Economics LVIII (1), 1–31. Bettman, J.R., 1979. An Information Processing Theory of Consumer Choice. Addison-Wesley Publishing Co., Reading, MA. Bettman, J.R., Park, C.W., 1980. Effects of prior knowledge and experience and phase of the choice process on consumer decision processes: a protocol analysis. Journal of Consumer Research 7 (3), 234–248. Board, S., Meyer-ter Vehn, M., 2013. Reputation for quality. Econometrica 81 (6), 2381–2462. Borkovsky, R.N., Goldfarb, A., Haviv, A.M., Moorthy, S., 2017. Measuring and understanding brand value in a dynamic model of brand management. Marketing Science 36 (4), 471–499. Braithwaite, D., 1928. The economic effects of advertisement. The Economic Journal 38 (149), 16–37. Broniarczyk, S.M., Alba, J.W., 1994. The importance of the brand in brand extension. Journal of Marketing Research 31 (2), 214–228. Bronnenberg, B.J., Dhar, S.K., Dubé, J.-P., 2007. Consumer packaged goods in the United States: national brands, local branding. Journal of Marketing Research 44, 4–13. Bronnenberg, B.J., Dhar, S.K., Dubé, J.-P., 2009. Brand history, geography, and the persistence of brand shares. Journal of Political Economy 117, 87–115. Bronnenberg, B.J., Dhar, S.K., Dubé, J.-P., 2011. Endogenous sunk costs and the geographic differences in the market structures of CPG categories. Quantitative Marketing and Economics 9 (1), 1–23. Bronnenberg, B.J., Dubé, J.-P., 2017. The formation of consumer brand preferences. Annual Review of Economics 9, 353–382. Bronnenberg, B.J., Dubé, J.-P., Gentzkow, M., 2012. The evolution of brand preferences: evidence from consumer migration. The American Economic Review 102 (6), 2472–2508. Bronnenberg, B.J., Dubé, J.-P., Gentzkow, M., Shapiro, J.M., 2015. Do pharmacists buy Bayer? Sophisticated shoppers and the brand premium. The Quarterly Journal of Economics 130. Bronnenberg, B.J., Dube, J.-P., Mela, C.F., 2010. Do digital video recorders influence sales? Journal of Marketing Research XLVII (December), 998–1010. Bronnenberg, B.J., Kim, J.B., Mela, C.F., 2016. Zooming in on choice: how do consumers search for cameras online? Marketing Science 35 (5), 693–712. Brown, C.L., Lattin, J.M., 1994. Investigating the relationship between time in market and pioneering advantage. Management Science 40 (10), 1361–1369. Brown, G.H., 1952. Brand loyalty fact or fiction. Advertising Age 9, 53–55. Brown, G.H., 1953. Brand loyalty-fact or fiction. The Trademark Reporter 43, 251–258. Brynjolfsson, E., Smith, M.D., 2000. Frictionless commerce? A comparison of Internet and conventional retailers. Management Science 46 (4), 563–585. Cabral, L.M., 2009. Umbrella branding with imperfect observability and moral hazard. International Journal of Industrial Organization 27 (2), 206–213. Cabral, L.M.B., 2000. Stretching firm and brand reputation. The Rand Journal of Economics 31 (4), 658–673. Campbell, B., 1969. The Existence of Evoked Set and Determinants of Its Magnitude in Brand Choice Behavior. Ph.D. thesis. Columbia Graduate School of Business, Columbia University. Carlson, L., Grossbart, S., Walsh, A., 1990. Mothers’ communication orientation and consumersocialization tendencies. Journal of Advertising Research 19 (3), 27–38. Carpenter, G.S., Glazer, R., Nakamoto, K., 1994. Meaningful brands from meaningless differentiation: the dependence on irrelevant attributes. Journal of Marketing Research 31 (3), 339–350.

References

Carpenter, S.M., Yoon, C., 2011. Aging and consumer decision making. Annals of the New York Academy of Sciences 1235 (1), E1–E12. Catalina Media, 2013. Engaging the Selective Shopper. Discussion Paper. Catalina Media. Caves, R.E., Greene, D.P., 1996a. Brands’ quality levels, prices, and advertising outlays: empirical evidence on signals and information costs. International Journal of Industrial Organization 14 (1), 29–52. Caves, R.E., Greene, D.P., 1996b. Brands’ quality levels, prices, and advertising outlays: empirical evidence on signals and information costs. International Journal of Industrial Organization 14 (1), 29–52. Childers, T.L., Rao, A.R., 1992. The influence of familial and peer-based reference groups on consumer decisions. Journal of Consumer Research 19, 198–211. Choi, J.P., 1998a. Brand extension as informational leverage. The Review of Economic Studies 65 (4), 655–669. Choi, J.P., 1998b. Brand extension as informational leverage. The Review of Economic Studies 65 (4), 655–669. Choi, J.P., Peitz, M., 2018. You are judged by the company you keep: reputation leverage in vertically related markets. International Journal of Industrial Organization 61, 351–379. Chu, W., Chu, W., 1994. Signaling quality by selling through a reputable retailer: an example of renting the reputation of another agent. Marketing Science 13 (2), 177–189. Crawford, G.S., Shum, M., 2005. Uncertainty and learning in pharmaceutical demand. Econometrica 73 (4), 1137–1173. Dawar, N., Sarvary, M., 1997. The signaling impact of low introductory price on perceived quality and trial. Marketing Letters 8 (3), 251–259. Day, G.S., Pratt, R.W., 1971. Stability of appliance brand awareness. Journal of Marketing Research 8 (1), 85–89. DeGroot, M., 1970. Optimal Statistical Decisions. McGraw-Hill, New York. Dekimpe, M.G., Steenkamp, J.-B.E., Mellens, M., Vanden Abeele, P., 1997. Decline and variability in brand loyalty. International Journal of Research in Marketing 14, 405–420. Demsetz, H., 1962. The effect of consumer experience on brand loyalty and the structure of market demand. Econometrica 30 (1), 22–33. Dhar, S.K., Hoch, S.J., 1997. Why store brand penetration varies by retailer. Marketing Science 16 (3), 208–227. Dickstein, M.J., 2018. Efficient provision of experience goods: evidence from antidepressant choice. Manuscript. Dong, X., Morozov, I., Seiler, S., Hou, L., 2017. Estimating Search Models with Panel Data: Identification and Re-Examination of Preference Heterogeneity. Working Paper. Stanford University. Doraszelski, U., Markovich, S., 2007. Advertising dynamics and competitive advantage. The Rand Journal of Economics 38, 557–592. Draganska, M., Klapper, D., 2011. Choice set heterogeneity and the role of advertising: an analysis with micro and macro data. Journal of Marketing Research 48 (4), 653–669. Drolet, A., Schwarz, N., Yoon, C., 2010. The Aging Consumer: Perspectives from Psychology and Economics. Routlege, New York. Dubé, J.-P., Hitsch, G., Rossi, P., 2018. Income and wealth effects on private label demand: evidence from the great recession. Marketing Science 37 (1), 22–53. Dube, J.-P., Hitsch, G., Rossi, P.E., Simonov, A., 2018. State-Dependent Demand Estimation with Initial Conditions’ Correction. Chicago Booth School of Business Working Paper. Dubé, J.-P., Hitsch, G.J., Manchanda, P., 2005. An empirical model of advertising dynamics. Quantitative Marketing and Economics 3, 107–144. Dubé, J.-P., Hitsch, G.J., Rossi, P.E., 2009. Do switching costs make markets less competitive? Journal of Marketing Research 46, 435–445. Dubé, J.-P., Hitsch, G.J., Rossi, P.E., 2010. State dependence and alternative explanations for consumer inertia. The Rand Journal of Economics 41 (3), 417–445. Ellison, G., Fudenberg, D., 1995. Word-of-mouth communication and social learning. The Quarterly Journal of Economics 110 (1), 93–125.

351

352

CHAPTER 6 The economics of brands and branding

Erdem, T., 1998. An empirical analysis of umbrella branding. Journal of Marketing Research 35 (3), 339–351. Erdem, T., Keane, M.P., 1996. Decision-making under uncertainty: capturing dynamic brand choice processes in turbulent consumer goods markets. Marketing Science 15 (1), 1–20. Erdem, T., Keane, M.P., Öncü, T.S., Strebel, J., 2005. Learning about computers: an analysis of information search and technology choice. Quantitative Marketing and Economics 3, 207–246. Erdem, T., Keane, M.P., Sun, B., 2008. A dynamic model of brand choice when price and advertising signal quality. Marketing Science 27 (6), 1111–1125. Erdem, T., Sun, B., 2002. An empirical investigation of the spillover effects of advertising and sales promotions in umbrella branding. Journal of Marketing Research 39 (4), 408–420. Erdem, T., Swait, J., 1998. Brand equity as a signaling phenomenon. Journal of Consumer Psychology 7 (2), 131–157. Erdem, T., Swait, J., 2004. Brand credibility, brand consideration, and choice. Journal of Consumer Research 31 (1), 191–198. Erdem, T., Swait, J., 2014. Branding and brand equity models. In: The History of Marketing Science Handbook. World Scientific-Now Publishers, pp. 237–260. Erdem, T., Swait, J., Louviere, J.J., 2002. The impact of brand credibility on consumer price sensitivity. International Journal of Research in Marketing 19 (1), 1–19. Erdem, T., Winer, R.S., 1999. Econometric modeling of competition: a multi-category choice-based mapping approach. Journal of Econometrics 89, 159–175. Fader, P.S., Hardie, B.G.S., 1996. Modeling consumer choice among SKUs. Journal of Marketing Research 33 (4), 442–452. Farquar, P.H., 1989. Managing brand equity. Marketing Research (September), 24–33. Farrell, J., Klemperer, P., 2007. Coordination and lock-in: competition with switching costs and network effects. In: Armstrong, M., Porter, R. (Eds.), Handbook of Industrial Organization, vol. 3. Elsevier, pp. 1967–2072. Farrell, J., Shapiro, C., 1988. Dynamic competition with switching costs. The Rand Journal of Economics 19 (1), 123–137. Frank, R.E., 1962. Brand choice as a probability process. The Journal of Business 35 (1), 43–56. Friedman, M., 1962. Price Theory. Aldine Publishing Co., Chicago. Geyskens, I., Gielens, K., Gijsbrechts, E., 2010. Proliferating private-label portfolios: how introducing economy and premium private labels influences brand choice. Journal of Marketing Research 47, 791–807. Golder, P.N., 2000. Historical method in marketing research with new evidence on long-term market share stability. Journal of Marketing Research 37 (2), 156–172. Golder, P.N., Tellis, G.J., 1993. Pioneer advantage: marketing logic or marketing legend? Journal of Marketing Research 30, 158–170. Goldfarb, A., Lu, Q., Moorthy, S., 2008. Measuring brand value in an equilibrium framework. Marketing Science 28 (1), 69–86. Gordon, B., Sun, B.-H., 2015. A dynamic model of rational addiction: evaluating cigarette taxes. Marketing Science, 452–470. Green, P.E., Srinivasan, V., 1978. Conjoint analysis in consumer research: issues and outlook. Journal of Consumer Research 5 (September), 103–123. Green, P.E., Wind, Y., 1975. New way to measure consumers’ judgments. Harvard Business Review 53 (4), 107–117. Grubb, M.D., Osborne, M., 2015. Cellular service demand: biased beliefs, learning, and bill shock. The American Economic Review 105 (1), 234–271. Guadagni, P.M., Little, J.D., 1983. A logit model of brand choice calibrated on scanner data. Marketing Science 2, 203–238. Guest, L., 1955. Brand loyalty – twelve years later. Journal of Applied Psychology 39, 405–408. Hakenes, H., Peitz, M., 2008. Umbrella branding and the provision of quality. International Journal of Industrial Organization 26 (2), 546–556.

References

Handel, B., 2013. Adverse selection and switching costs in health insurance markets: when nudging hurts. The American Economic Review 103, 2643–2682. Hansen, K., Singh, V., 2015. Choice Concentration. UCSD Working Paper. Hastings, J., Hortacsu, A., Syverson, C., 2013. Advertising and Competition in Privatized Social Security: The Case of Mexico. National Bureau of Economic Research Working Paper 18881. Hauser, J.R., 1978. Testing the accuracy, usefulness, and significance of probabilistic choice models: an information-theoretic approach. Operations Research 26 (3), 406–421. Hauser, J.R., Wernerfelt, B., 1990. An evaluation cost model of consideration sets. Journal of Consumer Research 16 (March), 393–408. Heckman, J.J., 1981. Statistical models for discrete panel data. In: Structural Analysis of Discrete Data and Econometric Applications. The MIT Press. Heilman, C.M., Bowman, D., Wright, G.P., 2000. The evolution of brand preferences and choice behaviors of consumers new to a market. Journal of Marketing Research 37, 139–155. Hertzendorf, M.N., 1993. I’m not a high-quality firm-but I play one on TV. The Rand Journal of Economics, 236–247. Hicks, J., Allen, R., 1934. A reconsideration of the theory of value. Part I. Economica 1 (1), 52–76. Hicks, J.R., 1939. The foundations of welfare economics. The Economic Journal 49 (196), 696–712. Hitsch, G.J., 2006. An empirical model of optimal dynamic product launch and exit under demand uncertainty. Marketing Science 25 (1), 25–50. Holbrook, M.B., 1992. Product quality, attributes, and brand name as determinants of price: the case of consumer electronics. Marketing Letters 3 (1), 71–83. Hollenbeck, B., 2017. The economic advantages of chain organization. The Rand Journal of Economics 48 (4), 1103–1135. Hollenbeck, B., 2018. Online reputation mechanisms and the decreasing value of brands. Journal of Marketing Research 55 (5), 636–654. Honka, E., 2014. Quantifying search and switching costs in the U.S. auto insurance industry. The Rand Journal of Economics 45, 847–884. Honka, E., Hortaçsu, A., Vitorino, M.A., 2017. Advertising, consumer awareness, and choice: evidence from the US banking industry. The Rand Journal of Economics 48 (3), 611–646. Horstmann, I., MacDonald, G., 2003. Is advertising a signal of product quality? Evidence from the compact disc player market, 1983-1992. International Journal of Industrial Organization 21 (3), 317–345. Hortacsu, A., Syverson, C., 2004. Product differentiation, search costs, and competition in the mutual fund industry: a case study of S&P 500 index funds. The Quarterly Journal of Economics 119 (2), 403–456. Houthakker, H., 1953. La forme des courbes. Cahiers du Seminaire d’Econometrie 2, 59–66. Howard, J.A., Sheth, J., 1969. The Theory of Buyer Behavior. Wiley, New York. Hoyer, W.D., Brown, S.P., 1990. Effects of brand awareness on choice for a common, repeat-purchase product. Journal of Consumer Research, 141–148. Husband, R.W., Godfrey, J., 1934. An experimental study of cigarette identification. Journal of Applied Psychology 18, 220–223. Ingram, P., Baum, J.A.C., 1997. Chain affiliation and the failure of Manhattan hotels, 1898-1980. Administrative Science Quarterly 42 (1), 68–102. Islam, S., 2018. What the rise of direct-to-consumer brands means for content marketing. NewsCred: Insights. Accessed on 5-10-2018 at https://insights.newscred.com/direct-to-consumer-brands-contentmarketing/amp/. Jeong, E.-Y., 2016. Samsung to recall Galaxy Note 7 smartphone over reports of fires. The Wall Street Journal, 2 September 2016. Jeuland, A., 1979. Brand choice inertia as one aspect of the notion of brand loyalty. Management Science 25, 671–682. John, D.R., Cole, C.A., 1986. Age differences in information processing: understanding deficits in young and elderly consumers. Journal of Consumer Research 13 (3), 297–315. Jones, J., Landwehr, T., 1988. Removing heterogeneity bias from logit model estimation. Marketing Science 7, 41–59.

353

354

CHAPTER 6 The economics of brands and branding

Joo, J., 2018. Quantity Surcharged Larger Package Sales as Rationally Inattentive Consumers’ Choice. University of Texas at Dallas Working Paper. Judd, K.L., 1985. Credible spatial preemption. The Rand Journal of Economics 16 (2), 153–166. Kalyanaram, G., Robinson, W.T., Urban, G., 1995. Order of market entry: established empirical generalizations, emerging generalizations, and future research. Marketing Science 14, 212–221. Kamakura, W.A., Russell, G.J., 1989. A probabilistic choice model for market segmentation and elasticity structure. Journal of Marketing Research 26, 379–390. Kamakura, W.A., Russell, G.J., 1993. Measuring brand value with scanner data. International Journal of Research in Marketing 10, 9–22. Kamenica, E., Naclerio, R., Malani, A., 2013. Advertisements impact the physiological efficacy of a branded drug. Proceedings of the National Academy of Sciences 110 (32), 12931–12935. Keane, M.P., 1997. Modeling heterogeneity and state dependence in consumer choice behavior. Journal of Business and Economic Statistics 15 (3), 310–327. Keller, K.L., 1993. Conceptualizing, measuring, and managing customer-based brand equity. Journal of Marketing 57 (1), 1–22. Keller, K.L., 2012. Strategic Brand Management: Building, Measuring, and Managing Brand Equity, 4th ed. Pearson. Keller, K.L., Lehmann, D.R., 2006. Brands and branding: research findings and future priorities. Marketing Science 25 (6), 740–759. Kerin, R.A., Varadarajan, P.R., Peterson, R.A., 1992. First-mover advantage: a synthesis, conceptual framework, and research propositions. Journal of Marketing 56 (4), 33–52. Kerschner, E.M., Geraghty, M., 2006. Brand Behemoths. Discussion Paper. Citigroup. Kihlstrom, R.E., Riordan, M.H., 1984. Advertising as a signal. Journal of Political Economy 92 (3), 427–450. Kirmani, A., Wright, P., 1989. Money talks: perceived advertising expense and expected product quality. Journal of Consumer Research 16 (3), 344–353. Klein, B., Leffler, K.B., 1981. The role of market forces in assuring contractual performance. Journal of Political Economy 89 (4), 615–641. Klemperer, P., 1987. The competitiveness of markets with switching costs. The Rand Journal of Economics 18 (1), 138–150. Klemperer, P., 1995. Competition when consumers have switching costs: an overview with applications to industrial organization, macroeconomics, and international trade. The Review of Economic Studies 62, 515–539. Klemperer, P., 2006. Network effects and switching costs: two short essays for the New Palgrave. Available at SSRN: https://ssrn.com/abstract=907502 or http://dx.doi.org/10.2139/ssrn.907502. Lambert-Pandraud, R., Laurent, G., Lapersonne, E., 2005. Repeat purchasing of new automobiles by older consumers: empirical evidence and interpretations. Journal of Marketing 69 (2), 97–113. Lambkin, M., 1988. Order of entry and performance in new markets. Strategic Management Journal 9, 127–140. Lancaster, K., 1971. Consumer Demand: A New Approach. Columbia University Press, New York. Lane, W., 1980. Product differentiation in a market with endogenous sequential entry. Bell Journal of Economics 11 (1), 237–260. Laurent, G., Kapferer, J.-N., Roussel, F., 1995. The underlying structure of brand awareness scores. Marketing Science 14 (3), G170–G179. Leone, R.P., 1995. Generalizing what is known about temporal aggregation and advertising carryover. Marketing Science 14 (3), G141–G150. Li, L., Tadelis, S., Zhou, X., 2016. Buying Reputation as a Signal of Quality: Evidence from an Online Marketplace. NBER Working Paper 22584. Liu, H., Chintagunta, P., Zhu, T., 2010. Complementarities and the demand for home broadband Internet services. Marketing Science 29, 701–720. Louviere, J.J., Johnson, R., 1988. Measuring brand image with conjoint analysis and choice models. In: Defining, Measuring and Managing Brand Equity: A Conference Summary. Marketing Science Report 88-104.

References

Luca, M., Zervas, G., 2016. Fake it till you make it: reputation, competition, and yelp review fraud. Management Science 62 (12), 3412–3427. Lynch, J.G., Srull, T.K., 1982. Memory and attentional factors in consumer choice: concepts and research methods. Journal of Consumer Research 9 (1), 18–37. Massy, W.F., 1966. Order and homogeneity of family specific brand-switching processes. Journal of Marketing Research 3 (1), 48–54. Matejka, F., McKay, A., 2015. Rational inattention to discrete choices: a new foundation for the multinomial logit model. The American Economic Review 105 (1), 272–298. McDevitt, R.C., 2011. Names and reputations: an empirical analysis. American Economic Journal: Microeconomics 3, 193–209. McDevitt, R.C., 2014. “A” business by any other name: firm name choice as a signal of firm quality. Journal of Political Economy 122 (4), 909–944. Mehta, N., Rajiv, S., Srinivasan, K., 2003. Price uncertainty and consumer search: a structural model of consideration set formation. Marketing Science 22, 58–84. Meyer, R.J., Sathi, A., 1985. A multiattribute model of consumer choice during product learning. Marketing Science 4 (1), 41–61. Miklós-Thal, J., 2012. Linking reputations through umbrella branding. Quantitative Marketing and Economics 10 (3), 335–374. Milgrom, P., Roberts, J., 1986. Price and advertising signals of product quality. Journal of Political Economy 94 (4), 796–821. Miller, G.A., 1956. The magical number seven, plus or minus two some limits on our capacity for processing information. Psychological Review 101 (2), 343–352. Mittelstaedt, R., 1969. A dissonance approach to repeat purchasing behavior. Journal of Marketing Research 6 (4), 444–446. Moore, E.S., Wilkie, W.L., Lutz, R.J., 2002. Passing the torch: intergenerational influences as a source of brand equity. Journal of Marketing 66 (2), 17–37. Moorthy, S., 1988. Product and price competition in a duopoly. Marketing Science 7 (2), 141–168. Moorthy, S., 2012. Can brand extension signal product quality? Marketing Science 31 (5), 756–770. Moorthy, S., Hawkins, S.A., 2005. Advertising repetition and quality perception. Journal of Business Research 58 (3), 354–360. Moorthy, S., Ratchford, B.T., Talukdar, D., 1997. Consumer information search revisited: theory and empirical analysis. Journal of Consumer Research 23 (4), 263–277. Moorthy, S., Srinivasan, K., 1995. Signaling quality with a money-back guarantee: the role of transaction costs. Marketing Science 14 (4), 442–466. Moschis, G.P., 1985. The role of family communication in consumer socialization of children and adolescents. Journal of Consumer Research 11 (4), 898–913. Moschis, G.P., Moore, R.L., 1979. Decision making among the young: a socialization perspective. Journal of Consumer Research 6 (2), 101–112. Murray, J.A.H. (Ed.), 1887. A New English Dictionary Founded on Historical Principles. Clarendon Press, Oxford. Muthukrishnan, A., 2015. Persistent Preferences in Market Place Choices: Brand Loyalty, Choice Inertia, and Something in Between. Now Publishers, Inc. Narayana, C.L., Markin, R.J., 1975. Consumer behavior and product performance: an alternative conceptualization. Journal of Marketing 39, 1–6. Nedungadi, P., 1990. Recall and consumer consideration sets: influencing choice without altering brand evaluations. Journal of Consumer Research 17 (3), 263–276. Nedungadi, P., Hutchison, W., 1985. The prototypicality of brands: relationships with brand awareness, preferences and usage. Advances in Consumers Research 12, 498–503. Nelson, P., 1970. Information and consumer behavior. Journal of Political Economy 78 (2), 311–329. Nelson, P., 1974. Advertising as information. Journal of Political Economy 82 (4), 729–754. Nerlove, M., Arrow, K.J., 1962. Optimal advertising policy under dynamic conditions. Economica 29 (114), 129–142.

355

356

CHAPTER 6 The economics of brands and branding

Newman, J.W., Staelin, R., 1972. Prepurchase information seeking for new cars and major household appliances. Journal of Marketing Research, 249–257. Osborne, M., 2008. Consumer Learning, Switching Costs, and Heterogeneity: A Structural Examination. Economic Analysis Group Discussion Paper. Osborne, M., 2011. Consumer learning, switching costs, and heterogeneity: a structural examination. Quantitative Marketing and Economics 9, 25–70. Padilla, A.J., 1995. Revisiting dynamic duopoly with consumer switching costs. Journal of Economic Theory 67, 520–530. Pakes, A., McGuire, P., 1994. Computing Markov-perfect Nash equilibria: numerical implications of a dynamic differentiated product model. The Rand Journal of Economics 25 (4), 555–589. Park, C.S., Srinivasan, V., 1994. A survey-based method for measuring and understanding brand equity and its extendibility. Journal of Marketing Research 31 (2), 271–288. Parry, M., Bass, F.M., 1990. When to lead or follow? It depends. Marketing Letters 3 (1), 187–198. Payne, J.W., 1976. Task complexity and contingent processing in decision making: an information search and protocol analysis. Organizational Behavior and Human Decision Processes 16, 366–387. Payne, J.W., 1982. Contingent decision behavior. Psychological Bulletin 92, 382–402. Peter, J., Olson, J., 1996. Consumer Behavior and Marketing Strategy, 4th ed. Irwin. Png, I., Reitman, D., 1995. Why are some products branded and others not? The Journal of Law and Economics 38, 207–224. Pollak, R.A., 1970. Habit formation and dynamic demand functions. Journal of Political Economy 78 (4), 745–763. Punj, G., Staelin, R., 1983. A model of consumer information search behavior for new automobiles. Journal of Consumer Research 9 (4), 366–380. Rasmusen, E.B., 2016. Leveraging of reputation through umbrella branding: the implications for market structure. Journal of Economics and Management Strategy 25 (2), 261–273. Ratchford, B.T., 1975. The new economic theory of consumer behavior: an interpretive essay. Journal of Consumer Research 2 (2), 65–75. Ratchford, B.T., 1980. The value of information for selected appliances. Journal of Consumer Research 17, 14–25. Ratchford, B.T., Talukdar, D., Lee, M.-S., 2007. The impact of the Internet on consumers’ use of information sources for automobiles: a re-inquiry. Journal of Consumer Research 34 (1), 111–119. Reisman, D., Roseborough, H., 1955. Careers and consumer behavior. In: Consumer Behavior, vol. II, The Life Cycle and Consumer Behavior. New York University Press. Roberts, J., 1989. A grounded model of consideration set size and composition. Journal of Consumer Research 12, 492–497. Roberts, J.H., Lattin, J.M., 1991. Development and testing of a model of consideration set composition. Journal of Marketing Research 28 (4), 429–440. Roberts, J.H., Urban, G.L., 1988. Modeling multiattribute utility, risk, and belief dynamics for new consumer durable brand choice. Management Science 34 (2), 167–185. Robinson, W.T., 1988. Sources of market pioneer advantages: the case of industrial goods industries. Journal of Marketing Research 25 (1), 87–94. Robinson, W.T., Fornell, C., 1985. Sources of market pioneer advantages in consumer goods industries. Journal of Marketing Research 22 (3), 305–317. Rosen, S., 1974. Hedonic prices and implicit markets: product differentiation in pure competition. Journal of Political Economy 82 (1), 34–55. Rotfeld, H.J., Rotzoll, K.B., 1976. Advertising and product quality: are heavily advertised products better? The Journal of Consumer Affairs 10 (1), 33–47. Roy, R., Chintagunta, P.K., Haldar, S., 1996. A framework for investigating habits, “the hand of the past,” and heterogeneity in dynamic brand choice. Marketing Science 15 (3), 280–299. Rubin, J.A., 1998. Studies in Consumer Demand. Springer. Sahni, N.S., 2012. Effect of temporal spacing between advertising exposures: evidence from online field experiments. Manuscript.

References

Schmalensee, R., 1978. Entry deterrence in the ready-to-eat breakfast cereal industry. Bell Journal of Economics 9 (2), 305–327. Schmalensee, R., 1982. Product differentiation advantages of pioneering brands. The American Economic Review 72, 349–365. Schmalensee, R., 1983. Advertising and entry deterrence: an exploratory model. Journal of Political Economy 91 (4), 636–653. Schmitt, B., 2012. The consumer psychology of brands. Journal of Consumer Psychology 22, 7–17. Seetharaman, P., Ainslie, A., Chintagunta, P., 1999. Investigating household state dependence effects across categories. Marketing Science 36, 488–500. Shaked, A., Sutton, J., 1987. Product differentiation and industrial structure. Journal of Industrial Economics 36 (2), 131–146. Shapiro, B.T., 2017. Positive spillovers and free riding in advertising of prescription pharmaceuticals: the case of antidepressants. Journal of Political Economy 126 (1), 381–437. Shapiro, C., 1982. Consumer information, product quality, and seller reputation. The Bell Journal of Economics 13 (1), 20–35. Shapiro, C., 1983. Premiums for high quality products as returns to reputations. The Quarterly Journal of Economics 98 (4), 659–679. Shin, S., Misra, S., Horsky, D., 2012. Disentangling preferences and learning in brand choice models. Marketing Science 31 (1), 115–137. Shocker, A.D., Ben-Akiva, M., Boccara, B., Nedungadi, P., 1991. Consideration set influences on consumer decision-making and choice: issues, models, and suggestions. Marketing Letters 2 (3), 181–197. Shugan, S., 1980. The cost of thinking. Journal of Consumer Research 17, 99–111. Shum, M., 2004. Does advertising overcome brand loyalty? Evidence from the breakfast-cereals market. Journal of Economics and Management Strategy 13 (2), 241–272. Simon, C.J., Sullivan, M.W., 1993. The measurement and determinants of brand equity: a financial approach. Marketing Science 12 (1), 28–52. Song, I., Chintagunta, P., 2007. A discrete-continuous model for multicategory purchase behavior of households. Journal of Marketing Research 44 (November), 595–612. Sonnier, G., Ainslie, A., Otter, T., 2007. Heterogeneity distributions of willingness-to-pay in choice models. Quantitative Marketing and Economics 5 (3), 313–331. Srinivasan, V., 1979. Network models for estimating brand-specific effects in multi-attribute marketing models. Management Science 25 (1), 11–21. Srinivasan, V., Park, C.S., Chang, D.R., 2005. An approach to the measurement, analysis, and prediction of brand equity and its sources. Management Science 51 (9), 1433–1448. Steenkamp, J.-B.E., van Heerde, H.J., Geyskens, I., 2010. What makes consumers willing to pay a price premium for national brands over private labels? Journal of Marketing Research 47 (6), 1011–1024. Stephens-Davidowitz, S., 2017. Everybody Lies: Big Data, New Data and What the Internet Can Tell Us About Who We Really Are. Harper Collins Publishers, New York, NY. Stigler, G.J., 1961. The economics of information. Journal of Political Economy 69 (3), 213–225. Sudhir, K., Tewari, I., 2015. Long Term Effects of ‘Prosperity in Youth’ on Consumption: Evidence from China. Cowles Foundation Discussion Paper No. 2025. Sullivan, M., 1990. Measuring image spillovers in umbrella-branded products. Journal of Business, 309–329. Sullivan, M.W., 1992. Brand extensions: when to use them. Management Science 38 (6), 793–806. Sutton, J., 1991. Sunk Costs and Market Structure: Price Competition, Advertising, and the Evolution of Concentration. MIT Press, Cambridge. Swait, J., Erdem, T., Louviere, J.J., Dubelaar, C., 1993. The equalization price: a measure of consumerperceived brand equity. International Journal of Research in Marketing 10, 23–45. Tadelis, S., 1999. What’s in a name? Reputation as a tradeable asset. The American Economic Review 89 (3), 548–563. Tedlow, R.S., 1990. New and Improved: The Story of Mass Marketing in America. Basic Books, Inc., New York, NY.

357

358

CHAPTER 6 The economics of brands and branding

The Nielsen Company, 2014. The State of Private Label Around the World. Discussion Paper. Accessed online 8/10/2016 from www.nielsen.com/us/en/insights/reports/2014/the-state-of-private-label-aroundthe-world.html. Thomas, L.A., 1995. Brand capital and incumbent firms’ positions in evolving markets. Review of Economics and Statistics 77 (3), 522–534. Thomas, L.A., 1996. Advertising sunk costs and credible spatial preemption. Strategic Management Journal 17 (6), 481–498. Thumin, F.J., 1962. Identification of cola beverages. Journal of Applied Psychology 46 (5), 358–360. Trajtenberg, M., 1989. The welfare analysis of product innovations, with an application to computed tomography scanners. Journal of Political Economy 97 (2), 444–479. Tuchman, A.E., Nair, H.S., Gardete, P.M., 2015. Complementarities in Consumption and the Consumer Demand for Advertising. Working Paper. Urban, G.L., 1975. Perceptor: a model for product positioning. Management Science 21, 858–871. Urban, G.L., Carter, T., Gaskin, S., Mucha, Z., 1986. Market share rewards to pioneering brands: an empirical analysis and strategic implications. Management Science 32 (6), 645–659. Urban, G.L., Hauser, J.R., 1993. Design and Marketing of New Products, 2nd ed. Prentice Hall, Englewood Cliffs, NJ. Van Osselaer, S.M.J., 2008. Associative learning and consumer decisions. In: Handbook of Consumer Psychology. Erlbaum, pp. 699–729. Vitorino, M.A., 2014. Understanding the effect of advertising on stock returns and firm value: theory and evidence from a structural model. Management Science 60 (1), 227–245. Waldfogel, J., Chen, L., 2006. Does information undermine brand? Information intermediary use and preference for branded web retailers. Journal of Industrial Economics 54 (4), 425–449. Ward, S., 1974. Consumer socialization. Journal of Consumer Research 1 (2), 1–14. Weitzman, M.L., 1979. Optimal search for the best alternative. Econometrica 47 (3), 641–654. Wernerfelt, B., 1988. Umbrella branding as a signal of new product quality: an example of signalling by posting a bond. The Rand Journal of Economics 19 (3), 458–466. Wiggins, S.N., Raboy, D.G., 1996. Price premia to name brands: an empirical analysis. Journal of Industrial Economics 44 (4), 377–388. Wilkie, W.L., Pessemier, E.A., 1974. Issues in marketing’s use of multi-attribute attitude models. Journal of Marketing Research 10 (4), 428–441. Woodyard, C., 2012. Toyota to pay $1.1B in ‘unintended acceleration’ cases. https://www.usatoday.com/ story/money/cars/2012/12/26/toyota-unintended-acceleration-runaway-cars/1792477/. Wright, P., Barbour, F., 1977. Phased decision strategies: sequels to initial screening. In: Multiple Criteria Decision Making. North Holland, Amsterdam, pp. 91–109. Zhang, J., 2010. The sound of silence: observational learning in the U.S. kidney market. Marketing Science 29 (2), 315–335.

CHAPTER

7

Diffusion and pricing over the product life cycle✩

Harikesh S. Nair Stanford Graduate School of Business, Stanford, CA, United States e-mail address: [email protected]

Contents 1 Introduction ...................................................................................... 2 The first wave: Models of new product diffusion as way to capture the PLC .......... 2.1 Models of “external” influence .................................................. 2.2 Models of “internal” influence................................................... 2.3 Bass’s model ....................................................................... 2.4 What was missing in the first wave? ........................................... 3 The second wave: Life cycle pricing with diffusion models.............................. 3.1 Price paths under separable diffusion specifications ........................ 3.2 Price paths under market potential specifications ............................ 3.3 Extensions to individual-level models .......................................... 3.4 Discussion .......................................................................... 3.5 What was missing in the second wave? ........................................ 4 The third wave: Life cycle pricing from micro-foundations of dynamic demand...... 4.1 Dynamic life-cycle pricing problem overview .................................. 4.2 Monopoly problem ................................................................. 4.3 Oligopoly problem ................................................................. 4.4 Discussion .......................................................................... 4.5 Additional considerations related to durability ................................ 4.5.1 Commitments via binding contracts ........................................ 4.5.2 Availability and deadlines ..................................................... 4.5.3 Second-hand markets ......................................................... 4.5.4 Renting and leasing ............................................................ 4.5.5 Complementary goods and network effects ............................... 4.6 Summary ............................................................................ 5 Goods with repeat purchase ................................................................... 5.1 Theoretical motivations ........................................................... 5.2 Empirical dynamic pricing ....................................................... 5.2.1 State dependent utility ......................................................... 5.2.2 Storable goods...................................................................

360 366 367 368 369 369 371 372 372 373 374 376 379 380 381 386 392 398 399 401 405 408 409 413 413 413 415 415 417

✩ Many thanks to the editors for putting together this volume, to three anonymous referees, and to Jean-

Pierre Dubé and Caio Waisman in particular for helpful comments and suggestions. The views discussed here represent that of the author and not of Stanford University. Handbook of the Economics of Marketing, Volume 1, ISSN 2452-2619, https://doi.org/10.1016/bs.hem.2019.05.001 Copyright © 2019 Elsevier B.V. All rights reserved.

359

360

CHAPTER 7 Diffusion and pricing over the product life cycle

5.2.3 Consumer learning ............................................................. 5.3 Summary ............................................................................ 6 Open areas where more work will be welcome ............................................ 6.1 Life-cycle pricing while learning an unknown demand curve ............... 6.2 Joint price and advertising over the life-cycle ................................. 6.3 Product introduction and exit .................................................... 6.4 Long term impact of marketing strategies on behavior ...................... 6.5 Linking to micro-foundations .................................................... References............................................................................................

418 420 420 420 422 425 426 426 427

1 Introduction The “Product Life Cycle” or PLC is a prominent marketing concept in the introduction and management of new products (Cao and Folan, 2012). A typical pattern that is observed for sales over time is of slow initial sales during launch, followed by rapid growth, followed by plateauing at maturity, and then decline. Accordingly, the PLC is often thought of as consisting of launch, growth, maturity, and decline stages.1 If products go through these distinct stages, it follows that marketing strategy ought to be tailored to the needs of these stages. At launch, awareness needs to be built among those most ready to buy (so called “innovators”), and trial has to be facilitated. This often implies low prices, high advertising, and selective distribution. In the growth phase, later buyers (so called “imitators” or laggards) start adopting, spurred on by positive word-of-mouth from innovators and by increased information available in the marketplace. Consequently, distribution needs to be increased to support the growth, and advertising be moved from purely building awareness to developing branding and conversion. In the mature phase, competitors are entrenched, price competition is intense, and periodic sales and promotions are typical. Productline extensions may be launched or the product repositioned to extend it to adjacent use-cases, or to capture additional customer segments. In decline, decisions need to be made as to what products to retire from the market and what to retain. For those that are retained, advertising may be stopped and price promotions may be limited to only the most loyal users. Historically, there was no unified way of developing good marketing policies over the life-cycle, or of formally assessing how the life-cycle itself depended on the nature of marketing. With frustration, Levitt (1965) colorfully remarked that,

1 The notion that products have a life-cycle is preceded by a long history in the sciences. Theories of life-cycles postulating that growth consisted of periods of fast, slow, and then declining phases were developed in the previous centuries for biological populations (Malthus, 1798; Darwin, 1859), then later for ideas (LeBon, 1898), culture (Tarde, 1903), and firms (Chapman and Ashton, 1914), leading up to products (Clark, 1934; Boulding, 1950). The term “product life cycle” seems to have been first used in the managerially oriented article by Dean (1950), who described pricing strategies over the life-cycle, and the four stages of the product life-cycle seem to be have been described as such first by Forrester (1959). See Osland (1991) for a detailed history.

1 Introduction

“It has remained—as have so many fascinating theories in economics, physics, and sex—a remarkably durable but almost totally unemployed and seemingly unemployable piece of professional baggage whose presence in the rhetoric of professional discussions adds a much coveted but apparently unattainable legitimacy to the idea that marketing management is somehow a profession.”2

Levitt’s frustration was the theory was mostly verbal and vaguely stated, with no explicit role for marketing. Consequently, it was not clear what exactly firms could do with the levers under their control – prices, advertising, distribution, quality – in order to improve their prospects over the life-cycle. The status of the literature has improved considerably since then, with the literature on pricing and advertising in particular becoming rich and deep. This chapter attempts to provide a review of the marketing literature on pricing over the life-cycle. For ease of exposition, I divide the research into what I call the first, second, and third waves as discussed below.

Three waves A first wave of papers (mostly in the 1950-60s) addressed the question of whether products indeed have a systematic PLC.3 There was interest in the question in both the marketing (e.g., Cox, 1967) and economics (e.g., Brockhoff, 1967) communities. With the limited data available, a guarded consensus emerged that there seems to be such a pattern in the sales growth of many products (e.g., Rink and Swan, 1979). For instance, work by Polli and Cook (1969) tested the existence of four distinct stages of the PLC using data in 140 categories of goods, and concluded that “while the overall performance of the model leaves some question as to its general applicability, it is clearly a good model of sales behavior in certain market situations – especially so in the case of different product forms competing for essentially the same market segment within a general class of products.” Individual-level data on the adoption of new products that became available from test-markets in the 1950s (discussed in Fourt and Woodlock, 1960; reviewed later in the chapter), showed that product penetration curves were S-shaped, lending credence to the idea that aggregate life-cycle adoption curves too followed similar patterns. Motivated by this, researchers developed mathematical models of sales-paths that matched the shape of the PLC. The goal was better forecasting – given that there is a systematic pattern to sales, how can we predict it in advance of new product launch so as to make pre-launch planning better? The models of Bass (1969), and the new product diffusion literature that followed were developed primarily to solve this forecasting problem, and were meant to be used as decision support systems to aid managers. An important question for the literature was also what underlying phenomena could generate such S-shaped growth? The consensus explanation for the PLC 2 Levitt goes on to explain ways of using the concept effectively and of turning the knowledge of its existence into “a managerial instrument of competitive power.” 3 As Levitt wryly observed, “The fact is, most new products doesn’t have any sort of classical life-cycle curve at all. They have instead from the very outset an infinitely descending curve. The product not only doesn’t get off the ground; it goes quickly under ground—six feet under.”

361

362

CHAPTER 7 Diffusion and pricing over the product life cycle

was it arose from a combination of market saturation and demand-side economies of scale such as word-of-mouth or other forms of contagion. As the product got adopted, market saturation implied that the number of potential adopters dwindled. But, the existence of demand-side economies implied that their probability of adoption increased. The combination generated S-shaped growth. The models in this first wave provided no role for marketing variables such as prices. They were specified at the industry or category-level, not at the product or brand-level, making it difficult to include marketing variables in a logical way and to accommodate a role for product competition. They also assumed a homogeneous population, which precluded understanding heterogeneity in marketing response across users and its impact on the product life-cycle. A second wave of papers (starting roughly in the late 1970s, early 1980s) augmented diffusion models to include a role for prices. The approach was to take the diffusion process as exogenously given, and to find a way to include prices and other marketing variables into this exogenously postulated specification. A focus of the literature was to understand the conditions under which life-cycle pricing would exhibit skimming, a pattern of declining prices over time, or penetration, a pattern of increasing prices over time. The literature suggested that when demand-side economies of scale such as word-of-mouth were strong, life-cycle prices exhibited penetration; when they were weak, the existence of consumer heterogeneity and saturation generated skimming. And, depending on how the strength of these varied over the life-cycle and its interaction with possible experience curve effects on the cost-side, the literature showed one could also obtain other patterns of optimal pricing over the PLC. While the second wave of literature made substantial progress on the life-cycle pricing problem, one shortcoming was the rudimentary treatment of competition; there was limited formal treatment of strategic interaction and its implication for life-cycle strategies. A second issue was the absence of a role for consumer expectations. While firms were treated as forward-looking, consumers were typically treated as myopic, responding only to current prices while remaining uninfluenced by the trajectory of future prices. This reduced the empirical realism of the models, and precluded a role for strategies to address the time consistency problems that arose when forward-looking consumers could contemplate the future incentives of the seller, and respond to them by changing their current actions (e.g., Coase, 1972). A third issue was that most of the papers solved for open-loop policies, which outline a set of time-optimal actions the firm commits to in advance. Such open-loop policies are time inconsistent, and require commitment power. In the absence of commitment, open-loop policies are unrealistic in practical settings where managers are routinely called upon to revise strategies in response to changes in the market state. The third wave of literature (starting roughly in the late 1990s, early 2000s) attempted to address these issues more directly. These interpreted diffusion through the lens of individual-level models of adoption while incorporating a formal role for consumer expectations and associated dynamics. While the previous waves of papers were primarily theoretical in orientation, these recent papers were also empirical in their orientation. The broad goal was to estimate the parameters of the models from

1 Introduction

real data and to compute optimal policies for normative use via empirically realistic models. The new literature emphasized the role of prices and under-emphasized the role of contagion, with the latter aspect driven to a large extent by the challenges of identifying these with the variation typically available in observational adoption data. The new literature addressed the three missing pieces of the second wave. Competition across firms was accommodated by embedding life-cycle pricing policies within the framework of a dynamic game across firms. Consumer expectations about future prices were incorporated into intertemporal adoption choices made by consumers, which determined demand. The papers also reversed the open-loop emphasis of the previous wave, leveraging numerical dynamic programming based solutions, which deliver a closed-loop, state-contingent policy by construction. As the third wave of papers formally incorporated consumer expectations and associated dynamics into pricing, the issue of time consistency and the Coasean problem induced by time consistency in pricing took on an important focus. The issue of time consistency arises because the firm is unable to convince rational consumers it will not cut prices, given it faces a logical incentive to do so. Making binding commitments to not cut prices; offering best-price provisions that compensate consumers for future price reductions; renting rather than selling; curtailing production so the product may be unavailable in the future and the market cannot be flooded; and building in planned obsolescence over the PLC are some strategies to address these. The theory and empirical work in the third wave developed these ideas deeply and showed how they interact with optimal pricing over the PLC. This chapter emphasizes primarily the third wave, which has an empirical focus; the first two waves are touched upon, but not deeply, partly because good reviews exist already (e.g., Lilien et al., 1992; Eliashberg and Lilien, 1993).

Implication for the PLC: A new perspective Following early prescriptions (Dean, 1950), the first and the second waves primarily thought of the product life-cycle as something that was exogenously given against which life-cycle pricing and marketing had to be optimized. The main contribution of the third wave was to recognize that the product life-cycle was endogenous to the market, arising out of the interaction of preferences, expectations, costs, and competition. Rather than start with the diffusion process as the primitive, the third wave started with deeper constructs such as consumer time and product preferences, and firm’s cost and technological structures as the primitive. The diffusion process became the outcome of the interaction of forward-looking consumers and firms who optimized against these primitives in a given market. This is a fundamental change in perspective. Early management critics had echoed some of these aspects when critiquing the rigid exogeneity of the PLC concept maintained by the community. Quoting Dhalla and Yuspeh (1976), “Clearly, the PLC is a dependent variable which is determined by marketing actions; it is not an independent variable to which companies should adapt their marketing programs. Marketing management itself can alter the shape and duration of a brand’s life cycle.”

363

364

CHAPTER 7 Diffusion and pricing over the product life cycle

Similar criticisms are echoed in Enis et al. (1977); Day (1981). Maintaining the view point has consequences for management. Again, quoting Dhalla and Yuspeh (1976), “Many strongly believe that brands follow a life cycle and are subject to inevitable death after a few years of promotion. . . []. . . Unfortunately, in numerous cases a brand is discontinued, not because of irreversible changes in consumer values or tastes, but because management, on the basis of the PLC theory, believes the brand has entered a dying stage. In effect, a self-fulfilling prophecy results. . . []. . . A major disservice of the PLC concept to marketing is that it has led top executives to overemphasize new product introduction. This route is perilous. Experience shows that nothing seems to take more time, cost more money, involve more pitfalls, or cause more anguish than new product programs. . . []. . . It is foolish for a corporation to invest millions of dollars to build goodwill for a brand, then walk away from it and spend additional millions all over again for a new brand with no consumer franchise. . . [instead management should focus on]. . . building up a large national franchise for a few key brands through heavy and intelligent marketing support.”

The third wave develops a more nuanced perspective on the life-cycle and endogenizes it within a framework derived from micro-foundations of behavior. By forging a clear connection between the belief and preference structures of consumers, the nature of the institutional environment, and firms’ choices of various strategies, this approach helps understand why one marketing policy works and why one does not in one situation versus another. This leads to a clearer formulation of how marketing fits into the PLC and shapes it, and how actions should change over the life-time of the product. Finally, the empirical emphasis of the third wave helps customize these to a particular environment and make it practically relevant. My review of the third wave of paper tries to outline some of the main economic forces and approaches underlying this new perspective.

An agenda for further research While this progress has been significant, decades later, the empirical understanding of some fundamental questions regarding the source of the PLC – how the interaction of demand-side economies, cost-side economies, and forward-looking behavior affects the nature of diffusion, and the consequent path of prices – remains a workin-progress. The main issue is of measurement (discussed in more detail later in the chapter). On the issue of demand-side economies, we are still not sure whether the magnitude of word-of-mouth and other forms of contagion are quantitatively large enough to sustain penetration pricing, or any form of price increases over the PLC as postulated by much of the theory. Based on the work of Manski (1993) and the large literature on the identification of social interactions (e.g., Blume et al., 2011), the identification of such effects in a real market is not trivial, and the interpretation of effects reported in the earlier literature is clouded by the possibility of confounds.

1 Introduction

On the issue of cost-side economies, marginal costs are typically not observed by researchers, and the question of how to recover marginal costs by inverting markups when prices are set dynamically and equilibria are possibly not unique, is an open econometric question. The issue of forward-looking behavior is perhaps the most challenging one for measurement. The third wave has shown the expectations of market participants play a crucial role in determining the evolution of the PLC and the nature of eventual outcomes, but we are still far way from credibly being able to measure them. The profitability of intertemporal price discrimination with forward looking consumers depends on the pattern of dependence between time and product preferences of consumers in the market, and the extent to which firms are more patient than consumers. But, a spate of negative identification results have shown that observational choice data typically does not contain the variation required to estimate consumer discount factors (e.g., Rust, 1994). A voluminous body of work in the behavioral sciences (see Frederick et al., 2002) has also shown that the discounted utility model, which implies that all motives underlying intertemporal choice can be condensed into a single parameter (i.e., the discount factor), along with exponential discounting, may not be a good description of how humans trade off choices intertemporally. Individuals may discount future rewards “hyperbolically” (i.e., have a declining rate of time preference); may discount gains at a higher rate than losses; discount small outcomes more than large ones; and have a preference for improving sequences to declining sequences. Discount factors may be context-dependent, vary across products, and be different for short-run versus long-run decisions. And finally, on the firm-side, few empirical strategies are available currently to credibly assess the discount factors of firms or of pricing decision makers within firms. While these issues have been addressed in parts separately with varying degree of success, progress could be characterized as piece-meal, focused on one or more pieces of the problem. Hence, in any one context, we still do not have a good empirical understanding of the relative quantitative significance of all of these various drivers of life-cycle pricing. We also do not have a good empirical understanding of their relative quantitative significance or insignificance in one context versus other. An agenda for future research that can develop a coherent framework for pricing over the life-cycle would combine four pieces: (a) credible measurement, with (b) a theory of how all these various pieces interact and how they affect life-cycle diffusion and pricing, with (c) a well-posed role for dynamics, and (d) empirical evidence of how these vary as a function of various aspects of the market environment. It is hoped this chapter will be a step in spurring this research agenda forward.

Organization A few comments on what this review does and does not do. First, while voluminous work in adjacent fields like Economics, Computer Science, Operations, Psychology, and Sociology have influenced and has been influenced by the marketing literature on this topic, I review primarily papers in the marketing literature, and closely related papers in economics where appropriate. Second, almost all the papers I focus on in

365

366

CHAPTER 7 Diffusion and pricing over the product life cycle

this review pertain to the dynamics of pricing over the life-cycle. So, papers that deal with a purely cross-sectional allocation of prices or marketing instruments without being concerned with their allocation over time, are not discussed in depth. Third, the demand-side of the market will be a core component of the exposition, but for reasons of brevity I do not review “pure” demand-side papers in much detail. Instead the focus is more on papers that use the demand-side as a springboard to address the strategic question of how to set prices over the life-cycle. While theory is discussed, for reasons of brevity, emphasis is placed on empirical work that uses the theory to build micro-foundations for how various aspects of the marketplace and the behavior of agents and their interactions are captured. Finally, the chapter has a bias towards more recent work; historical work is reviewed, but primarily as a way to provide context. In terms of organization, much of the initial parts of the chapter deals with durable goods. Early diffusion models focused on forecasting the sales of durables, subject to one-time purchases. Perhaps due to this focus, the literature on life-cycle strategies of durable goods has developed in a deeper way in the field compared to other types of goods. Goods subject to repeat purchase are discussed in Section 5, albeit briefly, given the existence of several comprehensive reviews on the topic. Some of the topics I cover include life-cycle pricing with durability; replacement; second-hand goods and resale; renting; perishability and deadlines; complementary goods, network effects. For goods with repeat purchase, I discuss pricing with state-dependence in utility; storability; and under consumer learning. I discuss interactions for pricing with advertising, product choice, introduction, entry and exit, and issues of distribution over the life-cycle peripherally towards the end. These literatures are not discussed in much detail compared to dynamic pricing. Linking to micro-foundations and accommodating forward-looking consumers and firms is emphasized. I do not discuss computational methods or econometrics.

2 The first wave: Models of new product diffusion as way to capture the PLC The genesis of the interest in mathematical and data-driven modeling of the product life-cycle started with models of new product introduction. These models were built primarily as forecasting tools to help firms forecast the trajectory of sales growth for new products, so as to coordinate their new product development and launch functions better. As consumer product markets grew in the United States post World War II, demand grew from the marketing community to better understand consumer preferences for branded goods and to help firms better manage the process of developing new products. Fourt and Woodlock (1960) details some of the early efforts and motivations in new product forecasting. Quoting them, “A survey of 200 large packaged goods manufacturers reveals that four out of five new products placed on the market after 1945 failed.1 A reliable method for early

2 The first wave: Models of new product diffusion

selection of the most promising fraction of innovations would eliminate much of the loss now incurred on failures.” 1 New Product Introduction, U.S. Small Business Administration, Management Series No. 17, Government Printing Office, Washington, DC, 1955, p. 63.

Mathematical models that attempted to forecast new product sales initially started by postulating separate models of “external” and “internal” influences on consumer adoption, which then were merged into one framework. The idea was to write down specifications so that the shape of the predicted sales curve over time conformed to the PLC.

2.1 Models of “external” influence These models postulated the consumer adoption was driven by a set of “environmental” or “external” factors specific to a product. The factors are referred to as external as they derive from sources outside the set of possible adopters. The “external” factors were not specified precisely in the early literature however, but were captured by a parameter, call it p, which measured the strength of the external influence. Products with higher p were adopted faster. The specification of the way in which external factors influenced adoption were driven by empirical analysis of penetration and repeat purchase data from “test markets” – essentially real-world laboratories for product-launches run by companies like AC Nielsen – in which consumer responses to experimental new products were tracked and assessed. Analyzing numerous annual cumulative penetration curves across many new product launches, Fourt and Woodlock (1960) report the main stylized fact that was observed in such data: “(1) successive increments in these curves decline, and [. . . ] (2) the cumulative curve seems to approach a limiting penetration less than 100 per cent of households frequently far less.” Others reported similar findings for both durables (for which only penetration applied), and for goods subject to significant repeat-purchase. The literature described a simple model that can capture this (e.g., Dodd, 1955; Fourt and Woodlock, 1960; Kelly, 1967). Suppose there is a single product which has a potential market size of M. Let f (t) be the probability of adoption at time t, and t f (t) let F (t) = 0 f (τ ) dτ , so that the hazard σ (t) = 1−F (t) represents the probability of adoption in t given that no purchase has occurred. Cumulative sales as of t is S (t) = MF (t), so sales in period t, s (t) is, s (t) = Mf (t) = σ (t) [M − S (t)]

(1)

Suppose we set σ (t) = p, a constant. Mathematically, this implies a growth model with the property that increments in penetration (or sales) for equal time periods are proportional to the remaining distance to the maximum penetration (or potential market). In other words, it implies that sales in each period are proportional to the set of consumers who have not tried the product. The parameter p represents the impact of (a constant set of) external factors, capturing how they influence con-

367

368

CHAPTER 7 Diffusion and pricing over the product life cycle

sumers who have not tried the product yet. Reflecting this, these models are referred to in the diffusion literature as models of external influence.4

2.2 Models of “internal” influence Simultaneously, research in economics and sociology appeared which stressed the importance of inter-agent influence in the adoption of innovations. Sociologists Katz and Lazarsfeld (1955) presented a “two-stage” theory of information diffusion. They postulated that mass media first reaches “opinion leaders,” people who are active media users and who then diffuse the ideas to other consumers in the marketplace that are not as actively exposed to these media. Katz and Lazarsfeld’s (1955) conclusions were based on field data they collected from voters in the 1940 U.S. presidential election. In their survey research, they found that most voters got their information about candidates from other people who read about campaigns in newspapers (see Lazarsfeld et al., 1944). Therefore, by implication, interpersonal communication or “word-of-mouth” was the dominant way by which the majority got their information. In parallel, Rogers (1962) postulated that the diffusion of innovation happens in stages. Innovations are adopted first by a group of “innovators”, then by “early adopters”, “early and late majority”, and “laggards”, with later adopters imitating the behavior of early adopters.5 In economics, Mansfield (1961) emphasized the role of imitation by firms in the adoption of new technological innovations, noting that, “In their discussions of technological change, economists have often cited the need for more research regarding the rate of imitation. Because an innovation will not have its full economic impact until the imitation process is well under way, the rate of imitation is of considerable importance. It is unfortunate that so little attention has been devoted to it and that consequently we know so little about the mechanism governing how rapidly firms come to use a new technique.”

The emphasis on imitation behavior formed the basis of “internal” models of diffusion, in which new adoption was driven by factors internal to the adopting population (e.g., Fisher and Pry, 1971). To obtain a simple formulation for internal influence, q this literature set σ (t) = qF (t) = M S (t), so that the probability of adoption in each period is a function of the number of past adopters. The parameter q represents the impact of internal influence, capturing how past adoption by others influences consumers who have not tried the product yet. This can derive from a host of phenomena including word-of-mouth, reduction in uncertainty, and improved reputation of the

4 Related models were also used in the life sciences. For example, von Bertalanffy (1957) showed how similar mathematical growth functions can be used to capture the observed S-shaped pattern in the growth in body weights of animals over their life-cycle. 5 Rogers’ theory was intuitive but verbal, and the empirics were mostly heuristic: for instance, he suggested that innovators comprise the first 2.5% of the adopting population. Interpersonal communication was also shown to be quantitatively important in diffusion in earlier work, e.g. by Bowers (1937) in his case-study on the diffusion of amateur radio in the US.

2 The first wave: Models of new product diffusion

product. The idea also has an analogy to epidemiological models: once an innovation is introduced, it spreads like an epidemic based on contagion.

2.3 Bass’s model Bass (1969) combined both external and internal influence and presented a model for the life-cycle sales curve for durable goods that was motivated by the problem of forecasting new product sales paths. Combining internal and external influence, Bass assumed, q (2) σ (t) = p + qF (t) = p + S (t) M Plugging in Eq. (2) into Eq. (1) he obtained that, q s (t) = p + S (t) [M − S (t)] M q = pM + (q − p) S (t) + [S (t)]2 M

(3) (4)

Bass offered a behavioral interpretation of his model. Some components of sales occur independently of past sales. This is the “pM” term. Some occur in a nonlinear way as a function of past cumulative sales, S(t), with the extent of dependence controlled by q, p. If q > 0, the property of the life-cycle “diffusion curve” in (3) is that over time, the second set of terms dominate, so the importance of sales occurring independent of the installed base S(t) reduces over time. Following Rogers’ (1962) terminology, Bass interpreted p as representing innovators, a subset of the population that bought independent of others’ influence. And he interpreted q as representing imitators subject to contagion effects which dovetailed well with models of interpersonal communication and word-of-mouth following Katz and Lazarsfeld (1955). Fitting this to life-cycle sales data of several durables, Bass showed that the model fits well, and therefore could be used for forecasting. The model has been influential and a large literature has followed utilizing it. See Mahajan et al. (1990) and the literature cited therein.

2.4 What was missing in the first wave? The models as originally formulated were envisaged mainly as devices for pre-launch sales forecasting in an environment with limited data. While parsimonious and elegant, they provided no role for marketing variables such as prices or advertising. Consequently, they could not speak directly to the role of these variables in shaping the product life-cycle. Secondly, the models were specified at the industry or category-level, not at the product or brand-level. Marketing allocation by firms is usually at the product or brand-level. This mismatch made it difficult to include marketing variables in a logical way and to accommodate a role for product competition.

369

370

CHAPTER 7 Diffusion and pricing over the product life cycle

Third, it was difficult to interpret the underlying parameters of the model as stable, time-invariant representations of the data generating process. Suppose we take it as given that marketing mix variables affect sales. Since marketing is a time-varying form of external influence, the assumption that p – the encapsulation of external influence – is constant and time-invariant is at odds with how marketing may impact on adoption.6 Even if we allowed these parameters to be time-varying, and fit them to the data, it is clear the pattern of time-variation inferred from the data will not hold under alternative marketing allocations. This made them unsuitable to simulate counterfactual scenarios and to facilitate normative decision making for the development of new life-cycle marketing strategies. Fourth, the models implicitly assume a homogeneous population, because all individuals in the population are equally “susceptible” to external or internal influence. Though Bass used the language of innovators and imitators, and the terminology persisted for some time, subsequent literature recognized the terminology was not appropriate. The right interpretation of the model is as representing in a parametric way, the impact of external and internal influence in the model, rather than mapping to underlying dichotomous sub-populations of innovators or imitators (e.g., Lekvall and Wahlbin, 1973; Tanny and Derzko, 1988; Chatterjee and Eliashberg, 1990). The assumption of homogeneity precluded understanding heterogeneity in marketing response across users and its impact on the product life-cycle. Finally, a fifth difficulty was the early formulations were not explicitly derived from the solution to a well-posed problem of consumer adoption. They also did not consider explicitly the intertemporal adoption problem faced by consumers. This made it harder to understand the belief and preference structures of consumers and the supporting marketing environments that lead to firms’ choices of various strategies by affecting those beliefs and preferences. This made it harder to understand why one marketing strategy works and why one does not. It also made it difficult to use these models for simulating counterfactual marketing scenarios, which is important for planning. The underlying concern is that the beliefs (and behaviors) of users can change systematically under alternative marketing strategies, and therefore the parameters encapsulating the diffusion equation would also change systematically. Though the community did not have the data or the tools to formally attack some of the other issues laid out above until much later, it is clear that the question of linking adoption back to an underlying micro-model of behavior is something they thought about deeply. For example, in presenting a model for adoption based on an approximate formal consumer search and utility maximization, Haines (1964) writes, “It should be noted that it may well be that the model developed below is entirely consistent with sophisticated maximizing behavior in short run situations with imperfect information and uncertainty on the consumer’s part about how the product will perform. However, there seems to be at present no obvious method for ascer-

6 Putsis (1998) documents that allowing for time-varying parameters in diffusion models improves fit.

3 The second wave: Life cycle pricing with diffusion models

taining whether this is so. This question is obviously an interesting one for further exploration.”

The next wave of research in diffusion models aimed to address some of these issues. A first step was to augment the framework to include marketing mix variables. A wave of papers on life-cycle pricing strategies took the diffusion process as given and thought about how marketing strategies should be formulated if demand evolves according to this diffusion process.

3 The second wave: Life cycle pricing with diffusion models On the demand-side, diffusion models were augmented to accommodate prices in two main ways. One way was to allow prices to shift the sales equation multiplicatively by a non-negative function g (.) (e.g., Robinson and Lakhani, 1975; Dolan and Jeuland, 1981; Bass, 1980; Jeuland and Dolan, 1982; Dockner and Jorgensen, 1988), s (t) = σ (S (t)) [M − S (t)] × g (p)

(5)

where we write σ (t) = σ (S (t)) to make explicit the dependence of the hazard rate on past cumulative sales. For example, Robinson and Lakhani (1975); Dolan and q Jeuland (1981) set g (p) = e−βp so that s (t) = [p + M S (t)] × [M − S (t)] × e−βp(t) . In these specifications, the effect of prices on current sales is separable from the effect of cumulative sales, so I refer to these as “separable diffusion” specifications. Another way was to specify the potential market size, M to be a function of prices (e.g., Feichtinger, 1982; Josrgensen, 1983; Kalish, 1985; Kalish and Lilien, 1986; Kamakura and Balasubramanian, 1987; Kamakura and Balasubramanian, 1988; Horsky, 1990), s (t) = σ (S (t)) M (p) − S (t) (6) For ease of reference, I refer to them as “market potential” specifications. Theoretically optimal pricing over the PLC differs between these specifications because the optimal price path depends on how it interacts with the impact of cumulative sales on demand. In addition, the literature also considered the implications of production on the cost-side (e.g., Bass, 1980; Jeuland and Dolan, 1982; Kalish, 1983). In particular, prices over the PLC could influence the extent to which costside economies of scale could be obtained, for example, when unit costs reduces with more production via learning (sometimes referred to as “experience curve” effects, e.g. Alchian, 1958; Arrow, 1962). Kalish (1983) provides a comprehensive investigation of the life-cycle consequences of these various features, so I discuss their implications through the lens of his results. Kalish’s model features a forward-looking monopolist setting prices over the PLC under both separable diffusion and market potential specifications, and with possible cost-side economies of scale.

371

372

CHAPTER 7 Diffusion and pricing over the product life cycle

3.1 Price paths under separable diffusion specifications Let’s consider the demand-side first. There are two effects at play. First, there is a “demand-side economies of scale effect” inherent in the model because Eq. (5) shows the likelihood of current purchase changes with past cumulative sales. Second, there is a “saturation effect” in the model given the existence of a maximum market potential, M. Because the good is durable, as more units are sold the remaining unfulfilled demand decreases. The resultant price path depends on the tradeoff between the two effects. Suppose the effect of demand-side economies is positive – so that more sales stimulate more demand (for example, because the product is good and word-of-mouth is beneficial). There is an incentive to implement penetration pricing: start with low prices and increase it over time. This helps stimulate early adopters, which in turn will stimulate demand. At some point in the life-cycle, the demand-side economies may be overcome by saturation (e.g., after maturity or close to decline). Then, additional sales reduce future sales by removing would-be adopters from the market. Reflecting these, Kalish (1983) shows the optimal price path over the PLC will involve increasing prices till the point where the saturation effect overcomes the diffusion effect, and will decrease afterward. If the selling horizon, market size or other factors of the product is such that this point is never reached, the optimal price path will instead be monotonically increasing. Now suppose that the demand-side economies for a product are not strong enough to ever overcome the saturation effect (e.g., word-of-mouth is limited), or they have an adverse effect (e.g., the product is bad). Then, reflecting the saturation effect, optimal price path will be monotonically declining. On the cost-side, the impact of cost-side economies of scale is to induce optimal price paths that decrease monotonically over the PLC, ceteris paribus. Intuitively, the effect of learning economies is to reduce price and produce more in order to benefit by lowering future production cost.7 When demand and cost-side economies are jointly present, the optimal price path will be determined by a complex interaction of the various factors. When all the factors reinforce each other, e.g., when demand-side economies for a product are adverse or not strong enough to overcome the saturation effect, both demand and cost sides imply price declines. Then prices will decline over the PLC. In other cases, the price path over the PLC will depend on how high the monopolist’s discount factor is, the rate of the cost decline, and the intensity of the demand-side economies of scale.

3.2 Price paths under market potential specifications In these specifications, we interpret M (p) as the number of consumers who are willing to buy the product at price p. The model has an appealing link to a consumer 7 Below marginal cost pricing is possible. In the beginning of the life-cycle, a forward-looking firm may be willing to lose money in order to kick-start such economies.

3 The second wave: Life cycle pricing with diffusion models

choice problem. Each individual in the population is characterized by a reservation price for the product, which is the highest price he would be willing to pay. Given an actual price p, M (p) represents the total number of users whose reservation price is above p (e.g., Horsky, 1990). As price decreases over time, more consumers find price acceptable, so there is an incentive to cream skim the market by selling at high prices initially to those with higher willingness-to-pay, and to reduce it over time. The hazard σ (S (t)) in the model represents the impact of past cumulative sales on the probability of adoption by the users in the potential market. If demand-side economies (as encapsulated in σ (S (t))), are weak or nonexistent, or if demand-side economies are adverse (e.g., the product is bad and generates bad word-of-mouth), the incentive to price discriminate by skimming users implies that we would get optimal price paths that are declining over the PLC. This holds whether or not there are cost-side economies. Because the impact of costside economies is to induce optimal price paths that decrease monotonically over time, the demand and cost-side incentives to cut prices over the PLC reinforce each other. When demand-side economies are strong, it is possible to obtain increasing price paths (penetration), or price paths that increase and then decrease over the PLC (penetration and skimming). As above, what obtains will depend on how high the monopolist’s discount factor is, the rate of the cost decline, and the intensity of the demand-side economies of scale.

3.3 Extensions to individual-level models Later papers that appeared in the eighties and early nineties ported the above frameworks to individual-level settings. These papers showed how to tie back diffusion formally to underlying consumer beliefs, and demonstrated how to aggregate these models specified at the individual-level to the aggregate-level so as to generate the S-shaped diffusion patterns that motivated the development of the macro-level literature. These papers were intertwined with parallel developments in economics. Horsky (1990) extends Kalish’s (1983) analysis, deriving the reservation price a utility maximizing individual will have for a new product as a function of the product’s benefits and his wage rate. The benefits of the product to the user are captured as a function of the time-savings provided by new durables. Using Becker’s (1965) household production function framework, Horsky shows the propensity to adopt will be a function of the user’s wages, as this determines the marginal cost of time. Assuming that the wage rate has an extreme value distribution across the population, he shows that the aggregate diffusion process will show an income-price dependent logistic adoption curve. This showed how S-shaped PLC curves could be obtained simply from the existence of heterogeneity in consumer valuations. Horsky augmented this model to incorporate demand-side economies of scale by allowing user’s awareness and uncertainty to be influenced by the number of previous adopters of the product. This generates a “market potential” model of the form in Eq. (6). Similar to Kalish (1983), Horsky shows that if demand-side economies of scale are weak, a

373

374

CHAPTER 7 Diffusion and pricing over the product life cycle

price skimming strategy over the PLC is optimal for a monopolist and possibly for oligopolists. A theme that developed in this later literature was to separate information flow from adoption. Diffusion processes in papers in the “first wave” applied directly to sales over the PLC. Diffusion processes in this later literature applied to the percolation of information into the population over the PLC. Following ideas in economics on the role of advertising as information (e.g., Ozga, 1960; Stigler, 1961; Gould, 1976), advertising was allowed to affect the rate of diffusion of information. This sets up a role for the price path to interact with advertising over the PLC. In Kalish (1985) for instance, consumer awareness of the new product is modeled as an “epidemic” type diffusion process, while adoption given the awareness occurs when the perceived risk adjusted value of the product exceeds its price. This is closer to a standard micro-economic model (see also Jeuland, 1981). Given that consumers are risk averse, willingness to pay increases over time as information from early adopters mitigates uncertainty about the product, thereby linking the model closer to models for how individuals make these kinds of choices, and presenting an explicit role for consumer beliefs. Price paths over the PLC are found to be declining unless the effectiveness of adopters in generating awareness or reducing uncertainty of later adopters is high. In the latter case, penetration pricing is possible. In a similar vein, Roberts and Urban (1988); Chatterjee and Eliashberg (1990) modeled consumers as risk-averse, myopic, Bayesian decision makers with diffuse priors about the quality of a new product. As information diffuses into the market, consumers update their priors in a Bayesian way making static adoption choices consistent with their posterior beliefs. The risk premium for newness reduces over time as information from early adopters reduces uncertainty about the product, so these models also featured a demand-side economy of scale. The solution of corresponding life-cycle pricing polices was however not worked out. Overall, these papers deepened the diffusion literature’s link to microeconomic fundamentals and set the stage for deeper consideration of heterogeneity in valuations and consumer beliefs for life-cycle pricing.

3.4 Discussion Which diffusion specification should be preferred? One consideration is that the separable diffusion specification in Eq. (5) is theoretically unappealing for durable goods. Because the model is multiplicatively separable, an implication of this formulation is that demand elasticity is independent of past sales. This assumption is particularly bothersome for durable goods, since in this case the potential population directly changes as a result of past sales (Jeuland, 1981; Kalish, 1983). In contrast, the market potential specification in Eq. (6) are more appealing as they map better to an adoption problem for durables, and captures the intuitive notion that the set of consumers remaining in the market who are susceptible to adoption, is a function of past sales. In the diffusion literature, the question of whether prices affect the likelihood of adoption or whether they affect the market potential, or both, is still unresolved.

3 The second wave: Life cycle pricing with diffusion models

A small empirical literature that has attempted to answer this question has found mixed results. The strategy in these papers has been to use aggregate market level time-series data on sales and prices across several durable goods, and test which specification fits the patterns best. Jain and Rao’s (1990) work suggests that price affects both adoption rates and the market potential; while Kamakura and Balasubramanian’s (1987); Bottomley and Fildes’ (1998); Kamakura and Balasubramanian’s (1988) results suggest that when an effect of prices on aggregate level diffusion is detectible, it seems to primarily operate via an effect on the adoption probability. The aggregate nature of the data, which is at the category and not the product-level, makes it difficult to interpret the role of prices in such testing. Further, the possibility of price endogeneity makes it difficult to assess whether the observed effects measure the effect of prices, or whether they reflect unobserved confounding. As such, the results are fragile and inconclusive as of this point. An open issue is the quantitative significance of the various factors that determine the shape of the price paths over the PLC (cost-side economies, firm-side discounting, demand-side economies). The issue of demand-side economies of scale is particularly vexing. There have been some attempts to test for the quantitative significance of factors such as word-of-mouth in driving diffusion. Early researchers like Bass (1969) interpreted the good statistical fit of diffusion models to aggregate sales data as implying the existence of strong word-of-mouth effects. Horsky’s (1990) work suggested that word-of-mouth effects are relatively weak relative to skimming incentives. His results, based on analysis of aggregate sales paths of several durables, also suggest that word-of-mouth that are large enough to generate penetration pricing over the PLC seem to exist only in certain product categories. This seems to gel with casual empiricism: penetration pricing for durables and technology goods is rarely observed; skimming is the norm. Based on the modern work of Manski (1993) and the large literature on the identification of social interactions (e.g., Blume et al., 2011), we now realize that it is not trivial to identify “contagion effects” emphasized in the diffusion literature with the kind of variation typically available in observational adoption data, especially at the aggregate level. Understanding whether word-of-mouth and other demand-side economies are quantitatively large enough to sustain penetration pricing, or any form of price increases over the PLC would be an important agenda for future research. Therefore, understanding the role of information diffusion in the adoption of new products; its interaction with consumer beliefs and preferences; and its ultimate implications for pricing over the PLC; continues to be an open issue for empirical work.8

8 The modern literature on social interactions with micro-data is copious. Exemplar papers that have analyzed the role of social interactions in the adoption of technology goods specifically include Goolsbee and Klenow (2002); Ryan and Tucker (2012); Björkegren (2019). The implications for pricing over the PLC corresponding to such demand-side economies remain an open question in this literature.

375

376

CHAPTER 7 Diffusion and pricing over the product life cycle

3.5 What was missing in the second wave? Competition At a high level, a first order issue for the literature was the rudimentary treatment of competition in the models (see Jørgensen, 1986 for more detailed assessments). There was limited formal treatment of strategic interaction and its implication for life-cycle strategies. The use of game theoretic models, and in particular, of formulating life-cycle policies as optimal strategies of dynamic games was limited.

Consumer expectations A second issue was the absence of a role for consumer expectations. Consumers were treated as myopic, responding only to current prices while remaining uninfluenced by the trajectory of future prices. The assumption that consumer behavior would remain the same irrespective of whether they expected prices to increase or decrease over the life-cycle was unrealistic. Thus, their omissions reduced the empirical realism of the models. More importantly, the microeconomic literature on durable good monopoly pricing following Coase (1972), showed that forward-looking behavior by consumers considerably affected the pricing policies of the monopolist.9 Anticipating future price cuts, forward-looking consumers may delay their adoption, and purchase only at low prices later. Stokey (1979); Sobel and Takahashi (1983) showed that in such a world, a firm that can credibly commit to a pricing strategy may in most situations, choose not to cut prices over time, choosing the monopoly price in the first period and holding it fixed thereafter. Therefore, the inclusion of price expectations by consumers in the model lead to dramatically different pricing polices over the PLC (in this case, moving it from declining to flat prices). Further, allowing forward-looking expectations by consumers altered the profit profile of price skimming strategies. Stokey’s (1981); Besanko and Winston’s (1990) analyses showed that in the absence of credible commitment strategies, the durablegoods monopolist would intertemporally price discriminate by price skimming, but the profits are lower than if he could commit to a fixed price. This implied that models with forward-looking consumer expectations generated a role for commitment devices for firms and for marketing strategies that help signal to consumers the firm’s intent to rein in price cutting over the PLC (e.g., binding contracts with retailers, developing a reputation to not run clearance sales, committing to limited production capacity). Such strategies did not play an important role in pricing over the PLC in the absence of consumer expectations. Further, the literature pointed out the extent of profitability of intertemporal price discrimination with durable goods is a function of the degree of consumer patience.

9 Coase conjectured that a monopolist selling a durable good to forward-looking consumers may lose all his market power due to his inability to commit to not cut prices (or increase production) in the future. Stokey (1981); Gul et al. (1986); Ausubel and Deneckere (1989) showed Coase’s conjecture arises as a limiting result in equilibrium models where the duration between the monopolist’s price shrinks to zero.

3 The second wave: Life cycle pricing with diffusion models

Even if consumers maintained expectations of possible price declines, price skimming was profitable when firms are more patient than consumers (e.g., Landsberger and Meilijson, 1985). And, to the extent that consumers with higher willingness-topay were also more impatient and less sensitive to future price changes, heterogeneity in consumer patience could form the basis of profitable price discrimination over time (e.g., Stokey, 1979). It is critical to the monopolist to assess the degree of consumer patience right. If a monopolist facing forward-looking consumers implements the optimal consumer pricing policy corresponding to a false assumption that consumers are myopic, profits are significantly less than if the monopolist follows the equilibrium pricing policy for rational, forward-looking consumers (Besanko and Winston, 1990). This is because forward-looking consumers incorporate the existence of a substitute good – the same product in the future – that myopic consumers do not, and therefore their demands are more price elastic than of myopic consumers. A firm that follows the prescription of a myopic consumer model will therefore set prices that are too high, harming the profitability of his price skimming strategy. For marketing practitioners, these aspects provide motivation for investing in market research for understanding consumer expectations, and for understanding the extent of discounting and its heterogeneity in the population. These form inputs to formulation of better life-cycle policies in practice. Properly embedding this in a life-cycle pricing framework needs specification with a well-posed role for consumer expectations. Other issues relate to the role of marketing strategies in affecting the future value of units the firm sells today. If consumers are forward-looking, they would incorporate these future values into their decisions to buy today. One example would be the prices they could obtain if they sold the product later in markets for used goods. Since the good is durable, such used goods markets are likely to develop. The problem for the monopolist is that the used good is a substitute for the new good, so the prices of a used unit on the second-hand market constrains him in terms of the price path it can induce for new units. This sets up a role for strategies such as “planned obsolescence” in which the quality (or durability) of new goods are deliberately reduced. By reducing durability, the monopolist reduces the substitutability between new and used units, which allows him to sustain a higher price for new units. In the extreme, it may be profitable to reduce quality to an extent that the market for used goods is “killed off” (see Waldman, 2003 for a more comprehensive discussion). These strategies cannot be properly assessed without accommodating a clear role for consumer expectations in the model. Consumer expectations may also affect monopolist’s choice of PLC length and his choice of selling mechanisms over the PLC. A monopolist facing rational, forwardlooking consumers would prefer a shorter time horizon compared to monopolist facing myopic consumers (Besanko and Winston, 1990). The reason is that as the selling horizon becomes shorter, the constraints imposed on pricing associated with

377

378

CHAPTER 7 Diffusion and pricing over the product life cycle

forward-looking consumers’ propensity to wait for future price declines reduces.10 This creates an incentive to shorten the PLC for the product. Strategies for doing this include planned obsolescence, introduction of substitute new products or early withdrawals. The choice of how much capacity to develop and how much inventory to hold is also affected, because that affects the decision of how long to sell. This creates an interaction with yield management to address the costs of developing and holding inventory over the PLC. On the selling mechanism-side, renting may be preferred to avoid time consistency issues (e.g., Bulow, 1982), and the choice of whether to sell the product via posted prices versus auctions may be affected when impatient consumers dislike the uncertainty and search costs associated with buying via auction (e.g., Ziegler and Lazear, 2003). A final point was the use of the demand systems for assessing counterfactual pricing policies as part of a marketing decision-system. In a model with consumer expectations, demand is a function of expectations of future prices; when a new price path is considered, consumer expectations about future prices would change. Consequently, the demand profile over the entire planning horizon would change reflecting consumer reactions to changes in both current and expected future prices. Accommodating this is important to comparing the profitability of various pricing polices over the PLC. Given the absence of a formal basis for incorporating consumer expectations, the models from the previous literature could not accommodate these kinds of counterfactual “what-if” situations as part of normative scenario planning by decision makers.

Open vs. closed-loop strategies A third issue was that most of the papers solved for open-loop policies, which outline a set of time-optimal actions the firm should implement. In open-loop polices, optimal actions are functions of time only, and with the dependence held fixed. In contrast, closed-loop polices are functions of time and the state of the system as summarized by the current value of the relevant state variables. Put differently, openloop policies are time inconsistent, while closed-loop policies are time consistent. This makes open-loop policies unrealistic in practical settings where managers are routinely called upon to revise strategies in response to changes in the market state. Paraphrasing Erickson’s (1992) critique of open-loop policies in terms of pricing, “The critical deficiency of open-loop strategies is precisely that, once determined, they are fixed; open-loop [price] levels may change across time, but the trajectory cannot be changed once the game has started. A marketing manager is not so likely to want to put [pricing] on such an automatic control; he/she would wish to monitor the market situation as it proceeds across time and modify [pricing], when needed, to correct the situation. Marketing managers need closed-loop equilibrium strategies.”

10 In the extreme, a selling horizon of one period provides the monopolist the ability to commit to one

fixed price over his selling horizon, which helps profits.

4 The third wave: Life cycle pricing

The strategies prescribed by open vs. closed-loop policies are in general different (Jørgensen, 1986). This difference plays an important role when allowing for rational, forward-looking consumers in the model. The issue is of time consistency. The implication of time consistency is that outcomes with commitment and lack of commitment outcomes are different. Open-loop policies make sense when the firm has the ability to credibly commit to a pre-determined sequence of actions over the planning horizon. Since the commitment is credible, it is plausible that consumers could be convinced the firm will stick to that policy over time. So the open-loop solution can form an equilibrium in which the pricing policy it prescribes is consistent with the expectations of rational consumers. Now notice that a policy that was optimal for the firm in the past may no longer be optimal when the future arrives and state of the market changes from what was originally anticipated. Sticking with the past-optimal policy would not be time consistent because, once the future arrives, the firm no longer cares about its past profits and is tempted to deviate. In the absence of a way to credibly commit, it is unlikely that consumers could be convinced the firm will not deviate in those states. Therefore, in the absence of commitment, the open-loop solution would not form an equilibrium in which the pricing policy is consistent with the expectations of rational consumers. On the other hand, closed-loop policies solve for a pricing policy that is optimal for the firm for every state it encounters, incorporating explicitly such deviations. Hence, closed-loop policies are more relevant in a world where the firm cannot commit upfront to a sequence of prices.11 The third wave of literature attempted to address these issues more directly.

4 The third wave: Life cycle pricing from micro-foundations of dynamic demand The next wave of papers interpreted diffusion through the lens of individual-level models of adoption while incorporating a formal role for consumer expectations and associated dynamics. While the previous waves of papers were primarily theoretical in orientation, perhaps driven by the increasing availability of data, these recent papers were more empirical in their orientation. The broad goal was to estimate the parameters of the models from real data and to compute optimal policies for normative use via empirically realistic models. The focus on empirical realism led to more complex structure, obviating analytical solutions, and necessitating numerical methods. The focus on empirical work also led to heightened sensitivity towards identification concerns. The new literature emphasized the role of prices and under emphasized the role of contagion, with the latter aspect driven to a large extent by

11 Equivalently, in the context of a game, open-loop equilibria are not sub-game perfect in the sense

they do not represent an equilibrium for every sub-game that may start at a different point in time (e.g., Fershtman, 1987). In contrast, closed-loop equilibria are sub-game perfect.

379

380

CHAPTER 7 Diffusion and pricing over the product life cycle

the challenges of identifying such contagion with the variation typically available in observational adoption data. The new literature addressed the three missing pieces of the second wave. Competition across firms was accommodated by embedding life-cycle pricing policies within the framework of a dynamic game across firms. Consumer expectations about future prices were incorporated into intertemporal adoption choices made by consumers, which determined demand. An important consideration was to ensure that these expectations were consistent with firms’ actual pricing choices over the PLC and that firm’s pricing choices were in turn, consistent with these expectations. The papers also reversed the open-loop emphasis of the previous wave. Freed of the limitations of analytical solutions, the literature embraced numerical dynamic programming. Dynamic programming delivers a closed-loop, state-contingent policy by construction. With competition, life-cycle policies were framed as optimal strategies in dynamic games. Sub-game perfect equilibria of these games are also closed-loop by construction. Given the empirical focus, a first challenge was to specify a demand model at the micro-level that formally encapsulates the intertemporal adoption problem faced by consumers. The idea was to allow for expectations that consumers may have about falling prices, improvements in quality and other ways in which the adoptionrelevant consumer states may evolve. Melnikov (2012) proposed a tractable empirical framework for handling the demand dynamics induced by the rapid price declines and quality improvements common in technology driven markets. Building on Melnikov’s ideas, Song and Chintagunta (2003) developed a tractable dynamic model of demand for durables. In an important link back to the diffusion literature, Song and Chintagunta (2003) showed the model can induce S-shaped adoption curves. Nair (2007); Goettler and Gordon (2011); Conlon (2012) extended these demand models to consider the pricing of durable goods. In what follows, I first discuss Nair (2007), which features a monopoly life-cycle problem by a durable good monopolist facing forward-looking consumers who sequentially exit the market after purchase. Then, I discuss Goettler and Gordon (2011) which extends the model to allow for replacement, competition, and innovation. The goal is to sketch out a framework so the reader can get a sense for how the main forces can be encapsulated in a reasonably realistic empirical model, and to help frame a discussion of several open issues in this literature. I do not discuss details of estimation or computation, and comment on them briefly at the end of the section.

4.1 Dynamic life-cycle pricing problem overview The PLC pricing problem for a durable good involves dynamic considerations. One source of dynamics is saturation, which derives from durability. The firm has to anticipate that the more it sells today, the more it lowers its demand tomorrow, because consumers who buy the durable good are out of the market for the product until they replace it. Therefore, the market potential faced by the firm is “endogenously shrinking” as a function of the chosen path of prices. Consumer behavior is also dynamic,

4 The third wave: Life cycle pricing

because consumers solve an intertemporal adoption problem, timing their adoption and upgrades based on their expectations of future prices and product quality, which are formed endogenously based on the polices taken by firm. The firm’s problem is to set a sequence of prices that incorporates these intertemporal effects, and at the same time, enables it to extract the most surplus from its consumer base. Since the valuation of any one consumer is unknown, perfect price discrimination is typically not a feasible strategy. Instead, the firm can use time to discriminate among consumers, setting high prices initially to sell to high valuation consumers, and cutting prices over time to appeal to the low valuation consumers remaining in the market. As emphasized above, a significant problem to “skimming” the market in this way is that rational consumers may anticipate lower future prices, and delay their purchases. This incentive to delay is a function of consumer expectations of future prices, which are formed endogenously with the prices chosen by the firm. To obtain the optimal set of prices therefore, we solve for a sequence of equilibrium prices and consumer expectations, such that the firm’s prices are optimal given consumer expectations, and such that the expectations are optimal given the firm’s pricing policy. Besanko and Winston (1990) present a stylized formulation of this setting. One way to think of the pricing problem is to think of it as a modified second degree price discrimination problem in which time is used as a screening device. The firm’s problem is to design a set of time-dependent prices that consumers of different valuations will self-select into. The challenge for the firm is to design the sequence of time-dependent prices so that high valuation consumers self-select into the high, early prices and not “trade-down” into later, lower prices. Assume the monopolist is unable to make binding commitments about future prices. Therefore, the equilibrium pricing policy must be sub-game perfect. This means that in any period, the monopolist’s intertemporal pricing policy from that period on must be optimal, given the state of the market and consumer expectations in that period.

4.2 Monopoly problem The firm’s objective is to set a sequence of prices {pτ }∞ τ =0 that maximize the expected present discounted value of profits from the product in each period t, ∞ τ −t δ π , S is the firm’s discount factor, and π (pt , St ) is the where δ (p ), t t f τ =t f period-t payoff to the firm from setting price pt when the state is St . The firm is assumed to observe the state vector which contains all relevant information required to decide the optimal price, p∗ (St ). The vector St could potentially include time since introduction, and also include functions of the entire history of the game between the firm and consumers. For simplicity, let’s assume that a firm bases its current pricing decision only on “payoff relevant” historic information, i.e. functions of history that only affects current profits, which evolve in a Markovian way. Further, we restrict the beliefs of all agents to evolve in a Markovian way. Thus, we focus on Markov pricing strategies and solve the model for a stationary Markov Perfect Equilibrium

381

382

CHAPTER 7 Diffusion and pricing over the product life cycle

(Maskin and Tirole, 2001; henceforth MPE). The resulting strategies are sub-game perfect conditional on the current state. For simplicity, assume that consumer heterogeneity is discrete. There is a potential market of size M for the product composed of R types of consumers, and at date t = 0, there are Mr0 consumers of type r in the market, so that M = R r=1 Mr0 . The state vector St consists of the following payoff-relevant variables: the number of consumers of each type in the market at the beginning of period t, Mrt , r = 1, . . . , R. The timing of the game between consumers and firms in the market is as follows. At the beginning of period t, the firm observes the sizes of each segment in the potential market, Mrt . The firm sets prices for that period conditioning on these state variables. Consumers then observe the prices and depending on their valuations, decide to buy the product or delay purchase. The consumer decision each period is a discrete-choice. Based on these decisions, aggregate demand for that period is realized at the end of the period.

Consumer’s inter-temporal choice problem A consumer i of type r decides to buy the product in a period t or wait till the next period. In deciding to buy, he faces an intertemporal tradeoff: purchasing at date t involves paying pt now, giving him a stream of service flow from the product starting t. Not purchasing implies waiting for t + 1, and foregoing consumption utility by one period. Waiting however presents the option to buy at a possibly lower price pt+1 tomorrow. In this sense, purchasing in the next period is akin to exercising an option, and waiting for the next period has some option value in period t. The option value is influenced by the consumers expectations about future prices. Finally, once the consumer buys, he is out of the market for the product. So, the purchase decision is an optimal-stopping problem. Let δc denote the consumer discount factor and ar denote the utility that consumer type-r derives from the use of the product per period of consumption. The consumer’s conditional indirect utility from buying the product in period t is assumed to be: ui1t = ar − βr pt + i1t

(7)

where, i1t (and i0t below) are consumer-specific private shocks that shift the consumers’ purchase utility, assumed to be iid over time and over consumers with CDF F (i0t , i1t ); and βr is the price-sensitivity. The indirect utility from the outside good – no purchase – is normalized to zero, i.e., ui0t = i0t . Under the assumption of perfect durability (no depreciation, and no replacement), we can write the alternative-specific value function from buying for type-r, W1r (pt , i1t ) in terms of the present discounted utility of the service-flow from the ar , product conditional on purchase αr = 1−δ c W1r (pt , i1t ) = αr − βr pt + i1t

(8)

Writing the deterministic component of the value as W1r (pt ), we can see that W1r (pt , i1t ) = W1r (pt ) + i1t . The alternative-specific value function associated

4 The third wave: Life cycle pricing

with no-purchase captures the option value of deferring purchase to a future period. We can write this as the discounted expected value of waiting until period (t + 1), and choosing the best option then (purchase or wait again),12 W0r (pt , i0t ) = δc Et max ui1,t+1 , ui0,t+1 + i0t (9) The expectation in (9) is taken with respect to the distribution of future variables unknown to the consumer, conditional on the current information, i.e. with respect to F pt+1 , i1,t+1 , i0,t+1 |pt , i1t , i0t . Following Rust (1987), the additive private information terms can be integrated out, and the deterministic component of the consumer’s utility of waiting be expressed in terms of an “alternative-specific” value function for waiting, W0r (pt ). This value function satisfies the functional equation, W0r (pt )

= δc max αr − βr pt+1 + i1t , W0r (pt+1 ) + i0t dF (i0t , i1t ) dF (pt+1 |pt ) (10) To solve for the equilibrium path of prices, it is convenient to re-write the continuation values over the states St ,

W0r (St ) = δc max αr − βr p (St+1 ) + i1t , W0r (St+1 ) + i0t dF (i0t , i1t ) dF (St+1 |St )

(11)

The representation over state variables St here is purely a device to represent the consumer’s beliefs about pricing. Intuitively, we represent the value function W0r (St+1 ) as depending on factors that the consumer believes will drive future pricing. Implicitly, we also assume that both consumers and firms understand and share the same expectation of how the state of the market tomorrow, St+1 , evolves given the state of the market today, St . This is a “rational” expectations assumption. Under this assumption, in equilibrium, consumers correctly anticipate that the firm, when facing the future state St+1 , will set the price pt+1 = p (St+1 ). This enables the model to allow consumer expectations of future prices to be formed endogenously with the pricing policy chosen by the firm. Further, in equilibrium, the price expectations of each consumer will be consistent with the pricing policy chosen by the firm. Consumer type-r will buy in period t if his utility from purchase exceeds that of waiting: αr − βr pt + i1t ≥ W0r (St ) + i0t 12 One way to appreciate the option value from choosing to wait again in the next period is to see that

the value of waiting in period t is not the expected discounted utility of buying in the next period, i.e., ˜ 0r (pt , i0t ) = δc Et ui1,t+1 + i0t . It is easy to see that δc Et max ui1,t+1 , ui0,t+1 + W0r (pt , i0t ) = W ˜ 0r (pt , i0t ). So, ignoring the option value would cause i0t > δc Et ui1,t+1 + i0t , so W0r (pt , i0t ) ≥ W one to understate the value of “inaction”, i.e., delaying purchase.

383

384

CHAPTER 7 Diffusion and pricing over the product life cycle

This decision rule implies probabilities σr (pt , St ) of purchase each period:

σr (pt , St ) = I (αr − βr pt + i1t ≥ W0r (St ) + i0t ) dF (i0t , i1t )

(12)

Summing over the consumer types, we can obtain the following expression for the aggregate demand for the product at state St , Q (pt , St ) =

R

Mrt σr (pt , St )

(13)

r=1

Evolution of states The demand system generates interdependence in demand over time. Dependence in demand arises from dependence in the mass of each consumer type remaining in the market each period. The mass of consumers of each type in the potential market, Mrt , is endogenous to the firm’s historic pricing behavior. Therefore, the dynamics of pricing introduce a dynamic in the evolution of Mrt . To derive the evolution of Mrt , note that each period, Mrt consumers of type r will purchase the product with probability σr (pt , St ), and drop out of the market. This leaves Mrt × 1 − σr (pt , St ) consumers in the market for the next period.13 Hence, the size of type r in period (t + 1) is, Mr,t+1 = Mrt × 1 − σr (pt , St ) r = 1, . . . , R (14) Eq. (14) summarizes the effect of the history of prices on the demand-side of the market. From Eqs. (12) and (14), one can see that demand, as well as the evolution of the state variables, Mrt -s, are endogenously determined in equilibrium. The state variables, M1t , . . . , MRt , depend on the corresponding probability of purchase, σr (pt , St ), which in turn depends on the evolution of the state variables through (14). Each consumer is assumed to be small relative to the entire market so the consumer’s purchase decision does not affect the evolution of the states.

Flow of profits and value function To obtain the profit function for the firm let c denote the marginal cost of production of the product, assumed to be constant over time. The flow of profit for the firm can be written as: π (pt , St ) = (pt − c) Q (pt , St )

(15)

The firm is assumed to be risk-neutral and to set prices by maximizing the expected present discounted value of future profits, where future profits are discounted using 13 Entry of new consumers into the potential market for the product has to be handled separately. With

new consumer entry, the optimal policy can feature price cycles (Conlisk et al., 1984; Narasimhan, 1989; Garrett, 2016). Price increases are possible if new consumer demand grows stronger over time (Board, 2008).

4 The third wave: Life cycle pricing

the constant discount factor δf . The solution to the pricing problem is represented by a value function, V (St ) which denotes the present discounted value of current and future profits when the firm is setting current and future prices optimally. The value function satisfies the Bellman equation, V (St ) = max π (p, St ) + δf V (St+1 (p)) p∈R+

(16)

The optimal pricing policy, p ∗ (St ), is stationary, and maximizes the right hand side of the value function, p ∗ (St ) = argmax V (p, St ) (17) p∈R+

Equilibrium A stationary MPE in this model is defined by a set of (“demand-side”) conditional choice probabilities, σr (pt , St ), r = 1, . . . , R and (“supply-side”) price functions, p ∗ (St ), such that Eqs. (11)-(16) are simultaneously satisfied at every St . The equilibrium is a fixed point of the game defined by (11)-(16). The equilibrium has three properties worth noting. First, Eqs. (16) and (17) imply that in equilibrium, when faced with state St , the firm’s pricing policy is a best response to consumer behavior at that state. At the same time, Eqs. (11) and (12) imply that when faced with a state St , and price pt = p (St ), consumers make purchase decisions by maximizing intertemporal utility. Both consumers and firms take into account the future states of the market through Eq. (14). Finally, Eq. (13) implies that the realized aggregate demand in state St is consistent with optimal consumer purchase decisions at the corresponding optimal price p ∗ (St ). The fixed point that defines equilibrium thus requires each consumer type to maximize their expected utility subject to consistent perceptions of the likelihood of future states for the firm, and firms to maximize expected payoffs based on consistent perceptions on the likelihood of future consumer states. The optimal pricing policy is essentially the solution to a (R + 1)-agent dynamic game between a monopolist and R coalitions of consumer types.

Solution The equilibrium has to be solved numerically. One solution strategy used in Nair (2007) is using parametric policy iteration (e.g., Rust, 1994) taking demand estimates as given. An alternative strategy to equilibrium computation would be to treat Eqs. (11)-(16) as a Mathematical Program with Equilibrium Constraints (MPEC; Su and Judd, 2012), and compute the equilibrium for a given set of demand and cost parameters (e.g., Chen et al., 2013; Rao, 2015). Convergence of the numerical procedure to a solution indicates that an equilibrium exists at those parameter values. There is no guarantee that the equilibrium is unique. For many plausible parameter values, the pricing policy exhibits skimming: a set of initial high prices followed by declining prices over time reflecting intertemporal price discrimination across consumers of varying valuations.

385

386

CHAPTER 7 Diffusion and pricing over the product life cycle

4.3 Oligopoly problem The pricing problem of a durable goods oligopolist is more complex than for a durable goods monopolist. The oligopolist faces dynamic competition from other firms, and needs to incorporate how competitors react to prices currently and in the future. By lowering price and selling an additional unit today, the tradeoff to the durable-goods monopolist is between increasing his profit today versus decreasing profit in the future by facing a lower residual demand. Compared to the monopolist, the durable goods oligopolist faces an added incentive that his current price decrease lowers both the current and future demands of his competitors. This externality exacerbates the incentive to lower price more than a monopolist because the decrease in future profits from saturation is “shared with one’s rivals” (Carlton and Gertner, 1989). When the benefit to the oligopolist from inducing his competitors to cut prices more in the future is higher than the cost of him of saturating his own market, he may cut prices aggressively. Compared to the monopolist, the oligopolists’ pricing also depends on his competitors current and future prices. For instance, if competitors are expected to lower prices in the future, it is not valuable to set high prices today and “save” some of current demand for the future, because much of future demand will be lost to competing low-price substitutes. Again, this causes a durable good oligopolist to cut prices faster than a monopolist. Further, consumer behavior is also more complex because now consumers have expectations about the focal firm and its competitor’s products, all of which affect their incentives to buy or delay. So, aggregate demand and pricing has to reflect the intertemporal tradeoffs implied by these expectations. The model below shows one way to frame the dynamic pricing problem for a durable goods oligopoly. To incorporate other variables in addition to prices in the model, we allow the oligopolist to pick both prices and quality over the PLC. Investing in improving quality allows the oligopolist to innovate. Because upgrades by consumers provides the oligopolist an incentive to innovate, the demand-side is expanded to allow replacement.

Consumer’s inter-temporal choice problem14 There are J firms in the market indexed by j = 1, . . . , J each offering one product. For simplicity, assume there is no entry and exit. Denote the “quality” of the product currently owned by the consumer in period t as t . One way to think of t is as denoting the period-t service flow from usage of the previously purchased product. If no product is currently owned, “quality” is normalized to q¯0 . If the consumer does not buy in period t, he continues to enjoy the flow utility from consuming the currently owned product. This flow utility will be a function of t ; which we could model, for

14 This section is based on Goettler and Gordon (2011). I change their set-up to allow for discrete con-

sumer types and product depreciation which allows me to discuss the model using the same notation as before. Another advantage is that interested readers can see how adding these features changes the various moving pieces of the model.

4 The third wave: Life cycle pricing

instance, as, ui0t = αr t + i0t

(18)

where αr is the utility derived per unit of “quality” by consumer type-r, and i0t is an additive, consumer-specific private shock to utility as before. The “quality” of the previously currently owned deteriorates over time with per-period usage cr ; for instance, modeled as, t+1 = t − g (cr )

(19)

r) where g (cr ) is a function of usage. Ensuring ∂g(c ∂c ≥ 0 implies that “quality” depletes with higher usage.15 For example, in the durable goods case, g (.) could capture how the service flow depletes over time with usage due to satiation or product-quality depreciation. Depreciation of the quality of the currently owned option sets up an incentive for the consumer to purchase again or to upgrade. The incentive to buy is a function of the prices and quality of the other goods in the market. To handle the quality of other goods, assume that the quality of the product each firm j offers in period t is qj t .16 Expand Eq. (7) to let the utility to consumer i of type-r from purchase of good j to reflect qj t , for instance as,

uij t = αr qj t − βr pj t + ij t

(20)

Here, ij t is a consumer-specific shock that shift the consumers’ utility from purchasing product j in each period, assumed to be iid over time and over consumers; αr is type-r’s taste for quality; βr is the price-sensitivity. Unlike the monopoly setup, the value of purchase does not simplify like in Eq. (8). When buying product j , consumer i has to incorporate the value of possibly choosing to replace the product with a more attractive one tomorrow, or consuming it at its depreciated quality tomorrow. To simplify the dynamics, assume that if a consumer purchases product j with quality qj t in period t, the quality of the currently owned product in the next period t+1 is set to qj t . If there is no purchase in t, t+1 evolves as Eq. (19). This assumes that the newly purchased product replaces the old product fully in the consumer’s consumption and there is free disposal of the old product. (Also, assume there is no resale. Extensions to resale are discussed later in the chapter.) The deterministic part of the alternative specific value function for buying j for

15 One can make this more micro-founded and tailored to the application by modeling the depreciation

function g (.) more precisely based on the product and consumption context. For example, one could take a stand on how satiation occurs in utility by including some curvature in the shape of the underlying utility function for the consumer (e.g., Kim et al., 2002; Hartmann, 2006). 16 Goettler and Gordon (2011) are thinking of an application where firms are chip manufacturers (like Intel and AMD), or computer manufacturers like Dell and Apple. To abstract away from multi-product firms, they assume that consumer consider only the frontier product of each firm, i.e., the product with highest quality. Each firm innovates every period to improve the quality of the flagship product offered in the marketplace implying j ’s quality is time varying.

387

388

CHAPTER 7 Diffusion and pricing over the product life cycle

consumer type r is,

Wrj (pt , qt , t ) = αr qj t − βr pj t + δc

max

l=0,...,J

Wrl pl,t+1 , ql,t+1 , t+1 + ilt

dF (i0t , it ) dF (pt+1 , qt+1 |pt , qt ) (21) Note the summation over l = 0 on the right hand side, which encapsulates the value of waiting in t + 1. Comparing this to the value of upgrading by buying one of the products j = 1, . . . , J in t + 1 enables choice of the best action tomorrow. The continuation value term on the right hand side of the Bellman equation encapsulates the expected value of the best action tomorrow, evaluated at the states expected to be reached tomorrow given the current state and choice of j today, and discounted by one period. The state reached tomorrow, t+1 depends on which j is purchased in t. Like before, each consumer is assumed to be small relative to the entire market so the consumer’s purchase decision does not affect the evolution of the vector of prices, qualities, and demand shocks. Hence, F (pt+1 , qt+1 |pt , qt ) is not conditioned on i’s decision to buy j in t. Analogously, the deterministic part of the alternative specific value function for waiting in t for consumer type r is,

max Wrl pl,t+1 , ql,t+1 , t+1 + ilt Wr0 (pt , qt , t ) = αr t + δc l=0,...,J

dF (i0t , it ) dF (pt+1 , qt+1 |pt , qt )

(22)

where we recognize that the quality of the currently owned product tomorrow t+1 will depreciate as per Eq. (19) given the period-t decision to not buy. Characterizing the payoff-relevant aggregate state for the firm’s price and investment decision is more complex. We need to keep track of the distribution of currently owned product’s quality for each consumer type r in the population. Assume as before, that there is a potential market of size M composed of R types of consumers,and at date t = 0, there are Mr0 consumers of type r in the market, so that M = R that of the r=1 Mr0 . Assume that the quality of products (including currently owned product) can take K + 1 discrete levels: qj t , t ∈ q¯0 , q¯1 , . . . , q¯K , where k = 0 refers to a situation with no currently owned product (or inventory).17 Denote the number of type-r consumers at currently owned quality level k in date t as Mrkt .18 Collect the R (K + 1) variables corresponding to Mrkt in a vector St = {Mrkt }r=1,...,R;k=0,...,K . St forms the payoff-relevant state vector for the firm’s dynamic price and investment problem.

17 This requires specifying g (c ) in Eq. (19) so that quality always lives on this discrete support. r 18 To be clear, K M k=0 rkt = Mrt . An assumption about Mrk0 is also required to complete the model; for instance that all consumers start off with no product t = 0 so that Mr00 = Mr0 and Mrk0 = 0 for

k = 1, . . . , K.

4 The third wave: Life cycle pricing

As before, to solve for the equilibrium, it is convenient to re-write the continuation values for the consumer’s problem over the states St ,

max Wrl (St+1 , t+1 ) + ilt Wrj pj t , qj t , t , St = αr qj t − βr pj t + δc l=0,...,J

dF (i0t , it ) dF (St+1 |St ) Wr0 ( t , St ) = αr t + δc

max

(23)

l=0,...,J

Wrl (St+1 , t+1 ) + ilt dF (i0t , it ) dF (St+1 |St ) (24)

where we substituted pj,t+1 = pj (St+1 ) and qj,t+1 = qj (St+1 ) into the continuation values in Eqs. (21) and (22). As before, it is worth emphasizing that the representation over state variables St here is a device to represent the consumer’s beliefs about pricing and quality. Under this assumption, in equilibrium, consumers correctly anticipate that firms, when facing the future state St+1 , will set prices and qualities as pj,t+1 = pj (St+1 ) and qj,t+1 = qj (St+1 ). This enables the model to allow consumer expectations of future prices and qualities to be formed endogenously with the policies chosen by firms. Further, in equilibrium, the expectations of each consumer will be consistent with the policies chosen by firms, are rational. implying expectations Consumer i will buy j in period-t if Wrj pj t , qj t , t , St + ij t ≥ Wr0 ( t , St ) + i0t , implying probabilities each period of σrj (pt , qt , t , St ) and σr0 (pt , qt , t , St ) of purchasing or not conditional on currently owned quality t for each type-r. Given these probabilities, we can obtain the aggregate demand for j from each consumer type r in the population by integrating over the distribution of currently owned product’s quality for that type, Qrj (pt , qt , St ) =

K

Mrkt × σrj (pt , qt , q¯k , St ) ,

j = 1, . . . , J

(25)

k=0

where we use the fact that there are Mrkt consumers with currently owned product quality t = q¯k in the market at time t. Finally, summing over consumer-types, we can obtain the following expression for the aggregate demand for the product at state St , Qj (pt , qt , St ) =

R

Qrj (pt , qt , St ) ,

j = 1, . . . , J

(26)

r=1

Evolution of states As in the monopoly case, the demand system generates interdependence in demand over time. Dependence in demand arises from dependence in the mass of each consumer type remaining in the market each period. The mass of consumers of each

389

390

CHAPTER 7 Diffusion and pricing over the product life cycle

type in the potential market, Mrt , is endogenous to the firm’s historic pricing behavior. Therefore, the dynamics of pricing introduce a dynamic in the evolution of Mrt . The complication relative to the previous case is the presence of many firms and the possibility of replacement. To derive the evolution of Mrkt , note that the number of consumers of type-r with currently owned product quality q¯k in period t + 1 is the sum total of two sets of consumers. One set corresponds to the period-t non-purchasers of type-r, whose currently owned product quality depreciated to q¯k after one period of consumption. The other set corresponds to the period-t purchasers of type-r whose newly purchased product is of quality q¯k .

Non-purchasers To derive the contribution from non-purchasers, let index z run over currently owned quality levels, i.e., z = 1, . . . , K. Note each period, Mrzt × σr0 (pt , qt , q¯z , St ) consumers of type-r with currently owned quality t = q¯z will decide not to purchase, where σr0 (pt , qt , q¯z , St ) is the probability of no-purchase evaluated at t = q¯z . No purchase in t implies that the currently owned quality depreciates for these consumers in period (t + 1) according to Eq. (19). Depending on how we specify g (cr ) and/or the support of cr in Eq. (19), a type-r consumer with q¯z will be depreciated to currently owned quality level t+1 = q¯k ≤ q¯z in period t + 1. Let Ir (z, k) be an indicator of whether consumer type-r’s currently owned quality depreciates to level q¯k after one period of consumption. With this notation, Mrzt × σr0 (pt , qt , q¯z , St ) × Ir (z, k) type-r period-t non-purchasers with q¯z will have currently owned quality level q¯k in period t + 1. Summing this term over z implies that the total number of type-r period-t no-purchasers who will have currently owned quality level q¯k in period t + 1 is K M z=0 rzt ×σr0 (pt , qt , q¯z , St ) ×Ir (z, k).

Purchasers To derive the contribution from purchasers, let Ij (k) denote whether product j is of quality level k. First, we note that in period t, Qrj (pt , qt , St ) consumers of type r will decide to purchase j . Therefore, summing across j , the number of type-r period-t purchasers of a product with quality level q¯k is Jj=1 Qrj (pt , qt , St ) × Ij (k). Substituting for Qrj (pt , qt , St ) from Eq. (25), this is Jj=1 [ K z=0 Mrzt × σrj (pt , qt , q¯z , St )] × Ij (k).

Putting both together Putting these two together, the evolution of the state Mrkt is, Mrk,t+1 =

K

Mrzt × σr0 (pt , qt , q¯z , St ) × Ir (z, k)

z=0

⎤ ⎡ J K ⎣ + Mrzt × σrj (pt , qt , q¯z , St )⎦ × Ij (k) j =1

z=0

(27)

4 The third wave: Life cycle pricing

Eq. (27) summarizes the effect of the history of prices, qualities, and demand shocks on the demand-side of the market. From Eqs. (26) and (27), one sees again that demand, as well as the evolution of the state variables, Mrkt -s, are endogenously determined in equilibrium. The state variables, Mt ≡ {Mrkt }r=1,...,R;k=0,...,K , depend on the corresponding conditional choice probabilities, {σr0 (pt , qt , q¯k , St ) , . . . , σrJ (pt , qt , q¯k , St )r=1,...,R;k=0,...,K }, which in turn depend on the evolution of the state variables through (27).

Flow of profits and value function The flow of profit for each firm is, πj (pt , qt , St ) = pj t − h qj t Qj (pt , qt , St )

(28) where h qj t is the marginal cost to firm j of producing quality qj t . Assume firms are risk-neutral and set prices and qualities by maximizing the expected present discounted value of future profits, where future profits are discounted using the constant discount factor δf . We solve for an MPE, looking for a set of policies that are subgame perfect at each state, St . In the MPE, each firm at each state sets its price and quality dynamically optimally, based on consistent perceptions of the strategies played by the other firms at that state, and finds it unprofitable to deviate from the chosen price and quality. The equilibrium is defined by a set of price and quality policy functions mapping the optimal price and quality to the state St . ∗ Denote the J equilibrium-optimal price and quality policy ∗functions∗ as p (St ) ∗ ∗ and q (St ), and collect them in a strategy profile, ζ (St ) ≡ p (St ) , q (St ) . The solution to the price and quality setting problem for firm j is represented by a value function, Vj (St |ζ ∗ ) which denotes the present discounted value of current and future profits to firm j when j sets current and future prices and qualities optimally, assuming all others firms (denoted by the index “−j ”) play the strategy ζ ∗−j (St ). The value function satisfies the Bellman equation, ∗ ∗ max Vj St |ζ ∗ = π p, q, p−j (St ) , q−j (St ) , St p∈R+ ,q∈ q¯0 ,q¯1 ,...,q¯K

j = 1, . . . , J

+ δf Vj St+1 (p, q) |ζ ∗ , (29)

where St+1 in the continuation value on the right hand side of the Bellman equation (29) is evaluated at Mrk,t+1 as outlined in Eq. (27), with prices and qualities in period t set as, {pj t = p, qj t = q, p−j,t = p∗−j (St ), q−j,t = q∗−j (St )}. The optimal policies, p∗ (St ) , q∗ (St ), are stationary, and maximize the right hand side of the value function: i.e., ∗ ∗ ∗ ∗ |ζ V p, q, p , argmax , q , S p (St ) , q∗ (St ) = (S ) (S ) j t t t −j −j p∈R+ ,q∈ q¯0 ,q¯1 ,...,q¯K

j = 1, . . . , J

(30)

391

392

CHAPTER 7 Diffusion and pricing over the product life cycle

Equilibrium A stationary MPE in prices in this model is defined by a set of (“demand-side”) conditional choice probabilities, conditional choice probabilities, {σ r (pt , qt , q¯k , St )r=1,...,R;k=0,...,K } and (“supply-side”) policy functions, ζ ∗ (St ), such that Eqs. (24)-(29) are simultaneously satisfied at every St . The equilibrium is a fixed point of the game defined by (24)-(29). The equilibrium has three properties worth noting. First, Eqs. (29) and (30) imply that in equilibrium, when faced with state St , the firm’s pricing policy is a best response to consumer behavior at that state. At the same time, Eqs. (23) and (24) imply that when faced with a state St , prices pt = p∗ (St ), and qualities qt = q∗ (St ), consumers make purchase decisions by maximizing intertemporal utility. Both consumers and firms take into account the future states of the market through Eq. (27). Finally, Eq. (26) implies that the realized aggregate demand in state St is consistent with optimal consumer purchase decisions at the corresponding optimal prices pt = p∗ (St ) and qualities qt = q∗ (St ). As before, the fixed point that defines equilibrium requires each consumer type to maximize their expected present discounted utility subject to consistent perceptions of the likelihood of future states for the firm, and firms to maximize expected present discounted payoffs based on consistent perceptions on the likelihood of future consumer states. In this set-up, the optimal price and quality provision policies are essentially the solution to a (R + J )-agent dynamic game among J firms and R coalitions of consumer types.

Solution The equilibrium has to be solved numerically. The numerical solution belongs to the class of computational algorithms for computing equilibria for dynamic oligopolies developed in Ericson and Pakes (1995). Compared to Ericson and Pakes’ (1995) model which features entry and exit, but static demand with no demand-side dependencies, this setup features no entry and exit, but dynamic demand with persistence and forward-looking consumers. Convergence of the numerical procedure to a solution indicates that an equilibrium exists at those parameter values. There is typically few guarantees that the equilibrium is unique. For a range of plausible parameter values, equilibrium prices are found to exhibit skimming.

4.4 Discussion One way to view these models is that they endogenize the product life-cycle as an outcome of the intersection of dynamic demand and supply. Unlike the previous waves of papers, the shape of the induced diffusion curve is not exogenously specified. Instead, diffusion and price paths are jointly determined and are an outcome of the interaction of preferences, expectations, costs, and competition in the market. The dependence of outcomes on the beliefs of market participants plays a key role, as does strategic interaction between the agents. This provides a richer and more micro-founded perspective on the PLC and the ways in which firms can implement profitable pricing over it.

4 The third wave: Life cycle pricing

The empirical focus of the models opens up a range of issues when taking the models to data. I briefly discuss these next.

Inferring demand and cost parameters The empirical computation of the pricing policy is contingent on obtaining realistic demand parameters. Demand parameters can be estimated from observed price and sales data, or from individual-level product adoption data when available. Demand estimation does not require the assumption that observed prices in the data are set optimally by firms according to the dynamic models presented above. The estimation of demand for durable goods with forward-looking consumers is now a large literature. Approaches to demand estimation include, for instance, Melnikov (2012); Song and Chintagunta (2003); Nair (2007); Gordon (2009); Carranza (2010); Goettler and Gordon (2011); Conlon (2012); Gowrisankaran and Rysman (2012); Derdenger and Kumar (2013); Lee (2013); Shiller (2013); Ishihara and Ching (2018); Derdenger and Kumar (2018) and related authors who present approaches to estimate durable goods demand with forward-looking consumers with aggregate data; Erdem et al. (2005); Sriram et al. (2010); Schiraldi (2011); Liu et al. (2014); Bollinger (2015); Li (2018); Lin (2017); Huang (2018); Chou et al. (2019) who propose strategies with individual-level data; and Dubé et al. (2014); Rao (2015) who present conjoint survey based approaches. Ways for simplifying the dynamics for demand estimation are discussed in Arcidiacono and Ellickson (2011). These estimators may be used to estimate demand parameters in a first stage. An open issue in the literature is accounting for the econometric endogeneity of prices in estimation of such dynamic demand. Instrumental variable strategies are complicated by the intertemporal linkage in prices and demand. To perform well, the instrument would have to explain the time series variation in prices, and to be valid, the instrument would have to be correlated with current but not lagged prices. If they are correlated with lagged prices, and shocks to demand are persistent, they would be related to current shocks, and be invalid by construction. Given the persistence in prices, such instruments are difficult to find (see Rossi, 2014 for a related discussion in the context of storable goods). One could consider estimating demand and supply jointly by adding restrictions from the optimal equilibrium pricing strategy into the demand estimation procedure. This approach, while more efficient, is computationally burdensome since the equilibrium has to be repeatedly solved for every guess of the demand parameters. It is also not obvious how to deal with the multiplicity of equilibria. It also entails imposing the strong assumption of optimal pricing by firms in demand estimation, which has the potential to bias the estimated parameters if observed pricing is indeed not optimal. Further, in this approach, perfect firm rationality is imposed when estimating parameters; this leaves little room for testing whether observed outcomes are optimal or for making normative policy recommendations, which are the part of the goals of marketing practice. A more agnostic strategy that does not impose optimality but still retains the spirit of Markov perfect pricing is to specify observed prices as a flexible function of the theoretical state variables (viz. the sizes of the segments and the shock to demand), and to model demand jointly with

393

394

CHAPTER 7 Diffusion and pricing over the product life cycle

this pseudo-policy function. Ching (2010a) has proposed this strategy in the context of a static demand system for experience goods, and this approach could be extended to durable goods. A strategy for joint estimation that frames the problem as an MPEC is presented in Conlon (2012). Marginal costs can be obtained from auxiliary cost data when available (examples of this include Nair, 2007; Goettler and Gordon, 2011; Conlon, 2012); without this, costs have to be estimated as well. A typical approach to estimate costs in the empirical literature has been to infer these from the first-order conditions of profit maximization. This approach has been implemented mostly in static settings. In the case of durable goods, prices are high not simply because costs are high, but because firms want to preserve future demand. So imposing a static first order condition will tend to overestimate marginal costs. Since the incentive to preserve future demand is highest for monopolists, this overestimation of costs will be greatest for concentrated markets (Goettler and Gordon, 2011). How to recover marginal costs by inverting markups when prices are set dynamically and equilibria are possibly not unique, is an open econometric question.19

Handling expectations While allowing consumers to be forward-looking and incorporating the impact of that behavior on pricing was a core part of this wave of literature, perhaps the most fragile part of the entire exercise is finding a way to credibly specify the expectations of consumers. Expectations are probabilistic beliefs, and beliefs are notoriously hard to capture in empirical models. Because of this, strong assumptions like rationality or perfect foresight are imposed in many of the models reviewed above. One way of thinking of these is that they approximate heuristics that consumers have developed with experience. Nevertheless, it is possible that expectations are not formed this way, at least for some consumers. If some consumers are systematically biased, unaware or unable to incorporate price paths into decision making (due to search costs, cognitive costs, forgetting), it is conceivable that firms could exploit that, and heterogeneity in the expectations across consumers may form the basis of intertemporal price discrimination. There is little work that has systematically addressed these kinds of issues, and more research is warranted on measuring and understanding the implications of this heterogeneity.20 A reasonable way forward for empirical work would be to collect direct data on the expectations and beliefs of consumers and to incorporate that into the model and estimation (see Erdem et al., 2005 for an example of how such data have been collected and used in a durable goods adoption problem). This converts the unobservable variable into an observable, enabling directly understanding the expectations forma19 The Euler equations-based approach of Berry and Pakes (2000) which circumvents the need to solve the full model for estimation, may be one possibility under the assumption of rational expectations. 20 Surveys of price knowledge conducted on buyers of grocery goods have shown conflicting results. Dickson and Sawyer (1990) suggest price knowledge and recall of price paid of consumers is poor, while Krishna et al. (1991) suggest price (and promotion) awareness is high.

4 The third wave: Life cycle pricing

tion process, which helps future work. While appealing, the main difficulties with the strategy are four-fold. First, such data are often unavailable or cost-prohibitive to collect. Second, even if we could collect them, following Manski (2004), eliciting expectations without bias is non-trivial. In particular, since we would need to model expectations as a probability distribution over future prices, we would be confronted with the complex problem of eliciting correctly counterfactual probability distributions from human beings in a survey setting. There is no guarantee that what we elicit coincides with the true beliefs. Third, expectations need to be collected not just at one point in time, but across all t for which the data for an individual are tracked. It is hard to conceive of a measurement technology that can reliably collect high frequency data on the expectations of consumers in this way without annoying them, or without inducing non-random attrition. The advent of smart-phones and other tracking technology that have the ability to interact with agents and record and obtain data in real-time non-invasively, holds out the possibility that this kind of belief information could eventually be collected by researchers in the future. But in the short term, it is possible we are forced to treat expectations as unobservables that are identified on the basis of researcher’s assumptions. Fourth, an open issue is how the belief data should be treated even if we were able to collect it. Are beliefs “X”s or “Y ”s? Beliefs about prices are likely an outcome of a search process and the precision of a user’s beliefs about prices is informative of his search costs and price sensitivity. This makes the collected information more like a “Y ” and not an “X”. Simply plugging in the belief information as data to “integrate out” expectations in a dynamic model of consumer demand may lead to biases because the contribution of belief data to learning these parameters is ignored. These issues are unresolved in the current literature and would be a fruitful area for research.

Discount factors A related issue that requires more research is to better understand the extent of heterogeneity in consumer patience in the population, and to understand the implications of that heterogeneity for the PLC. Stokey’s (1979) results show that even with commitment, intertemporal price discrimination via a schedule of declining prices is profitable when consumers are heterogeneous in both willingness-to-pay and their discount factors, and when those with higher reservation prices are also more impatient.21 With general patterns of dependence between time and product preferences, it is unclear what would be the nature of the optimal price path over the PLC and whether price discrimination is viable. Also important is how consumer patience relates to the patience of the firm. Price discrimination is profitable when firms are

21 In contrast, if consumers vary only in their willingness-to-pay and have the same discount factor, the

optimal strategy with commitment is to forgo the opportunity to price discriminate and set the monopoly price in first period and hold it fixed thereafter.

395

396

CHAPTER 7 Diffusion and pricing over the product life cycle

more patient than consumers. But, few empirical strategies are available currently to assess the discount factors of firms or of pricing decision makers within firms. On the consumer side, there are a large number of negative identification results regarding the ability of the researcher attrition to estimate discount factors based on observational choice data (e.g., Rust, 1994; Magnac and Thesmar, 2002). Broadly speaking, the discount factor, preferences, and beliefs are not jointly nonparametrically identified from choice data alone. There has been recent positive progress on this dimension. For example, one idea due to Yao et al. (2012) is to compare consumers in situations with and without dynamics. Broadly speaking, consumer actions in a world without dynamics help pin down preferences; and then, actions in a world with dynamics for the same consumers help pin down beliefs and discounting conditional on estimated preferences. The example in Yao et al. (2012) is to cell phone consumption, where consumers are observed under a flat-rate plan with no dynamic incentives for consumption; and later, in a non-linear tariff where consumption responds to dynamic incentives due to changes in marginal prices with consumption. This strategy may be used in a variety of situations under two caveats. First we should be able to observe the same set of individuals under a dynamic and non-dynamic scenario.22 Second, the strategy presumes that consumers time preferences are category independent so that behavior in one category can be used to identify preferences in another category. There is some emerging evidence that time preferences are category-dependent, so this has to be evaluated on a case-by-case basis (Frederick et al., 2002). A second strategy leverages access to an exclusion restriction, a variable that affects the continuation value but can be excluded from affecting per-period preferences (Abbring and Daljord, 2017; Daljord et al., 2018). Responses in current actions to changes in this variable suggest that consumers respond to future states, suggesting forward-looking behavior. One way of thinking about this exclusion restriction is in the context of a finite horizon model. If we believe that per-period payoffs are time homogeneous, the duration to the terminal period serves implicitly as an exclusion restriction because it affects the continuation value, but not per-period payoffs (Bajari et al., 2016). This of course requires time homogeneity of per-period payoffs and the existence of a terminal condition; in some situations this may be natural (e.g., retirement is often treated as a terminal state in some kinds of labor/savings problems), and in some situations, it is testable.23 A third strategy involves exposing consumers in the same current state to sequences of different future states, and examining how they change their behavior. Changes in behavior in response to changes in future states holding current states 22 In the durable goods or storable good situations, it is not clear what such a scenario would be because

consumers are always in a “dynamic” world. 23 Examples of specific applications of the exclusion strategy include Chung et al. (2013); Lee (2013);

Bollinger (2015); Ching and Osborne (2017). Identification in a parametric linear-in-parameters model is considered in Komarova et al. (2018). Identification of time preferences in an auction with a buy-it-now feature is considered in Ackerberg et al. (2017).

4 The third wave: Life cycle pricing

fixed are informative about forward-looking behavior. Because this kind of variation is hard to isolate in a real-world setting, the progress on this dimension has occurred in the context of in-laboratory experiments (see Dubé et al., 2014; Rao, 2015). Recent evidence has suggested that the discounted utility model, which implies that all motives underlying intertemporal choice can be condensed into a single parameter (i.e., the discount factor), along with exponential discounting, may not be a good description of actual human behavior. Individuals may discount future rewards “hyperbolically” (i.e., have a declining rate of time preference); may discount gains at a higher rate than losses; discount small outcomes more than large ones; and have a preference for improving sequences to declining sequences (see Frederick et al., 2002 for a review). Discount factors may be context-dependent, vary across products, and be different for short-run versus long-run decisions. It is possible that firms, who may have access to more complete capital markets, are less susceptible to these compared to consumers, so this difference may be the source of sustaining profitable price discrimination. The measurement of generalized discounting functions of individuals along with their covariation with product and risk preferences, combined with an exploration of its implications for pricing policies over the PLC is still in its early stages and is an important direction for future research (Cohen et al., 2016).24 To summarize, the current literature can be seen as a forward-looking wave of papers that shows us how to accommodate expectations and discounting formally in models of life-cycle marketing activity in advance of our ability to measure that kind of forward-looking behavior precisely.

Large state spaces A separate area where more progress will be welcome is in developing methods to handle very large state spaces in the context of dynamic games. Almost all the demand-side phenomena considered in this chapter generate persistent, highdimensional states in the aggregate. Heterogeneity is key to the durable-goods pricing problem. However, in accommodating heterogeneity, the researcher is confronted with a large state space that describes the proportion of each type of consumer at each individual-state in the marketplace. Current computational methods do not scale well to handling such large-scale state spaces especially with full solution dynamic programming methods. Modern approximate dynamic programming methods and strategies for efficient representation of state spaces will be useful in order to make more headway into these problems. The problem of large state spaces is also linked to assumptions about the information sets of firms. Because we assume that firms are unable to set prices at the individual-level, we have to set up aggregate demand and set up the life-cycle prob24 Though not focused on identification of discount factors per se; Chevalier and Goolsbee (2009) docu-

ment that consumers of textbooks behave as if they are forward-looking with their demand becoming more sensitive to price when books cannot be resold. Li et al.’s (2014) work suggests heterogeneity in consumer patience in airline markets; and Aflaki et al. (2019) suggest that consumer time preferences may be formed endogenously with prices.

397

398

CHAPTER 7 Diffusion and pricing over the product life cycle

lem as responding to the aggregate market state. The aggregate market state in the presence of heterogeneity and persistent demand is a higher dimensional object. This situation could change with the advent of technology-driven platforms in e-commerce for instance, for which firms can track data at the individual-level and personalize marketing interventions like advertising and pricing to each individual. In this situation, the relevant state is an appropriate summary metric of each individual’s history, which might make the problem of representation simpler and more feasible. Finally, the issue of large state spaces is linked to the issue of specifying expectations when there are a large number of products and/or quality is multi-dimensional. When there are a large number of products, we have to specify consumer expectations over all prices. It is not clear whether consumers actually maintain expectations over such a large space of products, or what are reasonable empirical specifications to capture these realistically. (See Erdem et al. (2003) for one possibility, which models a vector serially-correlated process for the prices of the most popular product of each brand, assuming that within-brand price differentials between the other products and the popular product are IID over time.) When including quality in the model, consumer expectations have to be specified over both prices and over characteristics of the products, which can be high dimensional. Strategies for state space reduction then become important both for empirical tractability and for maintaining realism. It may be likely that consumers form a consideration set with a limited number of products and keep track of only prices and characteristics of products in this consideration set, or that they maintain a heuristic in their minds for how the state of the market evolves. More research is warranted on how these are formed in actual practice. To the extent that the firm can track the state of the market better than consumers, corresponding pricing policies by the firm can exploit the heuristics used by consumers. Strategies for state-space reduction that build on the idea of Melnikov (2012), who proposed a heuristic based on discrete choice under the extreme value distribution, are reviewed in Aguirregabiria and Nevo (2013) and Derdenger and Kumar (2019). Finally, similar considerations for state-space reduction apply when the number of firms in the market is large. When there are many agents participating in the dynamic game and many states per agent, the curse of dimensionality makes dynamic programming intractable. Leveraging more tractable solution concepts such as oblivious equilibrium (Weintraub et al., 2010), in which agents play strategies that are functions only of an agent’s own state and the long run average system state, may help the literature make progress in the analysis of life-cycle pricing in such situations. With this brief discussion of estimation and computation, I now turn to a range of additional issues considered by the literature that impinged on and augmented the basic model of pricing presented in Sections 4.2 and 4.3.

4.5 Additional considerations related to durability As consumer expectations and associated dynamics were formally incorporated into pricing, several aspects related to durability became the focus of the literature. The

4 The third wave: Life cycle pricing

issue of time consistency and the Coasean problem induced by time consistency got particular attention. The issue of time consistency arises because the firm is unable to convince rational consumers that it will not cut prices, given it faces a logical incentive to do so. Making binding commitments to not cut prices; offering best-price provisions that compensate consumers for future price reductions; renting rather than selling; curtailing production so the product may be unavailable in the future and the market cannot be flooded; and building in planned obsolescence over the PLC are some strategies to address these.25 When employed well, these strategies improve pricing over the life-cycle. The variety of options available to firms have caused some to question the empirical relevance of the time-consistency problem in pricing. Quoting Waldman (2007), “[. . . ] Although I feel that Coase’s insight that a durable-goods monopolist faces a time inconsistency problem is one of the fundamental insights concerning durable-goods markets, I also feel that his and others’ focus on time inconsistency concerning output choice was somewhat misplaced [. . . because] contractual commitments concerning future outputs and prices is frequently feasible (and it is also frequently possible to establish a reputation concerning price). Hence, my feeling is that the specific time-inconsistency problem identified by Coase is infrequently observed in real-world markets, although knowledge of the problem can help us better understand real-world contractual provisions used to avoid the problem. In contrast, I believe there are other choice variables such as R&D expenditures and new-product introductions for which contracting is much more difficult, and thus time inconsistency concerning these variables is much more likely to be observed in real-world markets.”

A proper resolution of the issue of time consistency requires assessing the role of expectations and discounting. Also required are empirical assessments of how these strategies work in the real-world, and how they interact with consumer and firm behavior in various market settings. Several papers in the third wave addressed these issues in depth. I discuss some of the aspects considered in this literature emphasizing recent empirical work that has accommodated forward-looking behavior by the firm and/or consumers.

4.5.1 Commitments via binding contracts A practical issue for firms in a real market is what would be a credible commitment device that can inform and convince consumers they will stick to a specific sequence of future prices. There is limited empirical work that has explored these in actual practice. One example is Daljord (2015) who explores the role played by 25 The literature separately established that the extreme outcome conjectured by Coase for a durable good

monopolist that all prices would fall to marginal cost is very sensitive to model assumptions, and would not obtain in discrete time (Stokey, 1981), with discrete consumers (Bagnoli et al., 1989), with upward-sloping marginal costs (Kahn et al., 1986), with depreciation and replacement (Bond and Samuelson, 1987), or with potential entry (Ausubel and Deneckere, 1987).

399

400

CHAPTER 7 Diffusion and pricing over the product life cycle

vertical channels. Daljord’s (2015) point is that while a manufacturer cannot credibly commit to himself that he would keep prices steady (this is not time-consistent); he could enter into binding commitments with retailers or other parties in the market that he would maintain retail prices steady. If this policy is well known to consumers and violations of the policy are enforced in court through costly, binding contracts, vertical control could serve as a price commitment device for the industry, inducing consumers in the market to buy rather than wait. It turns out that this is the environment in the books industry in many countries.26 Therefore Resale Price Maintenance (RPM) policies could serve as a commitment device.27 Daljord’s application is to the book market in Norway. All book manufacturers and book retailers in Norway are part of a trade association, which stipulates that the manufacturer would set the introductory retail price for the book, which retailers would implement for up-to a year following introduction. The trade association’s contract stipulates that any of the other manufacturers or retailers can seek penalties in court for violations of this RPM agreement. Similar fixed book price agreements are in place in many European countries. A useful feature of the Norwegian market is that following an alignment of Norwegian and European Union competition law, the policy was changed in 2005 in response to public concerns that the vertical control was anti-competitive. This change was widely publicized and involved shortening the fixed price period. This regime change forms a natural experiment that helps identify the effect on demand of consumer expectations. A complication arises because RPM in this market helps to both obtain commitment (by mitigating intertemporal substitution) and facilitate dynamic channel co-ordination (by mitigating downstream retail substitution). Channel coordination enters the picture because book retail in Norway is a four-chain oligopoly, and oligopolistic competition in the downstream distorts the pricing incentives of the retailer relative to the manufacturer. Retail competition in the downstream implies retailers will have too low an incentive to set prices as high as the manufacturer prefers. Dynamically optimal RPM will coordinate the channel when the manufacturer can recommend an optimal sequence of downstream prices that restores the incentives for retailers to price high. To separate the commitment versus channel coordination role of RPM, Daljord simulates a series of vertical contracts with various levels of price coordination. He first simulates a baseline dynamic contract where pricing is neither coordinated over time nor between retailers. As a contrast to this baseline, he then simulates the optimal dynamic pricing of a vertically integrated unit (i.e., all channel distortions are internalized), without commitment and with commitment. The profits of the first con26 More broadly, durability complicates how the vertical channel between manufacturers and retailers can

be co-ordinated. See for example, Desai et al. (2004); Arya and Mittendorf (2006), and the references cited therein. 27 Many countries are moving towards sanctioning RPM arrangements (following the 2007 Leegin case, RPM is no longer per se illegal in the US, and is judged under a “rule of reason”; see, for instance, Klein, 2014).

4 The third wave: Life cycle pricing

tract relative to the baseline measures the returns to coordinating pricing across the retailers. The profits of the second contract relative to the first measures the returns to RPM as a commitment device beyond coordinating retailer competition. Using these simulations, Daljord documents that RPM plays both postulated roles in this market. Rather than directly committing to not cut prices, the monopolist could also commit indirectly and indemnify buyers if prices fall after they buy (e.g., best price provisions; see, for instance, Butz, 1990). Empirical work on the efficacy of such contracts in sustaining intertemporal price discrimination is limited. Broadly speaking, more empirical research on ways by which commitment can be obtained directly or indirectly and how they work in real market settings would be welcome.

4.5.2 Availability and deadlines One reason for consumers to not delay purchases to low-price future periods is if they perceive a risk the product would not be available if they wait. This may cause some high value consumers to buy early, and those with lower valuations to wait, thus facilitating price discrimination. This brings up the issue of product availability. The issue of availability is linked to the issue of capacity and how it is managed by the firm along with prices over the life-cycle. The empirical literatures that have most closely addressed the issue have focused on perishable goods, which have to be consumed before a deadline. Since the good loses all value after the deadline, managing capacity becomes very salient. Examples include airline seats, event tickets, advertising inventory, and seasonal goods that go out of fashion at the end of the season. The life-cycle of such products is the time from its introduction till the end of the deadline. One consideration is that the set of consumers who arrive into the market early may be different than those that arrive late, i.e. time of first arrival is correlated with willingness-to-pay. In the case of event tickets for example, those that have high valuation for the event and/or are more risk averse may arrive early, so earlier buyers may have higher valuation on average than later buyers. In the case of airline tickets, business travelers, whose travel plans are likely not fixed in advance, tend to arrive later, while leisure travelers whose travel plans are more likely to be made in advance arrive early. Therefore, later buyers may have higher valuation on average than earlier buyers. Optimal dynamic life-cycle pricing policies adjust accordingly to reflect the changing willingness-to-pay of the mix of arriving and remaining consumers (e.g., Koenigsberg et al., 2008). To appropriately model demand and pricing for perishable goods, one has to augment the models presented in Sections 4.2 and 4.3 to allow for new consumer arrival, and outline how the distribution of willingness-to-pay of arriving consumers varies with time. Because demand is stochastic, and the product has low salvage value if not sold over the horizon, Revenue Management is often employed to manage capacity. The dynamic tradeoff captured in Revenue Management is that, given the limited capacity, the firm must weigh allocating a unit to a user today versus to a user arriving tomorrow, who may have a possibly higher willingness-to-pay because the mix of arriving

401

402

CHAPTER 7 Diffusion and pricing over the product life cycle

users changes as it gets closer to the deadline.28 In most settings, a high price today weakly causes more buyers to be in the market tomorrow. Sweeting (2012) shows this implies the firms’ opportunity cost of selling a unit should decline as the deadline approaches, and that sellers in the market he analyzes (baseball tickets) behave as though it is. On the theory side, Dilme and Li (2018) solve for the optimal price path of a seller without commitment who wants to sell to forward-looking consumers by a deadline.29 The optimal price path features declining prices with periodic flash sales. The flash sales along the equilibrium price path serve to increase the scarcity of the goods, which affects the willingness to adopt. Prior to the deadline, forward-looking buyers anticipate that flash sales may be offered in the future, but incorporate that waiting for flash sales is risky because the goods may be purchased by others. To avoid such a risk, a buyer is willing to purchase a good early at prices higher than the flash-sale price, thereby facilitating price discrimination. With commitment, Board and Skrzypacz (2016) solve for the optimal selling strategy of a firm who wants to sell to forward-looking consumers by a deadline. The optimal mechanism is to set a series of prices via a cutoff strategy: in each period, sell units to buyers with valuations higher than a cutoff that depends on the inventory and time remaining until the deadline. In the continuous-time limit of their model, the optimal mechanism can be implemented via posted prices (plus, a last-minute auction in case the seller has units remaining to be sold when the deadline is reached). The empirical literature that develops demand with forward-looking consumers and assesses life-cycle price and inventory policies in a context of a real-world market is limited. One example is Moon et al. (2018), who solve for an optimal “markdown” policy, in which dynamic price and inventory policies are assessed empirically for fashion goods sold at an online website.30 Markdown pricing is commonly seen in apparel retail and involves high prices at the beginning of the season, followed by discrete price cuts in middle of the season, ending with low clearance prices at the 28 Broadly speaking, Revenue Management models specify an arrival process for consumers with a

specified distribution of willingness-to-pay so as to empirically analyze demand using flexible statistical specifications (Gallego and van Ryzin, 1994; Bitran and Mondschein, 1997). Demand is typically not derived from micro-foundations, consumers are often treated as myopic, and firms are often treated as monopolists (though, see Shen and Su, 2007 for a discussion of the more recent literature that relaxes these assumptions). Even the simple model turns out to be surprisingly difficult to analyze due to the fact that the resulting dynamic problem cannot be solved analytically. So emphasis in the literature has also been to find fast-computable heuristics that approximate the optimal policy reasonably well in practical settings. Revenue Management is now a successful and visible science with demonstrated impact in several industries, including airlines, hospitality, and online advertising (e.g., Talluri and van Ryzin, 2004). For an overview of dynamic pricing in Revenue Management, see Elmaghraby and Keskinocak (2003). For an application in marketing, see Sanders (2017). 29 See also Hörner and Samuelson (2011); Oery (2016). 30 Soysal and Krishnamurthi (2012) develop a model of dynamic demand for seasonal, fashion apparel which are priced according to markdowns. The model treats consumers as solving an intertemporal adoption problem with expectations about product prices and availability. The analysis suggests that expectations about availability are important in influencing consumer adoption behavior.

4 The third wave: Life cycle pricing

end of the season. Moon et al.’s (2018) results show that a pricing policy that randomizes the timing of markdowns over a limited sequence of markdown levels is more profitable than committing to deterministic markdown times. The reason is that randomized markdowns can induce search and increase the visitation to the website of consumers with lower monitoring costs. This emphasizes the interaction between price variation and search, and its implications for dynamic pricing over the PLC. Williams (2018) presents an assessment of intertemporal price discrimination in airlines. The “product” in this market is a ticket sold for a particular flight on a particular route, and the “life-cycle” is the time since the ticket is available for sale till the date of travel. Williams is interested in empirically assessing the interaction between intertemporal price discrimination incentives (within-ticket price variation arising from a desire to exploit the higher willingness-to-pay of travelers who consider buying the ticket at various times), and the effect of demand uncertainty (within-ticket price variation responding to uncertain demand with limited capacity). Williams’ point is that the ability to dynamically adjust to demand shocks by changing capacity complements the ability to inter-temporally price discriminate, because saving some seat capacity to close to the date of travel enables the airline to sell to high willingness-to-pay business travelers who are likely to buy later. Williams estimates a static demand system for tickets and computes the optimal price path for a monopolist facing that demand. When the ability to dynamically adjust capacity is shut down in simulations, seats are found to be sold out too early, so profits from price discrimination reduce. Since accounting for dynamic adjustment to stochastic demand is important to capture the opportunity cost of selling a ticket at each date, accommodating dynamic adjustment is also found to be important to assess the welfare consequences of intertemporal price discrimination. A take-away for empirical work is that in perishable goods monopoly, revenue management is important for the viability of profitable intertemporal price discrimination and for assessing its impacts. Chen (2018) presents a related analysis, extending William’s monopoly setting to include dynamic competition between airlines. Like Williams, Chen finds that dynamic adjustment complements price discrimination. Chen finds however that the value of dynamic capacity adjustment is mitigated in oligopoly. In his context, dynamic adjustment helps airlines manage their capacities more efficiently by expanding the market and sell to more consumers. But, allowing dynamic adjustment is also found to intensify price competition in early time-periods. This is because the extra seats the airlines can supply on account of revenue management are sold at low prices. So, overall, the positive effect of the increased sales is canceled out by the negative effect of intensified competition. Chen’s case-study accentuates the role of competition in assessments of intertemporal price discrimination in perishable goods with active revenue management. Lazarev (2013) considers how different intertemporal price discrimination interacts with other forms of price discrimination. His application is to ticket pricing in monopoly airline markets. The airline can inter-temporally price discriminate by setting price paths that vary as the deadline approaches. It can also offer products of different quality by imposing a cancellation fee on some tickets and offering

403

404

CHAPTER 7 Diffusion and pricing over the product life cycle

restricted fares which make them are unattractive to business travelers with more uncertain schedules. Or it could consider offering different prices to consumer groups of different profiles (e.g. separate prices over time for leisure and business travelers). Understanding how these interact in a dynamic setting is important to assessing the welfare effects of price discrimination and to the practical problem of improving the profitability of pricing over the ticket life-cycle. Intertemporal price discrimination is found to be profitable with forward-looking consumers (under commitment). Without the ability to allow restricted seats, the monopolist’s equilibrium price path is found to become flatter, and prices higher. Leisure travelers benefit due to the increase in the quality of tickets but lose from the increase in prices; and overall social welfare slightly increases. Interestingly, comparing intertemporal and third-degree price discrimination, Lazarev reports that intertemporal price discrimination alone can capture more than 90% of the profit that it would receive if third degree price discrimination was also possible. To the extent that third degree price discrimination has higher informational requirements for firms, this result is interesting and showcases the value of dynamic pricing. Waisman (2017) allows for forward-looking consumers and competition, and considers the question of how sellers choose how to sell their goods as part of a dynamic life-cycle problem for perishable goods. Waisman’s setting involves sales of NFL tickets by sellers on eBay. eBay serves as a marketplace where sellers can sell tickets for specific games to sports fans directly. An NFL ticket is a perishable good because the ticket loses its value after the game is played. Most sellers on eBay choose between auctions and posted prices to sell the tickets. As it gets closer to game day, the pressure on buyers to acquire a ticket and on sellers to sell that ticket increases. This induces within-variation in the seller’s choice of selling mechanism. Closer to the deadline, sellers are seen to prefer posted prices over auctions, showing changing preference for mechanisms over the ticket’s life-cycle (a pattern also reported by Sweeting, 2012 for baseball tickets). Waisman builds an empirical model of dynamic seller mechanism choice over a product life-cycle. In his model, conditional on choice of the mechanism, sellers also make a dynamically optimal choice of prices: a reserve price in case of an auction, or a posted price otherwise. This endogenizes dynamic pricing for each mechanism. Seller preferences over different mechanisms is rationalized by allowing for heterogeneity in risk aversion, platform listing fees or outside options. At the estimated parameters, optimal dynamic pricing policies are found to be declining as it gets closer to the deadline. A counterfactual is implemented in which auctions are eliminated. Both buyers and sellers are hurt when auctions are unavailable. Thus, simultaneously providing both mechanisms seems to be beneficial for the platform, an interesting result that highlights the role of selling formats in perishable goods monopoly pricing. Collectively, the recent perishable goods pricing literature has showcased the important role of managing product availability and capacity and its interaction with pricing over the PLC. While efficient capacity allocation is helpful for responding to shocks, the risk that the product would be unavailable also helps the seller to convince forward-looking consumers to buy early, mitigating Coasean outcomes.

4 The third wave: Life cycle pricing

4.5.3 Second-hand markets Profitable pricing over the PLC also requires management of second-hand markets. Because the good is durable, it has value when it is sold in a second-hand market. The volume of trade in a resale market depends on the extent of durability. If a good is perfectly durable, it can be resold. If a good depreciates immediately after first-use, there is no possibility of resale. Therefore, the extent of trade in a second-hand market is implicitly under the control of the firm, through its choice of the product’s durability. A full analysis of pricing with second-hand markets for durable goods thus requires endogenizing the durability of the product. This leads to models in which both durability and prices are chosen jointly over the life-cycle. The classical work on the topic suggested that the monopolist would not want to degrade product durability (e.g., Swan, 1972). In Swan’s model, consumers are homogeneous and can obtain the same service flow as a new good by buying multiple units of used goods. Choosing the optimal durability then requires providing a target level of service flow at lowest cost. With constant returns to scale, the costminimizing durability is independent of the target level, and the monopolist has no incentive to distort the level of durability from the socially efficient level. Later literature showed that when consumers are heterogeneous and new and used goods are not perfect substitutes, the monopolist may have an incentive to reduce durability (e.g., Bulow, 1986; Rust, 1986; Hendel and Lizzeri, 1999; Waldman, 2007). As Waldman (2003) notes, one way to think of the monopolist’s durability choice in this situation is to think of it as producing a product line over time, where new units are the “highquality” products and used units are “lower-quality” products. Reducing durability can be thought of as reducing the quality of used units. Reducing the quality of the used units helps sets the efficient level of prices for the consumers who buy the new units. This may be profitable. This parallels the findings in the product-line pricing literature that a monopolist reduces the quality sold to low-valuation consumers (e.g., Mussa and Rosen, 1978; Moorthy, 1984).31 How these play out in a given market depend on the nature of consumer expectations, the amount of depreciation, the magnitude of transactions costs and informational asymmetries, the extent of heterogeneity, among other factors. When forward-looking consumers anticipate they will be able to trade in a second-hand market, valuation and demand for new goods is increased. The more durable the product, the less it depreciates, and the stronger this effect. Depreciation can also occur in utility (from satiation or novelty wear-off). The firm can rein in the extent of depreciation by its choice of quality and durability, and thus improve the demand for new goods. The opposing force is competition: the more durable the good, the more it is a substitute for the new good, and the stronger the competition to new goods from the second-hand goods market. Therefore, the increase in new goods demand

31 However, the durability problem has a special structure compared to a typical product line problem, in

that the costs of producing the two goods are related: producing high quality used goods also raises the cost of producing the new goods (Hendel and Lizzeri, 1999).

405

406

CHAPTER 7 Diffusion and pricing over the product life cycle

due to higher willingness-to-pay has to be balanced against the substitution induced by competition. Transaction costs of participating in the second-hand goods market and informational asymmetries (or “lemons” problems) reduce the volume of trade in resale.32 When consumers are forward-looking, they anticipate this “selling friction” associated with later trade, making them value the option of selling the used good less; this reduces their valuation of the new good. Finally, when consumers are heterogeneous in their valuations, the existence of resale markets can help sort between consumers of various types. Now, the seller of a durable good is implicitly a multi-product firm (of new and used versions). The two types of goods help facilitate price discrimination, which can improve profitability if the right kind of sorting obtains. How these various forces play out in any given market setting is an empirical question. A small empirical literature considers the impact of secondary markets in durable goods for firms’ profits.33 Chevalier and Goolsbee (2009) present an empirical analysis of college textbooks. To the extent that forward-looking consumers incorporate the value of future resale opportunities into their willingness-to-pay for a new textbook, they find that publishers will not profit by reducing the volume of trade of second-hand textbooks (by introducing faster revision cycles for textbooks). Hodgson (2016) explores the role of trade-ins in the market for used jets. One role of trade-ins is to reduce transaction costs to consumers when they upgrade to new goods. This increases new goods sales relative to used goods. If firms resell the traded-in goods, this can increase the supply of used goods, augmenting used goods sales. The value of offering trade-ins to a firm is thus an empirical question. Simulations using a static model fit to the data show that firms’ revenues are 8% higher with trade-ins, relative to without, suggesting that firms in this market gain from offering trade-ins, due to their positive effect on inducing more new goods sales. Esteban and Shum (2007) construct a dynamic oligopoly model of a vertically differentiated durable goods market with trade in secondary markets. Their model captures forward-looking consumers and oligopolistic automobile firms competing in a quantity-setting game with a secondary market. The model is stylized and abstracts from transaction costs and adverse selection issues, and accommodates limited consumer heterogeneity. Calibrating their data on US automobiles prices (both new and used) and quantities over 1971-1990, they find that eliminating the secondary market in a year would lead automobile firms to increase production (by about 12%). 32 The firm could endogenize the extent of transaction costs for example, by limiting the transferability of

warranties when resold (Hendel and Lizzeri, 1999). Trade-ins can help manage “lemons” problems with used-goods (Rao et al., 2009). 33 Rust (1985); Adda and Cooper (2000); Stolyarov (2002); Schiraldi (2011); Gavazza et al. (2014) develop models of consumer durable good replacement and scrappage with forward-looking consumers and second-hand markets. Issues related to second-hand markets, including for non-durable markets, are reviewed in Sweeting (2019).

4 The third wave: Life cycle pricing

Chen et al. (2013) extend this work to allow for transaction costs and more consumer heterogeneity. Calibrating their model to US automobile data, they find that when sellers cannot commit to production in advance, opening the secondary market reduces primary market seller profits by as much as 35%. In contrast, when sellers can commit to production in advance, profits increase by 52% from opening secondary markets. The reason is that when firms cannot commit, the time consistent solution implies more output and thus a larger used goods stock, than the solution under full commitment. This generates sufficient substitution of demand away from new goods that profits fall. In contrast, under commitment, output is lower and the used good stock is smaller so that the allocative advantage of sorting consumer heterogeneity dominates the substitution. Consistent with this, in a simulation in which persistent heterogeneity is removed – which reduces the allocative value of the secondary market – opening the secondary market decreases profits by 11% even under commitment. Apart from providing a computational framework for empirical work on dynamic durable goods oligopoly with secondary markets, Chen et al.’s (2013) results also showcase the importance of the role of time consistency in the assessment of secondary goods markets. Firms selling digital goods with “durable-like” features can shut down secondary goods markets directly by using Digital Rights Management (DRM) – electronic codes that are inserted into the digital goods – that prevent them from being resold and used by another customer.34 Shiller (2013) analyzes the role of DRM in the market for US console video games in a model with dynamic pricing and competition with used goods. Shiller’s model allows for video games to depreciate in consumer utility due to satiation with use. Since used goods are supplied to the resale market by users who have tired of them, the equilibrium price in the used goods market will be lower than for new goods. The used goods are the same physical quality as new ones, but available at lower prices. Therefore, a forward-looking consumer faces an incentive to wait for a new video game to be available in the used goods market. DRM eliminates this incentive. Comparing a simulated equilibrium with no second-hand goods market to the observed data, Shiller finds that adding DRM raises producer profits substantially; sometimes more than 100%. This work emphasizes the complex ways in which product design, consumer behavior, and pricing are affected by the interaction with secondary goods markets. Ishihara and Ching (2018) analyze the role of used goods in the markets for new and used video games in Japan. Compared to Shiller (2013), who allows for usagebased depreciation, they allow the good to depreciate in consumer utility via novelty effects, i.e. where game quality declines to both owners and non-owners of games 34 For non-digital goods, resale markets are hard to shut down directly because the First Sale Doctrine

in the United States allows consumers to freely resell an originally purchased new good even if it is copyrighted, as long as they do not make copies of it (e.g., Katz, 2014). However, the legal logic changes when it comes to digital goods like music, movies, video-games, and other types of entertainment goods. For these kinds of products, the First Sale Doctrine only applies to the new good because transferring the same product to another user involves making a copy of the originally bought new good.

407

408

CHAPTER 7 Diffusion and pricing over the product life cycle

over time. Ishihara and Ching (2018) estimate a dynamic model of consumer demand and supply for new and used goods, and simulate a monopoly equilibrium with no used goods market at their estimated parameters. Like Shiller (2013), they find that preventing resale is beneficial. Allowing for prices to adjust dynamically in the no-resale counterfactual is shown to be important. Without allowing for price adjustment, the aggregate profit increase with no resale across all games is 2%, with about half of games experiencing a decrease in profits, and others experiencing an increase in profits. When allowing for prices to dynamically adjust in response to no-resale, aggregate profits increases by about 78% in response. Collectively, this empirical literature has shown how the existence of secondary goods markets interacts with dynamic pricing and profitability in markets for durable goods. The dynamics induced by forward-looking agents is seen to be a key aspect of the problem; so is the impact of commitment. An open issue in the literature is to figure out a way to endogenize the firm’s choice of durability in a tractable empirical model, and to better understand how this choice interacts with dynamic pricing in a competitive environment. This will enable assessing the empirical consequences of the rich theory on the topic. Also welcome will be more empirical work assessing asymmetric information in second-hand markets, and the role it plays in pricing over the life-cycle. Finally, using micro-data to enrich our understand heterogeneous consumer’s propensity to buy, trade, and scrap goods will be key to the formulation of better life-cycle pricing and the proactive management of used goods.

4.5.4 Renting and leasing The durable goods monopolist is better renting his product than selling (Bulow, 1982). One reason is to resolve the issue of time inconsistency. The time consistency problem for the monopolist arises because the firm’s demand curve for the durable good is becoming more elastic over time causing it to best-respond by lowering prices. Renting converts the market for the durable good into a market for the current service from the good which is, by construction, non-durable. This removes the dynamic inconsistency because the firm faces the same demand curve every period, thereby removing the price-cutting incentive. Renting the good can also benefit the monopolist in the presence of a second-hand market (Desai and Purohit, 1998, 1999; Hendel and Lizzeri, 2002; Waldman, 2007). Renting allows the firm to gain market power in the used market and ensure the price of used goods does not drop too far below the price of new goods. This reduces substitution with new goods and also helps manage allocation in the presence of heterogeneous consumers. By giving the firm a way to manage the quantity and quality of used goods entering the second-hand market, renting can also be a way of controlling the volume of trade in used goods (reducing substitution) and of controlling the extent of adverse selection. Renting also has an interaction with the choice of durability: when the good becomes more durable, the importance of second-hand markets increase, thereby increasing the importance of renting as a way to address time consistency, allocative efficiency, substitution, and adverse selection issues. Due to all these reasons, a durable-good monopolist may in general prefer to rent rather than sell his product. Some mitigat-

4 The third wave: Life cycle pricing

ing concerns with renting are transaction costs or moral hazard arising from the fact that renting consumers may face a lower incentive to maintain the quality of goods they rent rather than own. Empirical work on assessing dynamic pricing of durable goods with forwardlooking agents and rental markets is limited. One example is Rao (2015), who presents an empirical analysis of the interaction of dynamic pricing with renting and selling strategies. Her focus is on digital entertainment goods markets such as iTunes or Amazon movies, where goods can be rented or sold at prices that vary over time. When consumers are heterogeneous, simultaneously offering both rental and purchase markets can serve as a means to price discriminate across consumers. Given heterogeneity in usage-based satiation in entertainment goods, this helps segmentation because those who get high utility from repeat consumption will have an incentive to buy rather than to rent. This allocative benefit has to be traded off with the fact that some consumers, who may have bought early at high prices, may now have to wait for lower rental prices. In an empirical application to digital movies, Rao finds that it is profitable to offer both purchase and rental options. Even though high-valuation forward-looking consumers may “trade-down” from buying at high prices to renting at low prices, she finds the offering firm has the ability to capture their residual repeat-consumption value by either a re-rent or a purchase in the future; making the simultaneous strategy profitable. The issue of understanding renting versus selling under competition has received limited attention. Bucovetsky and Chilton (1986) show that the threat of entry may make selling more attractive; while Desai and Purohit (1999) show that in competition, a rent-only strategy may imply new good prices that are too high, creating an incentive for firms to sell and lease simultaneously. They present evidence that leases are more likely to be offered for automobiles in more competitive segments. The decision to sell or rent is also affected by the existence and relevance of complementary goods, because the choice to sell or rent affects the volume of trade of the firm’s product, and thereby, the potential market for such complementary goods (Bhaskaran and Gilbert, 2005). More empirical work on how these aspects play out in real market settings will be important to deepen our understanding of how the choice of selling mechanism interacts with life-cycle pricing.

4.5.5 Complementary goods and network effects There is a long history of research starting with Cournot (1838) on the joint pricing by a firm of complementary goods. Consider the problem for a monopolist. How it should price a system of complementary goods depends on how the goods are consumed. When the goods are consumed in fixed proportions, the configurations demanded by different consumers are similar, and the monopolist can price the system as a whole as though it were a single-product. When the goods are consumed in variable proportions, the monopolist can leverage differences in valuations for varying configurations by implementing schemes for intra-product price discrimination.

409

410

CHAPTER 7 Diffusion and pricing over the product life cycle

Tying and bundling are examples of such strategies.35 In a dynamic world, where goods are bought at different points in time, there is also the possibility of price variation over time. When prices for these products are optimized over the life-cycle, the intraproduct discrimination incentives must be balanced together with the intertemporal incentives across products, so that the policies are dynamically optimal. Consider a “hardware-software” system common in technology markets. The hardware is typically a durable good for which the software is a complement. Software may be bought in later periods after the hardware is adopted. Therefore, consumer adoption of the hardware is a function of not just expectations of the future price of the hardware, but the expectations about the future prices, quality, and availability of software. Consumers may rationally anticipate the monopolist may be tempted to raise software prices to the monopoly level by exploiting their lock-in to the hardware/software system. Therefore, in pricing the system, the monopolist faces a time consistency problem associated with both durability as well as lock-in. One possibility for the monopolist is to make binding commitments to keep future software prices low and to provide a sufficient variety of software.36 In the absence of commitment, another possibility is to “open” up the system so that software is supplied by independent vendors. When vendors produce more variety at lower prices when more of the hardware is sold, this assures consumers about future software supply. For instance, when software is supplied by many independent vendors in monopolistic competition, the larger the market size for the software, the larger is the variety and lower the prices of software supplied (Church and Gandal, 1992; Chou and Shy, 1996). Since the market size for the software is the installed base of the hardware (the total number of units sold), this sets up an incentive for penetration pricing on the hardware. When the increased software provision leads to more product adoption, and in turn induces more software provision, and so on, it can start off a self-reinforcing virtuous cycle or “positive feedback,” which is beneficial. When there are multiple hardware/software systems competing in the market, price competition occurs in the context of a “standards war”. The standards war may be fierce due to the possibility of winner-take-all outcomes. The latter is driven by two forces. One, an initial installed-base advantage for one standard may get accentuated by positive-feedback and lead to large long-run advantages for that standard. 35 In bundling, a set of products are offered together for sale with prices that feature a volume discount

across products: when a bundle comprised of various products are bought as a whole, one gets a lower price per product (e.g., Microsoft Office). In “pure” bundling only the bundles are offered for sale; in “mixed” bundling, both the bundles are the component products are offered for sale. Tying is similar to bundling, except one has to buy a primary good (e.g., Amazon Kindle) in order to use the secondary good (e.g., e-books). Tying is different from pure bundling. In pure bundling, the primary and secondary goods can only be bought in the fixed proportions offered. In tying, any quantity of the secondary good can be bought, so purchases are possible in flexible proportions. Tying can be seen as a restricted form of mixed bundling, wherein customers are offered prices for products A and B together, or B alone, but never A without B (Nalebuff, 2008). 36 Renting the hardware can also help by giving consumers an easy out if the hardware is not supported in the future or if the software prices becomes too high (Katz and Shapiro, 1994).

4 The third wave: Life cycle pricing

Second, beliefs of agents in the market can be self-fulfilling and there can be multiple equilibria. Market behavior is such that consumer have an incentive to adopt hardware, and software vendors an incentive to produce software, for the standard that is likely to be more popular. If a change in the environment causes market participants to believe more strongly a particular standard will emerge victorious, this change can in and of itself cause that outcome to be realized because agents act in a way that conforms to their beliefs, making their beliefs true. Because of this possibility, small early changes in beliefs can have large long-run effects on outcomes (David, 1985). This is another force towards winner-take-all outcomes and possible monopolization. The implication for marketing strategy is that early interventions in the life-cycle including low pricing, high advertising, facilitating software availability; and strategies that could affect market participants beliefs about standards (like announcements about marquee compatible software availability), become important for success. The forward-looking expectations of consumers, hardware firms, and software vendors about prices and quality is an important driver of these strategies. Since a consumer’s adoption decision is indirectly influenced by another’s due to the influence the other’s adoption can have on future software provision and prices, these markets are said to feature an indirect network effect (Katz and Shapiro, 1994; Shapiro and Varian, 1998). While the theory literature is well developed, the empirical work that has addressed dynamic pricing over the product life-cycle with forward-looking agents and complementary goods has been limited and sparse.37 One example is Li (2018), who analyzes life-cycle pricing for e-readers and e-books implemented by a monopolist (Amazon). Li’s (2018) approach is to estimate a dynamic demand system with forward-looking consumers in a first stage; and to compute an MPE in prices at those estimated parameters in a second stage. The MPE she computes incorporates dynamically optimizing consumers and a dynamically optimizing two product monopolist who incorporates the complementarities between the two products. Li points out that a firm that sells complementary goods can consider two pricing strategies: skimming, by which it sets high prices and lower them over time to cream-skim the market; and penetration, by which it sets low prices initially to penetrate the market so that it can earn later from subsequent software sales. Whether the firm chooses to skim or penetrate on either product depends on the mix of demand and costs, and has to be empirically determined. In general, it is optimal to penetrate in the product with higher relative demand elasticities and skim on the product with lower relative demand elasticities.38 Li’s estimates reveal two types of consumer types in the population: “avid readers,” who she finds are more price elastic for e-books than to

37 Hartmann and Nair (2010); Sriram et al. (2010); Derdenger and Kumar (2013); Liu et al. (2014); Lin

(2017); Huang (2018) analyze the demand for bundled or tied goods with forward-looking consumers. 38 The intuition is that profitable price discrimination requires that the higher price (relative to marginal

cost) be charged to the less elastic demander.

411

412

CHAPTER 7 Diffusion and pricing over the product life cycle

e-readers; and “general readers” who are more price elastic for e-readers than to ebooks. The optimal pricing policy suggests therefore that when e-readers are priced high initially, that e-books be cheaper, so as to attract avid readers. And once e-readers prices fall and the mix of buyers shift to more casual readers, to increase the e-book price. Li’s model shows the importance of accounting for consumer heterogeneity and forward-looking expectations in demand in the dynamic pricing of durables with complements.39 Life-cycle pricing of hardware with indirect network effects is considered in Liu’s (2010) exploration of the video game industry. Liu explores skimming incentives to inter-temporally price discriminate over the distribution of heterogeneity versus penetration incentives arising from indirect network effects. One feature of Liu’s model is that he considers cost-side experience curve effects in the production of the hardware, which are known to be important in many technology goods. This sets up an added incentive for falling prices. Liu’s (2010) model features dynamic duopolistic competition, but abstracts away from forward-looking consumer behavior. The joint impact of experience curve cost effects, network effects, and intertemporal price discrimination in this industry implies declining price paths over the life-cycle, though Liu’s estimates imply that firms’ margins rise. Dubé et al. (2010b) analyze life-cycle hardware pricing in markets with forwardlooking agents with indirect network effects. Dubé et al.’s (2010b) application is to the market for console video games which feature both durability (a video game console that is typically subject to one time purchases within a generation) and indirect network effects (the dependence in a user’s demand for a video game console on the behavior of another through his influence on video game availability). They specify a dynamic, aggregate logit model of demand for video-game consoles in which console demand for forward-looking consumers is driven by console prices and console-specific game availability; and a stylized model of video-game supply in which video game entry into the market is modeled as an entry problem under monopolistic competition. In the MPE they compute, life-cycle pricing is seen to display an increasing path, implying penetration pricing. In simulations, below marginal cost pricing emerges as an equilibrium strategy, and early price cuts more aggressive, as indirect network effects are made stronger (by increasing the weight given to software in consumer hardware utility). The simulations also accentuate the importance of consumer expectations and patience in driving long-run market outcomes. If firms are made ex ante symmetric, and neither firm has an initial installed base advantage, simulated long-run equilibria features the dominance of a single standard (“tipping”) only when the effect of software in consumer utility is high or when consumers are very patient. In other situations, the long-run equilibria frequently feature firms splitting the market. Interestingly, simulations suggest that the winner of the standards war between Sony and Nintendo is

39 Dynamic pricing of complements is also considered in Huang (2018) who analyzes the pricing implica-

tions of his dynamic demand-side model in steady-state without solving for time-varying pricing policies.

5 Goods with repeat purchase

also very sensitive to the complementarity in utility and forward-looking nature of consumers. Nintendo had a lower console production cost compared to Sony, but Sony had more software titles, and won the standards war in this generation. When the importance of software and/or the discount factors are sufficiently reduced, simulations show Nintendo’s cost advantage, which results in lower equilibrium prices, can lead to Nintendo instead winning over Sony.40

4.6 Summary This concludes the discussion on durable-goods. It should be clear that the dynamic pricing of such goods is now a mature empirical literature with several contributions on the main issue of pricing along with its interaction with related aspects such as product quality, selling-mechanisms, and contractual arrangements. The interest in durable-goods pricing was driven in part by the historical focus of the new product diffusion literature on technology goods that typically feature one-time purchases, and by the focus in the microeconomic theoretical literature on issues related to time consistency and durability. The literature on non-durable goods subject to frequent repeat purchase is equally copious. The next section considers some key themes in life-cycle pricing in this stream.

5 Goods with repeat purchase This section is included primarily for completeness of the chapter and is kept brief given comprehensive reviews of the literature already exist (Klemperer, 1995; Chen and Hitt, 2006; Farrell and Klemperer, 2007; Fudenberg and Villas-Boas, 2007; Villas-Boas, 2015; Klemperer, 2016). I briefly motivate the theory, and focus primarily on recent empirical work that has assessed dynamic pricing while accommodating forward-looking agents, referring the reader to these other reviews for more detailed overviews.

5.1 Theoretical motivations When goods are re-purchased, a primary issue for pricing is how consumer repurchase behavior is affected by current pricing. One source of dependence is the presence of switching costs. Switching costs arise when there is a cost to the consumer from changing his currently owned good. Some of these costs are contractual

40 Lee (2013) analyzes indirect network effects for a later generation of video games. Lee’s model features a richer demand-side for the software, accommodating the fact that consumers may care about the availability, quality, and prices of some games more than others, which complicates the dynamics. Lee uses the model to assess the impact of the extent of compatibility of games with consoles, but abstracts from life-cycle pricing issues. Joint estimation of demand and supply with indirect network effects and forward-looking consumers and firms is considered in Zhou (2017).

413

414

CHAPTER 7 Diffusion and pricing over the product life cycle

(e.g., the fee from termination of a cell-phone service); while others derive from brand-loyalty (e.g., the disutility from adopting a brand the consumer is not used to); sunk investments (loyalty points in reward programs; inventory of a storable good); from economies of scope arising from complementarities in purchases of products from the same firm; from search and transaction costs of identifying and evaluating new options; or learning costs of using new features in adopted goods. Switching costs give the consumer an incentive to continue buying from the firm from which he has previously purchased. Therefore, they generate lock-in, and produce future market power for firms as a function of their current market share. Price competition for this market share can be fierce, inducing penetration pricing, early discounting, and introductory offering strategies into the life-cycle (Klemperer, 2016). How the switching costs affect the price level again depends on consumer expectations. The following is an argument from Farrell and Klemperer (2007). With switching costs, a forward-looking consumer recognizes that he is more likely to buy tomorrow the product he bought today. The forward-looking consumer cares about his future self’s utility, and recognizes that his future self is more likely to prefer the product his current self picks today. In response to a current price cut, if consumer expectations are the price cut will be maintained in the future, he will be more sensitive to current price the higher his switching costs. Therefore, higher switching costs will induce lower equilibrium prices. On the other hand, if in response to a current price cut, consumer expectations are that the price cut will not be maintained in the future or that current price cuts will be followed by price increases, as in “bargain, then rip-off” pricing, he will be less sensitive to current prices the higher his switching costs. Then, higher switching costs will increase equilibrium prices. Therefore, accommodating forward-looking behavior and a role for expectations is critical to the assessment of pricing with switching costs. In pricing, firms face a tension between “harvesting” and “investing” incentives. When a consumer faces a cost to switch away from a firm, that firm has some monopoly power over that consumer. It is then optimal for the firm to charge higher prices to that consumer to extract more of his surplus given the monopoly power. This is the harvesting incentive. On the other hand, recognizing that they have an opportunity to harvest in the future, firms have an incentive to invest to get consumers to try their product and become future-loyals. For this they may lower prices. This is the investing incentive. Optimal pricing strategy balances both these incentives. The harvesting versus investing incentive is a function of the size of a firm’s customer base. All things equal, harvesting incentives may dominate for firms that enjoy a large proportion of the market as its customers. This is the “fat-cat” effect, which segments the market, facilitates new entry by small-scale competitors who can target the less loyal users, and possibly leads to cyclical shares when such entry occurs (see Farrell and Klemperer, 2007). The extent of harvesting and investing is a function of the entry of new users. When new consumers enter the market over time, there can be substantial competition to make those consumers loyal, leading to reduced harvesting, and periods of very low prices. How the profile of prices would look will depend on the strength of

5 Goods with repeat purchase

the switching cost and the distribution of heterogeneity among new and existing consumers in the marketplace. In some situations (when the switching cost is high and product differentiation is low), competition to induce consumers to try their product to become loyal can become intense enough that firms may be worse off with higher levels of brand loyalty (e.g., Villas-Boas, 2004; Cabral and Villas-Boas, 2005). Price cycles can also be obtained when firms can discriminate between existing customers and new customers. This is the “customer recognition” literature. With recognition, forward-looking consumers can rationally anticipate the seller can exploit the information it obtains on consumers’ purchase behavior for pricing. Similar to the canonical model for durable-goods with commitment, a seller with commitment power in this situation, may choose to not condition pricing on purchase history and forego the opportunity to discriminate (see Acquisti and Varian (2005) and the literature cited therein). For a review of pricing with switching costs and customer recognition, see Fudenberg and Villas-Boas (2007).

5.2 Empirical dynamic pricing Recent empirical work that has considered pricing with switching costs and forwardlooking agents has considered specific situations with state dependent utility, storable goods, and experience goods. Rather than solve the pricing problem over the full life-cycle of the product, much of the literature has considered pricing in a stationary steady state. One could loosely interpret this as developing a set of pricing strategies for the “mature” stage of the PLC.

5.2.1 State dependent utility Consumers have state dependence in utility when past actions by the agent have a direct effect on his current utility. In common parlance in marketing, the term is typically used to refer to a model of consumer behavior in which past purchase of a product directly affects the current utility from purchases (though other forms of dependence are also possible). State dependence in utility is a form of psychological switching cost because consumers face a disutility from purchasing a brand he is not loyal to. A substantial empirical literature has established these switching costs are statistically and economically meaningful in consumer packaged goods markets.41 The literature on brand loyalty is reviewed elsewhere in this handbook (Bronnenberg et al., 2019). While the demand-side of the literature is rich, the corresponding 41 Following Heckman’s early warnings (Heckman, 1991), researchers typically have been reluctant to

accommodate a role for state-dependence in demand unless they sort it out separately from the role of unobserved heterogeneity. After more than two decades of empirical work in which controls for unobserved heterogeneity have progressively become more sophisticated, state-dependence has now been established as a robust feature of consumer demand especially for branded, Fast Moving Consumer Goods (FMCG) goods (Kahn et al., 1986; Erdem, 1996; Roy et al., 1996; Keane, 1997; Seetharaman et al., 1999; Chintagunta et al., 2001; Erdem and Sun, 2001; Seetharaman, 2004; Shum, 2004; Horsky et al., 2006; Dubé et al., 2009, 2010a; Cosguner et al., 2018; Tuchman, 2018). Most marketing scholars now think of this persistence as a fundamental feature of consumer demand.

415

416

CHAPTER 7 Diffusion and pricing over the product life cycle

problem of optimal life-cycle pricing of products with state-dependent demand is relatively sparse. The significant challenge in solving for optimal firm side strategies with state dependent utility is the need to track an aggregate state that adequately captures the persistence induced across time in individual-level demand – a complex issue going back at least to Givon and Horsky (1978). Dubé et al. (2009) empirically analyze dynamic pricing with state dependent utility in the context of a dynamic game with competition. Consumers in their model are modeled as myopic and allowed to exhibit first-order dependence in brand loyalty. Fitting demand to scanner data on orange juice brands, the authors find evidence for inertia in brand choices with implied switching costs in the range of 15 to 60% of purchase prices. At the estimated parameters, prices and profits are found to be lower with switching costs, compared to without, implying switching costs make outcomes worse for firms. This result is driven by two aspects. First, the model features imperfect lock-in, which implies that consumers may switch away from products they have previously purchased; and two, consumers have time-varying tastes (a random utility component), which has effects similar to “new users” entering the market. Both create an incentive to cut prices to retain own and competitors’ loyal customers, dampening the harvesting incentive and increasing the investing incentive, reducing prices. To the extent that these aspects apply to other markets, this finding may apply more broadly. Che et al. (2007) is another paper that explores dynamic pricing under statedependent demand. They specify state-dependence over the attributes of the product bought previously, as opposed to the brand of the product bought previously. Consumers are modeled as myopic. Rather than aggregate up the state space across types as above, they use aggregate brand market shares as the market state. This is an approximation to the true state. At their estimated parameters, they report that dynamic weekly pricing by the firm with a 1-week horizon (modeled as a one-period lookahead) fits the observed pricing patterns as well as with a two or three week horizon, suggesting that retailer in practice may not be optimizing over long time-horizons. Cosguner et al.’s (2018) interest is to study how a retailer intermediary would affect dynamic price, so they model a vertical channel relationship between the brands and a retailer selling the brands. They point out an interesting role for the retailer in that, the retailer who cares about the category as a whole, may have little to no incentive to lock in customers to a particular brand by lowering its current retail price. Anticipating that the retailer may not cooperate in passing through lowered wholesale prices on their brands to end consumers, brands may therefore end up doing more harvesting over investing in setting their wholesale prices. In their approach, consumers are treated as myopic, so demand is static. Using a discrete-segment model of heterogeneity, demand is estimated to exhibit inertia. Life-cycle pricing is modeled as the outcome of a competitive game with switching costs. At their estimated demandside and cost parameters, Cosguner et al. (2018) find that while manufacturers have both harvesting and investing incentives, the retailer tends to be driven primarily by a harvesting incentive. It seems the retailer effectively free rides on the manufacturers’

5 Goods with repeat purchase

efforts by taking a lion’s share of the additional profits that accrue to the channel from the existence of inertial demand. While the empirical literature has made substantial progress, an open question that remains is to assess the implications for pricing with switching costs in a market with both forward-looking firms and consumers. Additionally, most of the demand estimation literature on the topic has treated consumers as myopic.42 It is an open question whether consumers recognize that their preferences change in the future conditional on their decision to buy a particular brand today. One can envisage that “naive” myopic consumers do not; while “sophisticates” do. Sophisticates are forward-looking. This is analogous to the debate in the literature on boundedly rational, present-biased consumers on whether consumers recognize that they will be present-biased in the future (e.g., Frederick et al., 2002). The typical assumption is that consumers are “naive” and unaware of their future self’s preferences. The challenge in distinguishing between naives versus sophisticates involves finding the right variation in the data that can distinguish between these two types of heterogeneity (see Mahajan and Tarozzi, 2011 for one approach based on sophisticates’ demand for commitment devices).

5.2.2 Storable goods A storable good is one that does not depreciate immediately and therefore is subject to infrequent repeat purchase. In that sense, storable goods are similar to durable goods with replacement. Under storability, past purchases affect current demand through the impact of inventory. If the consumer has a lot of inventory at home, he is unlikely to buy today even if prices are low. Therefore, when inventory costs are sunk, one can think of storability as inducing a switching cost. Storable goods abound in retail. For instance most FMCG products like household cleaning goods, boxed food, canned and frozen goods, and many others found in grocery retail stores are storable. Storability along with frequent retail sales induces a propensity for consumers to dynamically time their purchases, taking advantage of promotions to stockpile. The propensity to stockpile is a function of the storability of the product and the extent to which consumers are able to anticipate price promotions. When consumers have strong price knowledge and have well-informed expectations about the timing and depth of price promotions, they can stockpile during periods of low prices, and consume out of the stockpiled inventory in high-price periods. If this dynamic becomes strong, promotions become profitless give-aways, serving only to inter-temporally shift demand from a possible high price future to a low price present, with little incentivization of “primary” or category-expanding demand. This is sometimes referred to as “purchase acceleration” (Blattberg and Neslin, 1990). Normatively, accounting of the dynamics induced on demand through the impact of expectations is important for a proper auditing of the efficacy of promotions.

42 Demand side studies that have accommodated state dependent utility with forward-looking consumers

include Hartmann (2006); Hartmann and Nair (2010); Osborne (2011).

417

418

CHAPTER 7 Diffusion and pricing over the product life cycle

A feature of storable good demand for pricing is that the value of not buying is endogenous because it depends on past purchases. This feature generates a dynamic trade-off for pricing: selling more in the current period reduces demand in future periods for the firm because recent buyers are unlikely to buy again in the near future. The ownership distribution of products across consumers in the market is thus relevant for characterizing aggregate demand. This distribution becomes the state variable that determines pricing. Many advances have been made in capturing the dynamics of consumer demand with inventory accumulation and storability.43 More limited progress has been made in the field on computing optimal prices of firms when facing dynamically optimizing consumers under storability. Hendel and Nevo (2013) present an approach to solve for optimal prices in a storable goods setting, albeit using a restricted demand system. To facilitate computation of optimal prices, their aggregate demand system requires that consumer inventory depletes completely in a finite number of periods after purchase; that consumers have perfect foresight about prices (rational expectations and no uncertainty); and that consumers know their demand profile perfectly over the inventory depletion period, so they can optimize perfectly for their needs. Under realistic parameters, the optimal pricing policy from the model features sales, which enables intertemporal price discrimination between consumers who can store the good and those who cannot. Fitting the model to data from soft-drinks, consumers who can store are found to be more price-sensitive than consumers who cannot, so there are gains to price discriminating by targeting lower prices to storers. Third degree price discrimination would target separate prices to storers and non-storers, but is practically infeasible. Nevertheless, intertemporal price discrimination via sales are found to allow the manufacturer to recover about a quarter of the profit gain from third degree discrimination and no discrimination, and about half of the quantity increase, suggesting that “Hi-Lo” pricing is profitable even with forward-looking consumers that anticipate such pricing.

5.2.3 Consumer learning State dependence in consumer behavior can also arise from persistence in beliefs (Erdem and Keane, 1996; Moshkin and Shachar, 2002). When consumers are uncertain about the quality or match-value with a product, consumption can provide a signal to update their prior beliefs. There is an extensive empirical literature on consumer learning under uncertainty for such experience goods.44 Generally speaking, this lit43 Assessments of demand with forward-looking consumers and storable goods include Meyer and As-

sunção (1990); Krishna (1992); Assunção and Meyer (1993); Gönül and Srinivasan (1996); Erdem et al. (2003); Sun (2005); Hendel and Nevo (2006); Chan et al. (2008); Hartmann and Nair (2010); Seiler (2013); Gordon and Sun (2015); Akca and Otter (2015); Ching and Osborne (2017); Haviv (2019). Joint estimation of demand and supply with forward-looking consumers and firms is developed in Osborne (2012). 44 See Eckstein et al. (1988); Erdem and Keane (1996); Ackerberg (2003); Akçura et al. (2004); Mehta et al. (2004); Crawford and Shum (2005); Chan and Hamilton (2006); Israel (2005); Narayanan and Manchanda (2009); Zhang (2010); Goettler and Clay (2011); Osborne (2011); Shin et al. (2012); Grubb and Osborne (2015); Dickstein (2018), and Ching et al. (2013) for a survey.

5 Goods with repeat purchase

erature treats consumers as forward-looking Bayesians who are endowed with prior beliefs about the quality of a brand. Upon consumption of the brand, consumers receive a noisy signal about the quality of the brand, which they use to update their prior beliefs according to Bayes rule. Purchase behavior is modeled as a best response to current prices and product availability given the most updated beliefs. The incentive to switch products is high when the consumer is new to the product-market and is uncertain about product quality, so that trial has informational value. State dependence arises in this model because once a consumer has learned his match-value, he sticks with his best matching product. On the theory side, dynamic life-cycle pricing corresponding to consumer learning has been considered by Bergemann and Välimäki (2006). They consider lifecycle pricing of experience goods by a forward-looking monopolist seller facing forward-looking buyers who are initially uninformed about product quality, and learn from consumption. The key force in the model is the management of information to uncertain consumers. Since the consumers are forward-looking, the value of information to them depends on their expectations of the price path. If future prices are too high, making a current purchase to resolve uncertainty has low value. If future prices are too low, it makes sense to wait irrespective of the informative value of a current purchase. Therefore, the price-path serves to determine the rate at which information diffuses into the market. Bergemann and Välimäki (2006) show that in a mass market, optimal prices exhibit skimming, while in a niche market, they exhibit penetration. They define a market as mass or niche based on the extent to which the willingness-to-pay of the uninformed consumers exceeds the static monopoly price. A setting is considered a mass market when uninformed consumers are willing to pay more than the monopoly price; which loosely speaking, may characterize new products that substantially improve on existing products.45 Intuitively, in a mass market, managing information is not critical because uninformed consumers are willing to buy at the monopoly price. So the monopolist skims, extracting the surplus of the uninformed consumers. In a niche market, uninformed consumers are unwilling to pay monopoly prices. So the monopolist practices penetration, sacrificing current profits in order to find new future buyers.46 On the empirical side, one paper that has considered dynamic life-cycle pricing with consumer learning is Ching (2010b), who develops an empirical model of oligopoly pricing by generic pharmaceuticals after patent expiry. In Ching’s model, generics are treated as experience goods. Patients, physicians, and firms are not fully informed of generic quality and have to learn it from experience. Each period, physicians obtain feedback from past patients about their experience with the generic, use

45 Specifically, denote the static monopoly price for informed buyers as p, ˆ and by wˆ the willingness-to-

pay of an uninformed forward-looking buyer who expects future prices to stay at p. ˆ A mass market is one where wˆ ≥ p. ˆ 46 It could also be that consumers and firms are simultaneously learning. Dynamic pricing with two-sided learning is considered for example, in Bergemann and Välimäki (1996, 1997).

419

420

CHAPTER 7 Diffusion and pricing over the product life cycle

it to update their beliefs about generic drug quality, and prescribe drugs for their patents based on these beliefs. Ching assumes that physicians behave as aggregators of information about patient experience, and that moments of the physicians beliefs (mean, variance) are made available to the public, including firms. This describes the state of the market. Firms condition on this state and set prices in an oligopolistic, dynamic game. Ching’s goal is partly to explain the observed phenomenon that brand-name firms raise their prices after patent expiration. Ching shows that as patients learn generic drug quality, the set of patients who switch away from branded to generic drugs comprise the more price sensitive slice of the population, and the price increase by the branded firm can be rationalized as a best response to this change in its residual demand.

5.3 Summary This concludes the discussion on goods with repeat purchase. It should be clear that the dynamic pricing of such goods shares many themes with the durable goods pricing literature, especially in terms of the importance of managing consumer expectations and the effect this has on the firms’ life-cycle prices. A key force that drives the price dynamics is the way current prices affect consumers’ future repeat purchases, by changing the consumers’ state (information sets, beliefs, inventory, preference shifters, etc.). When pricing can move consumers to states that are more favorable for the firm, this generates a role for managing expectations, controlling the flow of information, facilitating trial, etc., and possibly sacrificing some current profits, so as to maximize profits over the long-run. The fact that rational consumers may anticipate that future prices will respond to their changed states is key to the problem. Though the theory and the demand-side of the empirical work is rich, the part of the empirical literature that has developed life-cycle pricing policies is thinner compared to the durable goods literature. It is hoped that this brief survey will spur further research in this area.

6 Open areas where more work will be welcome I outline some areas where more work will be welcome. I focus on areas where a nascent empirical literature is still emerging, and where more empirical work will be valuable. Finally, as conclusion, I offer some commentary on the emphasis on micro-foundations.

6.1 Life-cycle pricing while learning an unknown demand curve Overall, an area where more work would be welcome would be to allow for seller learning about underlying demand. Generally speaking, the empirical literature has

6 Open areas where more work will be welcome

assumed that firms know the true model generating demand.47 This assumption may not be unreasonable in data-rich situations in stable markets where firms have access to rich historical information with which to understand demand. However in datapoor situations; for new products; for contexts with new consumers; or in environments that are very different from what occurred historically, it seems unreasonable to assume that firms understand all aspects of demand structure. As Trefler (1993) wrote, “Economics lacks a good theory of the pricing and output decisions of a firm that does not know its demand. We always assume the firm has complete demand information or has exact knowledge of the stochastic process generating demand. Yet no explanation is given for how the firm comes by this information.”

When the structure of demand is uncertain, the way payoffs respond to actions the firm can take to exploit demand is uncertain. This sets the stage for an “exploration vs. exploitation” tradeoff in setting optimal price policies. The problem of life-cycle pricing becomes a problem of optimally managing the acquisition and leverage of information. Rothschild (1974); Grossman et al. (1977); McLennan (1984); Easley and Kiefer (1988); Aghion et al. (1991); Trefler (1993); Rustichini and Wolinsky (1995); Keller and Rady (1999); Harrison et al. (2012) present theoretical models of optimal price experimentation by firms facing an unknown demand curve. Broadly speaking, the main tension in life-cycle pricing is that the price helps the seller to both learn about demand and to profit from it. Information acquisition dynamically affects the pricing policy in many ways. To see this, imagine a seller with a single unit of a good who faces consumers arriving sequentially. The seller has a prior belief about the valuations of consumers. Observing consumer reaction at the current price forms a signal to the seller: no purchase by a consumer at a given price shows the seller his prior was too optimistic. This allows him to revise his beliefs. A forward-looking seller recognizes how his beliefs would be updated in setting his current prices. Another aspect is that uncertainty the seller has about the various levels consumer valuations can take is uneven. The uncertainty about a particular level of valuation is better resolved if the seller sets prices close to those levels. So, the seller has an incentive to move his price in the direction that is most informative. Finally, the seller recognizes that not selling the unit today provides an opportunity to learn more about demand tomorrow. So, the prospect of continuation acts as an opportunity cost to a current sale, incentivizing the seller to set a higher price today. As more information is acquired over time, and value of additional information acquisition reduces, this incentive weakens. Optimal pricing encapsulates these incentives. Competition complicates the problem substantially. When product qualities across firms are related, and market outcomes are publicly observed, information

47 Montgomery and Bradlow (1999) advocate making demand specifications more flexible to account for

the uncertainty the decision maker may have about the structure of demand.

421

422

CHAPTER 7 Diffusion and pricing over the product life cycle

becomes a public good. In competition, incentives for experimentation must balance the possibility of reduced experimentation due to competitor free-riding on the information generated by a focal firm’s experimentation, versus the fact that the focal firm may be encouraged to experiment more when it can bring forward the time at which the information generated by the experimentation of others becomes available (Bolton and Harris, 1999). Empirical investigation of life-cycle polices with seller learning are limited. Hitsch (2006) and Huang et al. (2018) are two notable examples. Accommodating seller learning allows Hitsch (2006) and Huang et al. (2018) to explain declining advertising and price paths respectively over the life-cycle of the products they consider (newly launched cereal brands, and cars offered for sale on CarMax.com). A challenge for empirical work is the complexity of the learning problem. In a realistic situation, seller uncertainty about the arrival process, the extent of heterogeneity, the ability to price discriminate and competition complicate the analysis substantially. Another difficulty for empirical work is that the state space for the firm’s problem is the space of beliefs, which is very high dimensional, and updating the beliefs based on the observed outcome in the market is complex except for a small class of distributions for which the prior and the signal are conjugate. Given these challenges, Hitsch (2006) and Huang et al. (2018) work with a parametric model and normally distributed signals to reduce some of the computational complexity. A literature in reinforcement learning has developed algorithms that have best in class performance for non-parametrically specified reward functions and transitions (e.g., Lai and Robbins, 1985). Using these algorithms can be helpful to aid firms to set prices while dynamically resolving their uncertainty about demand. Misra et al. (2017) is an example. More work along these lines, especially allowing prices to have long-lived effects, would be welcome.

6.2 Joint price and advertising over the life-cycle The empirical literature on advertising in marketing and adjacent fields is enormous. I do not attempt to discuss the voluminous empirical work that has explored the effect of advertising on demand in this chapter. Instead, I refer the reader to comprehensive reviews by Bagwell (2007) and by others in this volume. Broadly speaking, after about half a century worth of advertising research, the marketing community would generally agree on the following aspects of the effect of advertising on demand. First and foremost, advertising is not a static problem. The effect of current advertising depends on the history of past advertising. What matters is the stock of advertising exposure, not just the flow. Both the frequency and distribution of past exposures matter. There is also lot of descriptive support for S-shaped response (Rao and Miller, 1975; Lambin, 1976; Simon and Arndt, 1980; Bemmaor, 1984; Hanssens et al., 2001; Vakratsas et al., 2004; Dubé et al., 2005). S-shaped response implies that a marginal ad-exposure induces high incremental response when the number of past advertising exposures is small. Typically, there is a lower threshold level of exposures below which the incremental response is zero. Further, when

6 Open areas where more work will be welcome

the stock of past exposures is high, the marginal ad-exposure induces lesser response and beyond a certain threshold, possibly negative incremental response. The high incremental response may derive from awareness, reminder effects, complex processes of psychological activation that play out directly in utility; and the concavity in response may be because of annoyance or heightened perceptions about invasion of privacy when exposure to many ads. The possible mechanisms are many. Irrespective of the plethora of possible underlying mechanisms, S-shaped response is now considered a robust feature of advertising. Consequently, life-cycle advertising generally has to respond in a dynamic way to this kind of response pattern at the individuallevel. Second, most researchers would broadly agree that advertising affects consumer demand in many complex ways that are not yet fully understood empirically. At a broad level of abstraction, advertising may affect consumer beliefs by serving as information; or affect preferences by providing direct utility in and of itself, or by changing the utility from consumption of the product. While theoretical treatments of micro-foundations have been rich, empirical testing and clear understanding of the specific ways in which these various mechanisms affect consumer behavior is still a work in progress. One reason partly is the difficulties of addressing the endogeneity of advertising exposure. This confounds causality. The other is that high frequency, individual-level data on advertising exposure and outcomes has only been recently available to empirical marketing researchers. The lack of individual-level data makes assessing micro-phenomena difficult. The recent availability of high frequency data from digital platforms holds the promise of more progress on this dimension. Third, going back all the way to Dorfman and Steiner (1954), it is recognized that advertising and pricing are not separate problems. Advertising affects the elasticity of demand and therefore life-cycle advertising and pricing have interactions, and needs to be set jointly (the same statement obviously holds for all aspects of marketing strategy over the life-cycle, not just price and advertising). The challenge for computing optimal life-cycle policies is to adequately measure in a parsimonious way, the cause and effect of a sequence of advertising exposures on demand; and to then compute optimal life-cycle advertising policies in a competitive dynamic equilibrium that recognizes long lived effects of advertising. Modeling the entire history of past exposures as a state-vector that determines current advertising is too high dimensional. So, the literature has typically worked with the so-called “goodwill” model, which provides parsimony. This approach adapts the capital-stock approach of Nerlove and Arrow (1962) to the ad-case. In this model, we assume that advertising creates some goodwill in the marketplace for the firm. One can think of goodwill as long lasting “Brand Capital” (e.g., Keller, 2002). Consumer demand is assumed to myopically respond to goodwill. Goodwill is assumed to depreciate over time. Firms invest in advertising to augment the goodwill in the marketplace, and to impact on consumer demand. Goodwill is therefore a stock variable that is augmented each period t by period-t advertising. The carryover effect of advertising captured by the goodwill model is sometimes referred to as “wear-in”, while the depreciation of goodwill over time is sometimes referred to as “wear-out”. Since future goodwill is

423

424

CHAPTER 7 Diffusion and pricing over the product life cycle

a function of current goodwill and current advertising, advertising has long lived effects with wear-in and wear-out. So, the problem of dynamic advertising is a problem of dynamic investment over the life-cycle, where investment has long lived effects. There is no reason to believe that prices do not interact with goodwill to affect demand, so the life-cycle problem of pricing and advertising are linked dynamically. The basic goodwill model provides a parsimonious way of handing the historydependence of advertising effects on past advertising activity, but obscures the complexity of actual ad-campaigns and their effects over time. One issue is with the treatment of the campaign as a unit. In practice, an advertising campaign concurrently consist of several themes (for example, price and product advertisements) or ad-copies; and wear-in and wear-out can occur over the life-cycle of the campaign at the level of the theme or ad-copy. The carryover and depreciation of goodwill is therefore determined by the complex interaction of the carryover and depreciation of underlying ad-content and themes incorporated into the campaign. There is emerging evidence of these complex interactions. Naik et al. (1998) and Bass et al. (2007) present descriptive evidence of wear-in and wear-out over underlying themes of advertising. Naik et al. (1998) document that brand awareness is affected by wear-out at the level of ad-copy and wear-out due to repeated exposure to ads. Bass et al. (2007) describe similar patterns on the evolution of goodwill and advertising effectiveness over time for multiple ad-themes that comprise a campaign. In a competitive marketplace, imitation and clutter of ad-copies and strategies by competing firms can also lead to wear-out at the level of ad-copy and to increased clutter, reducing ad-response (e.g., Corkindale and Newall, 1978; Axelrod, 1980). These could result in a form of prisoner’s dilemma in advertising, which plays out in complex ways over the lifecycle. These results could be different for new goods versus established goods, and for new consumers versus mature consumers in the category. More generally speaking, we are still in the early stages of developing an empirically-tested, theoretically-grounded, and practically-relevant body of work on the dynamics of advertising content on demand, which can form the basis of computing corresponding life-cycle policies. Also missing in the literature is a proper empirical accounting of the dynamics induced by intermediaries (such as search engines) and modern tech-driven advertising institutions (such as ad-exchanges and marketing data management platforms) in life-cycle advertising policies. (Yao and Mela (2011) is a notable exception.) Incorporating these aspects these into parsimonious models for life-cycle advertising and campaign strategies in a dynamically competitive environment remain a work in progress. More work would also be welcome in the empirics of advertising dynamics over the PLC. Credible empirical work that speaks directly to life-cycle policies is still in its early stages (Hitsch, 2006 is a notable example). The key issue here is that advertising is not a static problem. The effect of advertising accumulates over time and outcomes are realized through repeated exposure over a long period. The intensity of exposure as well as the organization of exposures within a particular sequence matters critically for the implied effects of advertising. This implies that the interesting treatment for measuring the causal effect of advertising is not a single ad-exposure, but

6 Open areas where more work will be welcome

a sequence of ad-exposures. The modern literature on advertising effects has rightly leveraged the power of randomized controlled trials to account for the endogeneity of advertising exposure and to measure credible effects of ads. What is now needed are marketplace experiments in which consumers are randomized into sequences of ad-exposures so as to understand important stylized facts on the demand-side, which can then form an input into the creation of optimal advertising strategies over the life-cycle on the supply-side. This may be especially relevant for new products or for new consumers that are just entering the marketplace and seeing ads for the first time, where the early periods of adjustment are precisely the phenomena of interest to be studied. Experiments in which the treatment is a one-time static exposure that is implemented cross-sectionally are useful for cross-sectional testing of theory, and for comparing heterogeneity in response across consumers or creatives. But these experiments do not capture the interesting dynamics of sequential exposure and their impact on life-cycle policies.48

6.3 Product introduction and exit More work will be welcome in understanding the problem of how and what products to introduce and withdraw from the market over time. This is a problem of product entry and exit with competition where the attributes of the portfolio of products brought and taken out the market, as well as their prices and advertising are endogenized. This is obviously a very challenging problem; see Crawford (2012) for an overview. On the empirical side, limited progress has been made in some situations. For example, Iyer and Seetharaman (2003); Nosko (2014); Eizenberg (2014); Berry et al. (2016); Wollmann (2018); Fan and Yang (2018) analyze product introduction in competition; Ellickson and Misra (2011); Fan (2013); Sweeting (2013); Jeziorski (2014) analyze how firms reposition themselves in the market by adjusting product characteristics. The related problem of obtaining distribution of the product in the market over time is understudied. Analyzing data on 200+ FMCG new product introductions in France, Ataman et al. (2008) report that obtaining high distribution is most closely associated with new product success, underscoring the importance of the problem (see also Friberg and Sanctuary, 2017 for quasi-experimental evidence on the effect of distribution). Optimal dynamic “market-rollout” of products across stores in a competitive environment, and its link to life-cycle outcomes is not well understood in the literature (see Bronnenberg and Mela, 2004, and the literature cited therein). Empirical progress on the problem of retail assortment selection (a productchoice problem for the retailer); the choice by retailers of whether and with which manufacturer to adopt category captaincy (a product-choice problem for a retailer); 48 Experimental analysis of the effect of ad-distribution is developed in Sahni (2015); Sahni et al. (2016).

An appealing feature of these studies is that the advertising sequence to consumers are randomized at the individual-level and outcomes tracked. This enables them to study the effect of a sequence of ad-exposures on outcomes and to investigate how the properties of the sequence affects outcomes at the micro-level.

425

426

CHAPTER 7 Diffusion and pricing over the product life cycle

and the choice of which market to open ethanol-compatible gasoline stations in (a location-choice problem for the gasoline retailer) have been made in static setups by Draganska et al. (2009); Hristakeva (2018), Viswanathan (2016); and Shriver (2015) respectively. Progress on empirical analysis of the dynamics of product entry and exit has been made in Hitsch (2006); Shen (2014); Gallant et al. (2018), albeit with no role for product choice. A related problem is the strategy of releasing versions over time. This can be thought of as a strategy of simultaneous life-cycle pricing, product introduction, and exit. Recent progress has been made on the demand-side of this problem. One example is Brecko (2017) who focus on consumer adoption of versions of various durable software goods, introduced sequentially into the market over time. Her analysis model takes into account that consumers may rationally time their purchases and upgrades based of forward-looking expectations on the entry timing, pricing, and quality of future versions (without computing the optimal versioning policy). Because she has access to individual-level data, Brecko can handle consumer heterogeneity in a direct way (key to versioning incentives). Extending some of this work to solve the life-cycle problems of product introduction and exit in competition would be a challenging but significant step forward for this literature.

6.4 Long term impact of marketing strategies on behavior A final pitch for more empirical work is for empirical work on life-cycle policies that better considers the consequences of long-run marketing strategies on market outcomes. An important literature has described how consumers behavior changes in the long-run due to sustained exposure to firm’s life-cycle policies. For instance, tracking consumer behavior in FMCG categories for about 8 years, Mela et al. (1997) document that consumer’s exposure to frequent sales and price variation in grocery is associated with their becoming more price sensitive over time. Elberg et al. (2019) document similar phenomena using a randomized controlled trial in retail stores in Chile. Using Dutch FMCG data, Nijs et al. (2001) document that increasing the frequency of price promotions is associated with higher category demand only in the short run. These positive associations are dissipated in the long-run; which is consistent with the same phenomena. In other work, Anderson and Simester (2010) document in a randomized field experiment that buyers from a retailer who later observe the same firm selling the product for less, react by making fewer subsequent purchases from it, suggesting a cost to excessive price variation. Overall, it seems that long-run exposure to marketing strategies has the potential to produce non-trivial shifts in the behavior of the consumers targeted by those strategies. This needs to be better reflected in extant models of optimal life-cycle planning.

6.5 Linking to micro-foundations I conclude the chapter with a discussion of the value of linking the life-cycle to microfounded models of consumer and firm behavior, which has been a key characteristic of the third wave. It is not unreasonable to ask why we need to have this, especially if they come with costs like adding parametric functional-form assumptions.

References

A purely practical response to this is that unless we have a clearly specified model for exactly what causes persistence in demand over time, and how marketing strategy impacts that persistence, we cannot articulate exactly what the relevant state is to construct the corresponding strategy for firms facing such demand. In almost all the dynamic frameworks discussed in this chapter, the relevant state for the firm’s problem became clear when we had a model built from the individual-level problem, and allowed individuals to be heterogeneous. If we allow current demand to depend nonparametrically on past prices, advertising and marketing, and past actions, the entire history of past marketing exposures and actions become a part of the state space for the firm’s problem. This is too high dimensional to handle. Even if we adopt the perspective that we will learn this dependence via experimentation, the amount of experimentation that is required to understand this dependence may be too massive to be realistic in practical applications. So, having a direct link to the phenomenon that induces the dependence is helpful to impose needed structure and parsimony on the life-cycle problem. As computing power and the granularity of data improve, we can seek to relax these assumptions and handle more flexible and nonparametric frameworks. Separately, if we are completely non-parametric about the dependence, we learn very little about why marketing strategy induces this dependence and generates longterm effects. This makes it difficult to port insights from one case to another because we do not have links to a theory for why something works well in one context and does not work well in another context. If we cannot accumulate knowledge in this fashion, it is hard to formulate a programmatic agenda for moving marketing science forward. Finally, having a tight link to micro-foundations helps researchers who are working on life-cycle strategies to benefit from complementary investments made by other researchers on understanding the micro-structure of consumer behavior. By making life-cycle models linked closely to behavior uncovered in those complementary literatures, we can contemplate how optimal marketing strategies would respond and change to the nuances of consumer behavior continually being discovered in related areas of social science. This interdisciplinary connection is valuable for both academia and practice.

References Abbring, J., Daljord, Ø., 2017. Identifying the Discount Factor in Dynamic Discrete Choice Models. Working Paper. Booth School of Business. Ackerberg, D., Hirano, K., Shahriar, Q., 2017. Identification of time and risk preferences in buy price auctions. Quantitative Economics 8, 809–849. Ackerberg, D.A., 2003. Advertising, learning, and consumer choice in experience good markets: an empirical examination. International Economic Review 44, 1007–1040. Acquisti, A., Varian, H.R., 2005. Conditioning prices on purchase history. Marketing Science 24, 367–381. Adda, J., Cooper, R., 2000. Balladurette and juppette: a discrete analysis of scrapping subsidies. Journal of Political Economy 108, 778–806. Aflaki, A., Feldman, P., Swinney, R., 2019. Becoming Strategic: The Endogenous Determination of Time Preferences and Its Implications for Multiperiod Pricing. Working Paper. Fuqua School of Business.

427

428

CHAPTER 7 Diffusion and pricing over the product life cycle

Aghion, P., Bolton, P., Harris, C., Jullien, B., 1991. Optimal learning by experimentation. The Review of Economic Studies 58, 621–654. Aguirregabiria, V., Nevo, A., 2013. Recent Developments in Empirical IO: Dynamic Demand and Dynamic Games. Econometric Society Monographs, vol. 3. Cambridge University Press, pp. 53–122. Akca, S., Otter, T., 2015. Identifying the discount factor of forward looking consumers based on consumption from inventory. https://doi.org/10.2139/ssrn.2440681. Akçura, M.T., Gönül, F.F., Petrova, E., 2004. Consumer learning and brand valuation: an application on over-the-counter drugs. Marketing Science 23, 156–169. Alchian, A.A., 1958. Costs and Outputs. RAND Corporation. Anderson, E.T., Simester, D.I., 2010. Price stickiness and customer antagonism. The Quarterly Journal of Economics 125, 729–765. Arcidiacono, P., Ellickson, P.B., 2011. Practical methods for estimation of dynamic discrete choice models. Annual Review of Economics 3, 363–394. Arrow, K.J., 1962. The economic implications of learning by doing. The Review of Economic Studies 29, 155–173. Arya, A., Mittendorf, B., 2006. Benefits of channel discord in the sale of durable goods. Marketing Science 25, 91–96. Assunção, J.L., Meyer, R.J., 1993. The rational effect of price promotions on sales and consumption. Management Science 39, 517–535. Ataman, M.B., Mela, C.F., van Heerde, H.J., 2008. Building brands. Marketing Science 27, 1036–1054. Ausubel, L.M., Deneckere, R.J., 1987. One is almost enough for monopoly. The Rand Journal of Economics 18, 255–274. Ausubel, L.M., Deneckere, R.J., 1989. Reputation in bargaining and durable goods monopoly. Econometrica 57, 511–531. Axelrod, J., 1980. Advertising wearout. Journal of Advertising Research 20, 65–74. Bagnoli, M., Salant, S.W., Swierzbinski, J.E., 1989. Durable-goods monopoly with discrete demand. Journal of Political Economy 97, 1459–1478. Bagwell, K., 2007. The Economic Analysis of Advertising. Handbook of Industrial Organization, vol. 3. Elsevier, pp. 1701–1844. Bajari, P., Chu, C.S., Nekipelov, D., Park, M., 2016. Identification and semiparametric estimation of a finite horizon dynamic discrete choice model with a terminating action. Quantitative Marketing and Economics 14, 271–323. Bass, F.M., 1969. A new product growth for model consumer durables. Management Science 15, 215–227. Bass, F.M., 1980. The relationship between diffusion rates, experience curves, and demand elasticities for consumer durable technological innovations. The Journal of Business 53, S51–S67. Bass, F.M., Bruce, N., Majumdar, S., Murthi, B.P.S., 2007. Wearout effects of different advertising themes: a dynamic Bayesian model of the advertising-sales relationship. Marketing Science 26, 179–195. Becker, G.S., 1965. A theory of the allocation of time. The Economic Journal 75, 493–517. Bemmaor, A.C., 1984. Testing alternative econometric models on the existence of advertising threshold effect. Journal of Marketing Research 21, 298–308. Bergemann, D., Välimäki, J., 1996. Learning and strategic pricing. Econometrica 64, 1125–1149. Bergemann, D., Välimäki, J., 1997. Market diffusion with two-sided learning. The Rand Journal of Economics 28, 773–795. Bergemann, D., Välimäki, J., 2006. Dynamic pricing of new experience goods. Journal of Political Economy 114, 713–743. Berry, S., Eizenberg, A., Waldfogel, J., 2016. Optimal product variety in radio markets. The Rand Journal of Economics 47, 463–497. Berry, S., Pakes, A., 2000. Estimation from the Optimality Conditions for Dynamic Controls. Working Paper. Harvard University. Besanko, D., Winston, W.L., 1990. Optimal price skimming by a monopolist facing rational consumers. Management Science 36, 555–567. Bhaskaran, S.R., Gilbert, S.M., 2005. Selling and leasing strategies for durable goods with complementary products. Management Science 51, 1278–1290.

References

Bitran, G.R., Mondschein, S.V., 1997. Periodic pricing of seasonal products in retailing. Management Science 43, 64–79. Björkegren, D., 2019. The adoption of network goods: evidence from the spread of mobile phones in Rwanda. The Review of Economic Studies 86 (3), 1033–1060. Blattberg, R.C., Neslin, S.A., 1990. Sales Promotion: Concepts, Methods, and Strategies. Prentice Hall, Englewood Cliffs, NJ. Blume, L.E., Brock, W.A., Durlauf, S.N., Ioannides, Y.M., 2011. Identification of social interactions. In: Handbook of Social Economics, vol. 1, pp. 853–964. Board, S., 2008. Durable-goods monopoly with varying demand. The Review of Economic Studies 75, 391–413. Board, S., Skrzypacz, A., 2016. Revenue management with forward-looking buyers. Journal of Political Economy 124, 1046–1087. Bollinger, B., 2015. Green technology adoption: an empirical study of the Southern California garment cleaning industry. Quantitative Marketing and Economics 13, 319–358. Bolton, P., Harris, C., 1999. Strategic experimentation. Econometrica 67, 349–374. Bond, E.W., Samuelson, L., 1987. The Coase conjecture need not hold for durable good monopolies with depreciation. Economics Letters 24, 93–97. Bottomley, P.A., Fildes, R., 1998. The role of prices in models of innovation diffusion. Journal of Forecasting 17, 539–555. Boulding, K.E., 1950. A Reconstruction of Economics. Wiley and Sons, New York. Bowers, R.V., 1937. The direction of intra-societal diffusion. American Sociological Review 2, 826–836. Brecko, K., 2017. New Features Free of Charge? Using Price to Sort Consumers Among Legacy Software Versions. Working Paper. Simon School of Business. Brockhoff, K., 1967. A test for the product life cycle. Econometrica 35, 472–484. Bronnenberg, B.J., Dubé, J.-P., Moorthy, S., 2019. The economics of brands and branding. In: Dubé, J.-P., Rossi, P.E. (Eds.), Handbook of the Economics of Marketing, vol. 1. Elsevier, pp. 291–358. Bronnenberg, B.J., Mela, C.F., 2004. Market roll-out and retailer adoption for new brands. Marketing Science 23, 500–518. Bucovetsky, S., Chilton, J., 1986. Concurrent renting and selling in a durable-goods monopoly under threat of entry. The Rand Journal of Economics 17, 261–275. Bulow, J., 1986. An economic theory of planned obsolescence. The Quarterly Journal of Economics 101, 729–750. Bulow, J.I., 1982. Durable-goods monopolists. Journal of Political Economy 90, 314–332. Butz, D.A., 1990. Durable-good monopoly and best-price provisions. The American Economic Review 80, 1062–1076. Cabral, L.M.B., Villas-Boas, M., 2005. Bertrand supertraps. Management Science 51, 599–613. Cao, H., Folan, P., 2012. Product life cycle: the evolution of a paradigm and literature review from 1950–2009. Production Planning & Control 23, 641–662. Carlton, D.W., Gertner, R., 1989. Market power and mergers in durable-good industries. The Journal of Law and Economics 32, S203–S226. Carranza, J.E., 2010. Product innovation and adoption in market equilibrium: the case of digital cameras. International Journal of Industrial Organization 28, 604–618. Chan, T., Narasimhan, C., Zhang, Q., 2008. Decomposing promotional effects with a dynamic structural model of flexible consumption. Journal of Marketing Research 45, 487–498. Chan, T.Y., Hamilton, B.H., 2006. Learning, private information, and the economic evaluation of randomized experiments. Journal of Political Economy 114, 997–1040. Chapman, S., Ashton, T., 1914. The sizes of businesses, mainly in the textile industries. Journal of the Royal Statistical Society 77, 507–516. Chatterjee, R.A., Eliashberg, J., 1990. The innovation diffusion process in a heterogeneous population: a micromodeling approach. Management Science 36, 1057–1079. Che, H., Sudhir, K., Seetharaman, P., 2007. Bounded rationality in pricing under state-dependent demand: do firms look ahead, and if so, how far? Journal of Marketing Research 44, 434–449.

429

430

CHAPTER 7 Diffusion and pricing over the product life cycle

Chen, J., Esteban, S., Shum, M., 2013. When do secondary markets harm firms? The American Economic Review 103, 2911–2934. Chen, N., 2018. Perishable Good Dynamic Pricing Under Competition: An Empirical Study in the Airline Markets. Working Paper. National University of Singapore – School of Computing. Chen, P., Hitt, L., 2006. Information technology and switching costs. In: Handbook on Economics and Information Systems. Elsevier, Amsterdam. Chevalier, J., Goolsbee, A., 2009. Are durable goods consumers forward-looking? Evidence from college textbooks. The Quarterly Journal of Economics 124, 1853–1884. Ching, A., Osborne, M., 2017. Identification and Estimation of Forward-Looking Behavior: The Case of Consumer Stockpiling. Working Paper. Rotman School of Business. Ching, A.T., 2010a. Consumer learning and heterogeneity: dynamics of demand for prescription drugs after patent expiration. International Journal of Industrial Organization 28, 619–638. Ching, A.T., 2010b. A dynamic oligopoly structural model for the prescription drug market after patent expiration. International Economic Review 51, 1175–1207. Ching, A.T., Erdem, T., Keane, M.P., 2013. Invited paper—Learning models: an assessment of progress, challenges, and new developments. Marketing Science 32, 913–938. Chintagunta, P., Kyriazidou, E., Perktold, J., 2001. Panel data analysis of household brand choices. Journal of Econometrics 103, 111–153. Studies in estimation and testing. Chou, C., Derdenger, T., Kumar, V., 2019. Linear estimation of aggregate dynamic discrete demand for durable goods without the curse of dimensionality. Marketing Science. Forthcoming. Chou, C.-F., Shy, O., 1996. Do consumers gain or lose when more people buy the same brand. European Journal of Political Economy 12, 309–330. The economics of standardization. Chung, D.J., Steenburgh, T., Sudhir, K., 2013. Do bonuses enhance sales productivity? A dynamic structural analysis of bonus-based compensation plans. Marketing Science 33, 165–187. Church, J., Gandal, N., 1992. Network effects, software provision, and standardization. Journal of Industrial Economics 40, 85–103. Clark, J.M., 1934. Strategic Factors in Business Cycles. Augustus M. Kelley, Inc., New York. Coase, R.H., 1972. Durability and monopoly. The Journal of Law and Economics 15, 143–149. Cohen, J., Ericson, K., Laibson, D., White, J., 2016. Measuring time preferences. Journal of Economic Literature. Forthcoming. https://www.aeaweb.org/articles?id=10.1257/jel.20191074&&from=f. Conlisk, J., Gerstner, E., Sobel, J., 1984. Cyclic pricing by a durable goods monopolist. The Quarterly Journal of Economics 99, 489. Conlon, C., 2012. A Dynamic Model of Prices and Margins in the LCD TV Industry. Working Paper. New York University. Corkindale, D., Newall, J., 1978. Advertising Threshold and Wearout. MCB Publications, Bradford, UK. Cosguner, K., Chan, T.Y., Seetharaman, P.B.S., 2018. Dynamic pricing in a distribution channel in the presence of switching costs. Management Science 64, 1212–1229. Cournot, A., 1838. Researches into the Mathematical Theory of Wealth. The Macmillan Company, New York. Translated by Nathaniel Bacon, 1897. Cox, W.E., 1967. Product life cycles as marketing models. The Journal of Business 40, 375–384. Crawford, G., 2012. Accommodating endogenous product choices: a progress report. International Journal of Industrial Organization 30, 315–320. Crawford, G.S., Shum, M., 2005. Uncertainty and learning in pharmaceutical demand. Econometrica 73, 1137–1173. Daljord, Ø., 2015. Commitment, Vertical Restraints and Dynamic Pricing of Durable Goods. Working Paper. University of Chicago Booth School of Business. Daljord, Ø., Nekipelov, D., Park, M., 2018. A Simple and Robust Estimator for Discount Factors in Optimal Stopping Dynamic Discrete Choice Models. Working Paper. Booth School of Business. Darwin, C., 1859. The Origin of Species. Harvard University Press, Cambridge, MA. Reprint 1964. David, P.A., 1985. Clio and the economics of QWERTY. The American Economic Review 75, 332–337. Day, G.S., 1981. The product life cycle: analysis and applications issues. Journal of Marketing 45, 60–67. Dean, J., 1950. Pricing policies for new products. Harvard Business Review 28, 45–53.

References

Derdenger, T., Kumar, V., 2013. The dynamic effects of bundling as a product strategy. Marketing Science 32, 827–859. Derdenger, T., Kumar, V., 2018. A CCP Estimator for Dynamic Discrete Choice Models with Aggregate Data. Working Paper. Yale SOM. Derdenger, T., Kumar, V., 2019. Estimating Dynamic Discrete Choice Models with Aggregate Data: Properties of the Inclusive Value Approximation. Working Paper. Tepper School of Business. Desai, P., Koenigsberg, O., Purohit, D., 2004. Strategic decentralization and channel coordination. Quantitative Marketing and Economics 2, 5–22. Desai, P., Purohit, D., 1998. Leasing and selling: optimal marketing strategies for a durable goods firm. Management Science 44, S19–S34. Desai, P.S., Purohit, D., 1999. Competition in durable goods markets: the strategic consequences of leasing and selling. Marketing Science 18, 42–58. Dhalla, N.K., Yuspeh, S., 1976. Forget the product life cycle concept! Harvard Business Review (January). Dickson, P.R., Sawyer, A.G., 1990. The price knowledge and search of supermarket shoppers. Journal of Marketing 54, 42–53. Dickstein, M., 2018. Efficient Provision of Experience Goods: Evidence from Antidepressant Choice. Working Paper. NYU Stern Business School. Dilme, F., Li, F., 2018. Revenue management without commitment: dynamic pricing and periodic flash sales. The Review of Economic Studies. Forthcoming. https://www.restud.com/paper/revenuemanagement-without-commitment-dynamic-pricing-and-periodic-flash-sales/. Dockner, E., Jorgensen, S., 1988. Optimal advertising policies for diffusion models of new product innovations in monopolistic situations. Management Science 34, 119–130. Dodd, S.C., 1955. Diffusion is predictable: testing probability models for laws of interaction. American Sociological Review 20, 392–401. Dolan, R.J., Jeuland, A.P., 1981. Experience curves and dynamic demand models: implications for optimal pricing strategies. Journal of Marketing 45, 52–62. Dorfman, R., Steiner, P.O., 1954. Optimal advertising and optimal quality. The American Economic Review 44, 826–836. Draganska, M., Mazzeo, M., Seim, K., 2009. Beyond plain vanilla: modeling joint product assortment and pricing decisions. Quantitative Marketing and Economics 7, 105–146. Dubé, J., Hitsch, G., Rossi, P., 2010a. State dependence and alternative explanations for consumer inertia. The Rand Journal of Economics 41, 417–445. Dubé, J.-P., Hitsch, G., Jindal, P., 2014. The joint identification of utility and discount functions from stated choice data: an application to durable goods adoption. Quantitative Marketing and Economics 12, 331–377. Dubé, J.-P., Hitsch, G., Manchanda, P., 2005. An empirical model of advertising dynamics. Quantitative Marketing and Economics 3, 107–144. Dubé, J.-P., Hitsch, G., Rossi, P., 2009. Do switching costs make markets less competitive? Journal of Marketing Research 46, 435–445. Dubé, J.-P.H., Hitsch, G.J., Chintagunta, P.K., 2010b. Tipping and concentration in markets with indirect network effects. Marketing Science 29, 216–249. Easley, D., Kiefer, N.M., 1988. Controlling a stochastic process with unknown parameters. Econometrica 56, 1045–1064. Eckstein, Z., Horsky, D., Raban, Y., 1988. An Empirical Dynamic Model of Optimal Brand Choice. Working Paper 88. University of Rochester. Eizenberg, A., 2014. Upstream innovation and product variety in the U.S. home PC market. The Review of Economic Studies 81, 1003–1045. Elberg, A., Gardete, P., Macera, R., Noton, C., 2019. Dynamic effects of price promotions: field evidence, consumer search, and supply-side implications. Quantitative Marketing and Economics 17, 1–58. Eliashberg, J., Lilien, G.L., 1993. Mathematical marketing models: some historical perspectives and future projections. In: Marketing. In: Handbooks in Operations Research and Management Science, vol. 5. Elsevier, pp. 3–23. Ellickson, P.B., Misra, S., 2011. Estimating discrete games. Marketing Science 30, 997–1010.

431

432

CHAPTER 7 Diffusion and pricing over the product life cycle

Elmaghraby, W., Keskinocak, P., 2003. Dynamic pricing in the presence of inventory considerations: research overview, current practices, and future directions. Management Science 49, 1287–1309. Enis, B.M., Garce, R.L., Prell, A.E., 1977. Extending the product life cycle. Business Horizons 20, 46–56. Erdem, T., 1996. A dynamic analysis of market structure based on panel data. Marketing Science 15, 359–378. Erdem, T., Imai, S., Keane, M.P., 2003. Brand and quantity choice dynamics under price uncertainty. Quantitative Marketing and Economics 1, 5–64. Erdem, T., Keane, M.P., 1996. Decision-making under uncertainty: capturing dynamic brand choice processes in turbulent consumer goods markets. Marketing Science 15, 1–20. Erdem, T., Keane, M.P., Öncü, T.S., Strebel, J., 2005. Learning about computers: an analysis of information search and technology choice. Quantitative Marketing and Economics 3, 207–247. Erdem, T., Sun, B., 2001. Testing for choice dynamics in panel data. Journal of Business and Economic Statistics 19, 142–152. Erickson, G.M., 1992. Empirical analysis of closed-loop duopoly advertising strategies. Management Science 38, 1732–1749. Ericson, R., Pakes, A., 1995. Markov-perfect industry dynamics: a framework for empirical work. The Review of Economic Studies 62, 53–82. Esteban, S., Shum, M., 2007. Durable-goods oligopoly with secondary markets: the case of automobiles. The Rand Journal of Economics 38, 332–354. Fan, Y., 2013. Ownership consolidation and product characteristics: a study of the US daily newspaper market. The American Economic Review 103, 1598–1628. Fan, Y., Yang, C., 2018. Competition, Product Proliferation and Welfare: A Study of the U.S. Smartphone Market. CEPR Discussion Paper DP11423. Farrell, J., Klemperer, P., 2007. Coordination and lock-in: competition with switching costs and network effects. In: Handbook of Industrial Organization, vol. 3. Elsevier, pp. 1967–2072. Feichtinger, G., 1982. Optimal pricing in a diffusion model with concave price-dependent market potential. Operations Research Letters 1, 236–240. Fershtman, C., 1987. Alternative Approaches to Dynamic Games. Palgrave Macmillan UK, London, pp. 43–65. Fisher, J., Pry, R., 1971. A simple substitution model of technological change. Technological Forecasting and Social Change 3, 75–88. Forrester, J.W., 1959. Advertising: a problem in industrial dynamics. Harvard Business Review 37, 100–111. Fourt, L.A., Woodlock, J.W., 1960. Early prediction of market success for new grocery products. Journal of Marketing 25, 31–38. Frederick, S., Loewenstein, G., O’Donoghue, T., 2002. Time discounting and time preference: a critical review. Journal of Economic Literature 40, 351–401. Friberg, R., Sanctuary, M., 2017. The effect of retail distribution on sales of alcoholic beverages. Marketing Science 36, 626–641. Fudenberg, D., Villas-Boas, J.M., 2007. Behavior-Based Price Discrimination and Customer Recognition. Elsevier Science, Oxford. Gallant, A.R., Hong, H., Khwaja, A., 2018. The dynamic spillovers of entry: an application to the generic drug industry. Management Science 64, 1189–1211. Gallego, G., van Ryzin, G., 1994. Optimal dynamic pricing of inventories with stochastic demand over finite horizons. Management Science 40, 999–1020. Garrett, D.F., 2016. Intertemporal price discrimination: dynamic arrivals and changing values. The American Economic Review 106, 3275–3299. Gavazza, A., Lizzeri, A., Roketskiy, N., 2014. A quantitative analysis of the used-car market. The American Economic Review 104, 3668–3700. Givon, M., Horsky, D., 1978. Market share models as approximators of aggregated heterogeneous brand choice behavior. Management Science 24, 1404–1416. Goettler, R., Clay, K., 2011. Tariff choice with consumer learning and switching costs. Journal of Marketing Research 48, 633–652.

References

Goettler, R.L., Gordon, B.R., 2011. Does AMD spur Intel to innovate more? Journal of Political Economy 119, 1141–1200. Gönül, F., Srinivasan, K., 1996. Estimating the impact of consumer expectations of coupons on purchase behavior: a dynamic structural model. Marketing Science 15, 262–279. Goolsbee, A., Klenow, P.J., 2002. Evidence on learning and network externalities in the diffusion of home computers. The Journal of Law and Economics 45, 317–343. Gordon, B.R., 2009. A dynamic model of consumer replacement cycles in the PC processor industry. Marketing Science 28, 846–867. Gordon, B.R., Sun, B., 2015. A dynamic model of rational addiction: evaluating cigarette taxes. Marketing Science 34, 452–470. Gould, J.P., 1976. Diffusion Processes and Optimal Advertising Policy. Springer, Berlin, Heidelberg, pp. 169–174. Gowrisankaran, G., Rysman, M., 2012. Dynamics of consumer demand for new durable goods. Journal of Political Economy 120, 1173–1219. Grossman, S.J., Kihlstrom, R.E., Mirman, L.J., 1977. A Bayesian approach to the production of information and learning by doing. The Review of Economic Studies 44, 533–547. Grubb, M.D., Osborne, M., 2015. Cellular service demand: biased beliefs, learning, and bill shock. The American Economic Review 105, 234–271. Gul, F., Sonnenschein, H., Wilson, R., 1986. Foundations of dynamic monopoly and the Coase conjecture. Journal of Economic Theory 39, 155–190. Haines, G.H., 1964. A theory of market behavior after innovation. Management Science 10, 634–658. Hanssens, D., Parsons, L.J., Schultz, R.L., 2001. Market Response Models: Econometric and Time Series Analysis. Kluwer Academic Press, Boston, MA. Harrison, J.M., Keskin, N.B., Zeevi, A., 2012. Bayesian dynamic pricing policies: learning and earning under a binary prior distribution. Management Science 58, 570–586. Hartmann, W., 2006. Intertemporal effects of consumption and their implications for demand elasticity estimates. Quantitative Marketing and Economics 4, 325–349. Hartmann, W.R., Nair, H.S., 2010. Retail competition and the dynamics of demand for tied goods. Marketing Science 29, 366–386. Haviv, A., 2019. Consumer Search, Price Promotions, and Counter-Cyclic Pricing. Working Paper. Simon School of Business. Heckman, J., 1991. Identifying the hand of past: distinguishing state dependence from heterogeneity. The American Economic Review 81, 75–79. Hendel, I., Lizzeri, A., 1999. Interfering with secondary markets. The Rand Journal of Economics 30, 1–21. Hendel, I., Lizzeri, A., 2002. The role of leasing under adverse selection. Journal of Political Economy 110, 113–143. Hendel, I., Nevo, A., 2006. Measuring the implications of sales and consumer inventory behavior. Econometrica 74, 1637–1673. Hendel, I., Nevo, A., 2013. Intertemporal price discrimination in storable goods markets. The American Economic Review 103, 2722–2751. Hitsch, G.J., 2006. An empirical model of optimal dynamic product launch and exit under demand uncertainty. Marketing Science 25, 25–50. Hodgson, C., 2016. Trade-ins and Transaction Costs in the Market for Used Business Jets. Working Paper. Stanford University. Hörner, J., Samuelson, L., 2011. Managing strategic buyers. Journal of Political Economy 119, 379–425. Horsky, D., 1990. A diffusion model incorporating product benefits, price, income and information. Marketing Science 9, 342–365. Horsky, D., Misra, S., Nelson, P., 2006. Observed and unobserved preference heterogeneity in brandchoice models. Marketing Science 25, 322–335. Hristakeva, S., 2018. Vertical Contracts and Endogenous Product Selections: An Empirical Analysis of Vendor Allowance Contracts. Working Paper. UCLA Anderson School of Management.

433

434

CHAPTER 7 Diffusion and pricing over the product life cycle

Huang, G., Luo, H., Xia, J., 2018. Invest in information or wing it? A model of dynamic pricing with seller learning. Management Science. Forthcoming. Huang, Y., 2018. The Value of Compatibility to a Tied-Good Market. Working Paper. Simon School of Business. Ishihara, M., Ching, A., 2018. Dynamic demand for new and used durable goods without physical depreciation: the case of Japanese video games. Marketing Science. Forthcoming. https://pubsonline.informs. org/doi/abs/10.1287/mksc.2018.1142. Israel, M., 2005. Services as experience goods: an empirical examination of consumer learning in automobile insurance. The American Economic Review 95, 1444–1463. Iyer, G., Seetharaman, P., 2003. To price discriminate or not: product choice and the selection bias problem. Quantitative Marketing and Economics 1, 155–178. Jain, D.C., Rao, R.C., 1990. Effect of price on the demand for durables: modeling, estimation, and findings. Journal of Business and Economic Statistics 8, 163–170. Jeuland, A., 1981. Parsimonious Models of Diffusion of Innovation. Part B, Incorporating the Variable of Price. Working Paper. Booth School of Business. Jeuland, A.P., Dolan, R.J., 1982. An aspect of new product planning dynamic pricing. TIMS Studies in Management Science 18, 1–21. Jeziorski, P., 2014. Effects of mergers in two-sided markets: the US radio industry. American Economic Journal: Microeconomics 6, 35–73. Jørgensen, S., 1986. Optimal dynamic pricing in an oligopolistic market: a survey. In: Ba¸sar, T. (Ed.), Dynamic Games and Applications in Economics. Springer, Berlin, Heidelberg, pp. 179–237. Josrgensen, S., 1983. Optimal control of a diffusion model of new product acceptance with price-dependent total market potential. Optimal Control Applications and Methods 4, 269–276. Kahn, B.E., Kalwani, M.U., Morrison, D.G., 1986. Measuring variety-seeking and reinforcement behaviors using panel data. Journal of Marketing Research 23, 89–100. Kalish, S., 1983. Monopolist pricing with dynamic demand and production cost. Marketing Science 2, 135–159. Kalish, S., 1985. A new product adoption model with price, advertising, and uncertainty. Management Science 31, 1569–1585. Kalish, S., Lilien, G.L., 1986. A market entry timing model for new technologies. Management Science 32, 194–205. Kamakura, W.A., Balasubramanian, S.K., 1988. Long-term view of the diffusion of durables: a study of the role of price and adoption influence processes via tests of nested models. International Journal of Research in Marketing 5, 1–13. Kamakura, W.A., Balasubramanian, S.K., 1987. Long-term forecasting with innovation diffusion models: the impact of replacement purchases. Journal of Forecasting 6, 1–19. Katz, A., 2014. The first sale doctrine and the economics of post sale restraints. BYU Law Review 2014. Katz, E., Lazarsfeld, P.F., 1955. Personal Influence: The Part Played by People in the Flow of Mass Communications. ISBN 1-4128-0507-4. Katz, M.L., Shapiro, C., 1994. Systems competition and network effects. The Journal of Economic Perspectives 8, 93–115. Keane, M.P., 1997. Modeling heterogeneity and state dependence in consumer choice behavior. Journal of Business and Economic Statistics 15, 310–327. Keller, G., Rady, S., 1999. Optimal experimentation in a changing environment. The Review of Economic Studies 66, 475–507. Keller, K.L., 2002. Branding and brand equity. In: Handbook of Marketing, pp. 151–178. Kelly, R.F., 1967. Estimating ultimate performance levels of new retail outlets. Journal of Marketing Research 4, 13–19. Kim, J., Allenby, G.M., Rossi, P.E., 2002. Modeling consumer demand for variety. Marketing Science 21, 229–250. Klein, B., 2014. The evolving law and economics of resale price maintenance. The Journal of Law and Economics 57, S161–S179.

References

Klemperer, P., 1995. Competition when consumers have switching costs: an overview with applications to industrial organization, macroeconomics, and international trade. The Review of Economic Studies 62, 515–539. Klemperer, P., 2016. Switching Costs. Palgrave Macmillan UK, London, pp. 1–5. Koenigsberg, O., Muller, E., Vilcassim, N.J., 2008. easyJet® pricing strategy: should low-fare airlines offer last-minute deals? Quantitative Marketing and Economics 6, 279–297. Komarova, T., Sanches, F., Silva Junior, D., Srisuma, S., 2018. Joint analysis of the discount factor and payoff parameters in dynamic discrete choice models. Quantitative Economics 9, 1153–1194. Krishna, A., 1992. The normative impact of consumer price expectations for multiple brands on consumer purchase behavior. Marketing Science 11, 266–286. Krishna, A., Currim, I.S., Shoemaker, R.W., 1991. Consumer perceptions of promotional activity. Journal of Marketing 55, 4–16. Lai, T., Robbins, H., 1985. Asymptotically efficient adaptive allocation rules. Advances in Applied Mathematics 6, 4–22. Lambin, J.-J., 1976. Advertising, Competition, and Market Conduct in Oligopoly Over Time. American Elsevier Publishing Company, Inc., New York. Landsberger, M., Meilijson, I., 1985. Intertemporal price discrimination and sales strategy under incomplete information. The Rand Journal of Economics 16, 424–430. Lazarev, J., 2013. The Welfare Effects of Intertemporal Price Discrimination: An Empirical Analysis of Airline Pricing in U.S. Monopoly Markets. Working Paper. NYU Stern. Lazarsfeld, P.F., Berelson, B., Gaudet, H., 1944. The People’s Choice. How the Voter Makes Up His Mind in a Presidential Campaign. Columbia University Press, New York, NY. LeBon, G., 1898. The Psychology of Peoples: Its Influence on Their Evolution. G.E. Stechert and Co., New York. Reprint 1924. Lee, R.S., 2013. Vertical integration and exclusivity in platform and two-sided markets. The American Economic Review 103, 2960–3000. Lekvall, P., Wahlbin, C., 1973. A study of some assumptions underlying innovation diffusion functions. The Swedish Journal of Economics 75, 362–377. Levitt, T., 1965. Exploit the product life-cycle. Harvard Business Review (November). Li, H., 2018. Intertemporal price discrimination with complementary products: E-books and Ereaders. Management Science. Forthcoming. https://pubsonline.informs.org/doi/abs/10.1287/mnsc. 2018.3083?journalCode=mnsc. Li, J., Granados, N., Netessine, S., 2014. Are consumers strategic? Structural estimation from the air-travel industry. Management Science 60, 2114–2137. Lilien, G., Kotler, P., Moorthy, K.S., 1992. Marketing Models. Prentice Hall, Englewood Cliffs, NJ. Lin, X., 2017. Disaggregate network effects on two-sided platforms. Available at SSRN: https://ssrn.com/ abstract=2971184. Liu, H., 2010. Dynamics of pricing in the video game console market: skimming or penetration? Journal of Marketing Research 47, 428–443. Liu, X., Derdenger, T., Sun, B., 2014. An Empirical Analysis of Consumer Purchase Behavior of Base Products and Add-ons Given Compatibility Constraint. Working Paper. Tepper School of Business. Magnac, T., Thesmar, D., 2002. Identifying dynamic discrete decision processes. Econometrica 70, 801–816. Mahajan, A., Tarozzi, A., 2011. Time Inconsistency, Expectations and Technology Adoption: The Case of Insecticide Treated Nets. Working Paper. UC Berkeley. Mahajan, V., Muller, E., Bass, F.M., 1990. New product diffusion models in marketing: a review and directions for research. Journal of Marketing 54, 1–26. Malthus, T.R., 1798. An Essay on the Principle of Population. MacMillan and Co., London. Reprint 1926. Mansfield, E., 1961. Technical change and the rate of imitation. Econometrica 29, 741–766. Manski, C.F., 1993. Identification of endogenous social effects: the reflection problem. The Review of Economic Studies 60, 531–542. Manski, C.F., 2004. Measuring expectations. Econometrica 72, 1329–1376.

435

436

CHAPTER 7 Diffusion and pricing over the product life cycle

Maskin, E., Tirole, J., 2001. Markov perfect equilibrium: I. Observable actions. Journal of Economic Theory 100, 191–219. McLennan, A., 1984. Price dispersion and incomplete learning in the long run. Journal of Economic Dynamics and Control 7, 331–347. Mehta, N., Rajiv, S., Srinivasan, K., 2004. Role of forgetting in memory-based choice decisions: a structural model. Quantitative Marketing and Economics 2, 107–140. Mela, C.F., Gupta, S., Lehmann, D.R., 1997. The long-term impact of promotion and advertising on consumer brand choice. Journal of Marketing Research 34, 248–261. Melnikov, O., 2012. Demand for differentiated durable products: the case of the U.S. computer printer market. Economic Inquiry 51, 1277–1298. Meyer, R.J., Assunção, J., 1990. The optimality of consumer stockpiling strategies. Marketing Science 9, 18–41. Misra, K., Schwartz, E., Abernethy, J., 2017. Dynamic online pricing with incomplete information using multi-armed bandit experiments. Marketing Science. Forthcoming. https://pubsonline.informs.org/ doi/abs/10.1287/mksc.2018.1129?journalCode=mksc. Montgomery, A.L., Bradlow, E.T., 1999. Why analyst overconfidence about the functional form of demand models can lead to overpricing. Marketing Science 18, 569–583. Moon, K., Bimpikis, K., Mendelson, H., 2018. Randomized markdowns and online monitoring. Management Science 64, 1271–1290. Moorthy, K.S., 1984. Market segmentation, self-selection, and product line design. Marketing Science 3, 288–307. Moshkin, N.V., Shachar, R., 2002. The asymmetric information model of state dependence. Marketing Science 21, 435–454. Mussa, M., Rosen, S., 1978. Monopoly and product quality. Journal of Economic Theory 18, 301–317. Naik, P.A., Mantrala, M.K., Sawyer, A.G., 1998. Planning media schedules in the presence of dynamic advertising quality. Marketing Science 17, 214–235. Nair, H., 2007. Intertemporal price discrimination with forward-looking consumers: application to the US market for console video-games. Quantitative Marketing and Economics 5, 239–292. Nalebuff, B., 2008. Bundling and tying. In: Durlauf, S., Blume, L. (Eds.), The New Palgrave Dictionary of Economics, 2nd edition. Narasimhan, C., 1989. Incorporating consumer price expectations in diffusion models. Marketing Science 8, 343–357. Narayanan, S., Manchanda, P., 2009. Heterogeneous learning and the targeting of marketing communication for new products. Marketing Science 28, 424–441. Nerlove, M., Arrow, K.J., 1962. Optimal advertising policy under dynamic conditions. Economica 29, 129–142. Nijs, V.R., Dekimpe, M.G., Steenkamps, J.-B.E., Hanssens, D.M., 2001. The category-demand effects of price promotions. Marketing Science 20, 1–22. Nosko, C., 2014. Competition and Quality Choice in the CPU Market. Working Paper. University of Chicago. Oery, A., 2016. Consumers on a Leash: Advertised Sales and Intertemporal Price Discrimination. Cowles Foundation Discussion Paper No. 2047. Osborne, M., 2011. Consumer learning, switching costs, and heterogeneity: a structural examination. Quantitative Marketing and Economics 9, 25–70. Osborne, M., 2012. Dynamic Demand and Dynamic Supply in a Storable Goods Market. Working Paper. Rotman School of Business. Osland, G.E., 1991. Origins and development of the product life cycle concept. In: Proceedings of the Fifth Conference on Historical Research in Marketing and Marketing Thought, pp. 68–84. Ozga, S.A., 1960. Imperfect markets through lack of knowledge. The Quarterly Journal of Economics 74, 29–52. Polli, R., Cook, V., 1969. Validity of the product life cycle. The Journal of Business 42, 385–400. Putsis, W., 1998. Parameter variation and new product diffusion. Journal of Forecasting 17. Rao, A., 2015. Online content pricing: purchase and rental markets. Marketing Science 34, 430–451.

References

Rao, A., Miller, P.B., 1975. Advertising/sales response functions. Journal of Advertising Research 15, 7–15. Rao, R.S., Narasimhan, O., John, G., 2009. Understanding the role of trade-ins in durable goods markets: theory and evidence. Marketing Science 28, 950–967. Rink, D.R., Swan, J.E., 1979. Product life cycle research: a literature review. Journal of Business Research 7, 219–242. Roberts, J.H., Urban, G.L., 1988. Modeling multiattribute utility, risk, and belief dynamics for new consumer durable brand choice. Management Science 34, 167–185. Robinson, B., Lakhani, C., 1975. Dynamic price models for new-product planning. Management Science 21, 1113–1122. Rogers, E., 1962. Diffusion of Innovations, 1st ed. Free Press of Glencoe, New York. Rossi, P.E., 2014. Invited paper—Even the rich can make themselves poor: a critical examination of IV methods in marketing applications. Marketing Science 33, 655–672. Rothschild, M., 1974. A two-armed bandit theory of market pricing. Journal of Economic Theory 9, 185–202. Roy, R., Chintagunta, P.K., Haldar, S., 1996. A framework for investigating habits, “the hand of the past”, and heterogeneity in dynamic brand choice. Marketing Science 15, 280–299. Rust, J., 1985. Stationary equilibrium in a market for durable assets. Econometrica 53, 783–805. Rust, J., 1986. When is it optimal to kill off the market for used durable goods? Econometrica 54, 65–86. Rust, J., 1987. Optimal replacement of GMC bus engines: an empirical model of Harold Zurcher. Econometrica 55, 999–1013. Rust, J., 1994. Estimation of dynamic structural models, problems and prospects: discrete decision processes. In: Sims, C. (Ed.), Advances in Econometrics: Sixth World Congress, vol. 2. Cambridge University Press, pp. 119–170 (Chap. 4). Rustichini, A., Wolinsky, A., 1995. Learning about variable demand in the long run. Journal of Economic Dynamics and Control 19, 1283–1292. Ryan, S.P., Tucker, C., 2012. Heterogeneity and the dynamics of technology adoption. Quantitative Marketing and Economics 10, 63–109. Sahni, N., 2015. Effect of temporal spacing between advertising exposures: evidence from online field experiments. Quantitative Marketing and Economics 13, 203–247. Sahni, N., Narayanan, S., Kalyanam, K., 2016. An Experimental Investigation of the Effects of Retargeted Advertising: The Role of Frequency and Timing. Working Paper. Stanford GSB. Sanders, R., 2017. Reducing Retailer Food Waste Through Revenue Management. Working Paper. Rady School of Business. Schiraldi, P., 2011. Automobile replacement: a dynamic structural approach. The Rand Journal of Economics 42, 266–291. Seetharaman, P.B., 2004. Modeling multiple sources of state dependence in random utility models: a distributed lag approach. Marketing Science 23, 263–271. Seetharaman, P.B., Ainslie, A., Chintagunta, P.K., 1999. Investigating household state dependence effects across categories. Journal of Marketing Research 36, 488–500. Seiler, S., 2013. The impact of search costs on consumer behavior: a dynamic approach. Quantitative Marketing and Economics 11, 155–203. Shapiro, C., Varian, H., 1998. Information Rules: A Strategic Guide to the Network Economy. Harvard Business Press. ISBN 978-0875848631. Shen, Q., 2014. A dynamic model of entry and exit in a growing industry. Marketing Science 33, 712–724. Shen, Z.-J.M., Su, X., 2007. Customer behavior modeling in revenue management and auctions: a review and new research opportunities. Production and Operations Management 16, 713–728. Shiller, B.R., 2013. Digital distribution and the prohibition of resale markets for information goods. Quantitative Marketing and Economics 11, 403–435. Shin, S., Misra, S., Horsky, D., 2012. Disentangling preferences and learning in brand choice models. Marketing Science 31, 115–137. Shriver, S.K., 2015. Network effects in alternative fuel adoption: empirical analysis of the market for ethanol. Marketing Science 34, 78–97.

437

438

CHAPTER 7 Diffusion and pricing over the product life cycle

Shum, M., 2004. Does advertising overcome brand loyalty? Evidence from the breakfast-cereals market. Journal of Economics & Management Strategy 13, 241–272. Simon, J., Arndt, J., 1980. The shape of the advertising response function. Journal of Advertising Research 20, 11–28. Sobel, J., Takahashi, I., 1983. A multistage model of bargaining. The Review of Economic Studies 50, 411–426. Song, I., Chintagunta, P.K., 2003. A micromodel of new product adoption with heterogeneous and forward-looking consumers: application to the digital camera category. Quantitative Marketing and Economics 1, 371–407. Soysal, G.P., Krishnamurthi, L., 2012. Demand dynamics in the seasonal goods industry: an empirical analysis. Marketing Science 31, 293–316. Sriram, S., Chintagunta, P.K., Agarwal, M.K., 2010. Investigating consumer purchase behavior in related technology product categories. Marketing Science 29, 291–314. Stigler, G.J., 1961. The economics of information. Journal of Political Economy 69, 213–225. Stokey, N.L., 1979. Intertemporal price discrimination. The Quarterly Journal of Economics 93, 355–371. Stokey, N.L., 1981. Rational expectations and durable goods pricing. The Quarterly Journal of Economics 12, 112–128. Stolyarov, D., 2002. Turnover of used durables in a stationary equilibrium: are older goods traded more? Journal of Political Economy 110, 1390–1413. Su, C., Judd, K.L., 2012. Constrained optimization approaches to estimation of structural models. Econometrica 80, 2213–2230. Sun, B., 2005. Promotion effect on endogenous consumption. Marketing Science 24, 430–443. Swan, P.L., 1972. Optimum durability, second-hand markets, and planned obsolescence. Journal of Political Economy 80, 575–585. Sweeting, A., 2012. Dynamic pricing behavior in perishable goods markets: evidence from secondary markets for major league baseball tickets. Journal of Political Economy 120, 1133–1172. Sweeting, A., 2013. Dynamic product positioning in differentiated product markets: the effect of fees for musical performance rights on the commercial radio industry. Econometrica 81, 1763–1803. Sweeting, A., 2019. Secondary markets. In: The New Palgrave Dictionary of Economics. Talluri, K., van Ryzin, G., 2004. The Theory and Practice of Revenue Management. International Series in Operations Research and Management Science. Springer. Tanny, S.M., Derzko, N.A., 1988. Innovators and imitators in innovation diffusion modelling. Journal of Forecasting 7, 225–234. Tarde, G., 1903. Laws of Imitation. Peter Smith, Glouchester, MA. Reprint 1962. Trefler, D., 1993. The ignorant monopolist: optimal learning with endogenous information. International Economic Review 34, 565–581. Tuchman, A., 2018. Advertising and Demand for Addictive Goods: The Effects of E-Cigarette Advertising. Working Paper. Kellogg School of Management. Vakratsas, D., Feinberg, F.M., Bass, F.M., Kalyanaram, G., 2004. The shape of advertising response functions revisited: a model of dynamic probabilistic thresholds. Marketing Science 23, 109–119. Villas-Boas, J.M., 2004. Consumer learning, brand loyalty, and competition. Marketing Science 23, 134–145. Villas-Boas, J.M., 2015. A short survey on switching costs and dynamic competition. International Journal of Research in Marketing 32, 219–222. Viswanathan, M., 2016. Economic Impact of Category Captaincy: An Examination of Assortments and Prices. Working Paper. Eller School of Business. von Bertalanffy, L., 1957. Quantitative laws in metabolism and growth. The Quarterly Review of Biology 32, 217–231. pMID: 13485376. Waisman, C., 2017. Selling Mechanisms for Perishable Goods: An Empirical Analysis of an Online Resale Market for Event Tickets. Working Paper. Stanford University. Waldman, M., 2003. Durable goods theory for real world markets. The Journal of Economic Perspectives 17, 131–154.

References

Waldman, M., 2007. Antitrust perspectives for durable-goods markets. In: Recent Developments in Antitrust: Theory and Evidence. MIT Press, pp. 1–37. Weintraub, G.Y., Benkard, C.L., Van Roy, B., 2010. Computational methods for oblivious equilibrium. Operations Research 58, 1247–1265. Williams, K., 2018. Dynamic Airline Pricing and Seat Availability. Cowles Foundation Discussion Paper 3003-U. Wollmann, T.G., 2018. Trucks without bailouts: equilibrium product characteristics for commercial vehicles. The American Economic Review 108, 1364–1406. Yao, S., Mela, C.F., 2011. A dynamic model of sponsored search advertising. Marketing Science 30, 447–468. Yao, S., Mela, C.F., Chiang, J., Chen, Y., 2012. Determining consumers’ discount rates with field studies. Journal of Marketing Research 49, 822–841. Zhang, J., 2010. The sound of silence: observational learning in the U.S. kidney market. Marketing Science 29, 315–335. Zhou, Y., 2017. Bayesian estimation of a dynamic model of two-sided markets: application to the U.S. video game industry. Management Science 63, 3874–3894. Ziegler, A., Lazear, E.P., 2003. The Dominance of Retail Stores. Working Paper 9795. National Bureau of Economic Research.

439

CHAPTER

8

Selling and sales management✩

Sanjog Misra University of Chicago Booth School of Business, Chicago, IL, United States e-mail address: [email protected]

Contents 1 Selling, marketing, and economics .......................................................... 1.1 Selling and the economy ......................................................... 1.2 What exactly is selling? ........................................................... 1.3 Isn’t selling the same as advertising?........................................... 1.4 The role of selling in economic models ........................................ 1.5 What this chapter is and is not .................................................. 1.6 Organization of the chapter ...................................................... 2 Selling effort ..................................................................................... 2.1 Characterizing selling effort ...................................................... 2.1.1 Selling effort is a decision variable .......................................... 2.1.2 Selling effort is unobserved ................................................... 2.1.3 Selling effort is multidimensional ............................................ 2.1.4 Selling effort has dynamic implications..................................... 2.1.5 Selling effort interacts with other firm decisions .......................... 3 Estimating demand using proxies for effort ................................................. 3.1 Salesforce size as effort .......................................................... 3.1.1 Recruitment as selling: Prospecting for customers ...................... 3.1.2 Discussion: Salesforce size and effort ...................................... 3.2 Calls, visits, and detailing as selling effort .................................... 3.2.1 Does detailing work? ........................................................... 3.2.2 How does detailing work? ..................................................... 3.2.3 Is detailing = effort? ............................................................ 4 Models of effort .................................................................................. 4.1 Effort and compensation ......................................................... 4.2 Effort and nonlinear contracts ................................................... 4.3 Structural models.................................................................. 4.3.1 Effort and demand.............................................................. 4.3.2 The supply of effort .............................................................

442 442 444 444 446 448 449 450 450 451 451 453 453 455 455 455 457 459 459 460 463 466 467 469 469 471 472 474

✩ The author would like to thank Harikesh Nair, Brad Shapiro, Peter E. Rossi, Jean-Pierre Dube, and

two reviewers for their for helpful comments and suggestions. Thanks are also due to Olivia Natan for her careful reading of early drafts and useful suggestions. The author acknowledges research support from the Neubauer Family Foundation. Handbook of the Economics of Marketing, Volume 1, ISSN 2452-2619, https://doi.org/10.1016/bs.hem.2019.07.001 Copyright © 2019 Elsevier B.V. All rights reserved.

441

442

CHAPTER 8 Selling and sales management

4.4 Remarks............................................................................. 5 Selling and marketing .......................................................................... 5.1 Product .............................................................................. 5.2 Pricing............................................................................... 5.3 Advertising and promotions ...................................................... 6 Topics in salesforce management ............................................................ 6.1 Understanding salespeople....................................................... 6.2 Organizing the salesforce ......................................................... 6.2.1 Territory decisions .............................................................. 6.2.2 Salesforce structure ............................................................ 6.2.3 Decision rights................................................................... 6.3 Compensating and motivating the salesforce.................................. 6.3.1 Contract elements .............................................................. 6.3.2 Contract shape and form ...................................................... 6.3.3 Dynamics ......................................................................... 6.3.4 Other issues...................................................................... 7 Some other thoughts ............................................................................ 7.1 Regulation and selling ............................................................ 7.2 Selling in the new world .......................................................... 7.3 Concluding remarks ............................................................... References............................................................................................

478 478 479 480 481 483 483 484 484 485 486 487 487 487 488 488 489 489 490 491 491

Everyone lives by selling something. Robert Louis Stevenson (Across the Plains)

1 Selling, marketing, and economics 1.1 Selling and the economy In a large number of industries, the mechanism by which consumers are informed, persuaded, and induced to make purchases involves the effort of an agent acting on behalf of the firm. These “sales” agents are often referred to as salespeople, sales representatives, or simply just “reps”. The prototypical salesperson exerts “selling effort” that is ultimately aimed at eliciting some desired outcome, typically a sale. This effort has various components such as prospecting (finding leads or potential customers), qualifying or screening these leads, presenting the offer to the customers, and ultimately closing the sale. Given that these individuals are the front line of a firm’s interaction with its customers, issues relating to the management of these agents is of fundamental importance to firms. The institution of selling holds a significant place in the economy. Over 14.5 million workers in the US fall in the category of “Sales and Related Occupations” as defined by the Bureau of Labor Statistics (BLS).1 This represents close to 10% of 1 See https://www.bls.gov/oes/current/oes410000.htm.

1 Selling, marketing, and economics

the total US labor workforce and is the second largest occupation category in the US (second only to “Office and Administrative Support Occupations”). The BLS also reports that the average annual wages for sales related occupations stands at about $40,681 and ranges from $16,980 to $162,740. To put these numbers in perspective, the total sales compensation component of US businesses, based on BLS estimates, stands at about $590 billion or about 3.2% of US GDP. These numbers are at best conservative with some estimates closer to the one trillion mark (Zoltners and Sinha, 2005), about 5% of the GDP. Compare this to total advertising expenses in the US which were estimated to be somewhere around the $200 billion mark.2 To get a better idea of the magnitude of selling, within the context of a single firm, consider the Pharmaceuticals sector as a specific case-study. As of 2012, approximately 72,000 pharmaceutical sales representatives were employed in the United States (Rockoff, 2012). A Congressional Budget Office report (Campbell, 2009) states that the estimated promotional spend in the industry was over $20B of which about $12B was spent on “detailing,” which reflects the visits that medical (sales) representatives made to physicians, nurse practitioners, and physicians’ assistants. In contrast, traditional broadcast advertising (known as direct to consumer or DTC advertising) accounts for less than $5B. A more recent report3 from IQVIA (formerly IMS) suggests that globally, a similar proportion (65%) of promotion budgets in the pharmaceutical sector are allocated to selling activities. The large allocation of investment to selling is driven in large part by the wages of the salespeople. These numbers, while staggering, are not surprising given the scale of selling operations in this sector. A typical large pharmaceutical manufacturer will have about 7500 reps in the US (and about double that worldwide) with each rep earning about $150,000 per year. This amounts to $1.125B in compensation costs alone for a single firm. With other expenses (travel, entertainment etc.) the selling budget for this firm would be as high $2B.4 Either way, when compared to any other form of promotion, selling emerges as the largest expense a pharmaceutical company can make. The pharmaceuticals sector is one of many industries where the presence of salespeople is critical to the operations of the firm. Retail is dominated by selling activities with about 50% of the employees being salespeople. The financial services sector spends $7.5B5 on advertising but BLS estimates of sales related wages are in the $30B–$70B range.6 Similar investments in selling can be seen across the numerous other sectors, including automobiles, banking, consulting, and technology. These large numbers do not directly speak to the degree or manner in which selling creates value for the firm, or any other aspect of the economic role that selling plays in firms. They do, however, point out the fact that the firms choose to make these large invest2 See

https://www.statista.com/statistics/273736/advertising-expenditure-in-the-worlds-largest-admarkets/. 3 https://www.iqvia.com/library/publications/channeldynamics-global-reference-2017-edition. 4 http://fortune.com/2016/06/23/medical-sales-reps-pay-gap/. 5 See http://www.statista.com/statistics/275506/top-advertising-categories-in-the-us/. 6 http://www.bls.gov/oes/current/oes_nat.htm#41-0000.

443

444

CHAPTER 8 Selling and sales management

ments in selling activities, and therefore, by a revealed preference argument, we must conclude that selling is both relevant and important to their operations.

1.2 What exactly is selling? Standard textbooks on selling define selling as a person-to-person communication process that results in some form of transaction. The classic example of this is the salesperson who knocks on the door to convince you to buy whatever it is they are selling. A critical aspect of selling is that it features some form of personal communication. In the past, this usually meant that there was an in-person face-to-face element to the process. While this aspect is still in vogue in many selling engagements, the advent of new forms of communication have changed the process in significant ways. Even though the interaction may not always be in-person anymore, selling remains at its core a personal communication process between two individuals. Since the process requires two participating individuals, a starting point for selling is the act of finding target customers. The set of activities that a salesperson engages that result in potential customers or “leads” is often described as “prospecting” for obvious reasons. Once such prospects or leads are found, the process moves to the “approach” stage wherein the salesperson reaches out to the prospect usually to assess the potential for engagement. If the approach is successful, there is contact and communication. This allows the salesperson to being the next sub-process, that of “qualifying” the prospect. The salesperson does an expected value calculation and compares the expected cost of engaging and selling to the prospect to the expected return. As the process continues, it naturally moves towards an outcome. This outcome could be a sale, a deferment, or a non-transaction. The salesperson’s effort are obviously invested in increasing the possibility of the first outcome. If a sale is created the process terminates with some possibility of follow-up or some form usually described as “servicing the account.” This last element is more likely in environments that afford repeat sales. The broad notion of selling can be thought of as the set of activities, investments, and processes that seek to identify, manage, and ultimately realize, demand opportunities. There is the age old debate of whether selling can create demand, but that is not the focus of this chapter. In this chapter, I will use the term “selling” to narrowly represent the set of costly (to the salesperson) activities that move a customer from being a prospect to a final sale.

1.3 Isn’t selling the same as advertising? In a word, no. Selling is often portrayed of as a form of advertising (see e.g. Rizzo, 1999; Leffler, 1981), but that is a misconception created by a superficial treatment of the two constructs. While there certainly are elements of commonality across the two marketing instruments (e.g. that they are both forms of strategic communication with the objective of influencing the receiving party), there are numerous differences. First, advertising is non-personal and non-interactive, while selling involves personal interaction with the customer. Consider the interactions between an customer

1 Selling, marketing, and economics

looking to buy an automobile and salesperson for a particular auto manufacturer. Now, contrast that to the customer seeing a television advertisement from that said manufacturer. These are very different channels that deliver seemingly overlapping content but have distinct roles to play in the efforts made by the firm to realize a sale. Similarly, one could contrast selling and advertising by comparing the interactions that a physician has with a pharmaceutical salesperson to the act of the physician seeing an ad for the focal drug in a medical journal. In one case there is a human element to the message and in the other there is not. In other words, selling is an interactive process while advertising is more or less a passive encounter. Second, as a corollary to the first distinction, we have the fact that selling is by construction a personalized form of communication while advertising adopts more of a broadcast format. When interacting with a salesperson, each customer gets a different message which is, in part, driven by the interaction between the salesperson and the customer. Put differently, selling interactions are malleable in their content and afford the firm’s agent an opportunity for real time customization of the message being delivered. In contrast, advertising has fixed messages that are broadcast to some pre-defined aggregated audience. With the advent of targeted advertising, there is now increased scope for personalization in advertising, but such personalizations are far from those delivered in a selling encounter. To fully appreciate this, consider the fact that in a selling encounter the personalization is dynamic. That is, the message and content of the sales pitch shifts dynamically (and in real time) as new information is made available to the salesperson regarding the preferences of the customer. Since information flows both ways in the conversation, there is a completely different element of customization in selling that advertising simply cannot match. Third, selling and advertising efforts are controlled by different economic agents. In the context of advertising, the firm has complete control over the message and the manner in which the message is delivered to the potential customer. While the interpretation of the message is done by the consumer, in the context of messages from other competing firms, the deliver of the advertising message is fully controlled by the firm. Selling, in contrast, is a delegated function wherein the firm has, at best, limited control over the way the message is delivered and the message itself. Since the interaction between the salesperson and the customer cannot be monitored (not perfectly anyway) there will be departures between the firm’s optimal messaging approach and those employed by their sales agents. Moreover, the message and delivery mechanism in selling is a function of the characteristics, ability, and incentives faced by the focal agent. In other words there is considerable heterogeneity in the messaging across individual salespeople that doesn’t occur in the advertising context. Fourth, the human element in selling creates a host of agency issues that the firm has to deal with that are absent (or at least quite different) when dealing with advertising. Risk aversion, moral hazard, and adverse selection emerge as relevant issues when dealing with salespeople but are of limited concern when it comes to advertising decisions. To be fair, there may be similar economic forces at play when outsourcing advertising decisions to agencies or media houses, but those are unrelated to the interaction between the customer and the firm and have limited bearing on the discus-

445

446

CHAPTER 8 Selling and sales management

sion of topics we are interested in. More generally, sales management also pertains to the management of the people that constitute the salesforce. In other words, managing the selling function necessitates thinking about the organization of the firm, the allocation of decision rights, tasks, and territories, as well as tasks related to the recruitment, compensation, motivation, and retention of salespeople. In contrast, advertising functions are often outsourced to external parties including advertising agencies and media houses. Finally, and unsurprisingly, there are significant cost differences between the two forms of communication. Typically, advertising has broader reach and, on a per-capita basis, is cheaper than the more targeted selling effort of sales agents. Since the effectiveness of advertising vis-a-vis selling varies across industries and contexts, it is difficult to ascertain which of the two have a better return on investment. The few studies that have been done (particularly in the pharmaceutical industry – see e.g. Neslin, 2001) suggest that selling offers a better bang for each buck. The differences between selling and advertising have implications of how we incorporate selling into economic and marketing models. The extant literature has either treated them as interchangeable or has considered the difference but ignored human element that underscores the generation of selling effort.

1.4 The role of selling in economic models Selling and the management of salespeople offers a natural laboratory to study a plethora of economic constructs, since the decisions related to selling collect a variety of economic ideas and problems under a single umbrella. I would include Organizational Economics, Personnel Economics, Industrial Organization, and obviously Marketing as being relevant to the understanding selling and salespeople. To begin, let’s revisit the rationale for why selling exists as a function. Our earlier discussion implies that the role of a salesperson is to inform and persuade potential customers to purchase the firms product. It follows then that selling operates by influencing the utility that a consumer expects to derive from the purchase of the product. The manner in which such influence operates has been of interest to the Marketing literature for a while. Aggregation of these effects across customers suggests that selling is an important ingredient that potentially shifts the demand function faced by the firm. As such, selling should be of fundamental relevance to anyone studying IO topics in the context of industries where selling is prevalent. Consider the automobile sector where there are an estimated one to two million salespeople employed, and the vast majority of sales transactions take place after being influenced, at least in part, by the salesperson. I would argue that the primary tool that auto dealerships compete via is their salesforce. This is also true in other industries such as Pharmaceutical and Real Estate where direct sales are the primary channel, and less obvious, but still true, in the other industries discussed in the introduction such as Financial services, Banking, and Technology. What’s more, in a number of these industries, the critical measures we aim to estimate and understand (such as price elasticities) are a function of selling effort. In some cases this happens because the salesperson acts as the

1 Selling, marketing, and economics

bargaining agent (Real Estate, Automobiles), in other cases the salesperson acts as the bidding agent in auctions (Construction, Technology, Consulting Services), while in many others the salespeople have been delegated decision rights with respect to prices or discounts. Understanding the complex interplay between selling and other economic decisions within and across firms remain an open area of research for marketers and economists alike. Once we accept the fact that selling has a role to play in the construction and realization of demand, a subsequent set of questions emerge that have to do with the firm’s ability to control this instrument. As discussed earlier, unlike advertising or pricing, the provision of selling effort is delegated to the salesperson. As with any other agency problem, the goals of the agent (the salesperson) and the principal (firm) do not automatically align. As one would expect, any agency issues that arise need to be addressed via contracts between the firm and the agent. What makes selling unique as an economic construct is that it ties together the “front-end”, consumer facing aspects of demand to “back-end” problems in organizational and personnel economics. The forces that govern the generation of productive effort, and the indirect control of such effort via contracts and organizational structure is interest to economists and marketers alike. The selling function is a unique setting where the contracts are mostly explicit, the production function is relatively straightforward, and data is widely available (to the firm). As such, it offers a unique opportunity to study topics related to agency theory, personnel economics, and other broad organization economics topics. Topics such as the design of compensation plans, the reporting structure of the sales organization, and other recruitment, promotion, and retention policies are equally relevant to Economics as they are to Marketing. At the most strategic level, decisions relating to selling begin with a simple question: Should a firm employ salespeople? This question relates directly to issues related to Organization Economics, boundaries of the firm, allocation of decision rights, and other agency issues. While an in-depth treatment of this question is beyond the scope of this chapter, I will point the reader to some excellent work by Erin Anderson (Anderson and Schmittlein, 1984; Anderson, 1988) who examined this question using Williamson’s (1979) transaction cost framework. The broad idea articulated in this line of research is that the choice of having an employee-based salesforce depends, at its core, on whether there are specific assets at play (e.g. complex products that require customized training to demonstrate) and the difficulty of evaluating performance. In a seminal paper, Holmstrom and Milgrom (1994) examine similar strategic questions from an agency theoretic perspective, and coincidentally use the selling function and sales organizations as the continuing example in their analysis. Their analysis has direct implications for a number decisions managers make with respect to salesforces including task assignment, monitoring, and compensation. Given the range and abundance of academic questions contained in the selling function, and the magnitude of selling investments made by firms, there would seem to be ample reason for selling to be a well researched topic in the Marketing and Eco-

447

448

CHAPTER 8 Selling and sales management

nomics literature. While there is a somewhat large body of theoretical work on the topic (mostly in Marketing), there is very limited empirical research, at least when contrasted with traditional advertising. One potential explanation for this dearth of research is the limited availability of data. Firms are often reluctant to share data on their employees or about compensation practices and this may have limited the amount of empirical research on the topic. I would argue that, for researchers in Marketing, this is possibly the main obstacle. It does not, however, explain the absence of research in Economics related to selling. A possible explanation is that Selling hasn’t been thought of as an independently important economic construct. This argument is consistent with the discussion earlier suggesting that selling is often considered a variant of advertising. In some cases this confusion is explicit (Rizzo, 1999; Leffler, 1981), while in others it is reflected in the implicit choice to ignore the role of the salesperson (see discussion on Detailing later). The corollary to this argument is that selling has also not been seen as an economically relevant context. While the Labor and Personnel Economics literatures are replete with empirical research relating employee incentives to firm outcomes, selling is rarely the empirical context considered (farms and plantations are way more studied). Similarly, Industrial Organization and Marketing papers that estimate demand models often ignore the role of salespeople and selling, even when focusing on industries where selling is the primary channel (e.g. Pharmaceuticals, Automobiles). In later sections, I will discuss these issues in more detail.

1.5 What this chapter is and is not The overarching goal of this chapter is to provide the reader a framework for thinking about the economics of selling and consequently about new research ideas research on this topic. The chapter will provide an overview of the role of selling in empirical models of demand, a discussion of the issues related to estimating such models, and a critical examination of the state of the research related thereto. It will also contain a discussion on a selection of topics related to firm decisions with respect to the management of salespeople, and link these supply side decisions to the demand side elements of selling. The chapter is not meant to be comprehensive or exhaustive in its treatment of the literature. Where and when appropriate, I will provide a discussion of certain papers but will not include an exhaustive list of all related papers. The choice of particular papers, and their mention in this chapter is subjective, and serve only as a means to providing illustrative examples of a research relevant point. More often than not these discussions will be devoid of particular valence unless, of course, that is not the objective of the discussion. Again, this chapter is not intended to be a literature review or an annotated bibliography. As such, I am fully aware that the chapter will be incomplete in its treatment of this vast topic, will have ignored, what some might consider, “important” topics and papers. I hope to make up for that incom-

1 Selling, marketing, and economics

pleteness of coverage by bringing in some depth and constructive criticism to the discussion. To be clear, I will restrict my attention only to empirical research on topics pertaining to selling and sales management and, further, to research that is based on observational data. The chapter will have little to say about experimental research, survey based research, or other forms of qualitative research on the topic of selling and sales management. That is not to say that those are not relevant or important topics, just that this chapter has a specific and narrow focus. The chapter will also remain largely silent (with minor exception) on the large theory literature that has emerged on various elements of sales management such as recruitment, compensation, and structuring. There are some excellent reviews on the topic (see Coughlan and Sen, 1989; Coughlan, 1993; Coughlan and Joseph, 2012 for a discussion of topics related to salesforce compensation and Lilien et al., 1992; Mantrala et al., 2010; Mantrala, 2014 for a more general overview of the literature), and I refer the reader to those. This chapter will also be narrow in its definition of selling and sales management. For example, franchisee, authorized dealers, manufacturers representatives, and other related vertical entities that may have a selling element to their function will largely be ignored. There are thoughts and ideas herein that also relate to them, but there will be limited discussion of other vertical structures and agencies. To reiterate, the chapter will aim to make the reader think about selling and salespeople and the over-sized role it plays in our economy. It endeavors to provide the reader an understanding of the relevance selling and sales management have for the fields of Marketing, Industrial Organization, and more broadly Economics, and offer some insight into the nuance that is needed to empirically model selling related constructs, agents, entities, and actions. Ultimately, this chapter reflects my own excitement with the topic and I am hoping that some of that rubs off, so as to spark a renewed interest in this important yet hopelessly under-appreciated area.

1.6 Organization of the chapter The rest of this chapter is organized as follows. The first few sections of this chapter will focus on the role of selling in the context of its primary goal – influencing demand. Section 2 looks at the construct of selling effort and Section 3 follows with by considering empirical proxies for this effort construct in the context of estimating demand models. Section 4 examines the idea of treating selling effort as an endogenous construct, particularly in the context of structural models. Section 5 outlines the interaction between selling and marketing with a particular focus on the broad elements of marketing, namely product, pricing, and advertising. Section 6 outlines thoughts on topics related to salesforce management. I conclude with a discussion of some ideas relating to regulation and the relevance of selling in the new data rich environment.

449

450

CHAPTER 8 Selling and sales management

2 Selling effort Understanding and estimating the demand effects of investments made in selling activities is a necessary ingredient for the optimal allocation of such investments. Early work in the marketing literature focused on aggregate responses to salesforce activities, while more recent papers have shifted the attention to understanding these effects at a micro level. Part of this shift in focus stems from the objective that researchers have sought to optimize, while part is on account of the early paucity of granular transaction level data. As data has become more ubiquitous and computing power less expensive, the models used in the analysis of selling and its impact on demand have steadily improved with stricter attention paid to the constructs and specifications being used, identification issues, as well as the addressing econometric challenges. In the discussion to follow, I will first aim at characterizing the defining features of selling effort, and discuss the implications these have for the construction and estimation of economic models.

2.1 Characterizing selling effort The primary focus of the empirical literature in Marketing and Economics (as it relates to selling) has been on the estimating the responsiveness of firm outcomes such as sales and revenues to “selling effort”. The operationalization of this effort, however, has varied, and there is little consensus on, or even a clear articulation of, what selling effort connotes. As discussed earlier, the textbook description of selling activities includes all activities included in the process that results in the outcome of interest. This process starts with lead generation and goes all the way to closing the sale and post-sale interactions with customers. Given the variety and complexity of tasks involved, it stands to reason, that the way selling effort could impact a firm’s outcomes is inherently non-trivial. To provide the reader with an empirical context, I will formalize the relation between sales and effort, and for the remainder of this chapter will use the following construction, Qt = f (et , mt , st , ξt , t ) .

(1)

In the above equation and in what follows, m denotes the set of marketing instruments (such as advertising and prices), e is effort, while s and ξ are other exogenous, observed and unobserved variables that might impact demand. The t will represent unobserved i.i.d. (typically) disturbances that impact sales outcomes. Not all models of demand that include a selling component include the complete set of constructs described in the above equation – the typical specification includes some measure of effort, one or more of the other constructs, and some f that is parametrically specified. While the specification is straightforward and resembles typical demand models, there are a number of aspects to this problem that render it unique. We discuss some of these in what follows.

2 Selling effort

2.1.1 Selling effort is a decision variable Much like other endogenous constructs (prices, advertising) used in demand models, selling effort is a decision variable that is the result of some supply side decision that trades off costs and benefits. Unlike the other constructs, this decision is not made by the firm directly. Instead it is delegated to an agent (the salesperson) who makes effort allocations based on their own utility function. The extent to which the firm’s payoffs are aligned with the utility of the salesperson depends crucially on the nature of the contractual relationship between them. The timing of the decision making is relevant to the construction of the demand models as well. One could imagine a context where the salesperson observes marketing choices (e.g. advertising) made by the firm and then chooses effort in response to those observations. In this case, effort would be a functional response with et = e (mt , st , ξt ). Of course, one might assume that effort levels are independent of other factors in which case effort could be a realization from some independent stochastic process or just simply constant. The decision of how to model the effort generation process will be critical to how we consequently model sales. We discuss this further in later sections.

2.1.2 Selling effort is unobserved Another important difference between demand models that contain some version of selling effort and those that don’t is the fact that the key control variable in this model, effort, is unobserved. This “unobservability” is on the part of the firm as well as the researcher, and needs to be distinguished to the usual unobservables that might impact demand. Typically, we think of unobservables as being unobserved information from the point of the researcher, while the firm is assumed to have complete access to this information and possibly conditions on it. Consequently, we worry about the potential for endogeneity stemming from some form of an omitted variable bias. In contrast, the unobserved effort, is a decision variable for the salesperson who may or may not condition on other factors in making that decision. Given the unobservability of effort to the firm, it is less likely that the scenario where the firm conditions on effort when deciding on other marketing investments occurs. An immediate consequence of the unobserved nature of effort is that the firm (and the researcher) cannot distinguish between the effort and demand shocks (ε). Recall that the unobservable state variables (ξ ) are unobserved only to the researcher, not to the firm. The inability to distinguish between effort and noise renders it impossible for the firm to write contracts that specify required levels of effort. In other words, effort is non-contractible. In addition, the literature (discussed later in this chapter) often assumes that there exists no other signals available to the firm that would allow it to “back out” effort, or for the firm to ex-post infer or verify the level of effort exerted by the salesperson. Given this complete lack of information about effort, the empirical exercise of estimating the responsiveness to sales to effort is non-trivial. The extant literature has taken three routes to handling this problem. The first, and possibly the most common, approach is to completely ignore the prevalence of selling effort. Consider the large

451

452

CHAPTER 8 Selling and sales management

literature on automobile demand. In spite of the fact that the vast majority of purchase transactions are facilitated by salespeople, the role of effort is conspicuously missing from these models. It might be tempting to think of the unobservables as in BLP (Berry et al., 1995) as accounting for effort, but as discussed earlier, that thought process would be flawed if effort is a decision variable rather than an exogenous shock. One could claim that since prices and product characteristics are included, the model implicitly accounts for effort through the functional form assumptions made. This isn’t a compelling argument, since there may be systematic factors that impact effort that are being ignored from the model (for example, salesperson incentives that vary by brand and time). Further, there may be boundary conditions (or marginal customers) where the sale would not have been possible without the presence of selling efforts. Automobiles are not the only industry where selling effort matters and has been systematically ignored. Similar arguments could be made for pharmaceuticals (more on this later), financial services and banking, or any of the industries discussed in the introduction. The second empirical approach that is adopted to account for selling effort is the use of proxies. The most common proxies used include the number of salespeople and the number of interactions that a salesperson has with a potential customer. These proxies are informative about effort in the minimal sense that one might consider them a censoring indicator. In other words, effort will be zero if there isn’t a salesperson available to sell to the customer, but the availability of the salesperson tells us nothing about the level of effort exerted. Similarly, effort can only be exerted if there is an interaction between the customer and the salesperson, but an indicator for the presence of the interaction doesn’t inform us abut effort levels. In both these examples, all we can say is that effort is potentially positive when the proxy construct is non-zero but we cannot consider these as measures of effort, except in special circumstances. One example where proxies work is that of constant effort. If effort is a constant, the number of salespeople or the number of interactions are indeed measures of the amount of effort. Even if the scale of effort remains unknown, we can trace out the locus of sales responses using (hopefully exogenous) shifts these proxies. The approach of using proxies for unobserved effort has dominated the sales and selling literature in Marketing, and to some extent in Economics as well. Perhaps the most visible example of this approach is the study of the impact that “detailing”, i.e. the number of interactions that medical salespeople have with physicians, has on prescribing behavior (sales). Finally, and more recently, there has been emerging work that seeks to model effort from a structural viewpoint. The core idea is to consider the decision process of the salesperson in their allocation of effort, take into account factors that influence that process, and delineate an econometric model that allows for these factors to then indirectly influence demand. Doing so, brings to the model factors that are often ignored from traditional demand models in Marketing and Economics. We will discuss the proxy approach, as well as more structural models of effort, in a lot more detail, later in this chapter.

2 Selling effort

2.1.3 Selling effort is multidimensional Given the earlier discussion of what selling effort entails, it follows that there are distinct activities that the salesperson engages in that one might collectively call effort. While the ultimate goal is often the consummation of a sales transaction, these efforts also generate intermediate outcomes and signals. This aspect of effort has been discussed in the theoretical literature quite extensively, both in Economics as well as in Marketing. In a series of papers, Bengt Holmstrom and Paul Milgrom (see e.g. Holmstrom and Milgrom, 1991 and Holmstrom and Milgrom, 1994) deal with multitask settings and investigate the theoretical underpinnings of the role inventive contracts play in the allocation of agent effort in multi-task scenarios. More specific to selling, Lal and Srinivasan (1993) consider the case of a multi-product firm where the salesperson exerts effort for two products with two sales outcomes. A similar multi-product setup is investigated theoretically by Caldieraro and Coughlan (2007) in the context of “spiffs” (direct incentives that are offered by manufacturers to salespeople employed by retail establishments). One particularly interesting paper is by Inderst and Ottaviani (2009) who examine equilibrium levels of “mis-selling” (the mismatch between a customers needs and the product sold) when the salesperson engages in two sequential activities “prospecting” and “advising”. In spite of the many theory papers and an expanse of real world contexts where agents engage in different types of effort to influence a variety of outcomes, there is very limited empirical work examining this facet of selling. There are, however, some notable exceptions. Slade (1996) uses data from gas stations to test theoretical predictions of the HolmstromMilgrom multi-task framework. In her framework the agent can engage in selling gas or in some other productive activity (selling items at the convenience store). More recently, Kim et al. (2018a) model the effort decision of loan officers (effectively salespeople selling retail banking products) who are jointly optimizing effort levels for both loan acquisition as well as for repayment. To the best of my knowledge this is the only structural empirical work in the context of a salesperson’s multi-tasking effort.

2.1.4 Selling effort has dynamic implications The next challenge in modeling sales response to effort is that effort levels chosen by salespeople will be dynamic. These dynamics operate on two levels: First, effort has some form of dynamic influence on sales. That is, current effort will potentially impact future levels of sales. This shouldn’t be surprising, since effort occurs in the context of repeated human interactions. Given that selling effort typically aims to inform and persuade customers, any future actions that these customers take could very well be influenced by past effort levels. There have been two modeling approaches to accounting for such dynamic effects of effort. The standard approach is to construct a stock of “goodwill” (Gt ) that evolves by treating current effort as an investment and depreciates at some constant

453

454

CHAPTER 8 Selling and sales management

rate λ < 1. This process can be described as Gt = et + λGt−1 ∞ = λτ et−τ τ =0

This framework, due to Nerlove and Arrow (1962), is popular in advertising and has been used in numerous selling related papers (see e.g. Gönül et al., 2001; Narayanan et al., 2005; and more recently Shapiro, forthcoming and Chung et al., 2018). Rather than effort (et ) enter the sales response model it is now goodwill (Gt ) that enters. Effectively, this is an infinite lag model with parameters that exponentially decay. As a consequence one can calculate the long-term effect of effort on sales. In addition to the above construction of dynamic effects, researchers have also taken a more structural approach to specifying of selling effort influences customer behavior. One popular approach is to model the result of salesperson-customer interactions as providing the customer an informative signal that updates their prior beliefs about the quality of the product or service being offered. As an example, Narayanan and Manchanda (2009) build a learning model wherein physician learn about the efficacy of drugs based on interactions with the firms sales agents. These learning models can also be re-characterized as an infinite lag model albeit with a different structure. An alternative model that allows for dynamic effects of selling is a distributed lag model with a pre-defined number of lagged elements. This is equivalent to thinking of goodwill as some truncated sum as follows, Gt =

T˜

λt−τ et−τ

τ =0

In this case, the effect of effort on sales doesn’t necessarily follow some monotonically decaying effect and in that regard is more flexible than the Nerlove-Arrow setup. Mizik and Jacobson (2004) use this specification to investigate the longer term impact of detailing on physicians prescribing behavior. These approaches to investigating dynamic selling effects, while flexible, might suffer from serious empirical biases. For example, the truncation may result in biases if there is a misspecification of the truncation window. Additionally, in the presence of fixed effects to account for heterogeneity, the presence of lagged constructs might also induce an additional bias (see Narayanan and Nair, 2013 and Chung et al., 2018 for discussions). The second aspect of dynamics has to do with how effort is chosen. If the salesperson has reason to act in a forward-looking manner (say on account of dynamics in their compensation contract), effort levels will be decided as some policy that emerges as a solution to a dynamic programming problem. In most contexts, sales compensation plans have targets, quotas, and bonuses that only are obtained at some future date. It is more than plausible that the effort decision made by salespeople are indeed dynamic. This sort of dynamic behavior necessitates the construction and estimation of structural dynamic model which is a rather cumbersome undertaking.

3 Estimating demand using proxies for effort

The results of this undertaking though are rich since the framework, by construction, models both the supply and demand sides of the problem. As such, these models can be used to undertake a variety of counterfactual experiments that wouldn’t have been possible under more descriptive or reduced form approaches. In later sections we will discuss applications of these approaches by Misra and Nair (2011) and Chung et al. (2014).

2.1.5 Selling effort interacts with other firm decisions Given the above discussions it should be obvious that selling effort will interact with the firm’s other economic decisions in particular ways. This will be particularly true of marketing decisions. As mentioned earlier, there is a real possibility that effort is constructed as a response to contractual terms and is conditioned on marketing decisions such as pricing and advertising. A more nuanced interaction occurs when the impact of effort is moderated by marketing instruments. For example, Gatignon and Hanssens (1987) consider how the effectiveness of salespeople varies with advertising levels both at the national and the local level. One could also consider the opposite context where the effect of marketing activities are, to some extent, influenced by selling activities. In Hastings et al. (2017), the authors examine the degree to which the price sensitivity of individual customers is influenced by salespeople. In their case, the manner in which selling influences consumer utility is a function or marketing and vice versa, suggesting some degree of complementarity (or substitutability) between them. The empirical challenge is then to consider the identification of the true effect by ruling out other explanations that might be a result of spurious correlation. Ignoring these interaction effects in models of selling and marketing risks the introduction of biases (e.g. the classic pooling bias), especially in empirical contexts where these effects are explicitly considered by either the salesperson or the firm in their decisions. To further complicate matter, one might also note that, in keeping with the discussion of dynamic effects earlier, current effort decisions could, and in all likelihood will, depend on past marketing as well. Topics related to the interactions between selling and marketing will be explored in more depth at a later point in the chapter. Ultimately, the goal of the empirical exercise is to estimate models of the type (1) that relate selling effort to sales. These models are often termed “sales response models” in the literature, but are essentially demand models with selling incorporated into them. As I mentioned earlier, there are essentially three strategies with regards to the treatment of selling – ignoring selling effort, using proxies, or building models of effort allocation. I now turn to discussing the latter two strategies.

3 Estimating demand using proxies for effort 3.1 Salesforce size as effort A core problem in the management of salesforces is the determination of budget required for selling and the allocation of this budget across customer segments. Much

455

456

CHAPTER 8 Selling and sales management

of the early empirical work on salesforces is motivated by these types of problems (see e.g. Lodish et al., 1988; Lucas et al., 1975). Given the issues with observing and measuring effort, researchers adopted a more practical approach – the use of the number of salespeople as a proxy for effort. In other words, budgeting decisions were recast in terms of the number of salespeople to employ and their distribution across customers, which were primarily segmented along product categories, industry lines, geographical ‘territories’, or combinations of these. In particular, geographical territories, or just territories, are widely used and remain a popular segmentation scheme in current times. As I discussed before, the use of the number of salespeople as a proxy for effort implicitly assumes constant effort that is independent of other factors including marketing instruments and compensation. While these assumptions seem untenable today, they do afford a simplicity that allows the firm to arrive at solutions that what might be significantly better than their current practice. In order to arrive at this ‘optimal’ allocation there arose the need to estimate the responsiveness of sales to changes in selling effort. Unfortunately, there are practical limits to the empirical exercise here, since the variation in the size of the salesforce or the allocation across segments is often limited. As a solution, these models were calibrated on subjective data elicited from firm executives rather than on observed field data. In the cases where field data were available, they were used to adjust and adapt these subjective beliefs (e.g. see Lodish et al., 1988; Zoltners and Sinha, 2005). Ultimately, the goal of the endeavor was to infer the impact that a specified number of salespeople in a territory would have on the sales generated in that region. An excellent example of this belief elicitation (termed Decision Calculus) approach is the paper by Lodish et al. (1988). The authors implement a set of subjectively calibrated models for the firm (Syntex Labs) with the goal of optimally sizing their salesforce. The authors construct a linear programming approach, that relies on managements beliefs to calibrate a response function that related sales (Q) to the number of salespeople employed (n), Q = b + (a − b)

n . d +n

(2)

Responses are then elicited from the management team by asking them to reveal their beliefs about expected sales (for the focal product) under the following scenarios: No sales effort? (n = 0), One half of the current effort? (0.5n), Fifty percent greater effort? (1.5n), and Saturation level of sales effort? (n = ∞). Initial responses were shared among the management team and iterated until there was consensus. These responses taken together are then used to estimate the parameters of the response function. The identification strategy is quite straightforward and operationalized by tailoring questions to identify key parameters. For example, we have Q (0) = b and Q (∞) = a and so on. The authors then use the calibrated model optimize for n based on costs and other constraints. In their application, the authors sought to answer a number of questions including the number of salespeople to employ, the products to focus on, and the type of physicians to target with selling effort. The results from the analysis suggested that

3 Estimating demand using proxies for effort

the firm increase their salesforce size from around 433 to something in the range of 750 salespeople and focus their attention on one particular drug and physician type. Over the course of a two year period, the firm added close to 200 addition salespeople and estimated that the research project had created an 8% increase in revenues to the tune of about $25M. While there were significant limitations in the data, the model and the estimation this paper speaks to the importance of the salesforce in the pharmaceutical industry and the value of optimizing that function. A more econometric approach based on field data, is presented in Horsky and Nelson (1996) who use cross-sectional data from two firms to calibrate models that relate the number of salespeople to sales at the territory level. The authors posit a that models expected sales E Qj as a function of territory potential specification Mj and the number of salespeople nj as follows, γ E Qj = Mj nj

(3)

γ is interpreted as the elasticity of sales with respect to salesforce size. Potential (M) is an unobserved construct and is consequently parameterized as a function of other observable variables such as last period sales, number of prospects (operationalized as number of potential customer firms), competitive salespeople or firms, and so on. For example, for one application Horsky and Nelson (1996) specify β

β

β

Mj = eα Lj 1 Pj 2 Cj 3

(4)

where Lj is the number of customers from the previous period, Pj is population, and Cj is the number of competitive salespeople in the territory. The model is completed by adding an error component and linearizing to obtain ln Qj = α + β1 ln Lj + β2 ln Pj + β3 ln Cj + γ ln nj + j

(5)

The authors depart from the usual approach of assuming a regression approach and instead propose a Data Envelopment Analysis (DEA) procedure to obtain parameters. DEA (Charnes et al., 1978) is a linear programming based estimator that assumes that j = −μj and that μj ≥ 0. The authors interpret these “one-sided” errors as selling “inefficiencies” relative to territories that have μj = 0 and operate on the efficient frontier. They compare their results to traditional regression based estimators and find that the selling elasticity γ lies between (0.382, 0.658) across the two methods and firms. The two papers highlighted above are representative examples from a large literature on the topic. Rather that examine the entire literature, I will focus my attention on one application context that I had limited familiarity with (prior to writing this chapter) and found to be relevant to selling, marketing, and policy topics.

3.1.1 Recruitment as selling: Prospecting for customers Recruiting agents are essentially salespeople who inform, persuade, and consequently “sell” a product or service to potential “customers”. Perhaps the best example of such

457

458

CHAPTER 8 Selling and sales management

agents is found in the case of armed services recruiting. In fact, there is a dedicated literature that examines the effectiveness of recruiting agents and other marketing activities on overall recruitment goals. The effect of selling here is again measured by the impact of the number of agents on overall recruiting outcomes (i.e. “Sales”). In a series of papers on the topic, a variety of authors (Carroll et al., 1985, 1986; Hanssens and Levien, 1983; Gatignon and Hanssens, 1987) examine the role of the number of recruiters have influencing enlistment in the armed services. While the methods used in these papers vary, they are typically of the type described earlier. In some cases, where time series data is available, there are dynamic effects that incorporated by lagged constructs. Each of these papers report some measure of recruiter elasticity with the extremes being 0.26 to 0.98 and the majority falling in the 0.4–0.6 range. A significant contribution is by Carroll et al. (1985), who implement a large scale experiment that varies the number of recruits across markets randomly to assess the relevant elasticity. They find elasticity estimates in the 0.357–0.575 range as well as significant heterogeneity in these effects across different customer (enlistee) pools. The notion of using salespeople to recruit customers is not limited to armed services alone. Similar examples can be found in telecommunications, and financial services. For example Hastings et al. (2017) examine the role of selling in the context of financial products in Mexico. In their framework, selling plays two roles – first it has a direct (persuasive) impact on the customer’s expected utility from the product in question and second, selling moderates the sensitivity that a consumer has to prices. In particular, they specify a model where a customer’s utility (ui ) is written as uij = λi (n) Cij + δj nj + εij In the above equation, Cij is the price for firm j offers to customer i. nj refers to the number of firm j salespeople in the market while n refers to the total (across firms) number of salespeople in the market. In other words, the authors allow the price sensitivity (λ) to depend on total salesforce presence in the market while the “brand value” or intercept depends on the size of the focal firm’s salesforce. The authors are careful in their analysis and account for the fact that the allocation of salespeople to markets is not random. They assume that firms trade-off costs (wages) against benefits (expected revenues) and consequently a larger salesforce will be allocated to more lucrative territories. The analysis of this tradeoff suggests a number of instruments based on exclusion arguments. These include costs in neighboring markets, certain demographic variables, and the incidence of local bank branches. Using these instruments, the authors are able to estimate the salesforce effects on brand value and price sensitivity. They find that consumers less price sensitive when n is larger are indeed and also that selling is persuasive δnj > 0 . The effects estimated are large and significant suggesting that salespeople play an important role in this context. Given that salespeople increase brand value and reduce price sensitivity, it follows that salespeople create significant value for firms. In a counterfactual conducted in the paper, the author examine the equilibrium outcomes without salespeople and found that costs (revenues to the firm) would drop by about 61.6%.

3 Estimating demand using proxies for effort

3.1.2 Discussion: Salesforce size and effort The interpretation of salesforce size as a measure of selling effort, while simple, is somewhat problematic. Recall the earlier discussion about the number of salespeople being, at best, an indicator for positive effort. In the special case where effort is constant we can write total effort (across the salesforce) as proportional to the salesforce size. More formally, n

ei = ne¯ ∝ n.

i=1

This characterization is useful if e¯ remains constant as n changes, that is, if ei = e¯ ∀ i. This is a rather strong assumption and any violation limits the usefulness of the model. For example, if ei = e¯ ∀ i, the derivative of sales with respect to n will change in ways not accounted for by the model. By implicitly assuming constant effort the relevant heterogeneities, dynamics, and true supply side effects that create effort are ignored, or worse, misunderstood. One might have the inclination to argue that understanding of the magnitude of salesforce size effects are interesting and relevant in their own right. From a descriptive point of view, there is nothing particularly wrong with that, however, interpreting these effects and using them for any relevant policy construction or counterfactual would be problematic. More broadly, these models should be viewed as serving a descriptive purpose. They cannot speak to the true process via which selling influences demand and consequently, are not suitable for counterfactual policy exercises. In spite of the problems I’ve outlined with the size proxy, models that have used them have had a positive impact on firm outcomes. This is demonstrated by some the examples discussed above. I conjecture that these positive results are obtained on account of sub-optimal allocation of resources at the firm. In these cases, even an approximate model that optimizes effort with salesforce size proxies would be a useful tool. In more recent years, the availability of detailed data on interactions between salespeople and customers has shifted focus away from models that focus on salesforce size as a measure of selling effort (Hastings et al., 2017 is an exception). These models do, however, serve as a starting point for thinking about selling and sales management. Many of the issues, concerns, and questions raised in the early literature will remain relevant in models with other proxies for effort or even in structural models. Application areas such as the optimal allocation of salespeople across territories, industries, product, and time will continue to be relevant topics as will the fundamental problem of salesforce sizing. More research on these topics using newer data and methods coupled with newer thinking would be welcome.

3.2 Calls, visits, and detailing as selling effort A second strand of the literature treats the quantity of interactions between salespeople and customers as a measure of effort. This quantity is often measured in terms of the number of interactions with the customer or related measures such as number of

459

460

CHAPTER 8 Selling and sales management

calls made, number of visits to the customer, the time spent with the customer, number of customers interacted with per unit time, or other such metrics. While these measures, on the face of it, seem like better measures of effort they suffer from the same interpretability problems as salesforce size (n). As such one cannot directly treat the volume of interactions as effort. Having said that, there are indeed advantages to using the number of interactions as a proxy for effort such as an increased granularity of data and the ability to accommodate heterogeneity. The literature in Marketing and Economics that investigates the impact of salesperson-customer interactions focuses primarily on the pharmaceutical industry. The face-to-face interactions that medical representatives (salespeople) have with physicians about drugs is commonly referred to as detailing and this measure is typically measured in terms of the number of times the salesperson visits a physician. As data pertaining to these visits became widely available, it created a burgeoning literature that looks at the degree and manner in which detailing impacts physician prescribing behavior (sales). In what follows, I will consider the literature on pharmaceutical detailing (interactions between salespeople and physicians) and discuss these effects further.

3.2.1 Does detailing work? Early work on the effects of selling and detailing treated the visits (or calls) to physicians by salespeople as a decision variable controlled by the firm. As such, this strand of research formulated sales response models which modeled prescriptions (Q) as a function of the number of detailing visits (D). Very often, strong functional form restrictions were placed to ensure that the estimated demand curve was well behaved and within the range of acceptable possibilities. Indeed, some of the early models were calibrated of managerial expectations (as with salesforce size problems) rather than observed data. Perhaps the earliest paper to use detailing data in an econometric model is Parsons and Vanden Abeele (1981). Their state goal was the estimation of the effectiveness of a sales call (detailing visit). Their specification was quite sophisticated, with sales (Q) in a territory j at time t being modeled as a time-varying Cobb-Douglas type function of various marketing instruments as follows, ln Qj t = β0 + β1 (Zt ) ln Dj t + β2 ln Mj t−1 + β3 ln Qj t−1 + j t . The authors allow the effectiveness to detailing (Dj t ) to be a function of other marketing elements so that β1 (Zt ) = γ0 + γ1 ln Sj t + γ1 ln Hj t + γ3 ln Hj t × ln Sj t In the above, D is detailing, M refers to mailed promotional materials, H denotes promotional handouts, and S refers to free samples. This is an interactions model, where the elasticity of detailing varies with the use of samples and handouts suggesting that promotional tools and other information acquisition tools can increase the

3 Estimating demand using proxies for effort

effectiveness of detailing. Also note that H and S can only exist for Dj t > 0. The authors find significant interaction effects, with the effect of detailing being described as β1 = −0.148 + 0.03 ln Sj t + 0.029 ln Hj t − 0.005 ln Hj t × ln Sj t which suggest diminishing effects of marketing tools (handouts and samples) in impacting the detailing effect. Overall, the authors suggest that while the effect of elasticity of detailing is positive over the range of marketing investments, it is relatively flat. The interpretation of the effects presented in Parsons and Vanden Abeele (1981) and a large number of papers that followed (see Manchanda and Honka, 2005 for a detailed review) in the literature is difficult for a number of, now well understood, reasons. These include endogeneity concerns on account of targeting and omitted variables, the presence of an endogenous lagged construct on the right, and the absence of an accommodation for heterogeneity. Nevertheless, the paper proposed a sophisticated framework that used disaggregate data and laid the foundation for future work by others. Manchanda and Honka (2005) provide a comprehensive review of the literature pertaining to detailing and a nice discussion of the results obtained. I will refrain from revisiting that discussion and instead focus on a few selected papers some of which are relatively recent and do not feature in their review. A large portion of the literature estimating detailing effects ignore the fact that detailing could be targeted. In practice, firms often use historical data to segment physicians (usually on the basis of practice size) and allocate detailing effort as a function of that segmentation. This creates an endogeneity issue and a concern that the effects of detailing will be overstated as a result. Berndt et al. (2003) estimate how market shares responds to the depreciated stock of detailing minutes. They use an instrumental variables strategy to account for the endogeneity of selling effort. In particular, they use the log of the wage rate in the industry and the cumulative stocks of detailing spent on other products as instruments. Mizik and Jacobson (2004) offer a careful attempt at dealing with the problem of targeted detailing and any endogeneity concerns that it would raise. To address the issue, the authors use panel data coupled with physician fixed effects to nonparametrically account for differences across physicians. The authors then use a distributed lag specification with six lags for detailing as well as other lagged endogenous constructs to allow for dynamic effects of the focal variables. These variables are treated as endogenous and are instrumented out. While one might quibble about the use of fixed effects with lagged dependent variables and potential biases therein, the empirical analysis is otherwise careful and well thought out. The authors find that while detailing effects do decay over time, the decay patterns do not conform to the standard exponential pattern assumed in the Nerlove-Arrow framework. Further, the authors also find that elasticity of detailing is relatively small (0.07–0.17 for 12-month elasticity) when compared to anecdotal claims or findings in past research. Based on their analysis, the authors question the current levels of detailing observed in the field.

461

462

CHAPTER 8 Selling and sales management

Mizik and Jacobson (2004) are careful to point out that have limited ability to interpret the effect and the reasons behind it. They provide a set of possible explanations including a prisoner’s dilemma in the market. An alternative explanation is that detailing effects are driven by systematic factors and, for certain drugs, these factors make the estimated effects endogenously small. Consider that, in the context of the paper, the effects of detailing are low for an established drug (11 years in the market) and for a new drug (6 months) both of which are in well established categories. On the other hand, the highest effect of detailing is observed for a drug that is in a relatively new therapeutic area but has been around for about three years. One could conjecture that the two drugs (B & C in the paper) are more difficult to sell and consequently the incentives and relative differences in the cost of providing selling effort might have a role to play in the results we observe. A similar approach assessing detailing is adopted by Datta and Dave (2016) who find similarly small effects of detailing. A very different approach to modeling the targeting of detailing is taken by Manchanda et al. (2004). In that paper, the authors assume that detailing is allocated based on (partial) knowledge of the parameters in the demand system. To me more precise, the authors assume the number of prescriptions (yit ) written by physician i at time t follows a Negative Binomial distribution with mean λit , which is parameterized as a function of detailing (Dit ) as follows, ln (λit ) = β0i + β1i Dit + β2i ln (yit−1 + ζ ) . In the above ζ is some constant that is pre-specified. The model allows for the heterogeneous parameters {β0i , β1i β2i } and assumes them to be a function of observed covariates and a Normal stochastic component with mean zero and covariance Vβ . The most critical component of the model is that detailing is assumed to be a Poisson distributed stochastic variable with a mean that is a function of {β0i , β1i β2i }. The authors write down the stationary level of prescriptions μ∗i as an implicit equation and argue that an approximate solution to the fixed point can be described by ln

μ∗i

β0i β1i = + Det, 1 − β2i 1 − β2i

for some level of equilibrium detailing Det. Using this as inspiration, the authors then assume detailing to be distributed as Poisson with rate ηi which they express as ln (ηi ) = γ0 + γ1

β0i β1i + γ2 . 1 − β2i 1 − β2i

The specification implies that detailing levels could be allocated as a function of the effectiveness of detailing and the baseline level of sales. For example, γ2 = 0 implies that detailing is set independently of the responsiveness of detailing while γ1 = 0 suggests that the baseline sales have no impact on detailing levels chosen.

3 Estimating demand using proxies for effort

This specification is novel and to my knowledge not one that had been proposed before in the literature.7 The parameters {β, γ } of the complete model are sampled from the appropriate joint posterior using an MCMC routine. The results are both expected and striking – the authors find posterior means of γ1 = 0.19 and γ2 = −6.1, suggesting that detailing is allocated as a function of baseline levels of prescriptions made by the doctor (β1i ) and is negatively correlated with the individual level responsiveness (β2i ) to detailing that the doctor exhibits. This is opposite of what an optimal (profit maximizing) detailing policy would imply. The discussion in the paper suggests that these results obtain on account of the detailing policies that are based on sorting physicians into volume segments and such sorting results in a substantial amount of over-detailing. In spirit, the results echo the findings of Mizik and Jacobson (2004). Manchanda et al. (2004) are careful to point out that there might be another explanations for their findings including but not limited to unobserved competitive effects. As with the Mizik and Jacobson (2004) paper, there might also be more nuanced supply side factors pertaining to compensation, incentives, and effort that are driving the results. I will discuss these possibilities in later sections.

3.2.2 How does detailing work? In addition to ascertaining and quantifying the degree to which selling or detailing works the pharmaceutical setting, there is also the question of the mechanism by which this process works. The term “detailing” has been used for a while and is associated with the notion of a salesperson providing the details of a drugs performance and efficacy to the physician.8 At the same time, most descriptions of the sales process will also argue that persuasion plays an equally important role. For example, Leffler (1981) posits that detailing plays both an informative as well as persuasive role. He documents a pattern where newer drugs receive higher levels of detailing (consistent with an argument of providing information about the new drug). He also finds that older drugs continue to get detailed to experienced physicians which, he argues, is consistent with persuasion if one assumes that the efficacy of the drug is already established. More recent work on the topic includes Azoulay (2002), who finds that detailing levels are positively correlated with cumulative clinical outcomes, suggesting that the effectiveness of detailing shifts as the total accumulated (positive) information increases. Similarly, Venkataraman and Stremersch (2007) and Kappe and Stremersch (2016) find that the effect of detailing, across a variety of therapeutic classes, on prescribing behavior is a function of the cumulative information available about the efficacy and side-effects of the focal drug.

7 To be fair, the paper does have some connection to an earlier paper by Manchanda and Chintagunta (2004) and draws on ideas therein. 8 For a truly fascinating documentation of the history of selling, and more particularly detailing in the pharmaceutical sector, I refer the reader to the works of Roy Church (Church, 2005, 2008) who examines the practices of salespeople in the late nineteenth and early twentieth century in the UK.

463

464

CHAPTER 8 Selling and sales management

Shapiro (forthcoming) examines the interplay of information and detailing in a slightly different context. The key focus of the paper is to estimate the impact, if any, that detailing has on off-label prescriptions for a particular drug. The author builds a model similar to that of Datta and Dave (2016) and regresses prescriptions of a particular doctor on an Arrow-Nerlove type detailing stock variable. The model allows the effect of this stock to shift with information shocks that appear about the efficacy of the drugs. The results show that the effect of detailing on on-label prescriptions is similar to earlier work but there is very limited effect on off-label prescriptions. While the policy implications discussed in the paper are interesting in their own right, for the purposes of the discussion here, the variability of detailing effects as a function of information is particularly interesting. Once again, interpreting such heterogeneity can be complicated. On the one hand these might be heterogeneity of the type in Montoya et al. (2010) where the doctors are in a different state of receptivity because of the informational shock, or it could also be that the salesperson’s effort levels or cost of selling shifts because of such shocks. Either way the reduced for evidence is informative but understanding the mechanism behind the effect might move us in subtly different directions as far as counterfactual policy evaluations go. While the analysis in these papers is mostly descriptive, they provide compelling evidence that is consistent with the dual facets of selling in this industry. A more formal approach is adopted by Narayanan and Manchanda (2009) who outline a structural model wherein detailing is allowed to play both an informative and persuasive role in impacting the physician’s decision process. They model the physician’s (acting as an agent of the patient) utility as ij t + Xij t βi + εij t (6) Uij t = −E exp −ri Q ij t is doctor i’s belief about drug j at time t. Xij t is comprised of two elwhere Q ements – a detailing stock DSij t as well as a measure of patient requests P Rij t for the drug. Finally, ri reflects the risk aversion of the physician and εij t is the usual i.i.d. Extreme Value utility shock. The authors then assume that both detailing and patient feedback (prescriptions) provide informative signals about the true quality of the drug Qij . In particular physicians receive the composite mean (over multiple interactions) signals

D˜ ij t , F˜ij t that are distributed around the truth as follows, D˜ i ∼ N Qij ,

σD2 i dij t

(7)

σD2 i ˜ . Fi ∼ N Qij , fij t In the above, dij t and fij t are the number of detailing and feedback interactions the physician has. Given these signals the physicians posterior beliefs are denoted as, 2 ij t ∼ N Qij t , σQ , (8) Q ij t

3 Estimating demand using proxies for effort

where Qij t =

2 σQ ij t 2 σQ ij (t−1)

Qij (t−1) + fij t

2 σQ ij t

σF2i

F˜ij t + dij t

2 σQ ij t

σD2 i

D˜ ij t

(9)

with 2 = σQ ij t

1 . 2 2 + d /σ 2 1/σQ + f /σ ij t ij t F D i i ij (t−1)

(10)

In this model selling, in the form of detailing, enters the physician’s utility in two ways. First, selling acts via an information based mechanism – that is, it creates an informative signal that updates the physician’s belief about the quality of the drug while also reducing any uncertainty the risk averse physician has about the drug. This process is represented as a Bayesian updating process and it is assumed to describe the manner in which physician learning occurs. To see this, notice that the number of detailing interactions dij t lowers the posterior variance (10) as well as shifting the posterior mean (9). Second, detailing enters the utility function directly via the detailing stock. The authors term this the persuasive effect, since it shifts the utility function directly. The key parameters of interest (from a selling perspective) are then the variance of the quality signal from detailing σD2 i and the effect of the detailing stock (β1i ). The authors use a sophisticated MCMC algorithm to sample from the joint posterior of interest. A novel aspect of their model and estimator is that it delivers heterogeneous estimates of the key parameters of interest. The authors find evidence for both an informative and persuasive effects with the relevant the constructs (σD , β1 ) being negatively correlated. The results are consistent with the idea that selling plays dual roles of creating information as well as persuasion effects, but that these forces do not necessarily act together or contemporaneously. The authors posit that the findings imply a pattern wherein different physicians transition from information effects to persuasion effects over time but in heterogeneous ways. While some physicians are fast learners, others need a lot longer before the informative effects wear out. The framework in this paper is sophisticated and quite complex, the authors do a good job of articulating and arguing for patterns in the data that identify the various parameters. As with most learning models, the identification restrictions are strong and informative, but even with those caveats the paper offers an interesting take on the manner in which detailing influences prescription outcomes. In a recent paper, Huang et al. (forthcoming) allow the signals from detailing to be different depending on the degree of contra-indications at the patient levels. Using a simplified learning model, the authors find, in support of their hypothesis, that detailing conveys information and the match quality signal that arises via detailing is significantly inferior for contraindicated patients than for other patients. In other words, detailing actually makes doctors less-likely to prescribe to contra-indicated patients, suggesting that detailing acts as a conveyor of information. In addition to

465

466

CHAPTER 8 Selling and sales management

the papers mentioned, there are other papers that have delved into a structural investigation of selling effects. These include Narayanan et al. (2005) and Ching and Ishihara (2012) who use aggregate data on detailing and prescriptions to calibrate learning models similar to the one presented above.

3.2.3 Is detailing = effort? The detailing literature is perhaps the largest attempt to document and understand how selling impacts demand. It also happens to be an area that is of significant policy relevance. The focus of the literature has primarily been on (a) understanding the causal impact that detailing has on prescriptions and (b) using these estimates to conduct counterfactuals including the optimization of the levels of detailing to engage in. There is little effort to explicitly tie these the construct of detailing back to the provision of selling effort, and I would argue that this endeavor is not a trivial matter. The issue with treating detailing (as with the number of salespeople) as a proxy for effort is that we cannot really believe that our proxy is correct. Clearly detailing is not effort, since if it were, firms would contract with salespeople on detailing rather than on sales. Since detailing is observed there would not be any need for incentives. Since we don’t see such contract, it must follow that firms’ recognize that detailing is possibly correlated with but not exactly effort. Following the constant effort argument I provided earlier, one could rationalize detailing as proportional to effort in special cases. As with the number of salespeople, the assumption would be hard to justify and could jeopardize counterfactuals. In an ongoing project, Harikesh Nair and I (Misra and Nair, 2018) formalize the idea that detailing is a censored indicator for the provision of effort and reinterpret detailing effects. The premise is that a rational agent will obviously not engage in costly effort when there is not customer interaction. As such, effort will be zero in the absence of detailing. On the other hand, the fact that detailing occurs doesn’t necessarily mean that effort was exerted. If we assumed that effort were always positive when detailing occurs one could interpret the effect of detailing as effort. We argue that this can only be done via a fully specified structural model with specific assumptions pertaining to (a) how effort is generated in the context of detailing, (b) the timing and content of information sets agents have access to, and (c) sequence of actions engaged in the game. Using these assumptions we are able to construct a demand model that estimates effort in the context of stochastically targeted detailing policies. Detailing effects could be interesting in their own right and this it might lead to argument that we can ignore effort, or treat it as some benign unobservable. Even if the details of the above discussion are ignored, one has to accept the argument that the effect of detailing must be a function of salesperson effort. If not, it would suffice that the salesperson simply show up at the physicians office, stand there, and engage in no other effortful activity. Clearly, this would not be productive (and possibly quite disturbing to people at the physicians office). If we need the salesperson to engage with the physician at some level, we have accepted that human effort is relevant and that effort and detailing need to be jointly present to influence demand. As such, the

4 Models of effort

assessment of the effect of detailing requires us to consider the construct of effort. Without this consideration counterfactuals related to detailing will be uninformative and possibly wrong. As a thought exercise, assume that we use one of the above models to estimate detailing effects. Now consider the counterfactual where the firm (since the firm is assumed to control detailing) increases detailing with no change to the compensation offered to the agent. If the salesperson is operating that the maximum level of effort it must be that the effect of detailing must equal zero since the agent will be unable to exert any additional effort. As such, we would no shift in sales. What we are describing is a version of the Lucas critique when the micro foundations of detailing have been ignored. Models where effort is constant can easily be constructed under the right set of assumptions, and consequently, the estimated effect of detailing can be interpreted and used as intended. The point of the discussion was to highlight the fact that in order to fully incorporate selling into demand models we need to model effort.

4 Models of effort A more recent approach to evaluating the role of selling effort in demand is to move away from proxies and rely instead on economic theory. The idea then is to model the primitives that impact the effort generation process and assess the manner in which demand shifts as a function of these primitives. To formalize this intuition consider a setup where sales are a function of effort q (e, ) and that such effort is costly c (e). The agent receives some compensation that is a function of realized sales so that W = W (q (e, ) ; ϑ) where the ϑ are compensation parameters. In this framework, we must have (subject to some conditions) effort implicitly allocated according to the rule ∂W ∂c = ∂e ∂e In other words, effort will be a function of compensation parameters, demand factors, and the individual salesperson characteristics such as the cost of effort. Now, if selling effort is productive such that ∂q ∂e = 0, it must be that the factors that influence such effort also influence demand. As such, any variation that is naturally excluded from directly entering demand, such as compensation or cost of effort elements, could help us understand the role of effort in demand. To make this example concrete, consider a simplified version of the model in Lal and Srinivasan (1993) based on the framework originally proposed by Holmstrom and Milgrom (1987). Assume that demand (q) is generated as a function of effort (e) so that, q = h + ke + ε

(11)

where h is some baseline level of sales, k is the productivity of effort, and ε is an i.i.d. Normally distributed demand shock with mean zero and finite variance σ 2 . The

467

468

CHAPTER 8 Selling and sales management

salesperson is offered a compensation contract (W ) that is linear in sales so that W = α + βq.

(12)

We will call α the salary and β the commission rate. The net payoffs to the salesperson are compensation minus costs of effort. V = W − C(e)

(13)

The salesperson is assumed have CARA (Constant Absolute Risk Aversion) utility with risk aversion parameter (r), U (e) = − exp (−rV ) .

(14)

The salesperson then chooses effort to maximize expected utility net of quadratic costs of effort C(e) = ce2 so that, e∗ = arg max [EU (e)] .

(15)

e

Equivalently, the salesperson can be thought of as maximizing her certainty equivalent which is denoted by CE(e) = α + β (h + ke) − ce2 −

rβ 2 σ 2 . 2

(16)

Simple first order conditions give, e (β) =

βk . 2c

(17)

Finally plugging this effort back into the demand equation (11) we have q =h+

βk 2 +ε 2c

(18)

There are a couple of takeaways that emerge from an examination of Eqs. (17) and (18): First, effort in this stylized model is a function of commissions (β), productivity (k), and effort costs (c). Consequently so is realized demand. Typically, one would not consider incorporating compensation factors as determinants of sales, however, the structure imposed by economic theory argues for such inclusion. More generally, any factors that move these primitives (say, for example, if selling is less costly in certain geographies) will also influence effort. Again, as effort directly impacts sales, these factors will influence demand as well. Second, not all elements of the compensation plan and salesperson primitives will be relevant to effort and consequently to sales. In other words, only primitives that move effort will have an effect of sales. As an example, risk aversion has no direct impact of sales (in this model). This offers us some insights as to how we

4 Models of effort

might be able to identify effort relevant constructs without observing effort itself. For example, if the tenure of the salesperson shifts sales, there has to be some effort related mechanism that makes that happen. Finally, this simple model also outlines the need for (exogenous) variation to speak to effort. If compensation, productivity, and costs of effort are constant, there is no empirical way to disentangle effort from baseline sales and the demand shock. Later in this section, we will look at empirical models that use ideas as presented above to either describe the effort construct or to structurally model it.

4.1 Effort and compensation There is substantial evidence that salesperson compensation shifts demand. While these are not often directly interpreted as operating via some structural effort construct, this is implicitly the point that has been made in a long literature on the topic. For example, Darmon (1974) uses a small dataset obtained from the International Harvester Company to point out that financial incentives indeed do have an impact on sales. By analyzing a shift in the compensation scheme, he documents a significant increase in sales for one subgroup of the salesforce, while for another he finds a decrease in sales on account of the compensation change. He uses this to underline the importance of heterogeneity and argues that this may have implications for recruiting the appropriate type of salespeople, a point made more directly in Lazear (2000) and others. Banker et al. (1996) report results from a field test of the multi-period incentive effects of a performance-based compensation plan on the sales of a retail establishment. Analysis of panel data for 15 retail outlets over 66 months indicates a sales increase when the plan is implemented, an effect that persists and increases over time. Sales gains are significantly lower in the peak selling season when more temporary workers are employed. In some recent work Viswanathan et al. (2018) document the impact that the form of incentives (cash or non-cash bonuses) have on sales outcomes. Their analysis is not directly related to the identification of effort nevertheless they allow for sales to be shifted by salesperson targets and agent specific fixed effects. These covariates, including the format chosen for compensation, have significant sales relating effect which could only be possible on account of effort related reasons. Irrespective of the nature of the application, this literature provides direct evidence that shift and changes in compensation have sales related effects suggesting an avenue to explore the selling effort mechanism.

4.2 Effort and nonlinear contracts A starting point of building structural models that incorporate effort is considering the manner in which effort reacts to the primitives discussed above. Oyer (1998), for example, investigates the impact that fiscal year ends, seasonality, and nonlinearities in compensation have on sales outcomes. His empirical findings suggest that fiscal year endings have a significant influence on sales with year end revenues being higher when compared to earlier in the year. His explanation for this phenomenon is the well

469

470

CHAPTER 8 Selling and sales management

understood ‘hockey-stick’ effect where sales, and by implication effort levels, see a sharp uptick at the end of a quota horizon. The argument is that effort is increased with the goal of making the quota to avail of the incentive that lies on the other side of the quota threshold, such as a bonus, reward, incremental compensation, or recognition. A related issue raised by Oyer (1998) is that of inter-temporal substitution. In industries where demand is captured via the writing of contracts or orders, there is an incentive to move such orders on account of compensation related considerations. For example, orders may be moved forward to achieve a quota or moved back if the quota has already been achieved. Oyer (1998) provides some suggestive evidence to support this phenomenon. Larkin (2014) uses more granular data to test similar hypotheses and finds that, in a large software firm, compensation considerations (in the form of commission accelerators) lead salespeople to move the timing of sales. In particular, the number of transactions at a given time point is shown to be a function of the marginal incentives that accrue to the salesperson. In some cases, this mechanism can cause sales to move from one quarter to another. Further, Larkin (2014) provides evidence that the salespeople also strategically manipulate pricing on account of compensation considerations and that together these cost the firm 6%–8% in lost revenues. A similar finding is reported by Cao et al. (2018) who document distortions in the quality of loans approved by individual lenders working for Chinese financial institutions. They ascribe these distortions to month-end compensation targets and estimate that loans approved towards the end of the month are 16 percent more likely to be classified as bad at a future date. As a counterpoint to these findings, Steenburgh (2008) finds little to no evidence of such gaming practices. In the context of a large office products company and using granular data at the salesperson level, the author implements a model that relates sales to the distance between a salesperson’s quotas and their current accumulated sales. The model, calibrated on individual salesperson level data, accounts for differences across salesforces and salespeople and examines the differences in sales before and after the end of a compensation period. In particular, the author interacts timing (in relation to the end of the compensation period) with performance (whether the quota was exceeded or not) and finds that the 90% credible region for the focal metric lies between (−12.6, 0.3) for the year end effect, and between (−1.9, 8.2) for end of quarter effects. Steenburgh (2008) points out that these intervals both contain zeros and concludes that the presence of quotas and bonuses do not create any incentives for gaming in the form of inter-temporal substitution of orders. Further, his analysis shows that the distance to quota metric has significant effects within the compensation horizon which suggest that bonuses offer strong incentives to increase effort. There are a number of differences between the data, methods, and contexts used in these two papers that might contribute to the diversity of results and interpretations. The papers do agree on a few things though: First, taken together, the authors highlight issues with the identification of effort. Each of these authors point out the problem of separating seasonality from effort, the issue of customer purchase cycles,

4 Models of effort

the granularity of data, and other issues that arise because of the inherent unobservability of effort. More relevant to our discussion here, these authors tend to agree on the fact that compensation incentives, and the dynamics inherent therein, have an significant influence on sales. In other words, the mechanism of relating compensation to sales via effort is a viable empirical strategy. As Steenburgh (2008) points out in his concluding remarks, the descriptive results showing that effort acts as the mediating element between compensation and sales is the basic ingredient that can be used to build structural models of selling effort.

4.3 Structural models As I mentioned before, the notion of effort as an input into a productive process is not confined to the domain of selling. There have been a number of other researchers that have taken the approach of modeling (rather than proxied for) effort as a response to compensation using a model of agent behavior (see for example, Copeland and Monnet, 2009; Shearer, 1996; and Shearer, 2004). The general idea in these models is to recognize that effort arises as a strategic response to some compensation contract offered by the firm. As we have seen from the papers discussed in the prior section, the manner in which an agent chooses the effort level, and the level itself, then depends on the precise elements of that contract and also on other relevant primitives including, among other things, agent characteristics and market conditions. Put differently, in contrast to the proxy approach where effort was implicitly assumed to be constant, under a structural specification, effort will be a equilibrium policy function that will depend on the state variables in the system. In the context of selling, the idea of modeling effort as a structural construct is relatively new. While the idea that effort is exerted as a function of salesforce compensation is well established in the theory literature (see the reviews mentioned earlier, e.g. Coughlan and Sen, 1989; Coughlan, 1993; Lilien et al., 1992), empirical research on this topic is relatively sparse. To my knowledge, the only a few published papers that treat effort as an structural object, these include Misra and Nair (2011), Chung et al. (2014), and Daljord et al. (2016).9 The framework proposed and estimated in these papers begin by empirically describing a salesperson’s effort policy function as a function of supply side primitives. While the exact methods differ across papers, the idea is to obtain (or inform) an estimate of the effort policy from sales data policies and then use this estimated policy to calibrate supply side parameters that describe the salesperson’s effort and utility. Once these primitives are available, a number of counterfactuals related to salesforce

9 That is not to say that the idea of combining supply side and demand side elements to understanding effort is completely novel. For example, one could interpret the mathematical programming based literature as being in this vein. A particularly excellent example of this approach is Mantrala et al. (1994) who use structure, coupled with a mix of observational and stated preference data, to infer demand parameters and salesperson primitives (e.g. effort costs) jointly. These estimates are then used to design improved compensation plans.

471

472

CHAPTER 8 Selling and sales management

management can be conducted. In what follows, I will focus my attention on Misra and Nair (2011) as an example of this approach, and add commentary on the other papers as and when appropriate. I will first describe the demand estimation approach contained in their paper and then follow that with a discussion of the supply side estimator.

4.3.1 Effort and demand Misra and Nair (2011) structurally model selling effort as a optimal policy that arises via the salesperson solving an explicit dynamic optimization problem. To construct an empirical estimate of this effort policy, they formulate an econometric approach that inverts sales to obtain effort as a function of the relevant state variables. The demand models Misra and Nair (2011) consider are of the form, (19) qj t = g e (st ) , zj + εj t . In the above, qj t refers to prescriptions written by physician j at time t. Effort, denoted by e (st ), is a policy function that depends on a set of state variables st that impact the salespersons decision. This effort is productive and generates sales via some transformation g that could also be influenced by other factors zj that are physician specific. The authors point out that the demand models they describe rely on three assumptions, namely: 1. Monotonicity: Current sales is a strictly increasing function of current effort. That is, ∂g ∂e > 0. 2. Exclusion: Current sales are affected by the state variables only through their effect on the agent’s effort. 3. Additive Separability: Unobservable (to the agent) shocks to sales are additively separable from the effect of effort. Condition 1 assumes monotonicity of the sales function in the effort allocation and affords the inversion of the effort policy function from observed sales data. Condition 2 is an exclusion restriction that facilitates semi-parametric identification of effort from sales data. The underlying assumption is that compensation elements do not have any direct effect on sales apart from those that operate via the effort policy chosen by the salesperson. Finally, condition 3 is the usual simplifying econometric assumption. Since there is an identification problem, in that the transformation function g and effort policy e cannot be separately identified, the authors further simplify the model further and write, qj t = h zj + e (st ; λ) + εj t , which they then parametrize as, qj t = μ zj + λ ϑ (st ) + εj t ,

(20)

where ϑ are a set of (orthogonal polynomial) basis functions and λ are the relevant parameters. The idea is to recover effort as a flexible function of the state variables

4 Models of effort

by constructing an estimate of the effort policy function in the form of eˆt = λˆ ϑ (st ) .

(21)

The estimation of (20) is relatively straightforward and demand parameters = {μ, λ} can be estimated using standard least squares. There area, however, a few points about the identification of effort and the interpretation of this estimator are worth discussing. First, the absolute level of the effort policy cannot be recovered. Misra and Nair (2011) interpret the μ zj term in the regression (including the intercept) as the baseline level of sales in the salesperson’s territory. In other words, under the assumptions imposed, if selling effort were zero we would expect see sales of μ zj . This inability to ascertain the base level of the unobservable is a typical problem in any series estimator. An alternative, but equivalent, approach would be to normalize the scale of effort in some fashion. Either way, the fundamental unobservability of effort precludes any perfect solution. Second, the identification of effort relies critically on the absence of any other time varying factors (outside of the state variables) that might influence demand. For example, any form of unobserved seasonality such as purchase cycles on the customer front would negate the identification arguments espoused in Misra and Nair (2011). Similarly, any form of time varying marketing activity such as seasonal pricing or advertising would jeopardize the clean identification of effort. If such time varying factors do exist they have to be explicitly included as part of the state-space, since salespeople would typically condition on such information as well. If these factors are left ignored, one can no longer interpret the estimator found in (21) as effort. Finally, the construct of effort estimated here using the proposed estimator relies on sales being a function of contemporaneous effort alone. If the effect of effort persists one would need to alter the framework to allow for the incorporation past effort(s) as a state variable(s). Identification in this context would be tricky to say the least and would depend on clever context dependent exclusions that might be available. In the empirical application found in Misra and Nair (2011), the state variables include the month of the quarter (incentives are paid quarterly), the cumulative sales achieved at the beginning of the month, and the quota assigned for that quarter. The zj include physician level characteristics (including the precise tiers that the firm uses to assign detailing levels). An important point to note here is that the authors have rich data at the salesperson level. That is, each salesperson operates in an independent territory and is responsible for around 150 physicians in that territory. As such, the authors implement their demand estimation procedure to recover effort policies for each salesperson separately, thereby eliminating any worry about heterogeneity in ability or other salesperson level unobservables. This approach does require the assumption of homogeneity of treatment effects for physicians being detailed by the same agent which may be limiting. The luxury afforded by granular data is often not available in other environments and appropriate adjusts in the procedure will need to be made. For example, Chung et al. (2014) adopt a random coefficients approach and use a

473

474

CHAPTER 8 Selling and sales management

finite mixture model to approximate heterogeneity across salespeople. As is well understood, these parametric assumptions come with obvious trade-offs and caveats. For example, when pooling across salespeople in a random-coefficients framework, one loses the ability to soak up any true heterogeneity across salespeople (say ability) that might be correlated with observed covariates and state variables. Having said that, limitations in the data environment may offer no other alternative but to use such methods and aggregation may avoid the need for other strong assumptions such as the homogeneity of treatment effects across physicians. The approach adopted by Chung et al. (2014) is quite econometrically sophisticated and adapts the methods proposed by Arcidiacono and Miller (2011) to offer a viable alternative when the granularity of data at the individual salesperson is limited.

4.3.2 The supply of effort The supply side determination of effort will depend on the context of the application. In Misra and Nair (2011), the empirical context is a national medical devices company that has its salespeople call on (detail) physicians with the goal of influencing prescriptions. The salespeople are paid on a quarterly basis using a non-linear compensation plan that includes a quota (target sales for the quarter), commissions, and a salary. The quota mechanism induces dynamic incentives for the salesperson who internalizes these in the provision of effort. This is the same idea as those found in the earlier work (Oyer, 1998; Steenburgh, 2008) albeit in a more descriptive context. The mapping of the dynamic incentives (via the state space) into demand is what allows Misra and Nair (2011) to use the procedure described earlier to recover the effort policy function. Chung et al. (2014) use essentially the same idea in their empirical application although their compensation plan is somewhat different and involved lump-sum bonuses and differential commissions. Misra and Nair (2011) outline a model where agents allocate effort in the context of nonlinear compensation scheme where the nonlinearity arises via the existence of quotas. The salesperson receives a fixed salary (α) and an incentive payout that is based on a commission rate β that is applied to total sales accumulated at the end of the quarter (QT ) that is in excess of the quota floor, at . Any sales that lie above the pre-specified quota ceiling bt do not generate incremental compensation. In other words, sales in excess of bt effectively have a commission rate of zero. The compensation scheme resets every quarter so that at the beginning of each quarter, total sales (Qt ) are reset to zero and new quotas are announced. Their application has the feature that the salary, α and the commission-rate, β are time-invariant, and the ceiling bt = 43 at is a known deterministic function of the quota floor at . The authors, make the additional assumption that the time-invariant compensation parameters (α, β) are fixed and known a-priori. The salesperson, has rational beliefs about how at (and consequently bt ) evolve as a function of the state variables. As mentioned earlier, the relevant state variables in the system are, It , the months since the beginning of the quarter, the total accumulated sales, Qt , and the current quota floor, at . In their notation qt denotes the agent’s sales in month t and χt is an indicator for whether the agent remains employed with the firm. Collecting these they

4 Models of effort

write, st = {Qt , at , It , χt }, and denote the observed parameters of compensation as = {α, β}. Also recall that demand parameters (in (19) above) and are collected as = {μ, λ}. Using this notation, the salesperson’s compensation in a given month t will be, Wt = W (st , et , εt ; , )

(22)

and the salesperson’s per-period utility can be written as ut = u (Qt , at , It , χt = 1) = E [Wt ] − r var [Wt ] − C (et ; c) .

(23)

Here, r is a parameter indexing the agent’s risk aversion, C (et ; c) is the cost of effort which will be assumed to be quadratic, and the expectation and variance of wages is taken with respect to the demand shocks, εt . The specification in Eq. (23) is attractive since it can be regarded as a second order approximation to an arbitrary utility function.10 The payoff from leaving the firm and pursuing the outside option is normalized to zero so that ut = u (Qt , at , It , χt = 0) = 0

(24)

The transitions for the two of the state variables, It and Qt are straightforward since they involve deterministic updates. Misra and Nair (2011) assume that salespeople have rational (but idiosyncratic) beliefs about the quota setting process which they estimate using the observed data on quotas. In particular, they estimate the following transition function that models the quota in the next time period as a function of the current quota. This is represented as, at if It < T at+1 = K (25) k=1 θk (at , Qt + qt ) + vt+1 if It = T Note that quotas do not change within the quarter so the only relevant source of dynamics arises from the quota setting process across quarters. In Eq. (25) the new quota is depended on at and Qt + qt , via a K-order polynomial basis indexed by parameters, θk . This belief process is estimated separately in a first step by making the additional assumption that the terms v are i.i.d. and consequently independent of other shocks in the system. As a point of contrast, Chung et al. (2014) do not find any evidence of dynamics in quota setting behavior and treat each quarter as an independent spell. As such, the dynamics in their model occur only within the quarter and not across quarters. Misra and Nair (2011) posit that, given the beliefs the salesperson has, effort will be set by solving a dynamic programming problem. So, conditional on staying with 10 In case of the standard linear compensation plan, exponential CARA utilities and normal errors this

specification corresponds to an exact representation of the agent’s certainty equivalent utility.

475

476

CHAPTER 8 Selling and sales management

the firm, the optimal effort in any month t, et = e (st ; , ) will maximize the value function, e (st ; , ) = arg max {V (st ; , )}

(26)

e>0

The decision rule for the salesperson is to remain with the firm as long as the value from employment exceeds the outside option, χt+1 = 1 if max {V (st ; , )} ≥ 0 e>0

Given the complexity of the framework a few points of discussion are warranted. The empirical strategy adopted by Misra and Nair (2011) and Chung et al. (2014) are somewhat different but they both rely on very similar variation in the data. Misra and Nair (2011) adapt a version of the estimator proposed by Bajari et al. (2007). They use the approach described in the earlier section to estimate demand parameters () and construct and estimator of the observed effort policy eˆt = λˆ ϑ (st ). Effort is then simulated for each time period by solving the agent’s dynamic program for a guess of the supply side parameters () and using forward simulations that condition on the demand side estimates. A viable estimator of would then be the set that satisfies the empirical inequalities implied by the salesperson’s individual rationality (IR) and incentive compatibility constraints (IC). These constraints can be represented as follows, IR: IC:

ˆ ≥0 ˆ , V s0 ; e, ˆ ≥ V st ; e , , ˆ V s0 ; e, ˆ ,

(27)

ˆ is the policy specific value function where s0 is In the above, V s0 ; e, , some initial state, eˆ is the observed policy (estimated from demand), and e is some alternative policy that differs from the observed policy. In their implementation the authors perturb eˆ with some noise to obtain e . The authors then search the parameter space for elements that best satisfy the above conditions and consider that their estimates. Note that, this procedure has to be implemented separately for each salesperson and, consequently, is quite time consuming. I will refer the interested reader to the paper in question for more details on the computational implementation. The key advantage of estimating the model separately for each salesperson is that it accounts for heterogeneity in a complete nonparametric manner. Different from the approach presented above, Chung et al. (2014) use a parametric approach to heterogeneity and adapt the estimator proposed by Arcidiacono and Miller (2011). Their approach is useful when there is limited data and a fully nonparametric treatment of heterogeneity is infeasible. Another point of difference is that they allow for hyperbolic discounting and estimate the discount function rather than assume it as Misra and Nair (2011) do. Their estimator uses an EM type algorithm to jointly estimates the parameters of

4 Models of effort

interest and accounts for heterogeneity using a finite mixture approach. Once again, I will refer the interested reader to their paper for implementation details. Once the structural model has been estimated the analyst has the ability to evaluate any reasonable counterfactual. Misra and Nair (2011) examine the optimality of the compensation scheme used at the firm and find there to be possibility for improvements. Using a simulation based approach, they search for these improvements subject to constraints laid out by the firm. They find that a simplification of the quota policy via the removal of quota ceilings, a shortening of the quota horizon (from quarterly to monthly) and accounting for heterogeneity in salesperson types all lead to increased revenues and profits (revenues net of compensation costs). A selected set of these improvements11 were then implemented at the firm and the authors report on the results from that implementation. They find that aggregate revenues at the firm improve by around 9% (as opposed to their expectation of an improvement of 8.2%) which translates to $12MM improvements in revenues. The authors compare predictions from their estimated model to those generated in the field at the salesperson level and conclude that their model performed accurately at the granular level not just in aggregate. Finally, they also report high levels satisfaction with the new plan among the salespeople and management. Chung et al. (2014) use their model estimates to conduct a series of counterfactual simulations with regard to the compensation scheme and find the current compensation policy to be close to optimal. Their results suggest that lost revenues could range from −17.9% to −0.5% across a variety of alternative plans. They find strong evidence that an overachievement bonus (bonus paid for exceeding quota) and commission (an incremental commission rate paid when quotas are exceeded) result in profitable increases in effort and sales. When overachievement commissions are eliminated, revenues drop by 10.7% and profits are lower by about 2%. One explanation for the preference (of salespeople) for convex plans in Chung et al. (2014) is that the estimated risk aversion levels are quite small suggesting an appetite for a highly incentive leveraged compensation plans. While these methods are relatively new to the selling and salesforce arena, there has been some slow but growing interest in using these methods to address salesforce related topics. Daljord et al. (2016) use the framework and data from Misra and Nair (2011) to investigate the complementarity between composition (recruitment and retention) and compensation policies at the firm. In a recent working paper, Kim et al. (2018a) examine the context of micro-financing where loan officers act as sales agents who are responsible for both loan acquisition and repayment. The authors extend the above framework to allow for such multitasking effort and multiple outcomes of interest and calibrate this using observed data from a Mexican bank. To my knowledge, this is the first structural model of salesperson behavior in an agency theoretic model with multitasking. The authors proposed estimator allows for the joint estimation of

11 The non-disclosure agreement stipulated by the firm precludes the authors from revealing the exact

plan implemented.

477

478

CHAPTER 8 Selling and sales management

demand and supply primitives using an adapted version of Arcidiacono and Miller (2011). Using the obtained estimates, they conduct counterfactuals that show that aggregating performance additively (across multiple tasks) leads to substantial “adverse specialization” in the sense of MacDonald and Marx (2001) and that making agents joint responsibility for both tasks leads to better outcomes for the firm.

4.4 Remarks Whether it be via the use of proxies or structural models of behavior, the empirical implementation of models that relate selling effort to economic primitives requires access to rich data sources. For a while, the biggest roadblocks in this regard are the absence of organized data and the unwillingness to firms to share data on account of privacy or other legal and regulatory concerns. Even when some of this data is available, as in the context of detailing, there has been a lack of detailed institutional knowledge. For example, details about the nature of the compensation plans used by the drug manufacturers, data on other promotional and marketing activity that the firm might engage in, or elements such as formulary lists and/or copays have often been unavailable. To make things more difficult, the available data often lacks enough (exogenous) variation to permit the efficient implementation of the types of models and estimators described above. In more recent times there has been a remarkable shift on each of these fronts. Firms have been more willing to partner with academics on projects of mutual interest, there are start-ups (e.g. Salesforce.com) that have contributed to the creation of fairly standardized data structures and there has also been an increased willingness to engage in experimentation. I am optimistic that these factors will allow for opportunities for empirical researchers in Economics and Marketing. In the next couple of sections, I will look at various substantive areas for research related to selling and salesforces first with topics related the interplay between selling and other marketing mix elements and then a discussion of topics related to the management of salesforces.

5 Selling and marketing The relation between effort and other marketing elements is a complicated one. Once we recognize that effort is a decision that the salesperson makes, given her available information set and the incentives implied by that set, we have to allow for the possibility that effort could be a function of marketing decisions made by the firm. This conditioning, of course, depends crucially on the timing of the other marketing investments as well as the availability of information regarding such investments to the salesperson. Moreover, selling and marketing could act as complements or substitutes in the demand context and discerning such effects could be challenging. Below, I examine briefly the role selling plays in decisions pertaining the elements of the marketing mix.

5 Selling and marketing

5.1 Product To begin, consider the interplay between selling and the product or service the firm wishes to sell. More often than not, the product will be held fixed and selling effort will be allocated by the salesperson with full knowledge of the product. There are a number of scenarios where this may not be the exact case. For example, the agent may have to construct bundles (e.g. in financial services or banking) that meet the needs of a given customer or be required to design and create a customized version of the product (say for airplanes, construction equipment or consulting services). In these cases, selling effort is a complex function of a number of design related activities as well as the classic selling activities such as information provision and persuasion. Another context where product decisions and selling are co-mingled is that of customer service. Post purchase, customers often reach out to salespeople as a first point of contact and expect them to play the role of service providers as well. While still limited, there is some emerging research that aims to understand how these product related factors work with selling to influence demand. As discussed earlier (see section on Detailing), there is a well documented literature on how selling operates via the provision information about medical products. Kappe and Stremersch (2016), for example, use data on the content of the discussions between physicians and salespeople to examine the effects of information content has on physicians’ prescription decisions. Their results suggest that immediately following generic entry, it is more effective for salespeople of incumbent brands to focus on providing information related to drug contraindications and indications that differentiate these brands from generics. In contrast, when faced with branded entry, this strategy is significantly less effective. The authors uncover substantial heterogeneity among doctors in their response to this information content and suggest that their results could be useful in crafting optimal messaging policies for individual doctors. In a different context, Allcott and Sweeney (2017) implement a field experiment in partnership with a nationwide retailer to estimate the effect information disclosure, customer rebates, and salesperson incentives have on demand. Their case study focuses on energy-efficient durable goods and the disclosure of information pertaining to energy efficiency features of these products. The authors find that while large rebates and sales incentives work together to significantly increase sales, there is virtually no effect of information and sales incentives when acting alone. They find evidence of gaming as salespeople target the provision to information to more interested consumers. Taken together, these results imply that the equilibrium information content of a sales call depends on the demand for such information as well as the incentives to provide it. These papers, broadly speaking, touch upon the role of the salesperson as a gatekeeper for information regarding the product. The choice of what information to disseminate, that products to recommend or what bundles to construct for customers have implications for a firm’s product strategy as well as for salesforce management decisions (such as compensation/promotions). There are a number of other open topics for research at the intersection of product and salesforce strategy. These include selling in multi-product firms (see Kim et al., 2018a, 2018b for recent work on this),

479

480

CHAPTER 8 Selling and sales management

the organization and design of salesforces related to product expertise (Misra et al., 2004), and the design of product dependent compensation and incentives such as spiffs (Grimes, 1992; Caldieraro and Coughlan, 2007), to name a few.

5.2 Pricing Perhaps the most visible interaction between selling and marketing relates to the area of pricing. There are two aspects of this interaction that are relevant for our discussion. First, selling has been shown to influence the degree to which consumers are sensitive to prices. While there is limited research on outlining the processes that describe exactly how this mechanism works, the fact that these effects exist is now ell documented. As described earlier, Rizzo (1999) and Hastings et al. (2017) allow for explicit interactions between selling and pricing in their models and, to some degree, find evidence that supports the existence of such interaction effects. They interpret these effects as selling effort acting to reduce the attention the firm’s consumers pay to pricing cues. Of course, an alternative interpretation could be that lower prices decrease the cost of providing selling effort. The second avenue that integrates pricing and selling decision is that of delegated pricing. In a number of industries (e.g. automobiles, retail, consulting services) the salesperson has partial or complete control over the price offered to the potential customer. Consequently, the question of a decision that has been of interest to Marketing is the degree to which such price delegation should be adopted by firms. The theory literature (see for example Lal, 1986; Mishra and Prasad, 2005) offers some insights on this and discusses conditions under which such delegation might be optimal. The empirical literature is somewhat mixed with only limited research on the topic that uses observational data. Stephenson et al. (1979) argue that price delegation was correlated with lower revenues and profits. Using a survey of 108 firms, the authors find that those giving salespeople the highest degree of pricing authority generated the lowest sales and profit performance. In a similar type of study, Frenzen et al. (2010) point out that in their data pricing delegation increases as information asymmetry between the salesperson and sales manager increases monitor is costly. They then argue that their results imply that “one price fits all” policies are often inappropriate in B2B settings and allowing salespeople the freedom to adapt prices to customer requirements and market conditions, especially uncertain selling environments, could be optimal. To be clear, these papers are descriptive and do not aim to present causal findings. Larkin (2014) possibly provides the most direct empirical evidence in support of the argument that price delegation is not profitable. He finds that pricing distortions arise on account of the nonlinear compensation contract used to compensate salespeople employed at a leading enterprise software vendor. In particular, his results show that salespeople game the timing of the transaction to take advantage non-linear commission scheme and negotiate significantly lower prices when such compensation considerations arise. These distortions in pricing results in lost revenue to the tune of 6%–8% to the firm and cannot be explained away by price discrimination arguments.

5 Selling and marketing

In a tangentially related paper, Chan et al. (2014a) outline the impact that sales compensation contracts have on pricing decisions made by salespeople. While their main goal is to examine how compensation systems impact peer effects and competition in sales teams they also use the collected data to examine the degree to differences in compensation plans (individual vs. team based compensation) influence the discounts that salespeople offer customers. Their main finding in this regard is that individual compensation contracts (as opposed to team based) creates an additional level of competition that induces salespeople to resort to offering customers discounts to close the sale.

5.3 Advertising and promotions The marketing literature has always thought of advertising and selling as being to distinct elements of the firm’s toolkit. In Economics, this distinction is muddier and consequently there has been limited exploration in that area of how these elements interact. Advertising has been known to play two distinct roles vis a vis consumers – to inform and/or to persuade. It is not completely surprising then that the manner in which selling effort relates to advertising depends on the roles that these constructs are playing. One might imagine advertising working by making potential consumers aware of the product and leading them to interact with salespeople before a sale is closed. In this scenario, advertising and selling work together and could be thought of as being strategic complements in the sense that the cross-partial derivative of sales with respect to the two constructs is positive. Examples where advertising plays such a ‘traffic-generation’ role abound and include traditional automobile advertising that leads customers to car dealerships, infomercials that lead television viewers to call centers or even direct-to-consumer advertising in pharmaceuticals that lead patients to physicians offices who are then also influenced by salespeople. An alternative mechanism is where advertising and selling both play similar roles and consequently ‘compete’ for the customer. For example, informative advertisements (or advertising that points to information sources) may obviate the need for a customer to interact with a salesperson. This might be the case of medical journal advertising that provides enough information as to render detailing less relevant. Alternatively, as new direct and online channels emerge, advertising may reduce the foot-traffic at stores by pointing customers to these channels thereby taking away traffic from salespeople. In either of these cases advertising and selling might be considered substitutes in their ability to generate demand. As with pricing, the degree to which there is a direct interaction effect between selling and advertising is of interest. Since each might increase (or decrease) the effectiveness of the other there are direct implications for the optimal allocation of resources to these functions. In the Navy recruitment experiment described earlier (Carroll et al., 1985, 1986), the authors examine the results from an experiment where the investments in selling and advertising were randomized across markets. While the authors find and report strong effects of selling, they find little evidence that advertis-

481

482

CHAPTER 8 Selling and sales management

ing works in generating leads for enlistment. Unfortunately, they do not investigate or report the interaction effects available in the experiment which would be informative of the nature of interplay between these constructs. A set of related papers (Gatignon and Hanssens, 1987; Hanssens and Levien, 1983), however, do find strong evidence that local (but not national) advertising complements selling investments in the form of the number of salespeople. They are able to show that the patterns of interactions in their model lead to allocation of resources across selling and advertising as a complex function of the marketing budget available and other environmental factors. In the pharmaceutical marketing arena there has been a number of authors that have examined the role of detailing and advertising. Azoulay (2002) suggests that there is a negative interaction between detailing and direct to consumer (DTC) advertising. One possibility is that DTC advertising creates counterarguments in the mind of the physician that negates the effectiveness of detailing or another could be that DTC creates enough of a demand from patients that detailing effects are reduced. Narayanan et al. (2004) use data on detailing, direct to consumer advertising, and pricing to examine interactions between the various elements of the marketing mix. They find support for positive and significant interactions between detailing and DTC advertising in terms of shifting market share. The impact of this interaction on category sales is not significant. To convert their estimates into economically meaningful numbers the authors compute return on investment for each marketing investment. Their calculations imply that detailing ROI increases by 3 with a 5% increase in DTC. Similarly, the decrease in ROI for DTC is between 6 and 16% for a 5% decrease in detailing while the ROI for DTC increases between 9%–12% with a 5% increase in detailing. The asymmetry in the ROI effects arise on account of the detailing interaction constitutes a larger proportion of the DTC effect than it does for detailing. Given that the data in this industry is readily available it could be used for a more in-depth exploration of how advertising and selling impact prescribing behavior. Similar data could be sought out for other industries (as in Larkin, 2014 or Allcott and Sweeney, 2017) to explore such interactions in other contexts as well. Apart from the ‘big-three’ marketing mix elements described above selling related decision are also relevant in the context of other marketing decisions such as distribution channels. The decision to have salespeople call on channel partners as in the case of retail or the provision of spiffs (incentives provided by manufacturers to salespeople at retail stores) or even the decision to outsource the selling function to independent agents are all examples of such interactions. Ultimately, the nature of the interaction between selling and marketing will depend on a number of factors including the role of selling (information/persuasion), the timing of effort, the information set available to agents, the allocated decision rights, and the compensation structure. Each is these warrants further empirical exploration.

6 Topics in salesforce management

6 Topics in salesforce management The estimation of demand models that incorporate selling is a starting point to a large number of decisions pertaining to salesforce management. In this section I will examine a selection of such decisions and offer my own opinions on the promise for research relating to these. To reiterate my point from the introduction, this section is not intended to be a review of the literature, rather it offers a personal interpretation of certain aspects of the literature and some thoughts on research directions.

6.1 Understanding salespeople Given that selling effort is a human decision, it seems obvious that we attempt to understand the primitives that generate these decisions. While the focus of the discussion above has been on understanding the impact that selling effort has on sales (with the exception of the structural models discussed) it is just as important to understand the preferences that salespeople have for various characteristics of their employment environment and the consequent choices that these preferences create. The descriptive literature on this topic is large, and I will again refer the readers to the reviews that have been mentioned before. I will focus my attention on examples that I believe might serve as useful idea generators. The use stated preferences (via a conjoint exercise) to elicit preferences of salespeople over contract elements is a effective and successful tool in practice. The original idea is due to Darmon (1979) and has been extended by others including Mantrala et al. (1994). The approach outlined in Darmon (1979) is simple in that we offer salespeople choices of in the form of compensation tradeoffs (bonus and quota combinations in that paper) and elicit their choices or rankings. These data are then used to calibrate preferences via a model and used to optimize compensation for the salespeople. While the choices can themselves be used as the compensation plan, Darmon (1979) suggests that managers use these as inputs in their decision making process. Mantrala et al. (1994) is a more sophisticated implementation of the original ideas in Darmon (1979). The authors construct an agency theoretic model in the context of a multi-product firm. The goal is the design of compensation plan, assumed to be of the quota-bonus form, for “a geographically specialized heterogeneous sales force operating in a repetitive buying environment.” The authors outline a recipe that begins by estimating each salesperson’s utility for income and effort (defined as time spent selling) based on a salesperson’s preference rank-ordering of alternative sales quota-bonus plans. These estimated utilities are then incorporated into a mathematical program that searches for the best compensation scheme. In both of the above papers, and others of similar spirit, the core idea is that the researcher can construct contract choices are incentive compatible. If this is done accurately, the agent will self-select into the optimal contract and reveal their type. Usually, I (and I am sure others) have a healthy skepticism of stated preference data, however, in the context of the application at hand there is reason to believe that such

483

484

CHAPTER 8 Selling and sales management

data may be informative.12 Combining more formal structural models with such conjoint tasks may offer a viable avenue for future research in estimating salesperson preferences in the absence of observational data. Such an approach has already had some success in estimating discount factors (Dubé et al., 2014). Other aspects of salesperson preferences over compensation are also worth investigating. In Viswanathan et al. (2018), for example, the central question is how salespeople evaluate cash versus non-cash compensation. The paper reports on a large-scale field intervention that switched salespeople from cash plus “merchandise points” bonus to a commensurate all-cash bonus. After incorporating controls for a number of factors including salesperson, season, year, and target effects, the authors document that sales dropped by about 4.36%. In addition to compensation related primitives there also a need to understand how the salesperson reacts to other elements of the job environment. These could include organizational features such as hierarchy and reporting structures, promotion opportunities, preferences over geographies and customer groups, compensation timing and horizons, and other factors that might impact employee effort levels.

6.2 Organizing the salesforce The earliest papers on sales management focused primarily on optimal size of the salesforce (or the optimal level of selling investments) and the allocation of selling effort across the firm. The papers discussed earlier in the context of estimating the sales response to the number of salespeople (e.g. Lodish et al., 1988; Gatignon and Hanssens, 1987; Horsky and Nelson, 1996) all aimed to use the estimates of the demand system to compute the ‘optimal’ size of the salesforce and/or the allocation so salespeople across geographic territories. In most of these models, the firm trades-off incremental sales arising from increased selling against incremental expenses on account of compensation. The models are set up so that the program is concave and an interior solution exists. The early literature on sales-calls and detailing similarly focused on optimizing the volume of this effort as well as the allocation across territories, products, and other customer groups. While these papers offered a starting point to thinking about the problem, more recent research (Manchanda et al., 2004; Narayanan and Manchanda, 2009; Shapiro, forthcoming) has moved beyond the simple estimation approaches and has begun to think carefully issues pertaining to managing sales effort. The set of opportunities on this front remains large, and approaches that combine some form of structural thinking with salesforce allocation problems would be a welcome addition to the literature.

6.2.1 Territory decisions One key area is that of territory design and assignment – that is, how should an organization structure territories for salespeople so as maximize profits while sat12 It also helps that some of the authors (Andy Zoltners in particular) have created a very successful

consulting firm around some of these ideas (https://www.zs.com/).

6 Topics in salesforce management

isfying constraints pertaining to travel costs, agent preferences, as well as balance and fairness concerns. There is a significant literature on this topic, and I refer the reader to the excellent reviews by Zoltners and Sinha (1983) and Mantrala (2014). The literature on territory design and assignment has traditionally focused on the optimization aspect of the problem and treated this as an mathematical programming problem. Once again bringing together ideas that treat effort as an structural outcome as a function of the economic environment together with the optimization problem of allocating agents to territories would be an interesting avenue for further research. In particular, the empirical literature from matching could be used as a starting point. As a thought experiment, consider using the framework and estimates in Daljord et al. (2016) to create a matching function between territories and agents. Unlike the extant literature, this approach would be matching on agent and territory primitives – for example, risk averse salespeople might be matched with territories that have lower variability in sales. Such matches will have intermediate economic outcomes that may not be currently or accounted for. As in Daljord et al. (2016), matches will result in an immediate adjustment to the optimal compensation plan as a function of the match. If there are rigidities in the compensation structure or other constraints to the system, there may well be equilibrium territory allocations coupled with compensation plans that are quite different from those that only focus on the assignment problem. This is just one example where the territory assignment problem can be enriched by using theoretical constructs discussed earlier. Others may include the joint optimization of price delegation issues and territory assignment or accounting for other marketing investments that interact with effort when assigning territories.

6.2.2 Salesforce structure Other aspects relating to the organization of salesforces could also benefit from new perspectives. These include decisions the firm makes about the structure of the salesforce. For example, we know very little about the interplay between managers and salespeople and the impact these have on firm outcomes. Empirically, one line of examination would be to attempt to decouple the demand contributions of managers from salespeople. Since selling effort will be influenced by the choices managers make, one could argue that incorporating manager fixed effects might allow us to infer such influence. Unfortunately, this estimate will be confounded with other factors correlated within the managers purview such as the products and markets they manage. Ideally, one would need a random allocation of managers or at the very minimum some exogenous shock that moves managers (or salespeople) around so as to facilitate identification. Apart from managers, the influence of peers (other salespeople) on demand outcomes in something that is both practically and academically relevant. Some recent work on the topic (Chan et al., 2014a, 2014b) examines the role of peer effects among salespeople in a department store environment. In Chan et al. (2014a), the authors the interplay between compensation systems and peer effects among salespeople. Using data from a Chinese department store, they show that team based compensation al-

485

486

CHAPTER 8 Selling and sales management

lows for a softening of competition between peers while individual compensation intensifies competition. In addition, these individual compensation plans also create incentives to discount prices strategically to corner sales. Chan et al. (2014b) exploit a shift in the assignment policy at the store to examine how salespeople learn. In particular, they find that learning from peers is more salient in their data than is learning-by-doing (experience). Further, the use variation across products in the compensation policy and the variability in the difficulty of selling they are able to decouple passive observational learning from active interactive learning. The results have implications for the organization of salesforces into teams, compensation policy, and training. Related to structure is the issue of job design. Given the various elements of the selling process, a natural question might be to ask if a single salesperson should engage in all tasks in the process or should the firm seek to create specialization. For example, in a very stylized setting Misra et al. (2004) examine the interplay between generalists and specialists. A more recent look at the topic is contained in Kim et al. (2018b) that looks at the role of information asymmetry in these decisions.

6.2.3 Decision rights As I discussed earlier, salespeople are often given decision rights over non-effort investments. These decision rights could be related to pricing (such as the ability to set of discount prices), promotional decisions (such as travel and entertainment budgets that salespeople can use at their discretion), or even product decisions (such as the decision to bundle features or products to sell). Empirical research on most of these topics is scant and offers a fruitful area for further exploration. One aspect of the selling where the salesperson has decision rights is the selection of customers to target. In some of my own recent work (Jain et al., 2016), my co-authors and I examine the endogeneity of interactions between salespeople and customers in a retail setting. Using in-store video data, we are able to show that salespeople use visual cues (e.g. the attire of the customer) to determine whether or not to interact with customers and such choice also possibly depends to other supply side factors such as cumulative effort exerted by the salesperson on that day. More generally, salespeople often make choices over customers, offers, and projects – to the extent that these choices are strategically chosen they offer insights about salesperson behavior and opportunities for firms to consider interventions. One example of such choices is the nature or type of selling to adopt. In the context of detailing, we have seen evidence that selling could operate via the provision of information or via some persuasion based mechanism. I would conjecture that the salesperson chooses the mix of these approaches rather than make a binary choice. Recently, data has become available about the duration of detailing calls and to some extent the content of these calls (such as the script followed, drugs detailed etc.). Such data could be used in the context of economically derived models to examine how these choices are made and the extent to which they impact the selling process and outcomes.

6 Topics in salesforce management

6.3 Compensating and motivating the salesforce As I have mentioned before, the imbalance between empirical and theoretical literatures on salesforce compensation related topics is quite large. The literature on this topic has been organized and summarized excellently by Coughlan and Joseph (2012). In my commentary below, I will highlight certain aspects of compensation that I think deserve attention.

6.3.1 Contract elements Following Holmström (1979), there was a surge of interest in modeling salesforce compensation from an agency theoretic standpoint. The work by Basu et al. (1985) has had a lasting and indelible impact on the marketing literature on this topic. The equilibrium contracts posited in these papers though are not typically observed. I would conjecture that the ideas of aggregation and linearity of contracts outlined in Holmstrom and Milgrom (1987) were motivated, at least, in part by the dominance of the linear contract in practice. Even so, real world sales contracts have a plethora of features that are not completely well understood – these include quotas, bonuses, draws, accelerators, caps, ceilings and floors, contests and tournaments, prices and recognitions, team awards, and a variety of others (Coughlan and Narasimhan, 1992; Misra et al., 2005). Empirical and theoretical work is needed to understand the justification for these contract elements. Each of these features attempts to address some economic distortion or friction while that the same time possibly creating some. For example, compensation caps could guard against windfall gains on account of non-effort related shocks, but may result in gaming (Misra and Nair, 2011). Kishore et al. (2013) document differential effects of lump-sum (bonus) versus piece-rate (commissions) constructions of quota related compensation. The find that a shift in compensation plans from bonus to commissions led to significant and heterogeneous sales productivity improvement. They argue that the bonus plan was strictly inferior to the implemented commission plan with respect to short-term revenues and the gaming of sales. At the same time, commissions tended foster neglect of non-incentivized tasks. Along the same vein, another aspect of salesforce compensation that requires attention is the possible complementarity across compensation elements.

6.3.2 Contract shape and form Apart from linearity, perhaps the most popular contract feature in compensation contracts offered to salespeople is the presence of quotas. Raju and Srinivasan (1996) make a compelling argument that quotas are essentially a mechanism that approximates the ‘optimal’ nonlinearity found in contracts discussed in Holmström (1979) and Basu et al. (1985). The idea is that quotas are nothing more that piece-wise linear approximations that have the additional advantage that they are easy to communicate and understand. In addition, these contracts can also be made relatively heterogeneous by altering some aspects of the plan while leaving others constant. For example, salaries and quotas (targets) can be made individual specific while letting the commission rates (slopes) be common across salespeople. They show that

487

488

CHAPTER 8 Selling and sales management

simple adjustments like these piece-wise linear plans can achieve close to optimal profits even when the optimal plan is heterogeneous, smooth, and nonlinear. This idea is compelling and intuitive but to my knowledge the hypothesis has not been empirically tested. Other forms of nonlinear contracts, while popular in pricing and promotions, are somewhat rarer in the salesforce domain. At the same time the variety of shapes that are seen quota based contracts is quite large (Oyer, 1998). For example, progressive and regressive plans are both observed, as are plans with multiple kinks (multiple quotas). We know very little about the justification of these choices on the part of the firm or the particular economics that motivate these choices.

6.3.3 Dynamics Nonlinearities in the compensation contract often result in dynamic considerations. As we have seen earlier (Misra and Nair, 2011; Chung et al., 2014), quotas tend to generate dynamic incentives which result in differential allocation of effort as well a potential for gaming. Similar dynamic considers must also arise for other sales incentives such as contests or prices that have some deadline effects baked into them. Alternatively, there may be dynamic considerations that arise on account of other marketing activities that influence the selling effort of salespeople. For example, dealer incentives in the case of automobile selling or spiffs for electronic goods arise at certain points of time often tied to the time the product has been on the market. Similarly, information shifts or changes in sampling or promotional vehicles will force the medical representative to alter effort levels in detailing calls. Moreover, such reallocation of effort can occur even with anticipated changes to the environment – for example, a salesperson might ask a customer to come back to the store because there is an upcoming sale. If, how, and when these the dynamics in the firm’s activities endogenously change selling effort and warrants further research. On the firm’s side, quotas strategic choices and are updated across compensation periods based on past sales information and other information the firm might have about the market. A better understanding the optimal form of quota updating and the way salespeople learn (form beliefs) about these updates is useful in the design of quota based plans. For some initial thoughts on this front see Mantrala et al. (1997). One intriguing possibility is to think of quotas as a contract form that with some promised utility (Spear and Srivastava, 1987; Sannikov, 2008) type construct offered to salespeople. Similar to the types of arguments in Raju and Srinivasan (1996) we might attempt to measure how well quota based contracts approximate optimal dynamic contracts.

6.3.4 Other issues Ultimately, the role of compensation is to motivate and incentivize salespeople to provide optimal levels of effort. In most cases the economics are first order in that they describe the salesperson’s decision making process reasonably well. However, there are other aspects compensation such as balance, fairness, and equity which may also influence the effort a salesperson put in. Chung and Narayandas (2017) examine

7 Some other thoughts

issues of reciprocity and the difference between regular (reward for meeting quota) and punitive (a penalty for not meeting quota) compensation. Similarly, the territory design and assignment literature has long debated the trade-offs between profitability and balance (equity across salespeople in terms of territory size and potential). On a related note, ideas of fairness and equity are also directly related to the case of team compensation and relative performance contracts where we have seen limited empirical work apart from the notable exceptions discussed earlier (Chan et al., 2014a, 2014b).

7 Some other thoughts 7.1 Regulation and selling While not completely obvious, there are a number of regulatory agencies that frame the environment in which selling interactions occur. In some cases the regulations are directly ties to the selling process while in others they influence related constructs (such as pricing) that indirectly influence the manner in which selling occurs. Riggs (2012), through an extensive search and interviews, collects close to a hundred regulatory restrictions that impact the process of selling in the pharmaceutical sector. These range from placing prohibitions on gift giving (prohibiting the salesperson from “providing items for healthcare professionals’ use that do not advance disease or treatment education”) to restrictions on meals (“occasional meals may be offered as a business courtesy to healthcare professionals (including members of their staff) attending sales/marketing presentations as long as the presentations provide scientific or educational value”). He then uses a survey to examine the (perceived) impact these regulations have on various selling related activities. While the research is qualitative and specific to a particular firm, it offers some insight into how regulatory restrictions might play a role in selling. At a more aggregate level, Stremersch and Lemmens (2009) consider the case of new pharmaceuticals (15 new molecules) and the degree to which regulatory regimes impact sales across 34 countries. They find that regulation substantially explains cross-country variation in sales of pharmaceutical drugs. While some regulatory restrictions, such as manufacturer price controls, have positive effects on drug sales others, such as restrictions of physician prescription budgets and the prohibition of direct-to-consumer advertising (DTCA), tend to hurt sales. These effects persist even when accounting for controls such as national culture, economic wealth, and lagged sales. Aside from these and a handful of others, there is very little policy relevant research on the topic of detailing. This is not to say that these topics are not important. There are fundamental questions of whether or not selling creates any value or if it is harmful. For example, Larkin et al. (2017) show that when hospitals restricted the access salespeople had to physicians, there was a small but significant shift in prescriptions away from branded drugs to generics. This shift created a significant drop in the overall costs at the hospital. One could argue that selling in this case was

489

490

CHAPTER 8 Selling and sales management

contributing to the cost of healthcare. On the other hand, Chressanthis et al. (2012) suggest that the limiting access to salespeople reduces the timeliness of information flows and slows the adoption of drugs that may be more efficacious.13 Obviously the value selling creates will depend on the mechanism and context via which it occurs. Having said that, there is ample evidence that there are aspects of selling and salesforce management that are relevant to policy makers. Regulations imposed on the players in the selling system are also relevant for selling. Consider the case of Sorrell v. IMS Health Inc.14 In 2007, the state of Vermont passed a Prescription Confidentiality Law that required that a doctor’s past prescribing data not to be sold or used for marketing purposes without the consent of the doctor in question. IMS Health (and pharmaceutical manufactures) challenged the law in court and argued that the law violated their First Amendment rights. The case wound up (on appeal) at the Supreme Court who struck down the Vermont statute. While the case was primarily focused on free speech issues, the economics of such a regulation are interesting. How would selling be done without access to historical data? Would such selling be warranted in equilibrium? What would welfare implications be? There are a number of questions that are worth considering. Similarly, with the adoption of the Physician Payments Sunshine Act (PPSA), manufacturers now must submit annual data on payment and transfers of value made to covered recipients to the Centers for Medicare & Medicaid Services (CMS). While the aim of this law is to “increase transparency around the financial relationships between physicians, teaching hospitals, and manufacturers of drugs, medical devices, and biologics,” there has been some discussion of how the law has induced a reluctance of physicians to engage with salespeople. For a more complete discussion of these and other effects of the PPSA see Gorlach and Pham-Kanter (2013). These are isolated examples how regulation interacts with selling. It could be argued that any regulation related to pricing, marketing, or even data and privacy has implications for selling.

7.2 Selling in the new world Decisions pertaining to the organization and structuring of salesforces could be quite different in the new data rich world. While there are arguments that selling and salespeople will remain for the foreseeable future (Mantrala and Albers, 2010), there are structural changes to the selling environment that require new thinking and research. Hiring decisions are now routinely a function of data on a potential hire that is available on the web. There are questions related to if, and how, such information should be used in the hiring of salespeople. For example, is it OK to infer personality based on social media and use that to assess a ‘cultural’ fit with the firm? Territory design and allocation also needs to be fundamentally rethought in a world where geography is increasingly less relevant. The emergence of communication technologies 13 The reader should be aware that some of the authors of this paper have ties to a consulting for pharma

manufacturers. 14 Sorrell v. IMS Health Inc., 131 S. Ct. 2653, 2659 (2011).

References

have reduced in-person interactions while at the same time increasing the ease with which face-to-face communication is possible. Does this technology change where salespeople need to physically be located and obviate the need to geographical territories? The construct of territories will need be generalized to include non-geographical constructs, maybe even to the extent that individual customers are assigned to salespeople based on the match between their types. Such matching could even be done in real time based on moods or emotional states. Consider for example a recent patent filed by Google15 that aims to match agents to customers based on the emotional state of the customer as inferred from facial recognition data. Similar ideas based on inferred personality matches based on customer-agent call transcripts, past behaviors are now real possibilities. Research on the viability of such methods, the economics inherent therein, and the potential risks are obvious needs. Data has become close to ubiquitous, methods of analyzing such data are now more sophisticated and accessible than ever before and we have begun to see the emergence of AI tools that are close to mimicking human behavior.16 Between eDetailing, chatbots, robo-calls, AI based emails, and other such tools there is a fundamental shift in selling that is around the horizon. The provision of customized, timely, and on-demand information will almost surely be delegated to machines. What exact role of the salesperson plays in this new environment is an open question.

7.3 Concluding remarks I started this chapter with a set of facts about the importance of selling in our economy. My goal in this chapter was to offer the reader some perspective on why treating selling as a fundamental economic construct is of relevance to researchers in Marketing and Economics. In addition, I am hopeful that the discussion above outlines research possibilities and avenues moving forward.

References Allcott, Hunt, Sweeney, Richard L., 2017. The role of sales agents in information disclosure: evidence from a field experiment. Management Science 63 (1), 21–39. https://doi.org/10.1287/mnsc.2015.2327. Anderson, Erin, 1988. Transaction costs as determinants of opportunism in integrated and independent sales forces. Journal of Economic Behavior & Organization 9 (3), 247–264. https://doi.org/10.1016/ 0167-2681(88)90036-4. Anderson, Erin, Schmittlein, David C., 1984. Integration of the sales force: an empirical examination. The Rand Journal of Economics 15 (3), 385–395. http://www.jstor.org/stable/2555446. Arcidiacono, Peter, Miller, Robert A., 2011. Conditional choice probability estimation of dynamic discrete choice models with unobserved heterogeneity. Econometrica 79 (6), 1823–1867. https://doi.org/10. 3982/ECTA7743.

15 https://patents.google.com/patent/US9648171. 16 https://www.theverge.com/2018/5/9/17334658/google-ai-phone-call-assistant-duplex-ethical-social-

implications.

491

492

CHAPTER 8 Selling and sales management

Azoulay, Pierre, 2002. Do pharmaceutical sales respond to scientific evidence? Journal of Economics & Management Strategy 11 (4), 551–594. https://ideas.repec.org/a/bla/jemstr/v11y2002i4p551-594. html. Bajari, Patrick, Benkard, C. Lanier, Levin, Jonathan, 2007. Estimating dynamic models of imperfect competition. Econometrica 75 (5), 1331–1370. https://doi.org/10.1111/j.1468-0262.2007.00796.x. Banker, Rajiv D., Lee, Seok-Young, Potter, Gordon, 1996. A field study of the impact of a performancebased incentive plan. Journal of Accounting & Economics 21 (2), 195–226. https://doi.org/10.1016/ 0165-4101(95)00418-1. Basu, Amiya K., Lal, Rajiv, Srinivasan, V., Staelin, Richard, 1985. Salesforce compensation plans: an agency theoretic perspective. Marketing Science 4 (4), 267–291. http://www.jstor.org/stable/184057. Berndt, Ernst R., Pindyck, Robert S., Azoulay, Pierre, 2003. Consumption externalities and diffusion in pharmaceutical markets: antiulcer drugs. Journal of Industrial Economics 51 (2), 243–270. https:// doi.org/10.1111/1467-6451.00200. Berry, Steven, Levinsohn, James, Pakes, Ariel, 1995. Automobile prices in market equilibrium. Econometrica 63 (4), 841–890. http://www.jstor.org/stable/2171802. Caldieraro, Fabio, Coughlan, Anne T., 2007. Spiffed-up channels: the role of spiffs in hierarchical selling organizations. Marketing Science 26 (1), 31–51. http://www.jstor.org/stable/40057072. Campbell, Sheila, 2009. Promotional Spending for Prescription Drugs. Technical Report. Congressional Budget Office. https://www.cbo.gov/sites/default/files/111th-congress-2009-2010/reports/1202-drugpromo_brief.pdf. Cao, Yiming, Fisman, Raymond, Lin, Hui, Wang, Yongxiang, 2018. Target Setting and Allocative Inefficiency in Lending: Evidence from Two Chinese Banks. Working Paper 24961. National Bureau of Economic Research. http://www.nber.org/papers/w24961. Carroll, Vincent P., Rao, Ambar G., Lee, Hau L., Shapiro, Arthur, Bayus, Barry L., 1985. The navy enlistment marketing experiment. Marketing Science 4 (4), 352–374. http://www.jstor.org/stable/184061. Carroll, Vincent P., Lee, Hau L., Rao, Ambar G., 1986. Implications of salesforce productivity heterogeneity and demotivation: a navy recruiter case study. Management Science 32 (11), 1371–1388. http:// www.jstor.org/stable/2631498. Chan, Tat Y., Li, Jia, Pierce, Lamar, 2014a. Compensation and peer effects in competing sales teams. Management Science 60 (8), 1965–1984. https://doi.org/10.1287/mnsc.2013.1840. Chan, Tat Y., Li, Jia, Pierce, Lamar, 2014b. Learning from peers: knowledge transfer and sales force productivity growth. Marketing Science 33 (4), 463–484. https://doi.org/10.1287/mksc.2013.0831. Charnes, A., Cooper, W.W., Rhodes, E., 1978. Measuring the efficiency of decision making units. European Journal of Operational Research 2 (6), 429–444. https://doi.org/10.1016/0377-2217(78)90138-8. Ching, Andrew T., Ishihara, Masakazu, 2012. Measuring the informative and persuasive roles of detailing on prescribing decisions. Management Science 58 (7), 1374–1387. http://www.jstor.org/stable/ 41499562. Chressanthis, George A., Khedkar, Pratap, Jain, Nitin, Poddar, Prashant, Seiders, Michael G., 2012. Can access limits on sales representatives to physicians affect clinical prescription decisions? A study of recent events with diabetes and lipid drugs. The Journal of Clinical Hypertension 14 (7), 435–446. https://doi.org/10.1111/j.1751-7176.2012.00651.x. Chung, Doug J., Narayandas, Das, 2017. Incentives versus reciprocity: insights from a field experiment. Journal of Marketing Research 54 (4), 511–524. https://doi.org/10.1509/jmr.15.0174. Chung, Doug J., Steenburgh, Thomas, Sudhir, K., 2014. Do bonuses enhance sales productivity? A dynamic structural analysis of bonus-based compensation plans. Marketing Science 33 (2), 165–187. Chung, Doug J., Kim, Byungyeon, Park, Byoung G., 2018. How Do Sales Efforts Pay Off? Dynamic Panel Data Analysis in the Nerlove-Arrow Framework. Technical Report. Church, Roy, 2005. The British market for medicine in the late nineteenth century: the innovative impact of S M Burroughs & Co. Medical History 49 (3), 281–298. Church, Roy, 2008. Salesmen and the transformation of selling in Britain and the US in the nineteenth and early twentieth centuries. The Economic History Review 61 (3), 695–725. http://www.jstor.org/stable/ 40057607.

References

Copeland, Adam, Monnet, Cyril, 2009. The welfare effects of incentive schemes. The Review of Economic Studies 76 (1), 93–113. https://doi.org/10.1111/j.1467-937X.2008.00513.x. Coughlan, Anne T., 1993. Salesforce compensation: a review of MS/OR advances (Chapter 13). In: Marketing, pp. 611–651. Coughlan, Anne T., Joseph, Kissan, 2012. Sales force compensation: research insights and research potential. In: Handbook of Business-to-Business Marketing. Edward Elgar Publishing, United Kingdom, pp. 473–495. Coughlan, Anne T., Narasimhan, Chakravarthi, 1992. An empirical analysis of sales-force compensation plans. The Journal of Business 65 (1), 93–121. http://www.jstor.org/stable/2353176. Coughlan, Anne T., Sen, Subrata K., 1989. Salesforce compensation: theory and managerial implications. Marketing Science 8 (4), 324–342. https://doi.org/10.1287/mksc.8.4.324. Daljord, Øystein, Misra, Sanjog, Nair, Harikesh S., 2016. Homogeneous contracts for heterogeneous agents: aligning sales force composition and compensation. Journal of Marketing Research 53 (2), 161–182. https://doi.org/10.1509/jmr.14.0018. Darmon, René Y., 1974. Salesmen’s response to financial incentives: an empirical study. Journal of Marketing Research 11 (4), 418–426. http://www.jstor.org/stable/3151288. Darmon, René Y., 1979. Setting sales quotas with conjoint analysis. Journal of Marketing Research 16 (1), 133–140. http://www.jstor.org/stable/3150884. Datta, Anusua, Dave, Dhaval, 2016. Effects of physician-directed pharmaceutical promotion on prescription behaviors: longitudinal evidence. Health Economics 26 (4), 450–468. https://doi.org/10.1002/hec. 3323. Dubé, Jean-Pierre, Hitsch, Günter J., Jindal, Pranav, 2014. The joint identification of utility and discount functions from stated choice data: an application to durable goods adoption. Quantitative Marketing and Economics 12 (4), 331–377. Frenzen, Heiko, Hansen, Ann-Kristin, Krafft, Manfred, Mantrala, Murali K., Schmidt, Simone, 2010. Delegation of pricing authority to the sales force: an agency-theoretic perspective of its determinants and impact on performance. International Journal of Research in Marketing 27 (1), 58–68. https:// doi.org/10.1016/j.ijresmar.2009.09.006. Gatignon, Hubert, Hanssens, Dominique M., 1987. Modeling marketing interactions with application to salesforce effectiveness. Journal of Marketing Research 24 (3), 247–257. http://www.jstor.org/stable/ 3151635. Gönül, Füsun F., Carter, Franklin, Petrova, Elina, Srinivasan, Kannan, 2001. Promotion of prescription drugs and its impact on physicians’ choice behavior. Journal of Marketing 65 (3), 79–90. http://www. jstor.org/stable/3203468. Gorlach, Igor, Pham-Kanter, Genevieve, 2013. Brightening up: the effect of the physician payment sunshine act on existing regulation of pharmaceutical marketing. The Journal of Law, Medicine & Ethics 41 (1), 315–322. https://doi.org/10.1111/jlme.12022. Grimes, Warren S., 1992. Spiff, polish, and consumer demand quality: vertical price restraints revisited. California Law Review 80 (4), 815–856. Hanssens, Dominique M., Levien, Henry A., 1983. An econometric study of recruitment marketing in the U.S. navy. Management Science 29 (10), 1167–1184. http://www.jstor.org/stable/2631347. Hastings, Justine, Hortaçsu, Ali, Syverson, Chad, 2017. Sales force and competition in financial product markets: the case of Mexico’s social security privatization. Econometrica 85 (6), 1723–1761. https:// doi.org/10.3982/ECTA12302. Holmström, Bengt, 1979. Moral hazard and observability. The Bell Journal of Economics 10 (1), 74–91. http://www.jstor.org/stable/3003320. Holmstrom, Bengt, Milgrom, Paul, 1987. Aggregation and linearity in the provision of intertemporal incentives. Econometrica 55 (2), 303–328. http://www.jstor.org/stable/1913238. Holmstrom, Bengt, Milgrom, Paul, 1991. Multitask principal-agent analyses: incentive contracts, asset ownership, and job design. Journal of Law, Economics, & Organization 7, 24–52. http://www.jstor. org/stable/764957. Holmstrom, Bengt, Milgrom, Paul, 1994. The firm as an incentive system. The American Economic Review 84 (4), 972–991. http://www.jstor.org/stable/2118041.

493

494

CHAPTER 8 Selling and sales management

Horsky, Dan, Nelson, Paul, 1996. Evaluation of salesforce size and productivity through efficient frontier benchmarking. Marketing Science 15 (4), 301–320. http://www.jstor.org/stable/184167. Huang, Guofang, Shum, Matthew, Tan, Wei, forthcoming. Is advertising informative? Evidence from contraindicated drug prescriptions. Quantitative Marketing and Economics. https://doi.org/10.2139/ssrn. 1992182. Inderst, Roman, Ottaviani, Marco, 2009. Misselling through agents. The American Economic Review 99 (3), 883–908. http://www.jstor.org/stable/25592486. Jain, Aditya, Misra, Sanjog, Rudi, Nils, 2016. Search, sales assistance and purchase decisions: an analysis using retail video data. https://dx.doi.org/10.2139/ssrn.2699765. Kappe, Eelco, Stremersch, Stefan, 2016. Drug detailing and doctors’ prescription decisions: the role of information content in the face of competitive entry. Marketing Science 35 (6), 915–933. https:// doi.org/10.1287/mksc.2015.0971. Kim, Minkyung, Sudhir, K., Uetake, Kosuke, 2018a. A Structural Model of a Multi-Tasking Salesforce. Technical Report. Kim, Minkyung, Sudhir, K., Uetake, Kosuke, Canales, Rodrigo, 2018b. When Salespeople Manage Customer Relationships: Multidimensional Incentives and Private Information. Technical Report. Kishore, Sunil, Rao, Raghunath Singh, Narasimhan, Om, John, George, 2013. Bonuses versus commissions: a field study. Journal of Marketing Research 50 (3), 317–333. https://doi.org/10.1509/jmr.11. 0485. Lal, Rajiv, 1986. Delegating pricing responsibility to the salesforce. Marketing Science 5 (2), 159–168. http://www.jstor.org/stable/183670. Lal, Rajiv, Srinivasan, V., 1993. Compensation plans for single- and multi-product salesforces: an application of the Holmstrom-Milgrom model. Management Science 39 (7), 777–793. http://www.jstor.org/ stable/2632418. Larkin, I., Ang, D., Steinhart, J., et al., 2017. Association between academic medical center pharmaceutical detailing policies and physician prescribing. JAMA 317 (17), 1785–1795. https://doi.org/10.1001/ jama.2017.4039. Larkin, Ian, 2014. The cost of high-powered incentives: employee gaming in enterprise software sales. Journal of Labor Economics 32 (2), 199–227. Lazear, Edward P., 2000. Performance pay and productivity. The American Economic Review 90 (5), 1346–1361. https://doi.org/10.1257/aer.90.5.1346. Leffler, Keith B., 1981. Persuasion or information? The economics of prescription drug advertising. The Journal of Law & Economics 24 (1), 45–74. http://www.jstor.org/stable/725202. Lilien, G.L., Kotler, P., Moorthy, K.S., 1992. Marketing Models. Prentice-Hall International Editions. Prentice-Hall. https://books.google.com/books?id=Pw2oPwAACAAJ. Lodish, Leonard M., Curtis, Ellen, Ness, Michael, Simpson, M. Kerry, 1988. Sales force sizing and deployment using a decision calculus model at Syntex laboratories. Interfaces 18 (1), 5–20. http:// www.jstor.org/stable/25061045. Lucas, Henry C., Weinberg, Charles B., Clowes, Kenneth W., 1975. Sales response as a function of territorial potential and sales representative workload. Journal of Marketing Research 12 (3), 298–305. http://www.jstor.org/stable/3151228. MacDonald, Glenn, Marx, Leslie M., 2001. Adverse specialization. Journal of Political Economy 109 (4), 864–899. https://doi.org/10.1086/322084. Manchanda, Puneet, Chintagunta, Pradeep K., 2004. Responsiveness of physician prescription behavior to salesforce effort: an individual level analysis. Marketing Letters 15 (2/3), 129–145. http://www.jstor. org/stable/40216650. Manchanda, Puneet, Honka, Elisabeth, 2005. The effects and role of direct-to-physician marketing in the pharmaceutical industry: an integrative review. The Yale Journal of Health Policy, Law, and Ethics 5 (2), 785–822. Manchanda, Puneet, Rossi, Peter E., Chintagunta, Pradeep K., 2004. Response modeling with nonrandom marketing-mix variables. Journal of Marketing Research 41 (4), 467–478. http://www.jstor.org/stable/ 30164711.

References

Mantrala, Murali K., 2014. Sales force productivity models (Chapter 16). In: The History of Marketing Science, vol. 3. World Scientific, pp. 427–462. Mantrala, Murali K., Albers, Sönke, 2010. Impact of the internet on B2B sales force size and structure. In: Handbook of B2B Marketing. ISBM & Elgar Publishing. Mantrala, Murali K., Sinha, Prabhakant, Zoltners, Andris A., 1994. Structuring a multiproduct sales quotabonus plan for a heterogeneous sales force: a practical model-based approach. Marketing Science 13 (2), 121–144. https://doi.org/10.1287/mksc.13.2.121. Mantrala, Murali K., Raman, Kalyan, Desiraju, Ramarao, 1997. Sales quota plans: mechanisms for adaptive learning. Marketing Letters 8 (4), 393–405. http://www.jstor.org/stable/40216466. Mantrala, Murali K., Albers, Sönke, Caldieraro, Fabio, Jensen, Ove, Joseph, Kissan, Krafft, Manfred, Narasimhan, Chakravarthi, Gopalakrishna, Srinath, Zoltners, Andris, Lal, Rajiv, Lodish, Leonard, 2010. Sales force modeling: state of the field and research agenda. Marketing Letters 21 (3), 255–272. http://www.jstor.org/stable/40959645. Mishra, Birendra K., Prasad, Ashutosh, 2005. Delegating pricing decisions in competitive markets with symmetric and asymmetric information. Marketing Science 24 (3), 490–497. http://www.jstor.org/ stable/40056976. Misra, Sanjog, Nair, Harikesh, 2018. Selling to physicians: revisiting and reinterpreting detailing effects. Work in progress. Misra, Sanjog, Nair, Harikesh S., 2011. A structural model of sales-force compensation dynamics: estimation and field implementation. Quantitative Marketing and Economics 9 (3), 211–257. https:// doi.org/10.1007/s11129-011-9096-1. Misra, Sanjog, Pinker, Edieal J., Shumsky, Robert, 2004. Salesforce design with experience-based learning. IIE Transactions 36 (10), 941–952. https://doi.org/10.1080/07408170490487777. Misra, Sanjog, Coughlan, Anne T., Narasimhan, Chakravarthi, 2005. Salesforce compensation: an analytical and empirical examination of the agency theoretic approach. Quantitative Marketing and Economics 3 (1), 5–39. https://doi.org/10.1007/s11129-005-0164-2. Mizik, Natalie, Jacobson, Robert, 2004. Are physicians “easy marks”? Quantifying the effects of detailing and sampling on new prescriptions. Management Science 50 (12), 1704–1715. http://www.jstor.org/ stable/30048061. Montoya, Ricardo, Netzer, Oded, Jedidi, Kamel, 2010. Dynamic allocation of pharmaceutical detailing and sampling for long-term profitability. Marketing Science 29 (5), 909–924. http://www.jstor.org/ stable/40864673. Narayanan, Sridhar, Manchanda, Puneet, 2009. Heterogeneous learning and the targeting of marketing communication for new products. Marketing Science 28 (3), 424–441. https://doi.org/10.1287/mksc. 1080.0410. Narayanan, Sridhar, Nair, Harikesh S., 2013. Estimating causal installed-base effects: a bias-correction approach. Journal of Marketing Research 50 (1), 70–94. Narayanan, Sridhar, Desiraju, Ramarao, Chintagunta, Pradeep K., 2004. Return on investment implications for pharmaceutical promotional expenditures: the role of marketing-mix interactions. Journal of Marketing 68 (4), 90–105. https://doi.org/10.1509/jmkg.68.4.90.42734. Narayanan, Sridhar, Manchanda, Puneet, Chintagunta, Pradeep K., 2005. Temporal differences in the role of marketing communication in new product categories. Journal of Marketing Research 42 (3), 278–290. https://doi.org/10.1509/jmkr.2005.42.3.278. Nerlove, Marc, Arrow, Kenneth J., 1962. Optimal advertising policy under dynamic conditions. Economica 29 (114), 129–142. http://www.jstor.org/stable/2551549. Neslin, Scott, 2001. Roi analysis of pharmaceutical promotion (RAPP): an independent study. https:// amm.memberclicks.net/assets/documents/RAPP_Study_AMM.pdf. Oyer, Paul, 1998. Fiscal year ends and nonlinear incentive contracts: the effect on business seasonality. The Quarterly Journal of Economics 113 (1), 149–185. Parsons, Leonard Jon, Vanden Abeele, Piet, 1981. Analysis of sales call effectiveness. Journal of Marketing Research 18 (1), 107–113. http://www.jstor.org/stable/3151321. Raju, Jagmohan S., Srinivasan, V., 1996. Quota-based compensation plans for multiterritory heterogeneous salesforces. Management Science 42 (10), 1454–1462. http://www.jstor.org/stable/2634377.

495

496

CHAPTER 8 Selling and sales management

Riggs, John, 2012. A Taxonomy of Regulations: The Effect of Regulation on Selling Activities. Dissertations, Theses and Capstone Projects. https://digitalcommons.kennesaw.edu/etd/516. Rizzo, John A., 1999. Advertising and competition in the ethical pharmaceutical industry: the case of antihypertensive drugs. The Journal of Law & Economics 42 (1), 89–116. http://www.jstor.org/stable/ 10.1086/467419. Rockoff, Jonathan D., 2012. Drug reps soften their sales pitches. https://www.wsj.com/articles/ SB10001424052970204331304577142763014776148. Sannikov, Yuliy, 2008. A continuous-time version of the principal: agent problem. The Review of Economic Studies 75 (3), 957–984. http://www.jstor.org/stable/20185061. Shapiro, Bradley T., forthcoming. Informational shocks, off-label prescribing, and the effects of physician detailing. Management Science. https://doi.org/10.1287/mnsc.2017.2899. Shearer, Bruce, 1996. Piece-rates, principal-agent models, and productivity profiles: parametric and semiparametric evidence from payroll records. The Journal of Human Resources 31 (2), 275. https:// doi.org/10.2307/146064. Shearer, Bruce, 2004. Piece rates, fixed wages and incentives: evidence from a field experiment. The Review of Economic Studies 71 (2), 513–534. https://doi.org/10.1111/0034-6527.00294. Slade, Margaret E., 1996. Multitask agency and contract choice: an empirical exploration. International Economic Review 37 (2), 465–486. http://www.jstor.org/stable/2527333. Spear, Stephen E., Srivastava, Sanjay, 1987. On repeated moral hazard with discounting. The Review of Economic Studies 54 (4), 599–617. http://www.jstor.org/stable/2297484. Steenburgh, Thomas J., 2008. Effort or timing: the effect of lump-sum bonuses. Quantitative Marketing and Economics 6 (3), 235. https://doi.org/10.1007/s11129-008-9039-7. Stephenson, P. Ronald, Cron, William L., Frazier, Gary L., 1979. Delegating pricing authority to the sales force: the effects on sales and profit performance. Journal of Marketing 43 (2), 21–28. http://www. jstor.org/stable/1250738. Stremersch, Stefan, Lemmens, Aurélie, 2009. Sales growth of new pharmaceuticals across the globe: the role of regulatory regimes. Marketing Science 28 (4), 690–708. https://doi.org/10.1287/mksc.1080. 0440. Venkataraman, Sriram, Stremersch, Stefan, 2007. The debate on influencing doctors’ decisions: are drug characteristics the missing link? Management Science 53 (11), 1688–1701. https://doi.org/10.1287/ mnsc.1070.0718. Viswanathan, Madhu, Li, Xiaolin, John, George, Narasimhan, Om, 2018. Is cash king for sales compensation plans? Evidence from a large-scale field intervention. Journal of Marketing Research 55 (3), 368–381. https://doi.org/10.1509/jmr.14.0290. Williamson, Oliver E., 1979. Transaction-cost economics: the governance of contractual relations. The Journal of Law & Economics 22 (2), 233–261. Zoltners, Andris A., Sinha, Prabhakant, 1983. Sales territory alignment: a review and model. Management Science 29 (11), 1237–1256. http://www.jstor.org/stable/2630904. Zoltners, Andris A., Sinha, Prabhakant, 2005. Sales territory design: thirty years of modeling and implementation. Marketing Science 24 (3), 313–331. http://www.jstor.org/stable/40056963.

CHAPTER

How price promotions work: A review of practice and theory

9

Eric T. Andersona,∗ , Edward J. Foxb a Kellogg b Cox

School of Management, Northwestern University, Evanston, IL, United States School of Business, Southern Methodist University, Dallas, TX, United States ∗ Corresponding author: e-mail address: [email protected]

Contents 1 Introduction ...................................................................................... 2 Theories of price promotion ................................................................... 2.1 Macroeconomics ................................................................... 2.2 Price discrimination ............................................................... 2.2.1 Inter-temporal price discrimination ........................................ 2.2.2 Retail competition and inter-store price discrimination ................ 2.2.3 Manufacturer (brand) competition and inter-brand price discrimination .................................................................. 2.3 Demand uncertainty and price promotions .................................... 2.4 Consumer stockpiling of inventory .............................................. 2.5 Habit formation: Buying on promotion ......................................... 2.6 Retail market power ............................................................... 2.7 Discussion .......................................................................... 3 The practice of price promotion .............................................................. 3.1 Overview of trade promotion process ........................................... 3.2 Empirical example of trade rates ................................................ 3.3 Forms of trade spend ............................................................. 3.3.1 Off-invoice allowances ....................................................... 3.3.2 Bill backs ....................................................................... 3.3.3 Scan backs ..................................................................... 3.3.4 Advertising and display allowances........................................ 3.3.5 Markdown funds .............................................................. 3.3.6 Bracket pricing, or volume discounts ..................................... 3.3.7 Payment terms ................................................................ 3.3.8 Unsaleables allowance ....................................................... 3.3.9 Efficiency programs........................................................... 3.3.10 Slotting allowances............................................................ 3.3.11 Rack share ..................................................................... 3.3.12 Price protection................................................................ 3.4 Some implications of trade promotions ........................................ Handbook of the Economics of Marketing, Volume 1, ISSN 2452-2619, https://doi.org/10.1016/bs.hem.2019.04.006 Copyright © 2019 Elsevier B.V. All rights reserved.

498 502 502 503 504 505 507 508 510 511 512 513 513 514 515 518 518 519 519 520 520 521 521 521 521 521 521 522 522

497

498

CHAPTER 9 How price promotions work: A review

3.5 Trade promotion trends ........................................................... 3.6 Planning and tracking: Trade promotion management systems............. 4 Empirical literature on price promotions .................................................... 4.1 Empirical research – an update ................................................. 4.1.1 Promotional pass-through ................................................... 4.1.2 Long-term effects of promotion............................................. 4.1.3 Asymmetric cross-promotional effects .................................... 4.1.4 Decomposition of promotional sales....................................... 4.1.5 Advertised promotions result in increased store traffic ................ 4.1.6 Trough after the deal ......................................................... 4.2 Empirical research – newer topics .............................................. 4.2.1 Price promotions and category-demand ................................. 4.2.2 Cross-category effects and market baskets .............................. 4.2.3 Effectiveness of price promotion with display ........................... 4.2.4 Coupon promotions ........................................................... 4.2.5 Stockpiling and the timing of promotions ................................ 4.2.6 Search and price promotions ............................................... 4.2.7 Targeted price promotions................................................... 4.3 Macroeconomics and price promotions ........................................ 4.4 Promotion profitability ............................................................ 5 Getting practical ................................................................................. 5.1 Budgets and trade promotion adjustments .................................... 5.2 Retailer vs. manufacturer goals and issues .................................... 5.3 When decisions happen: Promotion timing and adjustments ............... 5.4 Promoted price: Pass-through ................................................... 5.5 Durable goods price promotion .................................................. 5.6 Private label price promotions ................................................... 5.7 Price pass through................................................................. 6 Summary .......................................................................................... References............................................................................................

522 523 526 526 526 527 528 529 530 531 531 532 532 533 534 534 535 535 536 537 537 538 539 541 542 543 544 545 545 545

1 Introduction Price promotions, which are temporary price changes offered by a seller, are recognized by academics as an important source of price variation (Klenow and Malin, 2010). A simplistic view would hold that price promotions are unilateral decisions by a seller to respond to immediate changes in supply or demand. For economists, it is important to understand the extent to which price promotions conform to what Robert Hall referred to as the Keynesian sticky price paradigm of “call options with unlimited quantities” (Klenow and Malin, 2010). While some price promotions do conform to this paradigm the vast majority do not. Most price promotions involve (i) joint, or coordinated, decision-making among retailers and manufacturers, and (ii) long-term planning. This has important implications for marketing, industrial organization, and macroeconomics which we will highlight in this chapter. The financial transfers from manufacturers to retailers that lead to most price promotions, which is known as trade spend, represent a substantial part of the global

1 Introduction

economy. For consumer packaged goods (CPG), trade spend by manufacturers is estimated to be as much as $500 billion globally, which represents more than half of the marketing budgets of CPG retailers (Corstjens and Corstjens, 1995, p. 239; Gómez et al., 2007). Price promotions are also critical for driving demand in many durable goods markets, such as automobiles, and in retail markets, such as sporting goods, apparel, and electronics. Among academics, there has been a heavy emphasis on CPG and retail markets due to the wide availability of data. We focus much of our attention on CPG markets but broaden our discussion in the final section to address durable goods markets. Within CPG, many studies have shown that, though a small percentage of items are price promoted at any given time, together they represent a disproportionate fraction of overall sales. ACNielsen found that 42.8% of US grocery store sales in 2009 were price promoted products. That same year, 40.4% of US drug store sales were found to be price promoted products (Felgate et al., 2012). The importance of price promotions is not limited to US markets. ACNielsen found that 12% to 25% of European retail grocery retail revenues in 2004 came from price promoted products (Gedenk et al., 2006). A senior CPG leader that we interviewed for this chapter noted that price promotions are commonplace all over the world, in both developed and emerging markets. Trade spend represents the second largest category of CPG manufacturer expenditures, after cost of goods (Gómez et al., 2007, p. 410). In a recent publication based on a survey of managers, Acosta (2012a, 2016) reported that trade spend averages 10% to 25% of manufacturer revenue (i.e., revenue derived from retailers, not point-ofsale revenue). In addition, a study by Boston Consulting Group (2012) reported that trade spend was 17.3% of revenues for nine large firms with a total of fifty billion in sales. Gartner (2015) reports that “upwards of 25%” of revenue is spent on trade promotions. The research reported in this chapter provides corroborating evidence that support these estimates.1 To many academics, price promotions are synonymous with “sales” or price discounts. There has been considerable debate among macroeconomists about whether to include or exclude “sales” from various metrics like the CPI or PPI, the impact of “sales” on the duration or stickiness of prices, and whether promotional price changes should be expected to be related to broader macroeconomic activity (e.g., Bils and Klenow, 2004; Nakamura and Steinsson, 2008; Klenow and Malin, 2010; Anderson et al., 2017). While we recognize the importance of these issues, our goal

1 To add further credibility to these metrics, we reviewed financial reports of CPG manufacturers. Many publicly traded CPG manufacturers report trade spend as an accrued liability in their financial statements. For example, P&G reported marketing and promotion accrued liabilities of $2.9 billion in 2015. If trade funds take 90 days to settle, this implies an annual trade spend of 4 x $2.9 billion = $11.6 billion. In 2015, P&G reported net sales of $76.2 billion and our rough calculation would suggest that trade spend is 11.6/76.2 = 15%, which is consistent with Acosta’s findings. In the same 2015 financial report, P&G reported $8.3 billion in advertising expenditure (11% of sales), which is also consistent with the Acosta surveys.

499

500

CHAPTER 9 How price promotions work: A review

in this chapter is to provide institutional details about the practice of price promotions to allow experts in these areas to arrive at more informed answers. To microeconomists and empirical marketing academics, price promotions provide variation in price that is fundamental to demand estimation (Nevo, 2000). In far too many papers, it is common to claim that “prices are endogeneous” and authors then pursue various solutions, including instrumental variables. This chapter provides institutional details about how price promotion decisions are made. For example, few academics are aware of the roles of scanbacks, bill backs, off-invoice allowances, trade rates, accrual accounts, and deal sheets. Such details will allow researchers to more carefully assess whether and why there is a price endogeneity problem. Again, we don’t want to diminish the issue of price endogeneity but we do hope that our chapter allows researchers to make more informed assessments. With this as a backdrop, we have four broad goals for the chapter. First, we highlight the theoretical foundations of price promotions, drawing on literature from both economics and marketing. We believe that theoretical models of price promotions are incomplete and no single model integrates four key features: a. Vertical Channel: manufacturers and retailers influence price promotions via a planned, negotiated process. b. Price Discrimination: manufacturers and retailers have incentives to price discriminate due to demand heterogeneity. c. Competition: manufacturers and retailers operate in oligopolistic markets. d. Consumer Behavior: stockpiling by consumers and habit formation. Existing theoretical models of price promotions fail to incorporate all of these features, which may cloud our understanding of how markets function. Second, we describe the mechanisms by which price promotions are implemented in practice, based on the experiences of retailers and manufacturers. We will show that price promotions are typically a planned, negotiated process in a vertical channel that involves both retail and manufacturer competition and price discrimination in a market with strategic consumers. Third, we summarize the recent empirical literature on price promotions. The last major review of this literature was published in marketing by Blattberg et al. (1995). Blattberg and Neslin’s (1990) seminal book on sales promotion was updated more recently (Neslin, 2002). Given this history, the focus of our empirical literature review is on published work between 1995 and 2018, though we do review foundational research prior to 1995. We attempt to highlight where empirical findings are consistent or inconsistent with economic models of price promotions. We believe that many of the current inconsistencies may be attributed to not capturing the institutional factors that we highlight in this chapter. Fourth, we identify gaps between academic and practitioner perspectives with an eye towards increasing the impact and relevance of academic research. To that end, we summarize key findings from numerous depth interviews with managers and identify academic research opportunities. Integration of academic thought leadership

1 Introduction

with practical problems has been a hallmark of research in marketing and economics for decades. Our hope is that this chapter helps to advance this mission. Before proceeding, we want to broadly address two important questions. First, why do price promotions exist? In particular, one might ask how price promotions benefit manufacturers and retailers. Second, how important are financial flows related to price promotions compared to other marketing activities, like advertising? To answer the first question, we offer both short-term and long-term perspectives. In the short-term, price promotions increase the sales of promoted products (see Blattberg et al., 1995 for a summary of the evidence). Interestingly, the resulting benefits accrue to manufacturers and retailers differently (Srinivasan et al., 2004; Ailawadi et al., 2007), reflecting their differing incentives. Increased sales of promoted products benefit manufacturers primarily by attracting consumers to switch from competing brands. To a lesser extent, manufacturers also benefit because price promotions cause consumers to accelerate their purchases and stockpile—buying before, or for a longer consumption horizon, than they would have otherwise (Gupta, 1988; Bell et al., 1999). Purchase acceleration and stockpiling benefit manufacturers by effectively precluding consumers from switching to competing brands on purchase occasions foregone. Finally, manufacturers may also benefit if price promotions increase consumption rates (Assuncao and Meyer, 1993), thus increasing demand for products in the category. Retailers benefit similarly from higher category demand but typically not from brand switching, which is the primary source of manufacturers’ benefit. Purchase acceleration and stockpiling may benefit retailers if they discourage shopping at competing retailers (Walters, 1991). Importantly, retailers also benefit from price promotions that prompt consumers to make incremental visits to their stores, not only because those consumers purchase the promoted products, but also because they make other planned purchases (Kollat and Willett, 1967), unplanned purchases (Walters, 1991), and purchases of complementary products (Manchanda et al., 1999). These “basket building effects” are the primary benefit for retailers. The long-term effects of price promotion are decidedly less beneficial. The primary drawback is that consumers “learn” to buy on promotion, which makes them more price sensitive and more responsive to future promotions (Papatla and Krishnamurthi, 1996; Mela et al., 1997; Jedidi et al., 1999). If consumers are trained to “buy on deal” then sellers face increasingly price-sensitive consumers. For manufacturers, an adverse long-term outcome of price promotions is the reduction of product differentiation and brand equity (Jedidi et al., 1999; Sriram and Kalwani, 2007). Manufacturers may benefit from price promotions causing consumers to try new products, but this benefit is offset by the long-term drawbacks (Ataman et al., 2008). In sum, price promotions offer short-term benefits but typically exact a long-term cost from both manufacturers and retailers. Regarding the importance of price promotions, a recent study by Boston Consulting Group (2012) noted that trade spend by large CPG firms averaged 12 times their R&D budget. Further, trade spend was 1.5 times the size of the typical advertising budget. For many CPG manufacturers, trade spend dominates advertising spend. As a specific example, Kraft publicly reported advertising expenses of $652 million while

501

502

CHAPTER 9 How price promotions work: A review

P&G reported advertising spending of $9 billion in 2014. If trade spend is roughly 15% of revenue (Acosta, 2012a, 2012b) then Kraft spent approximately $2.7 billion and P&G spent $12.4 billion on trade. In sum, while academics have recognized the importance of topics like innovation and advertising, trade spend is, on average, larger for CPG firms but largely neglected by researchers.

2 Theories of price promotion Before reviewing empirical findings, we will discuss the theoretical foundations of price promotions from macroeconomics, microeconomics, and marketing. To clarify our discussion, we distinguish retail promotions from trade promotions. Retail promotions include price promotions that retailers offer to consumers as incentives to make purchases. Trade promotions are promotions that manufacturers offer to retailers as incentives to offer retail promotions. These definitions suggest that trade promotions precede retail promotions and, perhaps more importantly, imply that retail promotions are a consequence of trade promotions. Manufacturer-retailer interactions require more complicated models. As we will argue, this complexity has limited the development of trade promotion theory and has resulted in theoretical models that are not necessarily consistent with empirical observations.

2.1 Macroeconomics We start with the macroeconomics literature, which has attempted in recent decades to reconcile models of the economy with pricing patterns observed in grocery stores and other retailers of fast-moving consumer goods. The emphasis on this industry is due, at least in part, to availability of data from syndicators such as IRI and Nielsen, and from the Bureau of Labor Statistics (BLS). Recently, scraped data from web sites has become another relevant data source. Much of the emphasis in this literature is on documenting the consistency between observed prices and theoretical macroeconomic models. In contrast to microeconomic and marketing models, which we will address later, macroeconomic models typically focus on understanding the broader economy. All of the macro models that we are aware of abstract away from the vertical channel (i.e., manufacturer and retailer) and consider a single firm. In addition to widespread availability of data, the CPG industry may have attracted interest among macroeconomists because of “moderate” price adjustment costs. For example, retail petroleum has very low costs of adjusting prices but business to business markets have relatively high adjustment costs. To some extent, CPG is an interesting laboratory for testing sticky price models. Eichenbaum et al. (2011) studied data from a large retailer in the CPG industry and demonstrated that a variant of the Dixit-Stiglitz model of monopolistic competition is broadly consistent with empirical prices, including price promotions. In contrast, they found that menu cost models, such as Golosov and Lucas (2007) and Burstein and Hellwig (2007), are generally inconsistent with empirical pricing pat-

2 Theories of price promotion

terns. A key inconsistency is that the latter models imply that prices are less volatile than marginal costs; empirically, the opposite is observed. They also found that the Calvo (1983) model, which is among the most widely used pricing models in macroeconomics, is inconsistent with the data. Klenow and Kryvtsov (2008) distinguished sticky price models, which vary with time (“time dependent”), from menu cost models (“state dependent”). Timedependent models, such as Calvo (1983) and Taylor (1980), imply that firms adjust prices randomly or every N periods. State-dependent models, such as Dotsey et al. (1999) and Golosov and Lucas (2007), assume that prices adjust with the state of the economy. One key metric that Klenow and Krystov focused on is how these models explain price inflation. They decomposed the variance in inflation into two parts: the intensive margin and extensive margin of adjustment. With BLS data that is used to construct the Consumer Price Index, they showed that the variance in price inflation is largely due to the intensive margin (i.e., the size of price changes) rather than the extensive margin (i.e., the fraction of items with price changes). Time dependent models, such as Calvo and Taylor, assume that all variance in inflation is explained by the intensive margin and don’t allow for any variation in the extensive margin. While time dependent models are broadly consistent with the decomposition, they cannot explain the fact that some adjustment occurs on the extensive margin. In contrast, state dependent models can potentially explain inflation variance on both margins of adjustment. However, simulations of Dotsey et al. (1999) show that most of the variance in inflation is on the extensive margin, which is inconsistent with the data. In addition, state dependent models like Golosov and Lucas (2007) fail to explain the sizable number of small price changes observed in the BLS data. Klenow and Kryvtsov (2008) briefly mentioned that newer macroeconomic models can explain more of the empirical facts. For example, Kehoe and Midrigan (2007) and Midrigan (2011) allow menu costs to vary in magnitude for regular versus sale prices, which leads to greater consistency with observed prices in the BLS data. While there is substantial evidence of menu costs when setting prices (Slade, 1998; Levy et al., 1997; Anderson and Simester, 2010), standard menu cost models (i.e., state dependent models) may need to be more flexible to explain observed prices.

2.2 Price discrimination A common view in microeconomics and marketing is that price promotions are used to price discriminate among end-users or consumers. Interestingly, none of the practitioners that we consulted while preparing this chapter mentioned price discrimination as a rationale for price promotions. Thus, while academics widely cite price discrimination to explain price promotions, practitioners apparently do not. A central premise of models that lead to variability in pricing over time (among sellers or products) is heterogeneity in supply and/or demand. Our review of the literature shows that heterogeneity in demand is the primary mechanism in most of these models. This may be due to the observation that cost heterogeneity among sellers may not be sufficient to generate price variation, as high cost suppliers would eventually be driven out of the market.

503

504

CHAPTER 9 How price promotions work: A review

We categorize the literature into three ways that price discrimination may arise: (1) inter-temporal, (2) inter-store, or (3) inter-brand. In the case of inter-temporal price discrimination, some shoppers are able to wait for a price promotion to be offered, when they will be able to buy at a low price. Other shoppers are unable to wait for a future price promotion so they will buy at the current price, often not a low promoted price. Inter-temporal price discrimination implicitly assumes that consumers strongly prefer to shop at a particular store and purchase a particular brand, so they will not switch. The case of inter-store price discrimination is conceptually similar, but assumes that shoppers have a uniformly strong brand preference. In this case, however, switchers are willing to buy from whichever store advertises a lower promoted price on their preferred brand. Loyals will pay whatever price their preferred retailer sets for that brand, often not a low promoted price. In the case of inter-brand price discrimination, consumers have a strong preference to shop at a given store, but some consumers are willing to switch brands to take advantage of price promotions while others are not. Switchers will generally purchase whichever brand the store offers at a low promoted price; loyals will purchase their preferred brand at whatever price the store offers.

2.2.1 Inter-temporal price discrimination Varian (1980) is often credited with introducing price discrimination as a theoretical basis for price promotion. He analyzed the case of a monopolist retailer facing some consumers who are informed about prices and others who are not. This might reflect the fact that only some consumers read retailers’ feature ads. Owing to the discrete nature of demand, the solution to this model is a mixed strategy equilibrium with a mass point where consumers are indifferent between buying and not buying. While the model treats all prices equivalently, the maximum price is interpreted as the regular price and the continuous distribution of lower prices as “sale” or promotional prices. A limitation of Varian’s model is its focus on a single firm that sets prices, which abstracts away from the vertical channel (e.g., retailer and manufacturer). In addition, the model predicts that price promotions follow a continuous distribution, conditional on the price being less than the regular price. This is inconsistent with empirical evidence that many brands are promoted at a small set of prices that are predictable (i.e., not random). As we will show later, one common strategy is to simply repeat the price promotion from a previous year. Despite its limitations, Varian’s seminal work motivated various demand-based explanations for price promotions including Narasimhan (1984), Raju et al. (1990), Rao (1991), Simester (1997), Anderson and Kumar (2007), and Sinitsyn (2008). Based on these theoretical studies, Rao et al. (1995) concluded that “competitive promotions are [emphasis ours] mixed strategies” (p. G96). In the spirit of Varian’s original work, these extensions typically assumed discrete demand (e.g., loyal and switching consumer segments) and interpreted mixed pricing strategies as price promotions. While this interpretation is intuitive and convenient, these models suffer from the same limitations as Varian’s original work.

2 Theories of price promotion

More recent price discrimination models incorporate retail competition and yield more nuanced equilibrium solutions. Díaz et al. (2009) developed a sequential model of price competition in a retail duopoly. In the first stage, the retailers simultaneously choose a regular, or “list” price; in the second stage, the retailers simultaneously choose a discount. They found a subgame perfect equilibrium in which both retailers play pure strategies. An implication of this work is that retail price promotions are not random, but rather reflect strategic decisions about regular and discounted prices. This is more consistent with empirical evidence and practitioners’ self-described decision-making processes.

2.2.2 Retail competition and inter-store price discrimination The literature reviewed in the previous section focuses primarily on inter-temporal discrimination.2 Marketing researchers have also focused on inter-store price discrimination. While retail competition is incorporated in some models of inter-temporal price discrimination, it plays a broader role in models in which the retailer’s objective is to drive traffic to the store (Hess and Gerstner, 1987). Marketing researchers have documented that price promotions may lead to store switching (Bell and Lattin, 1998). A long-held strategy for many retailers is to drive store traffic with the expectation that a customer will buy other items on a shopping trip (Richards, 2006). Many retailers fear that, if they do not price promote, they will not be able to generate customer excitement and trips to their stores or web site. In theory, retailers could increase revenue by increasing trip frequency, increasing spending per trip, or both. In practice, it is generally easier for retailers to grow via increased trip frequency than larger basket size (Simester et al., 2009). When selecting which items to promote, those with broad appeal among consumers are more desirable because they can efficiently generate store traffic. This tilts the playing field towards large brands with greater market share. A second factor is the magnitude of the price promotion. Saving ten cents on a low-priced item is unlikely to affect a store trip; saving hundreds of dollars on an expensive item is. In grocery stores, diapers are frequently used to drive store traffic because they are relatively expensive and are in high-demand from families with young children (Gönül and Srinivasan, 1996). If discounting a single item, like diapers, does not provide enough incentive to visit a store, then offering discounts on many items simultaneously can be an effective strategy (Lal and Matutes, 1994). In either case, promoted items typically have some degree of durability (i.e., lasting for weeks or months). To generate regular, weekly visits from the same consumer, a retailer may rotate the items that are discounted from period to period (Anderson and Kumar, 2007). In addition, retailers may differentiate themselves by promoting different brands, Pepsi

2 Two of the price discrimination models, Díaz et al. (2009) and Braido (2009), analyzed duopoly/ oligopoly markets and so incorporated inter-store as well as inter-temporal price discrimination.

505

506

CHAPTER 9 How price promotions work: A review

versus Coca-Cola for example, or different brand-packages, such as Pepsi 2 liter bottles versus Pepsi 24 pack cans (Anderson et al., 2004; Cao, 2011). These intuitions for competitive, multi-product retailers have been captured in various theoretical models. Braido (2009) incorporated retail competition and other factors in a very general model of price discrimination. The model accommodates both symmetric and asymmetric retail competition, multiple products, and arbitrary cost functions. In contrast to demand-based explanations in the Varian tradition, this model relies on costs rather than varying levels of consumer price sensitivity to drive price variation. Braido showed the existence of a Nash equilibrium in mixed strategies with an endogenous sharing rule. He proceeded to identify scenarios in which prices are necessarily random, which is interpreted as retail price promotions. This work represents a flexible, supply-side variation on the price discrimination explanation of retail price variation. Rhodes (2014) investigated important issues of pricing and promotion for multiproduct retailers selling to consumers who incur a search cost to learn prices. This reflects a perhaps outdated reality that consumers must visit a store to learn product prices at any particular time. Rhodes found that a retailer that offers more products should charge lower prices, though with little discounting. On the other hand, a retailer that offers fewer products should charge higher prices with deeper discounts. Generalizing to the case of competing multiproduct retailers,3 he found that competing retailers should charge a high regular price with occasional random discounts. Sinitsyn (2012) developed a model of competition between retail firms that sell complementary products, and investigated coordination of price promotions for those products.4 In his model, retailers face consumers who are either loyal, and so would rather purchase complementary products from their preferred retailer, or non-loyal. Analysis of the model for various combinations of parameter values showed that most equilibria involve discounting complementary products at the same time. Sinitsyn’s result suggests that multi-category retailers selling to loyal and non-loyal consumers should generally synchronize their price promotions. Shelegia’s (2012) model of multiproduct retailer price competition allows for not just complementary products, but also for products to be substitutes or independent in use. Using a duopoly model in which retailers sell two products to consumers of differing loyalty levels (see, for example, Sinitsyn, 2008), he showed that the equilibrium solution involves mixed strategies for both products’ prices. Only in the case of complementary products should product prices be related; in the cases of substitutes and independence, product prices should be unrelated. It is worth noting that Pesendorfer (2002) had previously conducted a detailed empirical study of synchronization in multiproduct retailers’ price promotions.

3 The case of competing multiproduct retailers was limited to just two products per retailer. 4 Note that this model assumes generic sellers offering products directly to consumers. The results are

applicable to retailers as sellers.

2 Theories of price promotion

2.2.3 Manufacturer (brand) competition and inter-brand price discrimination Researchers have also rationalized price promotions as a competitive tool for brands. Price promotions typically have a short-term impact on brand switching (Gupta, 1988; Bell et al., 1999), effectively stealing share from competitors. However, as discussed previously, the long-term benefits of price promotions for brands are less clear (Anderson and Simester, 2004). Lal and Villas-Boas (1998) extended Varian’s framework to include multiple manufacturers and retailers, and also addressed the case of retailers selling exclusive products. Their model assumes two competing single-product manufacturers selling through two competing retailers. The retailers sell to both loyal consumers and switchers. First considering the case in which manufacturers sell exclusively to one retailer, Lal and Villas-Boas found that incorporating retailers in the channel of distribution raises prices and lowers competition compared to direct consumer sales. They also found consistent retailer pass-through of trade promotions (i.e., manufacturer discounts), but that retailer margins increase with trade promotions. Interestingly, trade promotions should occur more frequently than retailer price promotions. Lal and Villas-Boas also analyzed the non-exclusive case in which manufacturers sell their products to both retailers—this is more common in practice. In this case, the lower-priced brand will always have more manufacturer discounts and the retailer will, in turn, adopt a mixed strategy in setting the retail price. Across the two cases, the most compelling result is that larger trade promotions should result is a lower percentage pass-through of those promotions by retailers. Rao (1991) modeled promotional decisions by manufacturers (vs. retailers) in a multistage game. This work recognizes the role of manufacturers in incentivizing retail price promotions and allows for asymmetric competition between a national brand and a private label. Each manufacturer chooses a single regular price, then the depth(s) of discount from regular price (many different discounts could be chosen), then the frequency of discounts. The manufacturers face consumers who differ in terms of their preference for the national brand, resulting in different degrees of price sensitivity. Rao’s primary finding is that, in equilibrium, the weaker private label manufacturer is unlikely to offer incentives for price promotion. Manufacturer inventory considerations have also been shown to play a strategic, competitive role in price promotions. Lal et al. (1996) specifically addressed forward buying by retailers that is induced by manufacturer trade promotions. They developed a dynamic model with a monopolist retailer to investigate manufacturer competition via trade promotions. Manufacturers offer branded products; consumers are assumed to vary in their brand (but not store) loyalties. Analysis of the model shows that forward buying is profitable for both retailers and manufacturers. Retailers benefit because forward buying enables them to stockpile products when trade deals are offered. Manufacturers benefit because forward buying causes the frequency of trade deals to be reduced. As we will discuss later, this has become less of an issue in practice as manufacturers have shifted towards trade promotions designed to eliminate forward buying. In related work, Cui et al. (2008) showed that price promotions may

507

508

CHAPTER 9 How price promotions work: A review

FIGURE 1 Theoretical rationales for price promotions.

allow manufacturers to price discriminate among large, dominant retailers and small retailers based on their inventory holding costs and ordering practices. Cao (2011) developed a relatively robust model, albeit with a monopolist retailer, to explain a set of empirical observations about retail price promotions. His model specifies duopoly manufacturers selling multiple products to the aforementioned single retailer, which in turn resells them to consumers with heterogeneous reservation prices. The combination of reservation price heterogeneity and the retailer’s monopoly power is sufficient to support an equilibrium that matches the empirical observations about price promotions. Specifically, Cao determined that the monopolist retailer should employ a pure strategy of price promotions if the low-reservation price segment is attractive enough. The low-reservation price segment is also sufficient for the competing manufacturers to employ a mixed strategy in offering trade promotions.

2.3 Demand uncertainty and price promotions The price discrimination models reviewed above typically require heterogeneity in demand to yield firm policies recognizable as price promotions. Demand uncertainty provides another rationale for price promotions, and Fig. 1 provides a comprehensive taxonomy of these various theories.5 The first branch of the taxonomy in Fig. 1 divides theories into those with known versus uncertain demand. When demand is known, we further divide explanations depending on whether demand has a temporal component, and so is time-driven, or

5 We thank Greg Shaffer, University of Rochester Simon School of Business, for bringing this framework to our attention, which is from Dolan and Simon (1996).

2 Theories of price promotion

not. Demand for seasonal goods, like cranberries (holiday), snow shovels (winter), and swimwear (summer) are time driven, which may lead to inter-temporal price discrimination as in Section 2.2.1. Demand for non-seasonal goods such as laundry detergent and coffee makers do not have naturally time-varying demand. When demand information is known and not time driven, the firm may offer price promotions to induce trial, accelerate purchases, and/or take advantage of built-up demand. One rationale for inducing trial is that, while the firm may know the true quality or value of its product, consumers may not. By incentivizing purchases, price promotions can therefore lead to consumers learning about quality, which in turn leads to future purchases (Erdem and Keane, 1996). A firm can also use price promotions to shift or accelerate purchases from the future into the present. For example, if the price discount is deep enough, consumers may be willing to purchase enough carbonated beverages for the next several weeks. Research has shown that purchase acceleration can boost short-term sales but may not have a long-term impact; in other words, firms often steal from their own future demand (Anderson and Simester, 2004; Srinivasan et al., 2004). In consumer packaged goods, Gupta (1988) and Bell et al. (1999) found that purchase acceleration is generally small in magnitude compared to brand switching (discussed in more detail later in this section). Finally, firms may offer price promotions to take advantage of a “build up” in demand of low value consumers who are unwilling to pay a high price (Conlisk et al., 1984; Sobel, 1984; Pesendorfer, 2002). Such a strategy may fail if high value consumers are strategic and wait for price discounts. Fairness can also be a concern if firms skim the market and charge higher prices initially while offering price promotions later (Anderson and Simester, 2008). When demand is known and is time driven, we are in a situation of peak load pricing. Price promotions or discounts may be used to shift the timing of consumer purchases. This type of pricing is more common when there is a fixed capacity constraint, such as a restaurant with a limited number of seats. Price promotions may be used to encourage consumers to dine on a weekday rather than weekend, for example. A critical link between price discrimination and capacity constraints is illustrated by Anderson and Dana (2009), who showed that price discrimination is not profitable for a monopolist, absent a quality constraint. Anderson and Dana’s model integrates work by Stokey (1979), who focused on the demand-side, and Salant (1989), who focused on the supply side, on whether price discrimination is profitable for a monopolist. Surprisingly, we often observe retailers and manufacturers promoting items during peak season. For example, it is very common to offer price promotions during holiday periods. Peak load pricing implies the opposite: prices should rise in peak demand periods (holding supply fixed). In consumer packaged goods, this puzzle was examined empirically by Chevalier et al. (2003). They found that prices tend to fall during peak holiday periods and attribute this to loss-leader pricing behavior on the part of retailers. When demand is unknown, price promotions may be offered to either learn about demand—demand probing—or to engage in yield management (Misra et al., 2019).

509

510

CHAPTER 9 How price promotions work: A review

A/B testing of price promotions is now extremely common and such tests often allow managers to answer the question: “Do I have the right promoted price?” Many online retailers regularly conduct A/B price promotion tests. Yield management is well known for airline pricing, but is also implicitly used for fashion goods, which are highly seasonal and for which demand is uncertain. Retailers launch such products with high prices and, if the item does not sell, the price is then reduced. In both A/B testing and yield management, the goal is to learn about demand and then respond with an optimal price. One difference is that, in A/B testing, price promotions may be inputs to the learning process (e.g., low and high price test conditions); in yield management, price promotions may be outcomes (e.g., a low price in response to low demand). Excess supply is a rationale for price promotions that follows from demand probing or yield management. When retailers or manufacturers find themselves facing excess supply, it is common to offer a price promotion. Excess supply can occur for many reasons, including overly optimistic demand forecasts as well as unanticipated value decay, spoilage or obsolescence (e.g., Lazear, 1986; Pashigian, 1988; Pashigian and Bowen, 1991; Rakesh and Steinberg, 1992). For example, in the U.S. car industry there are regular audits of inventory and, when thresholds are exceeded, either consumer cash or dealer cash are offered by manufacturers to induce lower prices (Busse et al., 2006). When products are in excess supply, the price promotions may be permanent rather than temporary, and remain until all excess units are sold. This is particularly true for seasonal goods.

2.4 Consumer stockpiling of inventory Another possible explanation for retail price promotions is that they influence consumers’ purchase timing and quantity, which is a specific form of inter-temporal price discrimination (see Section 2.2.1). Among the first to study this phenomenon were Blattberg et al. (1981), who proposed an inventory-theoretic model of retail pricing. The link between consumer stockpiling and price promotions has been examined subsequently by numerous researchers, including Jeuland and Narasimhan (1985), Assuncao and Meyer (1993), Bell et al. (2002), and Hendel and Nevo (2013). In their pioneering work, Blattberg et al. (1981) posited that promotions are a mechanism to shift inventory-carrying costs between retailer and consumer. In their inventory control model, a monopolist retailer sells a single product with an exogenous regular price.6 There are two types of consumers: high holding cost and low holding cost. This typology might reflect the difference between consumers who live in apartments with limited pantry space and consumers who live in larger homes with less binding constraints. Both retailer and consumers try to minimize their inventory costs, trading off acquisition costs (“cost of goods” for the retailer; “retail price” for the consumer) against holding costs. Analysis of this model shows that consumers

6 Blattberg et al. further assumed no manufacturer incentives, i.e., trade promotions.

2 Theories of price promotion

should stockpile products when prices are discounted, and that higher-demand products should be discounted less deeply but more often. Finally, the frequency of discounts should increase with holding costs and demand. An accompanying empirical analysis found support for this model, preferring it to an alternative model of retail price promotions as a mechanism for stimulating consumer trial. Hong et al. (2002) analyzed a model with n identical retailers selling to two types of consumers, loyals, and switchers. Consumers hold different levels of inventory, depending upon the time since their last purchase. Analyzing this model for subgame perfect Markov equilibria (in which decisions depend only on inventories) results in the finding that retailer prices should have negative serial correlation. This is because consumer stockpiling when prices are low depresses demand in the following period. More generally, stockpiling was found to increase retail price competition. Guo and Villas-Boas (2007) investigated stockpiling with a two-period model in which differentiated retailers face consumers with varying preferences for given products. Importantly, both retailers and consumers are strategic in this model. Consumers with relatively high preferences for a product are more likely to stockpile, thereby taking themselves out of the market in the second period. In equilibrium, Guo and Villas-Boas found that retailers should not discount deeply in order to avoid consumer stockpiling. This finding is consistent with Hong et al. (2002) in that consumer product storability, and with it the opportunity for consumers to stockpile, increases retail price competition. On the other hand, retailers have an incentive not to offer price discounts sufficient to trigger widespread stockpiling. The net effect should be to reduce the depth of promotional price discounts. More recently, Hendel and Nevo (2013) tested a similar inventory-theoretic model empirically using storable consumer goods.7 They found that consumers who store more product are more price sensitive. Further, they calculated that retail price promotions, as implemented, enable retailers to recover a substantial proportion of the possible gains from price discrimination.

2.5 Habit formation: Buying on promotion Once consumers are habituated to buying on price promotion and shopping for weekly deals, removing price promotions can be extremely difficult for both manufacturers and retailers. Models of habit formation can explain such behavior. Rozen (2010) formalized a model of habit formation and distinguished between habits that persist and those that are responsive. If consumer preferences are responsive, then there exists a compensating stream of utility such that a consumer can be weaned from his/her habits. Standard economic models assume that consumers respond to price promotions because of monetary savings, hence monetary benefit is the source of their habit. Yet

7 Hendel and Nevo (2013) assumed product storability, but further assumed that product can be stored for only a fixed period.

511

512

CHAPTER 9 How price promotions work: A review

several puzzles or counter-examples from the marketing literature suggest that monetary savings alone cannot explain consumer behavior (e.g., Dhar and Hoch, 1996; Hoch et al., 1994; Inman et al., 1990; Schindler, 1992). Chandon et al. (2000) developed a framework to reconcile much of this work. They proposed that consumers broadly enjoy three types of hedonic benefits and three types of utilitarian benefits from price promotions. A model with forward-looking consumers who are sufficiently patient and have price expectations can be consistent with the habit of “buying on deal.” Consumers may rationally form price expectations that “deals are expected to happen” when past price promotions are offered in the market. If a consumer is sufficiently patient, then that consumer will find it optimal to wait and buy only when a price is lower than some threshold. Models of habit formation with price expectations typically require that price promotions have been offered in the past (and become a habit) or that consumers believe they will be offered in the future (and are expected). An equilibrium that arises from these models might be characterized as a prisoner’s dilemma. If a firm had never offered a price promotion, then the habit may not have been formed and there would be no expectation of a future deal. Once consumers start buying on deal, it can be very difficult or costly to change their behavior and eliminate price promotions. In other words, the compensating stream of utility required to change consumer habits can be very large. The experiences of JCPenney (JCP) illustrate this problem (Mourdoukoutas, 2017). CEO Ron Johnson discovered that JCP was offering more than 365 promotions each year, which he believed was excessive. To address this issue, JCP dropped this promotional approach and offered lower regular prices every day instead. The drastic reduction in price promotions had a devastating effect on the business, as many consumers stopped shopping at JCP because they had become habituated to buying on deal. Johnson’s tenure as CEO ended after seventeen months and Mike Ullman, who succeeded him, quickly reinstated price promotions to lure customers back to JCP.

2.6 Retail market power Another view of price promotions is that they are the result of retail market power. For example, if manufacturers cannot profitably sell direct to consumers, then retailers may hold some degree of market power due to their ability to efficiently reach those consumers. Under this view, trade funds (i.e., payments from manufacturers to retailer) can be viewed as a financial transfer from a manufacturer to a retailer to obtain access to the retailer’s scarce resources—in other words, its customers. While this view is somewhat pessimistic, it is not without merit. Most retail markets are oligopolies and retailers have some degree of market power due to their geographic location. Further, conditional on a consumer visiting a store, a retailer has even more market power due to its ability to control prominent in-store locations, such as end-of-aisle displays. Price promotions that generate huge increases in vol-

3 The practice of price promotion

ume are typically located in these prominent in-store locations. Under this view, price promotions can be viewed as part of an auction or negotiation by the retailer to sell its most valuable, scarce assets each week. While this view is plausible, we are not aware of any papers that explicitly take this perspective.

2.7 Discussion In sum, there are numerous theories of price promotion and, to the best of our knowledge, no single theory explains all of the empirical observations that we will discuss later in this chapter. While we do not propose a specific theory in this chapter, we would argue that a complete explanation must account for these major factors: a. Vertical Channel: manufacturers and retailers influence price promotions via a planned, negotiated process. b. Price Discrimination: manufacturers and retailers have incentives to price discriminate due to demand heterogeneity. c. Competition: manufacturers and retailers operate in oligopolistic markets. d. Consumer Behavior: purchase acceleration and stockpiling by consumers, as well as habit formation. To date, no single model that we know of incorporates all these factors. Perhaps the one common theme among these models is that they have considered many different rationales grounded in price discrimination. In this sense, the literature is quite long on a set of theoretical possibilities. But many of the early models abstract away from the vertical channel and consider only a single firm. Numerous models also consider monopoly markets, ignoring competition entirely or focusing on competition at either the manufacturer or retailer level. As we will document later in this chapter, the nature of vertical contracts and price promotion planning among retailers and manufacturers has also been largely ignored in theoretical models. Finally, while habit formation is a well-understood concept, we have few, if any, theoretical models that seriously tackle the implications of habit formation in the context of price promotions.

3 The practice of price promotion As noted in the opening paragraph of this chapter, it is important for economists to understand whether wholesale and retail prices conform to what Robert Hall referred to as the Keynesian sticky price paradigm of “call options with unlimited quantities” (Klenow and Malin, 2010). We believe that this interpretation is unlikely due to the process by which promotional prices are set in most markets. The goal of this section is to describe the process by which regular and promoted prices are determined in the packaged goods industry but we believe the key properties of (i) coordination and (ii) planning among retailers and manufacturers generalize to many other markets.

513

514

CHAPTER 9 How price promotions work: A review

3.1 Overview of trade promotion process In practice, price promotions are co-created by manufacturers and retailers. Manufacturers offer trade promotions (e.g., trade funds) to retailers as an inducement to offer, in turn, retail promotions (in particular, price promotions) to consumers. In this section, we provide an overview of the trade promotion process from the perspectives of both manufacturer and retailer. Trade promotions are incentives that manufacturers offer retailers in return for marketing, merchandising, selling activities, and scarce resources (such as shelf space, display space, and advertising pages) that differentially benefit the manufacturer’s products and brands. A broader interpretation also includes incentives for retailers to transport and warehouse products and pay for them in ways that improve manufacturer efficiency. Under the umbrella of trade promotions, manufacturers also offer outcome-based incentives; i.e., payments for selling the manufacturer’s products through to consumers. Incentives for these activities take various forms, which we describe in more detail later in this chapter. The money that manufacturers allocate to these incentives is known as trade spend. While manufacturers have long been concerned about “bang for the buck” from trade spend8 and profitability of individual trade promotions,9 these concerns have not been reflected in reduced manufacturer profitability.10 Manufacturers’ trade spend is allocated to retailers using a two-step process (Gómez et al., 2007). First, a fixed amount of trade spend is budgeted for each retailer. For large manufacturers, these budgets are typically based on accruals (which we explain later) but budgets can also be lump-sum amounts. Trade spend budgets determine how much the manufacturer may spend in incentives for each retailer over the course of a year (in some cases, a quarter or a month). Second, throughout the budgeted period, money is allocated to specific incentive payments and performance requirements for individual promotional events, each negotiated between manufacturer and retailer. When the money allocated to a promotional event is spent, the remaining trade spend budget is reduced. Manufacturers often determine the amount of trade spend allocated to a retail account based on a percentage of the revenues generated from that retailer. This process is known as accrual of promotional funds, and the percentage is known as the accrual rate or trade rate. The budgeted trade spend for the current year is often based on accrual from revenues during the previous year. The rationale for setting budgets in this manner is to align incentives. If a retailer supports a manufacturer’s product line and generates more revenue, there is an increase in next year’s trade spend for that retailer. In contrast, if a retailer does not support a brand then there may be a subsequent reduction in trade spend. In theory, this process aligns the incentives of the retailer and manufacturer. It is worth noting that trade spend budgets are generally set by the manufacturer and not negotiated with the retailer. This was confirmed by interviews with a sample 8 See generally Besanko et al. (2005); Srinivasan et al. (2004); Tyagi (1999). 9 Cf. Dreze and Bell (2003). 10 Messinger and Narasimhan (1995); Ailawadi et al. (1995).

3 The practice of price promotion

of buyers from fifteen supermarket companies that indicated, in general, budget decisions are not jointly determined by manufacturer and retailer. In fact, ten of the buyers explained “that the manufacturer determines the budget first and then negotiates with the retailer on its allocation.” (Gómez et al., 2007, p. 412). The allocation process often occurs via annual joint business planning (JBP). Nearly every manufacturer has a major annual meeting with large retail accounts to plan 12 months of trade and marketing spend. There are often quarterly reviews of these annual plans to make adjustments during the year. As we will document later in this chapter, promotions are planned months in advance of their execution and for academics, this is an important institutional fact. In sum, the process we describe suggests that price promotions may be modeled as a sequential game where manufacturers first determine the budget and then negotiate with a retailer on how to spend the budget over the course of a year.

3.2 Empirical example of trade rates While trade rates are common in practice, there is a lack of empirical information regarding trade rates in the academic literature. Manufacturers make offers to retailers and the terms of these trade deals vary across time, retail account, and product. To our knowledge, trade rates have not been studied in previous academic papers. Trade rate data is difficult to obtain and hence we are unable to provide a broad set of empirical generalizations about trade rates. But, we hope that our unique example illustrates that trade rates exist and vary among retail chains, products, and time. Our data is from an anonymous, mid-size manufacturer that sells two different brands of cheese. Each observation in the data is a promotion event that has a start and end date for a specific retail account and a specific product. For each promotion event, there is a trade rate that indicates how the retailer accrues trade funds. For example, if the trade rate is 7% then $100 in purchases by the retailer yields $7 in trade funds. In Fig. 2, panel A, we plot a histogram of the raw data for product 1 and we see that the average trade rate is 12%, which is again consistent with the Acosta report. Notably, there is substantial variation and the trade rate ranges from near zero to more than twenty percent. What is important for academic researchers to notice is that the trade rate varies by both retail chain and promotion event – it is not a constant percentage. In Fig. 2, panel B, we plot the average trade rate by retail chain, which collapses the temporal dimension of the data, and observe a bimodal distribution. While somewhat speculative, this is consistent with some chains executing a hi-lo strategy that is funded by a trade rate of 14.5%. For comparison, in Fig. 3 we also plot the same information for another product offered by the same manufacturer (product 2). Here we see much lower trade rates (2.2%) and no longer see a bi-modal distribution (see Fig. 3, panel B). Empirically, we observe that product 2 is not as reliant on trade funding as product 1. We look at two facts to explain this difference in reliance on trade promotions between products 1 and 2. First, while product 1 has a larger trade rate, the number

515

516

CHAPTER 9 How price promotions work: A review

FIGURE 2 Panel A shows a histogram of all trade promotion events for product 1. Panel B shows a histogram of the average trade rate by retail chain.

FIGURE 3 Panel A shows a histogram of all trade promotion events for product 2. Panel B shows a histogram of the average trade rate by retail chain.

of promotion events is 4,000 versus nearly 25,000 for product 2. Thus, product 2 has higher frequency of events but a lower funding rate for each event. Second, an online search reveals that product 2 is heavily advertised while product 1 has virtually no advertising. Hence, it appears that marketing dollars are directed towards the trade for product 1 and towards consumer advertising for product 2, consistent with Allender and Richards (2012). Another challenge with point-of-sale (POS) data is that researchers do not directly observe the duration of a promotion event. In Fig. 4, we plot a histogram of the promotion duration, which is recorded in our data. The duration of trade promotion events are roughly equivalent for both products 1 and 2. 88% of promotion events are less than 30 days in duration more than 96% are less than 60 days. The modal number of days is 12 for both products, and the mean durations are 16.9 and 18.47 days for products 1 and 2, respectively. Keep in mind that this characterizes the supply side of

3 The practice of price promotion

FIGURE 4 Panel A shows a histogram of the duration in days of each promotion event for product 1. Panel B shows the same information for product 2.

FIGURE 5 Panel A shows a histogram of the number of promotion events each month for product 1. Panel B shows the average trade rate each month for product 1.

the market—the trade deal (i.e., trade funding) is available for an average of 16-18 days. The data do not describe the price promotion offered by a retailer as a result of this trade deal. In Fig. 5, we look at the number of promotion events each month and the average trade rate by month for product 1; Fig. 6 contains similar information for product 2. Panel A of Fig. 5 shows that promotions are concentrated in the first six months of the year and there are almost no promotions from September through December, which suggests that product 1 is seasonal. Panel B of Fig. 5 shows that the trade rate is also not constant over time and peaks in April, May, and January. For product 2 (Fig. 6), there are promotions in every month with a spike in promotions during January. The average trade rate is relatively constant across months and varies between 1.9% and 2.6%.

517

518

CHAPTER 9 How price promotions work: A review

FIGURE 6 Panel A shows a histogram of the number of promotion events each month for product 2. Panel B shows the average trade rate each month for product 2.

Again, our goal for this analysis is to briefly illustrate the variability of trade rates across retailers, products, and time. For academic researchers, it is important to be aware of the existence of trade rates as a funding mechanism for trade promotion budgets. Empirically, there is a need for a deeper analysis of this type of data that could ultimately link promotion offers from manufacturers to their realization at the point of sale.

3.3 Forms of trade spend Trade rates typically lead to the creation of a manufacturer promotion budget for a retail account. One might think of this as the creation of a pool of dollars that can be spent on retail marketing activities. These manufacturer dollars are then allocated, or transferred, to retailers in different ways. This is often referred to as the form of trade spend. These transfers can vary in at least three important ways: (1) retailer performance requirements, (2) which retailer costs are defrayed, and (3) the form and timing of payment. We discuss each form of trade spend in more detail below. Note that the discussion below includes forms of trade spend that are not trade promotions per se, but rather incentives to make the manufacturers’ transportation, warehousing, and cash flow more efficient.

3.3.1 Off-invoice allowances Off-invoice allowances are discounts from the list price, either a percentage off list price or a fixed amount off per case or unit. While it is difficult to determine when off-invoice allowances originated, evidence points to the Nixon administration’s implementation of a retail price freeze to stem inflation in August 1971 (Acosta, 2012b). To create financial flexibility in advance of the impending price freeze, many manufacturers raised wholesale prices, then immediately offered off-invoice allowances to retailers to offset part or all of those price increases. Perhaps unknowingly, this led

3 The practice of price promotion

the CPG industry down a path of what ultimately became complex financial transfers between manufacturers and retailers. One characteristic of off-invoice allowances is that the manufacturer does not actually pay them—the price is reduced before the retailer is ever invoiced (Bell and Dreze, 2002). A second characteristic of off-invoice allowances is that they do not depend on the retailer’s sales performance. This raises obvious moral hazard problems where retailers can accept the off-invoice allowances but may not lower the price to increase sales. A third characteristic is that the retailer is only required to buy the product to get the off-invoice allowance. When off-invoice allowances are offered, some retailers buy larger quantities than they expect to sell and either warehouse the product to sell after the promotion at regular retail prices (“forward buying”) or resell the excess product to other retailers that did not have access to the off-invoice allowances (“diverting”).

3.3.2 Bill backs Bill backs are payments made to retailers based on the number of units they purchase from the manufacturer during a specified period.11 One primary difference between an off-invoice allowance and a bill back is the timing of the payment—off-invoice allowances are deducted from the invoice before it is paid; bill backs are rebated to the retailer after the specified period (Blattberg and Neslin, 1990). Bill backs are often used by small manufacturers who may rely on wholesalers and brokers. The bill back documents that the retailer purchased a manufacturer’s product and so enables a direct financial transfer from the manufacturer to the retailer. This financial transfer can bypass intermediaries like wholesalers, which avoids double marginalization in the channel.

3.3.3 Scan backs In scan back promotions, the manufacturer typically pays a fixed dollar amount for each unit that the retailer sells during a specified promotion period, not for products that the retailer purchases as with bill backs and off-invoice allowances. And, like bill backs, scan back payments are made after the retailer has purchased and paid for the manufacturer’s products (Dreze and Bell, 2003). The retailer documents its sales by sending scanner data to the manufacturer, which then pays the retailer based on the number of units sold during the promotion period.12 Much of trade spend that was previously been offered as off-invoice allowances is now offered as scan backs, which enables manufacturers to limit forward buying and diverting (Bell and Dreze, 2002). Scan backs also allow manufacturers to monitor retailer performance for their trade spend dollars by observing the prices at which their products are sold.

11 The term “bill back” may be used for other rebates, including payments made for retailer performance. 12 In some cases, scan back payments are not made to the retailer but rather deducted from future invoices.

519

520

CHAPTER 9 How price promotions work: A review

3.3.4 Advertising and display allowances The greatest differential benefit that retailers provide to manufacturers, in addition to retail price discounts, is by advertising or displaying the manufacturers’ products (Waller et al., 2010; Blattberg and Neslin, 1990). As a result, manufacturers offer allowances for retailers to allocate their limited advertising pages or display space on end-of-aisle, free-standing or other in-store displays to the manufacturers’ products (Blattberg and Neslin, 1990). Advertising allowances, sometimes called co-op funds, often take the form of lump sum payments, although they can also be funded via other mechanisms like scan backs. Ad allowances are used to defray the costs of creating, printing, and distributing retailers’ feature advertisements. Retailers often advertise families of products rather than individual items, typically by manufacturer by pack type. Allocating a lump sum allowance for advertising a family of products to each individual product in that family is typically not done in practice. Display allowances, which are paid for temporary placement in-store in locations such as end-of-aisle, can also take the form of lump sum payments or scan backs. Scan backs are sometimes offered for a combination of retailer price discounts, advertising, and display.

3.3.5 Markdown funds Markdown funds are offered to mitigate retailers’ costs to mark products down at the end of a season (or for discontinued items). Demand for many products is highly seasonal. Retailers mark down the prices of such products at the end of the high demand season in order to avoid carrying unsold inventory into low demand seasons. Markdown funds encourage retailers to buy seasonal products in larger quantities, because the risk of overstocking is shared by the manufacturer. In CPG, markdown funds may be negotiated after there is joint realization that a product did not sell. In the apparel industry, markdown funds are often negotiated at the time an order is placed. For example, an apparel item may have a suggested retail price of $99 and wholesale price of $50. If the retailer discounts the retail price to $79, the manufacturer may make $10 of markdown funds available, maintaining the retailer’s gross margin at 50%. In apparel, the schedule of markdown dollars and prices is often known when an order is placed. This practice raises agency issues and has led to fraud. For example, in 2007, Saks Incorporated (which owns the retail chain Saks Fifth Avenue) faced a lawsuit from the Securities Exchange Commission for fraudulently claiming vendor allowances and illegally accounting for markdown funds from vendors, a practice referred to as the rolling of markdowns. The New York Times (Barbaro, 2007) reported that “At Saks, according to the S.E.C. documents, buyers routinely misled suppliers by overstating the number of products sold at a deep discount to collect greater payments.” The improper treatment of markdown dollars was alleged to have overstated annual net income by six to nine million dollars per year. Ultimately, Saks settled the lawsuit, fired several senior employees, and changed internal processes to address the issue.

3 The practice of price promotion

3.3.6 Bracket pricing, or volume discounts Bracket pricing is an incentive for retailers to purchase in volume. Truckload purchases generally earn retailers the highest bracket discounts. By purchasing and taking delivery of larger quantities, however, retailers incur additional inventory holding costs. Thus, bracket pricing effectively shifts inventory holding costs from the manufacturer to the retailer. Note that the bracket price is reflected on the invoice as a discount from the list price.

3.3.7 Payment terms Also known as prompt payment discounts, payment terms are manufacturer incentives for retailers to pay their invoices quickly. Payment terms typically give a deadline for retailers to earn a discount from the invoiced amount as well as a date by which the entire invoiced amount must be paid. For example, payment terms of “2/10 net 30” offer a 2% discount from the invoice price if payment is made within ten days; if not, the entire invoiced amount is due in 30 days.

3.3.8 Unsaleables allowance This allowance, alternatively known as waste, spoils, or swell allowance, is offered by manufacturers to pay for products that are delivered in an unsaleable condition. Unsaleables can be contentious because there may be uncertainty about how the product became unsaleable and whether the manufacturer, retailer, or another party is responsible. Processing unsaleables and determining responsibility can add administrative costs. To avoid such costs, manufacturers often simply offer a percentage of the invoice price to the retailer to offset the cost of unsaleable products.

3.3.9 Efficiency programs Separate from bracket pricing, many manufacturers offer incentive programs for retailers to increase the manufacturer’s supply chain efficiency. Criteria to receive efficiency funds include large order sizes (full truckloads and pallets) and low order cancellation rates. By taking delivery in larger quantities, the retailer again incurs additional inventory holding costs.

3.3.10 Slotting allowances Slotting allowances are one-time payments that manufacturers make to retailers in return for putting a new product in distribution. “Although these fees [allowances] help to defray the costs of adding (and deleting) an item from the system, they also cover the retailer’s opportunity costs for allocating shelf space to one item over another” (Dreze et al., 1994).

3.3.11 Rack share Rack share is effectively a rental payment to the retailer for shelf space on the racks in the checkout area at the front of a store. The payment is typically charged per linear inch of rack space, although it may be a fixed fee instead.

521

522

CHAPTER 9 How price promotions work: A review

3.3.12 Price protection When manufacturers raise their prices, they typically offer their retail customers funds to cover the difference between old and new prices for a period of time corresponding to merchandising commitments made before the price increase was announced—e.g., eight or twelve weeks. Price protection is paid to retailers in one lump sum after the protection period or at the end of shorter windows during the protection period.

3.4 Some implications of trade promotions Nearly every type of trade promotion is offered with an expectation of retailer performance. Scan backs and bill backs are offered for short-term promotions; markdown allowances are used to run through excess inventory of seasonal items; price protection extends pricing and merchandising agreements made before a manufacturer price increase. And even though most types of trade promotion lower the retailer’s cost of goods, the retailer often incurs a cost to implement retail promotions. Such costs include changing prices at the shelf and in point-of-sale systems, putting up and then taking down shelf tags, building in-store displays, laying out feature ads, etc. In a typical week, the vast majority of prices in a retail store are regular shelf prices, not promotional prices (McShane et al., 2016). Retailers generally set regular shelf prices to meet gross margin targets, consumer expectations, or competitor prices. This varies by retailer, and sometimes by category and by the perceived importance of an item to the retailer’s price image. Categories that are more important in store choice decisions are called destination categories (Briesch et al., 2013); individual items perceived to be more important in store choice decisions are often called key value items. By definition, gross margin targets for retailer prices are based on manufacturers’ gross, or list, prices—not on manufacturers’ net prices. In fact, most retailers never calculate manufacturers’ net prices. Thus, retailers’ regular shelf prices, the vast majority of prices in their stores, are not influenced at all by manufacturer trade spend. Bracket pricing, payment terms, unsaleables allowances, efficiency programs, slotting allowances, and rack share are all similarly ignored for the purposes of retail shelf prices or regular prices. The existence of an annual manufacturer trade budget, which funds negotiated price promotions, casts doubt on the notion that price promotions and trade deals represent “call options with unlimited quantities.” Quantities are clearly bounded by the trade budget. While the implied constraint of a trade budget is rarely captured in empirical studies, it does exist and our discussions with managers illustrate its relevance.

3.5 Trade promotion trends A study by the Point of Purchase Advertising Institute famously concluded that “PO-P is significant as the ‘last three feet’ of a brand’s marketing campaign, and serves as the ‘closer’ for in-store purchasing decisions as well as an influencer for impulse

3 The practice of price promotion

purchases” (Consumer Buying Habits Study, 1995, p. 3). Because the retailer has exclusive control of the ‘last three feet,’ it is not surprising that CPG manufacturers have reconfigured their marketing budgets to provide more incentives for retailers to perform activities at point-of-purchase that benefit their brands. In fact, many CPG manufacturers shifted their marketing budgets away from media advertising and toward trade promotions. The allocation of trade spend between different types of promotions has changed as well. Two decades ago, the vast majority of trade spend was allocated to off-invoice allowances. Industry experts reported that, for CPG manufacturers, off invoice allowances fell from 90% of all trade spend in the mid-1990s to about 35% by 2003 (Gómez et al., 2007). This drop in off-invoice allowances as a percentage of trade spend was driven by manufacturers’ desire to prevent forward buying and diverting (Bell and Dreze, 2002), and so that manufacturers could monitor retailer performance. A recent study by Boston Consulting Group (2012) highlighted that trade spend growth exceeded revenue growth between 2008 and 2010. Given the emergence of online shopping, one may speculate as to whether this trend will continue. As of the writing of this chapter, we see no signs that trade spending is declining with the growth of online shopping.

3.6 Planning and tracking: Trade promotion management systems In the academic literature, we often conceptualize the financial transfer from a manufacturer to a retailer as simple wholesale discount. In practice, the financial transfers between manufacturers and retailers are much more complex. In the previous sections, we have documented how trade rates vary across retailers and products. Due to this complexity, manufacturers typically need tools to assist with promotion planning, tracking financial flows, and evaluating promotion performance. For many years, this information was often planned manually, tracked in a spreadsheet, and often not evaluated in terms of ROI. Today, nearly all large manufacturers use trade promotion management (TPM) systems to assist with these tasks, though the adoption rate among small manufacturers is much lower. Gartner (2015) conducted a comprehensive review of all TPM vendors, which includes large IT providers like Accenture, SAP, and Oracle. The Gartner report notes that TPM systems should provide five broad functions for manufacturers: (1) promotion planning and budgeting at various levels, (2) abbreviated P&L statements prior to the promotion, (3) promotion execution and encumbering funds, including accruals, (4) settlement of funds with retailers, wholesalers, brokers, and (5) post promotion analysis. A typical TPM system has a web-based dashboard that allows managers to visualize planned promotions for the year as well as track planned vs. actual promotion volume by retail account, product, and time. TPM systems also help managers keep track of planned vs. actual trade budgets and provide financial summaries at different levels of granularity.

523

524

CHAPTER 9 How price promotions work: A review

FIGURE 7 Example of a deal sheet from a TPM system.

The exact details of each promotion event are stored in the TPM system as what one manager we spoke with referred to as a “deal sheet.” We were provided with numerous examples of deal sheets and, in Fig. 7, we provide a mock-up of one example. While the details are disguised, it is intended to provide academics with a better understanding of how promotions are planned. At the top of Fig. 7, we see the start and end date of the promotion event, which is December 1 to December 31. The communication between the manufacturer and retailer is stored in the TPM system. We see that the deal is created by copying a previous deal (from the same time period in the previous year), and interviews with practitioners confirmed that this is a very common practice—the promotion from a previous year is very likely to be run in a subsequent year. For academics, this suggests state dependence among promotion events. Next we see that the final negotiation process consists of a single text message. In this case, the manufacturer and retailer had planned for a specific scan back promo-

3 The practice of price promotion

tion as part of joint business planning or the annual promotion process. In August, four months before the promotion, there is a last-minute adjustment in the magnitude of the scan back. Notice that the negotiation is in increments of ten cents, which a manager we spoke with referred to as “dimes.” Managers often anticipate that retail price changes will be in increments of dimes and may negotiate in multiples of dimes. For academics, this suggests that retail price adjustments are lumpy (e.g., dimes) rather than continuous (e.g., pennies). While there is consistency between this empirical fact and the menu cost literature (Dotsey et al., 1999; Golosov and Lucas, 2007), we are not aware of any menu costs associated with this practice. Instead, the practice of pricing in dimes appears to be grounded in managerial norms or habits. The TPM system provides detailed financial information about each of the three UPCs that are part of the promotion. The deal sheet contains information about the retail regular price and the gross margin at that price, which is 38%. Notice that one of the UPCs has a slightly lower regular price, $3.99, than the other two UPCs, $4.19. During the promotion, however, the manufacturer would like the retailer to lower the price to $2.99 for all three UPCs. Offering the same price on all UPCs, called line pricing, is a common practice. To achieve this discount of between $1.00 and $1.20, the retailer accepts a scan back of between $0.57 and $0.68. The scan back keeps the retailers percentage gross margin at roughly the same level with and without the promotion (38% vs. 36%). Presumably, the retailer is willing to accept a lower dollar margin and make this up via increased volume. For this deal, the pass through rate (price change/cost change) is significantly greater than 100%. As we prepared this chapter, a manager shared six other scan back deal sheets with us that included sixty UPCs. The pass through rate on all sixty UPCs exceeded 100%. Among these examples, we found no evidence that the gross margin percentage increased during a promotion, which must occur if pass through is less than 100%. McShane et al. (2016) showed that managers pay close attention to gross margin percentage when adjusting the regular price and that there may a similar emphasis on gross margin percentage when determining the promoted price. Our discussion thus far has focused on TPM systems. Many vendors also offer trade promotion optimization (TPO) systems, which focus on using data to design promotions. In 2015, Gartner reported that TPO systems were not as widely adopted as TPM systems but were growing in popularity. The underlying tools and methods used in TPO systems are often grounded in demand models which have their origins in academic research. TPO is an area where academics have a considerable amount of expertise to share with practitioners. For example, a TPO system that uses historical data will likely suffer from endogeneity concerns, a well-known issue in the academic literature. Improving the quality of data via A/B testing, quasi-experimental methods, or valid instrumental variables are approaches that academics have used to overcome these challenges. TPO systems represent an opportunity for academic research to advance managerial practice.

525

526

CHAPTER 9 How price promotions work: A review

4 Empirical literature on price promotions 4.1 Empirical research – an update In 1995, Blattberg, Briesch, and Fox (hereafter BBF) compiled an extensive set of empirical generalizations, uncertain results, and insufficiently researched topics related to promotions. In this section, we update their results with more recent findings, with special emphasis where previous research was limited or inconclusive.

4.1.1 Promotional pass-through Promotional pass-through captures the proportion of the manufacturer’s trade spend that is passed through to consumers in the form of retail price reductions.13 BBF found that, in general, retailer pass-through was less than 100%. Yet recent studies suggest that this generalization requires refinement. Notably, Ailawadi and Harlam (2009) found that “. . . the retailer actually spends a little more on price promotions across all products than the total funding it receives from all manufacturers.” Moorthy (2005) developed a theoretical model of price promotion pass through that accounts for retail concerns of category management and store competition. In theory, a price promotion on one brand can affect the price of all other brands in the category, which has been referred to as cross-brand pass-through. Empirically, there is a lack of consensus as to whether cross-brand pass through exists (McAlister, 2007; Dubé and Gupta, 2008). Besanko et al. (2005) evaluated pass-through rates for 78 brands across 11 categories, finding that pass-through was systematically higher for high-share brands than for low-share brands.14 Pauwels (2007) evaluated pass-through rates for 75 brands across 25 categories, finding an average pass-through rate of 65% though it was higher for larger brands and more expensive categories. Nijs et al. (2010) analyzed product shipments in a single category to more than 1,000 retailers, and found the mean pass-through rate to be 69% for retailers that purchase directly from manufacturers, but 106% for retailers that purchase from wholesalers. Trade promotion spend, and hence pass-through rates, can be difficult to calculate for specific products. Promotional incentives sometimes apply to multiple products; for example, a single lump-sum allowance may be paid to advertise or display all flavors of a particular brand-size of yogurt. While previous analyses included only the incentives reflected in the manufacturer’s price, Ailawadi and Harlam (2009) incorporated the full array of manufacturer incentive payments. In aggregate, they found that pass-though rates exceeded 100%, varying by department from 65% to more than 200%. Their findings imply that failing to incorporate all manufacturer incentives may substantially underestimate pass-through. 13 Pass-through is the ratio of manufacturer trade spend for a given promotion to the total retail price

discount (= average discount per unit × number of units sold). Similarly, pass-through has also been calculated as the change in retail price for a given change in manufacturer (or wholesaler) price. 14 McAlister (2007) noted the consequences of this paper incorporating variation across different price zones.

4 Empirical literature on price promotions

Ailawadi and Harlam found a great deal of variation in pass-through rates across manufacturers. Using data from a major US retailer over two years, they found the median pass-through rate to be 20%, but aggregate pass-through exceeded 100% and 14% of manufacturers offered trade promotion rates exceeding 250%. Interestingly they found that 34% of manufacturers offered no trade promotions, though retailers sometimes contributed funds to the promotion of these manufacturers’ products. Nijs et al. (2010) focused on a single category at various manufacturers, wholesalers, and retailers around the US. They found the mean pass-through rate was higher for wholesalers, 71%, than for retailers, 59%. Meza and Sudhir (2006) evaluated variation in pass-through over time using a structural model. Not surprisingly, they found that pass-through was higher for lossleader products compared to other regular products. More interesting, they found this difference to be larger during periods of high demand. They concluded that retailers augment manufacturers’ trade spend for loss-leader products during periods of high demand, compared to regular products and periods of lower demand. In a study of automobile purchasing, Busse et al. (2006) assessed the simultaneous use of trade and consumer promotions. They found pass-through rates for trade promotions (dealer discounts) to be only 30-40%, less than half the pass-through rates of consumer promotions (customer rebates). Our field would benefit from further studies of the simultaneous use of consumer and trade promotions by different types of retailers.

4.1.2 Long-term effects of promotion Do price promotions decrease brand differentiation? Do they increase consumers’ price sensitivity? In their review, BBF found conflicting evidence, concluding that the “jury is still out” (p. G127). However, they determined that promotional sales decrease with the frequency of price promotions. Although the objectives of price promotions are short-term, the growth of trade promotion budgets has led to an increased interest in long-term effects. Mela et al. (1997) analyzed 8¼ years of panel data to determine how promotions affect consumer response over time. They found that price promotions make consumers more price sensitive in the medium- and long-term. This increase in price sensitivity is much greater for non-loyal customers than for loyal customers. In contrast, non-price promotions cause loyal customers to become less price sensitive. Analyzing the same data, Mela et al. (1998) determined that brands had become less differentiated over time as manufacturers shifted dollars from media advertising to trade promotion, which increased retailer price promotion. This loss of differentiation negatively affected premium brands. Based on the same data, Jedidi et al. (1999) conducted an extensive investigation of the long-term effects of price promotions and advertising. In general, they found that price promotions had a negative long-term effect on brand equity, while advertising had a significant positive effect. In the long term, price promotions were also found to make consumers more price sensitive yet less responsive to discounts. These results suggest that price promotions become less effective over time, even though they cause consumers to become more price sensitive.

527

528

CHAPTER 9 How price promotions work: A review

Using a much shorter panel dataset of detergent purchases, Papatla and Krishnamurthi (1996) found that purchasing products that were displayed or feature advertised, in particular if they were also discounted, caused consumers to be more responsive to subsequent promotions. More recently, Sriram and Kalwani (2007) used eight years of data to study competition between two brands of orange juice. They found that the positive short-term effects of trade promotions are partially offset by the negative long-term effects on brand equity. Sriram and Kalwani determined that an optimized sequence of trade promotions would have a net positive long-term effect, yet they found that the retailer they studied spent too much money on trade promotions with adverse long-term consequences. Taken together, these investigations suggest that current trade promotion spending increases consumer price sensitivity and decreases brand loyalty in the long-term. The result is a negative long-term effect on brand differentiation. Fok et al. (2006) assessed both the immediate and dynamic effects of price (promotional and non-promotional) on brand sales. Estimating hierarchical Bayesian error correction models on data for 100 brands across 25 categories, they determined that long-term cumulative effects are significant, though smaller in magnitude than immediate effects. Of relevance here, they found that the cumulative effect of promotional pricing is to increase price sensitivity. This negative long-term effect of price promotion is mitigated somewhat by brand differentiation. In a related study, Ataman et al. (2008) investigated the drivers of success for newly launched brands. Using five years of retail sales and advertising data for 225 new brands, they found that trade promotion investments were less effective drivers of long-term brand performance than other marketing investments (in particular investments in product line and retail distribution). Interestingly, promotional discounting was found to have a negative long-term effect on sales for new brands, as well. Sahni et al. (2017) analyze seventy field experiments from a company that sells tickets to events. They document that price promotions not only influence immediate demand, as expected, but have a spillover effect and influence demand for several weeks after the promotion.

4.1.3 Asymmetric cross-promotional effects In their literature review, BBF observed that cross-promotional effects are asymmetric, with higher quality brands disproportionately impacting lower quality brands. More recent studies show that this generalization is subject to boundary conditions. Bronnenberg and Wathieu (1996) decomposed brand positioning into two orthogonal components, “positioning advantage” and “brand distance.” Using these two components for estimation, their results varied by category. For orange juice, lower quality/lower price brand promotions were more effective than higher quality/higher price brand promotions; for peanut butter, promotional effectiveness results were reversed. They concluded that the prevailing promotional asymmetry result (favoring higher quality/higher price brands) can be offset lower quality/lower price brands enjoy a positioning advantage.

4 Empirical literature on price promotions

Based on a meta-analysis of 1,060 estimated cross-price effects, Sethuraman et al. (1999) found evidence for a “neighborhood” effect. Specifically, they found that a brand’s sales are most affected by promotional discounts of the immediately higherpriced brand, and affected almost as much by discounts of the immediately lowerpriced brand. Similarly, store brand promotions were found to affect sales of national brands priced near the store brand. Conversely, promotions on those low-priced national brands were found to have the largest impact on store brands. Sethuraman et al. measured both absolute cross-price effects and cross-price elasticities, arguing that absolute effects are more relevant for profit maximization. Sethuraman and Srinivasan (2002) compared these same two metrics in a study of cross-effects for brands with differing market shares. Analysis of elasticities led to the conclusion that promotion of higher-share brands has a larger effect on lower-share brands than the reverse, a conclusion consistent with the view of market share as reflective of brand power. Analysis of absolute cross-price effects, their preferred measure, led to the opposite conclusion—that promotion of lower-share brands has a larger effect on higher-share brands than the reverse. This result suggests that higher share brands may not be able to exploit their market power with price promotions. Lemon and Nowlis (2002) used panel data and experimental evidence to assess the effects of synergies between price, feature, and display promotions on brands in different price-quality tiers. They found that high-tier brands benefit more from price, feature, and display promotions than low-tier brands do (consistent with Blattberg and Wisniewski, 1989). However, the benefit that high-tier brands enjoy from price promotions was found to disappear when (1) price promotions are used in combination with feature or display, and (2) in settings where comparison is difficult, such as on end-of-aisle displays.

4.1.4 Decomposition of promotional sales The incremental sales that result from promotions are usually been partitioned into a few categories: brand switching, category expansion, and stockpiling/purchase acceleration. Category expansion represents a true increase in primary category demand; brand switching does not; stockpiling/purchase acceleration may generate some primary demand. BBF reported conflicting evidence about the proportion of promotional sales attributable to brand switching—some studies found brand switching to represent for the majority of promotional sales; other studies did not. Further, they were unable to generalize about incremental sales due to category expansion, or to compare the incremental sales attributable to stockpiling vs. purchase acceleration. Fortunately, a good deal of additional work has been published since that article. Bell et al. (1999) decomposed promotional price elasticities for 173 brands across 13 product categories into choice, incidence, and quantity components. They found that the majority of incremental promotional sales—75% on average—were a result of brand switching (choice), with incidence and quantity responsible for the remainder. Van Heerde et al. (2003) took a different approach to the promotional decomposition, focusing on unit sales of the promoted brand rather than its promotional price elasticity. Using the results of Bell et al., they calculated that the 75%

529

530

CHAPTER 9 How price promotions work: A review

average of promotional price elasticity attributable to brand switching corresponded to a 33% average of unit sales switching to the promoted brand. While this approach did not invalidate previous analyses, it highlighted the limits of interpreting the elasticity composition. Van Heerde et al. (2004) applied a similar unit sales approach to aggregate store data, decomposing promotional sales for brands in four different categories. While the decomposition varied depending on support for the price promotion (feature, display, or feature and display), they found brand switching to be lower than 50% in almost every case. Category expansion was found to be larger than in previous studies, averaging roughly 35%. Leeflang et al. (2008) extended this approach to a multi-category framework in order to quantify the effects of cross-category complementarity and substitution resulting from price promotions. Using data from a Spanish supermarket, they found these effects to be modest, with category complementary usually exceeding substitution (for example, a beer price promotion would probably generate more sales increases in complementary categories such as salty snacks than sales decreases in substitute categories like wine). Within category, incorporating substitution and complementarity led to smaller cross-item effects (22% on average). Interestingly, category expansion (72% on average) was found to be larger than in previous studies. Chan et al. (2008) incorporated consumer heterogeneity (in brand preference and usage rate) in their promotional sales decomposition. Applying a dynamic structural model to canned tuna and paper towel data, they determined that brand switching was not the primary driver of promotional sales. They found that brand loyal shoppers’ primary response to price promotions was to stockpile, while brand switchers did not stockpile at all. Heavy users stockpiled more than light users did; light users’ primary response to price promotions was increasing their consumption rate.

4.1.5 Advertised promotions result in increased store traffic In their review, BBF generalized from the extant literature that advertised promotions increase store traffic. Since their review, panel data has become widely available, enabling household-level studies of store choice. The general picture painted by these studies is that the traffic impact of price promotions is primarily the result of shoppers making additional store visits to purchase promoted products, not switching stores. Bell and Lattin (1998) used data from 1042 households in two geographic markets to investigate the relationship between store format (Everyday Low Price (EDLP) vs. Promotional Pricing (HiLo)) and store choice, controlling for factors such as feature advertised promotions.15 They found that large basket (infrequent) shoppers are less responsive to price promotions than small basket (frequent) shoppers, who can postpone category purchases to take advantage of promotional price variation over time. Note that their feature ad control variable had a strong positive impact on store choice 15 EDLP is an acronym for Everyday Low Price and refers to retailers who tend to who do not typically

offer discounts and maintain a constant price. HiLo refers to retailers who offer periodic, temporary price discounts (low prices) that contrast with regular prices (high prices).

4 Empirical literature on price promotions

probability across basket size segments. Ho et al. (1998) modeled price variability, induced by promotions, as offering consumers “option value” which can be exploited to pay lower prices. Using data from 513 households making 66,694 store visits, they found that shoppers acted like cost minimizers, visiting stores more often and buying smaller quantities in response to price variability. In a study focused on supermarkets, Rhee and Bell (2002) used data from 548 households shopping at five stores to investigate the causes of store switching. They determined that shoppers did not switch from their main store to a secondary store to take advantage of price promotions on a common basket of items. On the other hand, Rhee and Bell noted that shoppers did cherry-pick their secondary stores for price deals. Fox and Hoch (2005) focused on cherry-picking (shopping at two stores on the same day). Using household-level data for 9,562 cherry-picking trips, they found that shoppers bought 25% more price promoted items and 33% more feature advertised items when cherry-picking. Interestingly, shoppers who cherry-picked often bought more price promoted and more feature advertised items than those who did not, even when they did not cherry-pick. Fox and Hoch concluded that the propensity to shop at multiple stores is an individual characteristic, observing that it was correlated with a lower opportunity cost of time. In a more recent study, Breugelmans and Campo (2016) addressed the effects of price promotions on multi-channel grocery retail. They investigated multi-channel shopping behavior using 78 weeks of U.K. panel data with both online and in store shopping trips. They found that price promotions in one channel can negatively effect contemporaneous category purchases in the other channel. These effect was asymmetric, with in-store promotions hurting online purchases more that the converse. Breugelmans and Campo also found that promotional frequency can hurt the effectiveness of future promotions in the other channel.

4.1.6 Trough after the deal Researchers have long expected that, because price promotions cause purchase acceleration and stockpiling, a sales dip, or “trough,” should follow a promotion. BFF observed that store sales data seldom reveal these troughs, which “is surprising and needs to be better understood” (p. G127). Van Heerde et al. (2000) investigated troughs before and after price promotions by estimating three different distributedlag dynamic models. Using two multi-store time series datasets, they found prepromotion and/or post-promotion effects for nearly all brands in the data. The magnitude of these troughs ranges from 4% to 25% of current brand sales, and so can be managerially significant. Interestingly, this appears to be the only empirical investigation to address this topic explicitly.

4.2 Empirical research – newer topics Since BBF reviewed the empirical research on promotions, several new topics have become more prominent in the marketing literature. These topics are discussed below.

531

532

CHAPTER 9 How price promotions work: A review

4.2.1 Price promotions and category-demand Related to the long-term effects of promotions, a few recent studies have used time series models to investigate the effects of price promotions on category demand over time. Nijs et al. (2001) applied such models to data from 560 product categories over a 4-year period in Dutch supermarkets in order to assess the impact of price promotions on category demand. They found category demand to be stationary, either around a constant mean or a trend line. Although they found short-term elasticities to be of customary magnitudes, the effects of promotions almost always dissipated within a 10-week period. In general, perishable categories were found to be more responsive to price promotions. Interestingly, non-price advertising was found to decrease the effectiveness of price promotions. Lim et al. (2005) used similar methods to study heterogeneity in demand due to price promotions across consumer segments. Analysis of 138 weeks of data in four product categories showed that promotional effects lasted longer for light users in perishable product categories and for other-brand loyals. In non-perishable categories, heavy users had a negative adjustment effect, generally reducing the effectiveness of promotions in these categories. Lim et al. found no permanent effects of price promotions in any segment studied, confirming the finding of Nijs et al. In a related study, Ailawadi et al. (2007) identified four stockpiling effects on promotional purchasing: (1) increased consumption rate, (2) purchase acceleration by brand loyals, (3) preemption of future brand switching, and (4) changing subsequent brand purchase probabilities. The later effect could be positive or negative for a brand, depending upon how a brand’s purchase probability is affected. Estimating promotional decomposition models using data for brands in two categories, the authors determined that all four effects contributed meaningfully to the promotional sales via stockpiling. Increased consumption rate, however, was the most important effect. This study offers more nuanced understanding of how the timing and depth of price promotions affect stockpiling. Bell et al. (2002) noted that price promotions can have an endogenous impact on consumption. If a consumer responds to a price promotion and increases their inventory, then this can affect a consumer’s consumption rate. They show that this can then lead to increased frequency of depth of price promotions and more intense price competition.

4.2.2 Cross-category effects and market baskets Related to price promotions is the question of how prices in one category affect sales in other categories. Blattberg and Neslin (1990) concluded that cross-category effects are small, implying that studying these effects should not have a high priority. Nevertheless, more recent work has generated useful insights. Manchanda et al. (1999) partitioned cross-category price effects into category complementarity, coincidence (due to similar purchase cycles), and other household factors. Using data from intentionally complementary category pairs, they found that (1) cross-price and cross-promotion effects are smaller than own-price and own promotion effects, and (2) cross-effects are asymmetric across complementary category

4 Empirical literature on price promotions

pairs, even after controlling for coincidence and other factors. This study therefore supported extant empirical generalizations, eliminating alternative explanations. Erdem and Sun (2002) developed a model to account for marketing spillover effects from marketing activities, including price promotions. They study categories that are connected via an umbrella brand and show that price promotions for a brand in one category affect purchase probabilities in other categories. Song and Chintagunta (2007) proposed a structural multi-category model that allows consumers to purchase in multiple categories, subject to a budget constraint, while also controlling for coincidence and complementarity/substitution. Using data from four categories (including an intentionally complementary category pair), they also found that cross-price effects for brands in different categories are small. Interestingly, they found that these cross-price effects are due to coincidence, rather than category complementarity. These two papers provide support for the generally accepted notion that cross-category promotional and price effects are small, but differ in their findings about the role of category complementarity in cross-category effects. Another more recent topic of empirical research is the shopper’s market basket. The market basket is of particular importance to retailers, who are not only interested in where consumers choose to shop, but also on what they choose to buy at that store. Moreover, the market basket represents a disaggregate look at multi-category purchases. Mulhern and Padgett (1995) used an in-store survey along with purchase data at a home improvement store to determine how promotional purchases affected by purchases at regular price. Noting that purchases in this store are much sparser than in grocery stores, they found that over ¾ of shoppers who visited the store for a promotion also bought items at regular price. Of particular importance, shoppers who visited the store for the promotion spent more on regular priced items than on promoted items. Arora and Henderson (2007) documented spillovers both within category and across category in the context of embedded premium promotions that are linked with social causes. For example, if a consumer purchases a product then there is a donation to charity. In addition to documenting cross-category spillovers, they showed that traditional price promotions are less effective than embedded premium promotions.

4.2.3 Effectiveness of price promotion with display Narasimhan et al. (1996) estimated price promotion elasticities, including those which are advertised, and/or displayed, in order to investigate heterogeneity in promotional response across categories. Based on data from 108 categories, they found that categories that are more responsive to price promotions have the following characteristics: (1) higher penetration, (2) shorter interpurchase times, and (3) are easily stockpiled. Interestingly, they found a negative relationship between the actual price level and the elasticity of promotions, when those promotions are accompanied by display. This relationship implies that displayed promotions are more effective in categories with lower prices than in categories with higher prices. In a related study, Swait and Erdem (2002) evaluated how the temporal consistency of retail promotions affects consumer utility and choice, with implications for

533

534

CHAPTER 9 How price promotions work: A review

brand equity. Based on analysis of panel data for a single category, they found that consistency in feature advertising affects consumer utility and choice positively. In contrast, pricing consistency (i.e., fewer promotions) has a more nuanced effect, with the increased utility of price changes somewhat offset. Taken together, “these results suggest that sales promotion mix. . . inconsistency have an overall negative impact on consumers’ utilities and thus their choices” (p. 318).

4.2.4 Coupon promotions Dhar and Hoch (1996) compared the effectiveness of retailers’ in-store coupon promotions to shelf price discounts. Based on data from five field tests, they found that in-store coupons generated a greater increase in the promoted brand’s sales (35% greater) compared to shelf price discounts. Using the observed in-store coupon redemption rate of 55% together with product cost data, they found that in-store coupons actually generated much greater profit increases (108% greater) compared to shelf price discounts. Sales in the rest of the category were not affected by promotion type. The authors further demonstrated that an in-store coupon with an optimal discount would generate higher category unit sales, dollar profits, and higher passthrough of promotional funds to consumers compared to an optimized shelf price promotion. Anderson and Song (2004) also addressed coupon promotions, but incorporated contemporaneous shelf price reduction as an additional managerial variable. They used data from over 400 coupon promotions (not in-store, but in free-standing inserts) across six CPG categories to test predictions of their economic model about the relationships between coupon face value, coupon redemption rate, and shelf price. They found that lower coupon face values are associated with greater contemporaneous discounts from non-coupon prices and with lower prices. They also found that reducing the retail price and/or increasing the coupon face value increases the efficiency of a coupon promotion. Ramanathan and Dhar (2010) applied regulatory focus theory to determine how promotional cues affect purchases in other, non-promoted brands. Using data from two experiments and a field study, they found that compatibility of regulatory orientation (promotion vs. preventative) induced by a coupon promotion with preexisting orientation influences people to buy not just the promoted brand, but other nonpromoted brands as well. Generalizing from this finding is difficult, but it suggests that retailers should design coupon promotions to be consistent with the regulatory orientation induced by the rest of the retail mix.

4.2.5 Stockpiling and the timing of promotions Pesendorfer (2002) used supermarket data in a single category to investigate the timing of promotions. He found evidence that retailers appear to be timing their price promotions in response to consumer stockpiling. After controlling for retailer competition, the incidence of price promotions is well explained by the accumulation of demand since the most recent promotion.

4 Empirical literature on price promotions

Hendel and Nevo (2006) analyzed an inventory model of the timing of price promotions using supermarket using data from nine supermarkets and about 1,000 households. They found (1) that promotional sales volume increases in the time since the most recent promotion, (2) that the timing of promoted and non-promoted purchases are fundamentally different (consistent with the authors’ inventory theory), and (3) that differences the timing of promotional purchases across categories is consistent with differences in storage costs between those categories. Together, these studies suggest that retailers time their price promotions to accommodate consumer stockpiling.

4.2.6 Search and price promotions Banks and Moorthy (1999) developed a theoretical model of consumer search for price promotions. They assumed that firms first select a regular price and then choose to periodically offer price promotions. Consumers are assumed to have full information about regular prices but need to search to learn about promoted prices. A key finding from this paper is that frequency and depth of promotion increase with consumer search cost. Seiler (2013) developed a model of consumer search that was estimated using consumer panel data for laundry detergent. A core intuition in Seiler’s model is that consumers may have limited information for products that are infrequently purchased and hence may not be aware of a price promotion. For example, if a product is offered on deep discount and a consumer does not purchase (after controlling for consumer inventory), this is an indication that the consumer did not know about the deep discount. Seiler estimated that consumers are unaware of prices on 70% of their shopping trips and that price promotions can be used to increase consumer search.

4.2.7 Targeted price promotions A review article by Grewal et al. (2011) provides an excellent overview of emerging opportunities in price promotion. In particular, they explain how various technologies and databases enable targeted price promotions. For example, mobile technologies now allow firms to customize offers to specific individuals in different locations. The topic of targeting individual consumers was explored theoretically by Shaffer and Zhang (2002) and empirically by Zhang and Krishnamurthi (2004) for an online grocery store. More recently, mobile technology allows targeting to a specific consumer in a specific geographic location. Chen et al. (2017) developed a theoretical model that illustrates how geo-targeted price promotions (or geo-conquesting) can increase firm profits. Fong et al. (2015) conducted one of the first randomized field experiments to investigate the impact of geo-conquesting in practice. Importantly, they showed that deep discounts in competitor’s local market can lead to incremental profits but result in cannibalization in the focal market.

535

536

CHAPTER 9 How price promotions work: A review

4.3 Macroeconomics and price promotions As previously discussed, the macroeconomics literature has focused on understanding which models can explain typical pricing patterns observed for fast moving consumer goods. This includes regular prices, with low frequency price movements, and promoted prices, with high frequency price movements. A conventional view among macroeconomists was confirmed by Kashyap (1995) who analyzed a single product category from one retailer and showed that nominal prices are typically fixed for more than one year. In contrast, Bils and Klenow (2004) showed that price adjustments are quite frequent, roughly every 4 months, which challenged the conventional view that prices are relatively sticky. However, subsequent research by Nakamura and Steinsson (2008) showed that much of this variation was due to price promotions (i.e., temporary sales). They showed that, after excluding price promotions, the average duration of a price is eight to eleven months. They further noted that “some type of sales may be orthogonal to macroeconomic conditions.” Klenow and Malin (2010) provided a comprehensive review of the empirical macroeconomics literature on price setting, which covers several global markets and includes many categories (i.e., beyond just grocery stores). They concluded with ten empirical facts regarding price setting behavior of firms. Among these facts is that price promotions are critical to price flexibility, and that this effect is more pronounced in the United States. The authors also noted that many price changes have memory in the sense that the price reverts to the previous regular price after a price promotion is offered (Nakamura and Steinsson, 2008). This empirical observation is consistent with views held by marketing academics for decades (Blattberg and Neslin, 1990). The resulting “sawtooth” pricing pattern is also consistent with regular prices and promotional prices arising from two distinct processes (McShane et al., 2016; Anderson et al., 2017). Klenow and Malin also observed that, on average, price changes are large; again, consistent with the marketing literature (McShane et al., 2016) and our sample deal sheets. In a recent paper, Anderson et al. (2017) showed that regular prices are largely responsive to changes in demand and supply but that price promotions are not. For macroeconomists, this finding is critical because, while price promotions are common, they do not appear to be responsive to large economic forces. Instead, price promotions are “sticky plans” that are determined well in advance and respond sluggishly to unanticipated economic forces. Via simulation, Anderson et al. (2017) found that “while the use of sale prices to price discriminate is crucially important, varying the extent of price discrimination in response to a cost shock is not.” Recently, Chevalier and Kashyap (2019) showed that frequent price promotions introduce complexities for accurately constructing a price index. One solution is to obtain detailed data on both prices and quantities, such as in IRI or Nielsen scanner data, though this may not be a practical solution for many reasons. The authors propose an elegant alternative that is both intuitive and practical to implement and mimics an approach used in non-food categories such as airline pricing. Their proposed price index uses a weighted average of normally sampled prices for the CPI

5 Getting practical

along with the lowest price in a category of similar goods. The approach is practical and addresses an ongoing concern that price promotions may distort the CPI.

4.4 Promotion profitability Anecdotally, many pundits believe that price promotions are unprofitable investments by retailers and manufacturers. However, measuring the return on investment of promotions requires one to take a stand empirically on how to measure return on investment. For example, one view is to evaluate the optimal price promotion strategy conditional on offering a promotion during a specific period. By conditioning on offering a promotion, say in a given year, this may influence the regular price. A retailer may choose to set a higher regular price in anticipation of periodic price promotions. Under this view, the question is not whether a price promotion is profitable; rather, it is which price promotion strategy is most profitable. An alternative view is to compare the optimal price promotion strategy (conditional on offering a price promotion) with a strategy of not offering price promotions. For example, one might compare an every-day-low-price (EDLP) strategy with the optimal price promotion strategy. Empirically, it can be extremely difficult to determine return-on-investment due to limited variation in prices. To illustrate this problem we note many manufacturers want their brands promoted during peak periods such as 4th of July in the United States. As a result, the counterfactual of “What would prices and demand have been had we not promoted?” is not observed. Inferring what would happen if a price promotion is removed on these peak weeks is difficult to impossible. As a result, it can be extremely difficult to evaluate the return on investment of offering a price promotion on 4th of July. If it is difficult to evaluate a single promotion event due to limited price variation, it is even more difficult to evaluate a price promotion strategy (i.e., a vector of prices) in a competitive, dynamic market.

5 Getting practical While trade spend continues to be an enormous marketing expenditure, incentivizing a variety of retailer activities, the topic does not receive commensurate attention among academics. For example, nearly all MBA programs offer a course on advertising but issues like trade funding are no more than a small part of a typical MBA pricing course. At Kellogg School of Management, for example, students are more likely to learn about trade funds, trade budgets, and price promotions through their job internship or work experience than from the classroom. In reviewing the academic literature for this chapter, we determined that there was far more practical research on price promotions between 1995 and 2013 than in the last five years (i.e., 2013 to 2018). For example, work by Bucklin and Gupta (1999) summarized the ways in which managers used scanner data and found that price promotions was at the top of the list. Cooper et al. (1999) demonstrated the im-

537

538

CHAPTER 9 How price promotions work: A review

plementation of an academic model (PromoCast) to forecast volumes for promotion events. Similarly, Natter et al. (2007) showed how a better approach to pricing and promotion planning led to increased profits and sales for an Austrian retailer. These types of papers are important, as they help bridge the gap between academic and practitioner research. The operations management literature has recently tackled the challenge of optimizing prices for a grocery store (Cohen et al., 2017) and for an online retailer (Ferreira et al., 2016). In both papers, the authors’ core contribution involved solving a complex, high dimensional, constrained optimization problem. The objective function is realistic from the perspective of incorporating managerial constraints in the optimization algorithm. While such research is impressive, the authors did not address a core question raised in empirical studies in marketing and economics— empirical identification. The authors demonstrated that their models fit well in a predictive sense, but did not address the question of whether and how their models could recover causal estimates. We believe that truly making progress on these types of difficult problems requires insights from multiple academic disciplines, including marketing, economics, and operations management, along with practitioner integration. This is clearly a high bar. A further limitation of the price promotions literature is that it is heavily focused on CPG. Price promotions occur in many markets, yet most researchers focus on CPG because the data is available and abundant. This narrow focus limits the impact of academic work on practice. Examples of other product types that have been studied include automobiles (Bruce et al., 2005; Busse et al., 2006; Pauwels et al., 2004; Silva-Risso and Ionova, 2008), magazines (Esteban-Bravo et al., 2005, 2009), pharmaceuticals (Gonul et al., 2001), and produce (Richards, 1999). To help bridge this gap, we conducted interviews with more than twenty managers who would prefer to remain anonymous. We hope that insights from these interviews can help academics identify new, impactful problems that can help advance the theory and practice of price promotions. The main insights and themes from those interviews are summarized below.

5.1 Budgets and trade promotion adjustments There was consensus among the managers we spoke with that best practice in price promotions was typically the result of joint annual planning among retailer, manufacturer, and agency (e.g., digital marketing agency). The expectation for all brands is to have a 52-week plan. Developing this plan can take as long as six months. Thus, planning typically begins in Q3 and is finalized in Q4 with a strategic plan for the next 12 months as output. This plan sets joint expectations for the year and represents what economists now refer to as “sticky plans.” While adjustments in execution can be made during the year, there is typically little change in overall expectations over that period. Blattberg and Neslin (1990) highlighted the importance of long-term promotion planning nearly thirty years ago, but few empirical or theoretical models have tackled

5 Getting practical

the challenge of developing a long-term budget and strategic plan. Managers struggle with this issue and, in lieu of generating a new optimal plan each year, it is common practice to take last year’s plan and make small adjustments. The result is considerable state dependence in price promotions from year to year. Managers indicated that adjustments to the promotion plan were based on a mix of top-down and bottom-up assessments and included questions such as: Did a promotion work as expected nationally? Did a promotion work as expected at a specific chain and region? Based on answers to these types of questions regarding past performance, the promotion plan is adjusted. One difficulty in the planning process is risk management; some retailers act as if promotion planning guarantees performance and carries no risk. Effective planning requires a shared understanding of the promotion plan and its associated risks. One manager we spoke with indicated that some retailers view trade spend as an entitlement (i.e., a guaranteed financial transfer), which they bank on when creating their own financial plans. Because of demand uncertainty, preserving flexibility is critical in trade promotion planning and budgeting. One manager we spoke with explained that, if a manufacturer commits to $10 million dollars in trade funds for a retail account but volume subsequently fails to meet plan (i.e., a “soft” market), then the manufacturer finds itself in trouble. This can lead to a long-term financial spiral in which the manufacturer embarks on cost-cutting to meet financial expectations—both internally and with retailers. Cost-cutting may lead to short term success, but result in long-term negative consequences for brand health. To manage these risks, a manufacturer may “hold back” trade funds. For example, rather than committing to $10 million, the manufacturer may commit to only $8 million (80% of allocated funds). If the market is soft, then the trade spend can be aligned with market volume. If volume forecasts are met, however, then the additional $2 million can be allocated strategically. Such an approach avoids the pre-commitment problem and allows trade funds to be allocated more profitably. Research opportunities: There is very little academic research focused on how to create and manage promotion budgets, long-term planning, and risk sharing. These involve the creation of an annual budget, optimal adjustments as uncertainty is resolved throughout the year, and management of the negotiation process. There is also a lack of agreement among academics on how the budgeting process affects the true economic, marginal costs faced by sellers. While wholesale prices may exist, there is also the practical reality of a budget constraint that is rarely observed by academics but cannot be ignored.

5.2 Retailer vs. manufacturer goals and issues As we have noted, price discrimination is the most common rationale for price promotions offered by academics. It was illuminating for us is that this topic never surfaced in our interviews with managers. That does not mean that price discrimination is not a rationale for price promotions in practice, but managers are clearly not voicing this as a key, strategic goal.

539

540

CHAPTER 9 How price promotions work: A review

Among retailers that we interviewed, motivations for price promotions include driving store traffic and maintaining retail brand awareness. Weekly price promotions offer retailers an opportunity to engage the consumer in store, and to support a low price image to maintain shopper loyalty. Yet one manager told us that, if store operations were not prepared to handle a major price promotion, then the event could extremely disruptive and costly. Successful execution of promotional events is more likely with careful planning. Retail managers expressed a desire to have a unique promotion each week that allowed them to differentiate themselves from a competing retailer. Managers were quick to note that they did not demand exclusivity from a manufacturer on a promotion, but that they did pay close attention to competing promotions at other retailers. In periods of peak seasonal demand, retailers often end up promoting the same items (e.g., turkey and cranberries before Thanksgiving). In all periods, retailers “keep score” by noting whether they had the only product on price promotion and, if not, whether they had the lowest price point for that product. Managers on the manufacturer side that we spoke with indicated that price promotions are part of their overall business planning process. Their broad goal was to hit targets on key metrics such as share, unit volume, margin, and profit. Winning share from competing brands was highlighted as a primary motivation for price promotions. There was also a shared belief that price promotions result in the “big weeks” that enable brand managers to hit their numbers. Academic theories of stockpiling and forward buying resonated with managers. Loading the channel with product was recognized to be inefficient, but was nevertheless viewed favorably from a competitive standpoint as it helped keep other brands off the shelf. In our conversations, we found substantial differences in the relative emphasis on trade spend vs. consumer spend by size of manufacturer and by brand. Among small CPG firms, there was a heavy emphasis on trade spend with little to no consumer spend. This was based on three beliefs. First, managers believe that trade spend are a prerequisite to working with a retailer. Most retailers are unlikely to work with a small CPG firm that offered insufficient trade funds. Second, we found a widely held belief that trade spend is more certain in terms of demand generation than consumer spend (e.g., media advertising). In other words, increased trade promotion budgets are due to beliefs about the relative riskiness of these two approaches. Third, once money is allocated for trade spend, there is typically nothing left over for consumer promotions. Many small CPG firms would love to fund consumer marketing programs, but they simply cannot afford them. Both retailers and manufacturers indicated that it is important to time promotions so that they coincide with times when consumers are in “shopping mode.” This implicitly means that many promotions are coordinated with peak demand periods such as holidays, back to school, grilling season or hunting season. One manager referred to these periods as “power windows,” and noted that they can be quite specific to a brand. To managers, it is perfectly logical to offer price promotions during peak demand, yet academics have struggled to rationalize this practice (Chevalier et al., 2003).

5 Getting practical

Finally, managers indicated that price promotions have historically been aligned around manufacturer goals. There was a perception that, while the retailer may benefit, the manufacturer’s goals are paramount. This perspective has evolved and become more balanced with the role of “category captain,” which often requires a broader view of both retailer and manufacturer objectives. Despite this trend, retail managers indicated that they want to become more independent, more strategic, and have greater control over price promotions. Research opportunities: Retailers’ competitive concerns offer a great opportunity to apply game theory to promotion planning and timing. Interestingly, managers don’t think of this in terms of an equilibrium. If promotions are timed with peak periods, this raises questions of causal inference. In other words, how does one separate the incremental impact of a price promotion from the impact of a seasonal shock? Finally, there is an opportunity to extend recent models of category management (Nijs et al., 2013; Alan et al., 2017) that address how the joint concerns of manufacturers and retailers may affect price promotions. Within the industrial organization literature, it has been common to abstract from vertical channel relationships. For example, Nevo (2000) studied competition among brands, but retail goals such as driving store traffic and retail brand awareness are not considered. Explicitly modeling the goals of retailers and manufacturers represents an opportunity for future research.

5.3 When decisions happen: Promotion timing and adjustments Based on our review of the academic literature, few researchers have given serious consideration to the practical issues that affect when promotions occur and whether adjustments can be made. For example, the theoretical idea that adjustments can be made and executed each week simply has little connection to practice. Models that make such assumptions are divorced from reality. Among all managers we spoke with, there was a clear indication that decisions are made well in advance and are very hard to change at the last minute. In CPG, plans occur annually and the actual execution starts twelve to sixteen weeks in advance of a price promotion. The deal sheet in Fig. 6 notes that a December 1 price promotion was finalized on August 22—100 days or 14 weeks in advance. When price promotions are combined with advertising in a retail flyer, the planning process typically starts months in advance. These flyers are commonly used by many types of retailers, such as department stores (e.g. Macy’s, JCPenney), sporting goods retailers (e.g., Dick’s, Cabela’s), mass merchants (e.g., Target, Walmart), and grocery stores (e.g., Safeway, Kroger). As the execution date nears, details such as the promoted price point, ad copy, images, etc. are finalized. To allow time for printing and distribution of flyers, all details are typically finalized four weeks in advance of a promotion. One manager commented that this constraint is relaxing as they move from print to digital flyers. Many retailers are looking to eliminate print flyers altogether. As this happens, there will be increased flexibility in the timing of advertised price promotions.

541

542

CHAPTER 9 How price promotions work: A review

Within CPG, the ability to execute last minute price promotions varied by product category. For example, meat and produce are highly perishable, and both retailers and manufacturers are set up to run last-minute price promotions to liquidate excess inventory. These price promotions are more likely to be in-store only and not have advertising support outside the store. Manufacturers noted that the ability to make last minute adjustments often depends on the retailer’s flexibility. Larger retailers are generally viewed as being less flexible and more process-driven, which limited their ability to execute a last minute promotion. This is understandable, since a last-minute promotion may create unintended negative spillovers such as stealing demand from other products. Managing the spider-web of price promotion spillovers among products is perhaps easier for small retailers. Research opportunities: Many theoretical and structural models assume that prices are set on a weekly basis. In reality, price promotions arise from sticky plans, making adjustments difficult. The shift from print to digital flyers will reduce adjustment costs and the subsequent impact on price promotions is an interesting topic for future research.

5.4 Promoted price: Pass-through The stylized view among academics is that manufacturers provide off-invoice wholesale price reductions, and then a retailer decides how much of the discount to pass through to consumers. Our interviews suggested that this may not be the best characterization of how price promotion decisions are made. First, the broad plan of when to promote, the desired price point, the associated trade funding, and other marketing activities are laid out in the annual plan. Second, multiple marketing activities often coincide with a price promotion, the in-store portion of which is paid for by the same vendor allowances that fund price promotions. Managers indicated that it was often difficult to separate how these funds were allocated to specific activities. Third, manufacturers have promotion guidelines for different types of retailers, depending upon their preferred promotional pricing strategy (Hi-Lo, EDLP, BOGO [buy one get one], etc.). The actual pass-through on a given price promotion is the result of considerable joint planning, and should be viewed through a broad lens that may involve multiple marketing activities that depend on the retailer’s strategic positioning. When we asked managers whether retailers would pocket trade dollars and not pass through discounts to consumers, there were mixed responses. Yes, we were told that this happens. Yet we were surprised to learn that running a “hot price,” a very deep discount, is of greater concern among brand managers. A retailer may offer an extremely low price to drive store traffic that may benefit a retailer, but could also disrupt the marketplace, anger competing retailers, and damage brand equity. Research opportunities: Causal inference for price promotions in the presence of coordinated marketing activities demands more attention from academic research. In addition, there are opportunities, both theoretical and empirical, to improve our

5 Getting practical

understanding of price promotions by recognizing that observed prices are the result of long-term plans, involve multiple marketing activities, and are influenced by the retailer’s overall pricing strategy.

5.5 Durable goods price promotion While most research on price promotion has been in CPG, price promotions are common in many other industries. Minimum advertised price (MAP) is a common practice among the durable goods manufacturers that we interviewed. Within the MAP framework, a manufacturer may offer PMAP, or promotional MAP. Our small sample of interviews suggested that, while PMAP is possible, it is not common. Instead, if a price promotion is required, then MAP is often removed from the product. One rationale for price promotion in durable goods is to drive volume of noncurrent merchandise (e.g., last year’s model). A major manufacturer in of sporting goods products stated that roughly one-third of price promotions were used to push obsolete inventory out of the channel. There is a similar motivation in the apparel industry, where price promotions are frequently used to manage excess inventory. Managers distinguish between in-season markdowns (discounts), which are typically part of the marketing plan, and end-of-season markdowns, which are the result of excess inventory. One manager of a national brand noted that price promotions are localized geographically to take advantage of regional opportunities. For example, a regional sporting event that generates consumer buzz may be combined with a price promotion. Local events create an opportunity for brands to establish a call to action among consumers and encourage them to purchase. Durable goods do not depend on repeat purchasing in the same way that frequently purchased consumer goods do. Once a consumer has bought a tennis racket, she may have little or no demand for that item for a long period of time. As a result, the timing of price promotions for durable goods is a critical concern. On the one hand, there is a temptation to offer a price promotion early in the season to be first to promote, getting all consumers to buy your tennis racket and locking out the competition. But managers warned that price promotions are ineffective if they are too early and not coordinated with consumers’ normal shopping patterns. For example, demand for swimsuits starts in Spring and ends in Summer; trying to offer price promotions too early, say in January, may beat the competition to market but won’t succeed because consumer interest is low. Demand uncertainty and inventory considerations also affect a retailer’s willingness to promote a durable good multiple times in a narrow window. A manufacturer, for example, may want to encourage trial for a product and ask a retailer to offer consecutive price promotions on the first two weeks of a month. A manager called our attention to the paradox that success in week 1 can create a stockout problem in week 2. If all inventory is sold in week 1, it is often impossible to get additional product for week 2. If an item is sourced domestically, additional supply can be ob-

543

544

CHAPTER 9 How price promotions work: A review

tained in four weeks. However, if an item is sourced internationally, the minimum lead time is 12-14 weeks. Recalling that flyers and advertising also have long lead times, there is no way to remove a product from the weekly circular if it is sold out. Retailers never want a prominently advertised item to drive traffic to the store but then disappoint consumers when they arrive at the store and find a stock-out. As a result, inventory considerations must be carefully integrated with price promotion timing. A senior manager noted that it is common to offer incentives to the channel (e.g., wholesale price discounts, cooperative marketing funds) to encourage price promotions in the store. However, it is often entirely up to the retailer whether and how a promotion is implemented. Hence, what academics refer to as agency concerns are very real for durable goods manufacturers. Related to this point, many durable goods require extensive sales support, instore education, and post-sales support. Manufacturers indicated that they are more likely to support retailers whose goals are aligned with their brand, but discussion of whether the manufacturer favors some retailers over others is very sensitive. Joint training programs funded by a manufacturer can help retailers with best practices in merchandising, marketing, and sales. However, the reality is that some retailers are simply better business partners. Manufacturers must strike a balance between rewarding committed partners and treating all partners fairly and equitably. Research opportunities: In general, price promotions for durable goods are a wide open area for researchers. The process by which decisions are made is often driven by heuristics, so there is an opportunity to add rigor to these decisions. Agency theory is clearly applicable, but we have little empirical evidence to explain how these concerns affect price promotions.

5.6 Private label price promotions Private label and store brands are growing parts of nearly every retail product assortment. In academic research, private label and store brands have been modeled as vertical integration decisions where the retailer is assumed to own and control the product. In reality, the retailer has considerable control of the brand name, product design, etc., but manufacturing of private label products is often done by third-party manufacturers. This has important practical implications for price promotions. A retail manager we spoke with indicated that private label manufacturers offer trade funds (e.g., cooperative advertising funds) to support their products in-store. When pressed about why this happens, it was noted that volume declines dramatically when products are not promoted (e.g., not in the weekly flyer, no in-store merchandising). Research opportunities: Theoretically, there is an opportunity to improve models of retail private label. Empirically, the impact and pass through of trade funds for private label vs. national brand manufacturers has not been studied.

6 Summary

5.7 Price pass through Economists and marketers have studied the topic of price pass through from various lenses. In marketing, the literature has tackled pass through of promoted prices as well a pass through of regular prices. In contrast, the macroeconomic literature has addressed exchange rate pass through. In a global economy, it is important to understand how cost shocks flow across geographies (i.e., countries) and impact both promoted and regular prices. Research opportunities: There is a near term opportunity to compare exchange rate pass through with price pass through and a longer term opportunity to understand when and why cost changes impact prices.

6 Summary In this chapter, we have provided an overview of the academic literature on price promotions in marketing and economics. We also have provided extensive discussion of how price promotions work in practice. We have concluded the chapter by pointing out opportunities for future academic research. We hope that our summary of how price promotions work in practice is of interest to academics. In particular, we found that price promotions are coordinated, planned activities among retailers and manufacturers. For economists, we hope that this chapter spurs greater interest in understanding the data generating process for promoted prices which commonly appear in the BLS and/or CPI and are more widely available in syndicated data sets from IRI and Nielsen. In any empirical study, it is critical to understand not only what prices are offered but the underlying data generating process. We believe the insights from this chapter have direct implications for work on price stickiness, promotion pass through, and demand estimation. This chapter also suggests that one thinks carefully about structural models that involve short-term price competition among sellers, such as weekly brand competition in a grocery store. Such models may ignore the coordinated, planned data generating process. We believe that there is considerable opportunity for collaboration with and impact on practice. Trade spending represents hundreds of billions of dollars and is clearly allocated inefficiently. Academic theories need to explain not only what happens (descriptive) but take a normative position and explain how price promotions can become more profitable. We hope that this chapter spurs researchers to look at the major issues facing practitioners today.

References Acosta, 2012a. The trend behind the spend: client spending study. AMG Strategic Advisors presentation. Acosta, 2012b. Trade Promotion: A Shift in the Lift. White Paper. Acosta, 2016. Reversing the Diminishing Returns of Trade Promotion. White Paper.

545

546

CHAPTER 9 How price promotions work: A review

Ailawadi, Kusim L., Borin, Norm, Farris, Paul W., 1995. Market power and performance: a cross-industry analysis of manufacturers and retailers. Journal of Retailing 71 (3), 211–248. Ailawadi, Kusum, Gedenk, Karen, Lutzky, Christian, Neslin, Scott, 2007. Decomposition of the sales impact of promotion-induced stockpiling. Journal of Marketing Research 44 (3), 450–467. Ailawadi, Kusum L., Harlam, Bari A., 2009. Retailer promotion pass-through: a measure, its magnitude, and its determinants. Marketing Science 28 (4), 782–791. Alan, Yasin, Kurtulus, Mumin, Wang, Chunlin, 2017. The Role of Store Brand Spillover in a Retailer’s Category Management Strategy. Vanderbilt Owen Graduate School of Management Research Paper 3042004. Allender, William J., Richards, Timothy, 2012. Brand loyalty and price promotion strategies: an empirical analysis. Journal of Retailing 88 (3), 323–342. Anderson, Eric T., Dana, James D., 2009. When is Price Discrimination Profitable? Northeastern University College of Business Administration Research Paper No. 08-003. Anderson, E.T., Kumar, N., 2007. Price competition with repeat, loyal buyers. Quantitative Marketing and Economics 5 (4), 333–359. Anderson, Eric T., Kumar, Nanda, Rajiv, Surendra, 2004. A comment on: “Revisiting dynamic duopoly with consumer switching costs”. Journal of Economic Theory 116 (1), 177–186. Anderson, E., Malin, B.A., Nakamura, E., Simester, D., Steinsson, J., 2017. Informational rigidities and the stickiness of temporary sales. Journal of Monetary Economics 90, 64–83. Anderson, Eric T., Simester, Duncan I., 2004. Long-run effects of promotion depth on new versus established customers: three field studies. Marketing Science 23 (1), 4–20. Anderson, Eric T., Simester, Duncan I., 2008. Does demand fall when customers perceive that prices are unfair? The case of premium pricing for large sizes. Marketing Science 27 (3), 492–500. Anderson, E.T., Simester, D.I., 2010. Price stickiness and customer antagonism. The Quarterly Journal of Economics 125 (2), 729–765. Anderson, Eric T., Song, Inseong, 2004. Coordinating price reductions and coupon events. Journal of Marketing Research 41 (4), 411–422. Arora, Neeraj, Henderson, Ty, 2007. Embedded premium promotion: why it works and how to make it more effective. Marketing Science 26 (4), 514–531. Assuncao, Joao L., Meyer, Robert J., 1993. The rational effect of price promotions on sales and consumption. Management Science 39 (5), 517–535. Ataman, M. Berk, Mela, Carl F., van Heerde, Harald J., 2008. Building brands. Marketing Science 27 (6), 1036–1054. Banks, Jeffrey, Moorthy, Sridhar, 1999. A model of price promotions with consumer search. International Journal of Industrial Organization 17 (3), 371–398. Barbaro, Michael, 2007. Saks settles with S.E.C. on overpayments. The New York Times. 9/6/2007:C12. Bell, David R., Chiang, Jeongwen, Padmanabhan, V., 1999. The decomposition of promotional response: an empirical generalization. Marketing Science 18 (4), 504–526. Bell, David R., Dreze, Xavier, 2002. Changing the channel: a better way to do trade promotions. MIT Sloan Management Review 43 (2), 42–49. Bell, David R., Iyer, Ganesh, Padmanabhan, V., 2002. Price competition under stockpiling and flexible consumption. Journal of Marketing Research 39 (3), 292–303. Bell, David R., Lattin, James M., 1998. Shopping behavior and consumer preference for store price format: why “large basket” shoppers prefer EDLP. Marketing Science 17 (1), 66–88. Besanko, David, Dubé, Jean-Pierre, Gupta, Sachin, 2005. Own-brand and cross-brand retail pass-through. Marketing Science 24 (1), 123–137. Bils, Mark, Klenow, Peter J., 2004. Some evidence on the importance of sticky prices. Journal of Political Economy 112 (5), 947–985. Blattberg, Robert C., Briesch, Richard, Fox, Edward J., 1995. How promotions work. Marketing Science 14 (3), G122–G132. Blattberg, Robert C., Eppen, Gary D., Lieberman, Joshua, 1981. A theoretical and empirical evaluation of price deals for consumer nondurables. Journal of Marketing 45 (1), 16–29.

References

Blattberg, Robert C., Neslin, Scott A., 1990. Sales Promotion: Concepts, Methods, and Strategies. Prentice Hall, Englewood Cliffs, NJ. Blattberg, Robert C., Wisniewski, Kenneth J., 1989. Price-induced patterns of competition. Marketing Science 8 (4), 291–309. Boston Consulting Group, 2012. Consumer products trade spend benchmark. Braido, Luis H.B., 2009. Multiproduct price competition with heterogeneous consumers and nonconvex costs. Journal of Mathematical Economics 45 (9–10), 526–534. Breugelmans, Els, Campo, Katia, 2016. Cross-channel effects of price promotions: an empirical analysis of the multi-channel grocery retail sector. Journal of Retailing 92 (3), 333–351. Briesch, Richard A., Dillon, William R., Fox, Edward J., 2013. Category positioning and store choice: the role of destination categories. Marketing Science 32 (3), 488–509. Bronnenberg, Bart J., Wathieu, Luc, 1996. Asymmetric promotion effects and brand positioning. Marketing Science 15 (4), 379–394. Bruce, Norris, Desai, Preyas S., Staelin, Richard, 2005. The better they are, the more they give: trade promotions of consumer durables. Journal of Marketing Research 42 (1), 54–66. Bucklin, Randolph E., Gupta, Sunil, 1999. Commercial use of UPC scanner data: industry and academic perspectives. Marketing Science 18 (3), 247–273. Burstein, A., Hellwig, C., 2007. Prices and Market Shares in a Menu Cost Model (No. w13455). National Bureau of Economic Research. Busse, Meghan, Silva-Risso, Jorge, Zettelmeyer, Florian, 2006. $1,000 cash back: the pass-through of auto manufacturer promotions. The American Economic Review 96 (4), 1253–1270. Calvo, Guillermo A., 1983. Staggered prices in a utility-maximizing framework. Journal of Monetary Economics 12 (3), 383–398. Cao, Wen, 2011. Another look at price promotion. Journal of Industrial Economics 59 (2), 282–295. Chan, Tat, Narasimhan, Chakravarthi, Zhang, Qin, 2008. Decomposing promotional effects with a dynamic structural model of flexible consumption. Journal of Marketing Research 45 (4), 487–498. Chandon, Pierre, Wansink, Brian, Laurent, Gilles, 2000. A benefit congruency framework of sales promotion effectiveness. Journal of Marketing 64 (4), 65–81. Chen, Yuxin, Li, Xinxin, Sun, Monic, 2017. Competitive mobile geo targeting. Marketing Science 36 (5), 666–682. Chevalier, Judith A., Kashyap, Anil K., 2019. Best prices: implications for price measurement. American Economic Journal: Economic Policy 11 (1), 126–159. Chevalier, Judith A., Kashyap, Anil K., Rossi, Peter E., 2003. Why don’t prices rise during periods of peak demand? Evidence from scanner data. The American Economic Review 93 (1), 15–37. Cohen, M.C., Leung, N.H.Z., Panchamgam, K., Perakis, G., Smith, A., 2017. The impact of linear optimization on promotion planning. Operations Research 65 (2), 446–468. Conlisk, John, Gerstner, Eitan, Sobel, Joel, 1984. Cyclic pricing by a durable goods monopolist. The Quarterly Journal of Economics 99 (3), 489–505. Consumer Buying Habits Study, 1995. Point-of-Purchase Advertising Institute and Meyers Research Center, Englewood, NJ. Cooper, Lee G., Baron, Penny, Levy, Wayne, Swisher, Michael, Gogos, Paris, 1999. PromoCast: a new forecasting method for promotion planning. Marketing Science 18 (3), 301–316. Corstjens, Judith, Corstjens, Marcel, 1995. Store Wars: The Battle for Mindspace and Shelfspace. Wiley, Chichester. Cui, Tony H., Raju, Jagmohan S., Zhang, Z. John, 2008. A price discrimination model of trade promotions. Marketing Science 27 (5), 779–795. Dhar, Sanjay K., Hoch, Stephen J., 1996. Price discrimination using in-store merchandising. Journal of Marketing 60 (1), 17–30. Díaz, Antón García, González, Roberto Hernán, Kujal, Praveen, 2009. List pricing and discounting in a Bertrand-Edgeworth duopoly. International Journal of Industrial Organization 27 (6), 719–727. Dolan, Robert J., Simon, Hermann, 1996. Power Pricing: How Managing Price Transforms the Bottom Line. Free Press, NY.

547

548

CHAPTER 9 How price promotions work: A review

Dotsey, Michael, King, Robert G., Wolman, Alexander L., 1999. State-dependent pricing and the general equilibrium dynamics of money and output. The Quarterly Journal of Economics 114 (2), 655–690. Dreze, Xavier, Bell, David R., 2003. Creating win-win trade promotions: theory and empirical analysis of scan-back trade deals. Marketing Science 22 (1), 16–39. Dreze, Xavier, Hoch, Stephen J., Purk, Mary E., 1994. Shelf management and space elasticity. Journal of Retailing 70 (4), 301–326. Dubé, Jean-Pierre, Gupta, Sachin, 2008. Cross-brand pass-through in supermarket pricing. Marketing Science 27 (3), 324–333. Eichenbaum, Martin, Jaimovich, Nir, Rebelo, Sergio, 2011. Reference prices, costs, and nominal rigidities. American Economic Review 101 (1), 234–262. Erdem, Tulin, Keane, Michael P., 1996. Decision-making under uncertainty: capturing dynamic brand choice processes in turbulent consumer goods markets. Marketing Science 15 (1), 1–20. Erdem, Tulin, Sun, Baohong, 2002. An empirical investigation of the spillover effects of advertising and sales promotions in umbrella branding. Journal of Marketing Research 39 (4), 408–420. Esteban-Bravo, Mercedes, Múgica, José M., Vidal-Sanz, Jose M., 2005. Optimal duration of magazine promotions. Marketing Letters 16 (2), 99–114. Esteban-Bravo, Mercedes, Múgica, José M., Vidal-Sanz, Jose M., 2009. Magazine sales promotion. Journal of Advertising 38 (1), 137–146. Felgate, Melanie, Fearne, Andrew, DiFalco, Salvatore, Garcia Martinez, Marian, 2012. Using supermarket loyalty card data to analyse the impact of promotions. International Journal of Market Research 54 (2), 221–240. Ferreira, Kris J., Lee, Bin H.A., Simchi-Levi, David, 2016. Analytics for an online retailer: demand forecasting and price optimization. Manufacturing & Service Operations Management 18 (1), 69–88. Fok, D., et al., 2006. A hierarchical Bayes error correction model to explain dynamic effects of price changes. Journal of Marketing Research 43 (3), 443–461. Fong, Nathan M., Fang, Zheng, Luo, Xueming, 2015. Geo-conquesting: competitive locational targeting of mobile promotions? Journal of Marketing Research 52 (5), 726–735. Fox, Edward J., Hoch, Stephen J., 2005. Cherry-picking. Journal of Marketing 69 (1), 46–62. Gartner, 2015. Market guide for trade promotion management and optimization. Gedenk, Karen, Neslin, Scott A., Ailawadi, Kusum L., 2006. Sales promotion. In: Retailing in the 21st Century. Springer, Berlin, pp. 345–359. Golosov, Mikhail, Lucas Jr., Robert E., 2007. Menu costs and Phillips curves. Journal of Political Economy 115 (2), 171–199. Gómez, Miguel I., Rao, Vithala R., McLaughlin, Edward W., 2007. Empirical analysis of budget and allocation of trade promotions in the U.S. supermarket industry. Journal of Marketing Research 44 (3), 410–424. Gonul, Fusun F., Carter, Franklin, Petrova, Elina, Srinivasan, Kannan, 2001. Promotion of prescription drugs and its impact on physicians’ choice behavior. Journal of Marketing 65 (3), 79–90. Gönül, Füsun, Srinivasan, Kannan, 1996. Estimating the impact of consumer expectations of coupons on purchase behavior: a dynamic structural model. Marketing Science 15 (3), 262–279. Grewal, Dhruv, Ailawadi, Kusum L., Gauri, Dinesh, Hall, Kevin, Kopalle, Praveen, Robertson, Jane R., 2011. Innovations in retail pricing and promotions. Journal of Retailing 87 (Suppl. 1), S43–S52. Guo, Liang, Villas-Boas, J. Miguel, 2007. Consumer stockpiling and price competition in differentiated markets. Journal of Economics & Management Strategy 16 (4), 827–858. Gupta, Sunil, 1988. Impact of sales promotion on when, what and how much to buy. Journal of Marketing Research 25 (4), 342–355. Hendel, Igal, Nevo, Aviv, 2006. Sales and consumer inventory. The Rand Journal of Economics 37 (3), 543–561. Hendel, Igal, Nevo, Aviv, 2013. Intertemporal price discrimination in storable goods markets. American Economic Review 103 (7), 2722–2751. Hess, James D., Gerstner, Eitan, 1987. Loss leader pricing and rain check policy. Marketing Science 6 (4), 358–374.

References

Ho, Teck-Hua, Tang, Christopher S., Bell, David R., 1998. Rational shopping behavior and the option value of variable pricing. Management Science 44 (12-part-2), S145–S160. Hoch, Stephan J., Dreze, Xavier, Purk, Mary E., 1994. EDLP, Hi-Lo, and margin arithmetic. The Journal of Marketing, 16–27. Hong, Pilky, Preston McAfee, R., Nayyar, Ashish, 2002. Equilibrium price dispersion with consumer inventories. Journal of Economic Theory 105 (2), 503–517. Inman, Jeffrey J., McAlister, Leigh, Hoyer, Wayne D., 1990. Promotion signal: proxy for a price cut? Journal of Consumer Research 17 (1), 74–81. Jedidi, Kamel, Mela, Carl F., Gupta, Sunil, 1999. Managing advertising and promotion for long-run profitability. Marketing Science 18 (1), 1–22. Jeuland, Abel P., Narasimhan, Chakravarthi, 1985. Dealing-temporary price cuts-by seller as a buyer discrimination mechanism. Journal of Business, 295–308. Kashyap, A.K., 1995. Sticky prices: new evidence from retail catalogs. The Quarterly Journal of Economics 110 (1), 245–274. Kehoe, Patrick J., Midrigan, Virgiliu, 2007. Sales and the Real Effects of Monetary Policy. Federal Reserve Bank of Minneapolis, Research Department. Klenow, Peter J., Kryvtsov, Oleksiy, 2008. State-dependent or time-dependent pricing: does it matter for recent U.S. inflation? Quarterly Journal of Economics 123, 863–904. Klenow, Peter J., Malin, Benjamin A., 2010. Microeconomic evidence on price-setting. In: Handbook of Monetary Economics, vol. 3. Elsevier, pp. 231–284. Kollat, David T., Willett, Ronald P., 1967. Consumer impulse purchasing behavior. Journal of Marketing Research 4 (1), 21–31. Lal, Rajiv, Little, John D.C., Villas-Boas, J. Miguel, 1996. A theory of forward buying, merchandising, and trade deals. Marketing Science 15 (1), 21–37. Lal, Rajiv, Matutes, Carmen, 1994. Retail pricing and advertising strategies. Journal of Business, 345–370. Lal, Rajiv, Villas-Boas, J. Miguel, 1998. Price promotions and trade deals with multiproduct retailers. Management Science 44 (7), 935–949. Lazear, Edward P., 1986. Retail pricing and clearance sales. The American Economic Review 76 (1), 14–32. Leeflang, Peter S.H., Parreño Selva, Josefa, Van Dijk, Albert, Wittink, Dick R., 2008. Decomposing the sales promotion bump accounting for cross-category effects. International Journal of Research in Marketing 25 (3), 201–214. Lemon, Katherine N., Nowlis, Stephen M., 2002. Developing synergies between promotions and brands in different price-quality tiers. Journal of Marketing Research 39 (2), 171–185. Levy, Daniel, Bergen, Mark, Dutta, Shantanu, Venable, Robert, 1997. The magnitude of menu costs: direct evidence from large US supermarket chains. The Quarterly Journal of Economics 112, 791–824. Lim, Jooseop, Currim, Imran S., Andrews, Rick L., 2005. Consumer heterogeneity in the longer-term effects of price promotions. International Journal of Research in Marketing 22 (4), 441–457. Manchanda, Puneet, Ansari, Asim, Gupta, Sunil, 1999. The ‘shopping basket’: a model for multicategory purchase incidence decisions. Marketing Science 18 (2), 95–114. McAlister, Leigh, 2007. Cross-brand pass-through: fact or artifact? Marketing Science 26 (6), 876–898. McShane, Blakeley B., Chen, Chaoqun, Anderson, Eric T., Simester, Duncan I., 2016. Decision stages and asymmetries in regular retail price pass-through. Marketing Science 35 (4), 619–639. Mela, Carl F., Gupta, Sunil, Jedidi, Kamel, 1998. Assessing long-term promotional influences on market structure. International Journal of Research in Marketing 15 (2), 89–107. Mela, Carl F., Gupta, Sunil, Lehmann, Donald R., 1997. The long-term impact of promotion and advertising on consumer brand choice. Journal of Marketing Research 34 (2), 248–261. Messinger, Paul R., Narasimhan, Chakravarthi, 1995. Has power shifted in the grocery channel? Marketing Science 14 (2), 189–223. Meza, Sergio, Sudhir, K., 2006. Pass-through timing. Quantitative Marketing and Economics 4 (4), 351–382. Midrigan, Virgiliu, 2011. Menu costs, multiproduct firms, and aggregate fluctuations. Econometrica 79 (4), 1139–1180.

549

550

CHAPTER 9 How price promotions work: A review

Misra, Kanishka, Schwartz, Eric M., Abernethy, Jacob, 2019. Dynamic online pricing with incomplete information using multi-armed bandit experiments. Marketing Science. Moorthy, Sridhar, 2005. A general theory of pass-through in channels with category management and retail competition. Marketing Science 24 (1), 110–122. Mourdoukoutas, Panos, 2017. A strategic mistake that still haunts JC Penney. www.forbes.com. (Accessed 24 December 2018). Mulhern, Francis J., Padgett, Daniel T., 1995. The relationship between retail price promotions and regular price purchases. Journal of Marketing 59 (4), 83–90. Nakamura, Emi, Steinsson, Jón, 2008. Five facts about prices: a reevaluation of menu cost models. The Quarterly Journal of Economics 123 (4), 1415–1464. Narasimhan, Chakravarthi, 1984. A price discrimination theory of coupons. Marketing Science 3 (2), 128–147. Narasimhan, Chakravarthi, Neslin, Scott A., Sen, Subrata K., 1996. Promotional elasticities and category characteristics. Journal of Marketing 60 (2), 17–30. Natter, Martin, Reutterer, Thomas, Mild, Andreas, Taudes, Alfred, 2007. An assortment wide decisionsupport system for dynamic pricing and promotion planning in DIY retailing. Marketing Science 26 (4), 576–583. Neslin, Scott A., 2002. Sales Promotion. Marketing Science Institute, Cambridge, MA. Nevo, Aviv, 2000. Mergers with differentiated products: the case of the ready-to-eat cereal industry. The Rand Journal of Economics, 395–421. Nijs, Vincent R., Dekimpe, Marnik G., Steenkamp, Jan-Benedict E.M., Hanssens, Dominque H., 2001. The category-demand effects of price promotions. Marketing Science 20 (1), 1–22. Nijs, Vincent, Misra, Kanishka, Anderson, Eric T., Hansen, Karsten, Krishnamurthi, Lakshman, 2010. Channel pass-through of trade promotions. Marketing Science 29 (2), 250–267. Nijs, Vincent R., Misra, Kanishka, Hansen, Karsten, 2013. Outsourcing retail pricing to a category captain: the role of information firewalls. Marketing Science 33, 66–81. Papatla, Purushottam, Krishnamurthi, Lakshman, 1996. Measuring the dynamic effects of promotions on brand choice. Journal of Marketing Research 33 (1), 20–35. Pashigian, B. Peter, 1988. Demand uncertainty and sales: a study of fashion and markdown pricing. The American Economic Review, 936–953. Pashigian, B. Peter, Bowen, Brian, 1991. Why are products sold on sale? Explanations of pricing regularities. The Quarterly Journal of Economics 106 (4), 1015–1038. Pauwels, Koen, 2007. How retailer and competitor decisions drive the long-term effectiveness of manufacturer promotions for fast moving consumer goods. Journal of Retailing 83 (3), 297–308. Pauwels, Koen, Silva-Risso, Jorge, Srinivasan, Shuba, Hanssens, Dominique M., 2004. New products, sales promotions, and firm value: the case of the automobile industry. Journal of Marketing 68 (4), 142–156. Pesendorfer, Martin, 2002. Retail sales: a study of pricing behavior in supermarkets. The Journal of Business 75 (1), 33–66. Raju, Jagmohan S., Srinivasan, Venkatesh, Lal, Rajiv, 1990. The effects of brand loyalty on competitive price promotional strategies. Management Science 36 (3), 276–304. Rakesh, Arvind Rajan, Steinberg, Richard, 1992. Dynamic pricing and ordering decisions by a monopolist. Management Science 38 (2), 240–262. Ramanathan, Suresh, Dhar, Sanjay K., 2010. The effect of sales promotions on the size and composition of the shopping basket: regulatory compatibility from framing and temporal restrictions. Journal of Marketing Research 47 (3), 542–552. Rao, Ram C., 1991. Pricing and promotions in asymmetric duopolies. Marketing Science 10 (2), 131–144. Rao, Ram C., Arjunji, Ramesh V., Murthi, B.P.S., 1995. Game theory and empirical generalizations concerning competitive promotions. Marketing Science 14 (3_Suppl.), G89–G100. Rhee, Hongjai, Bell, David R., 2002. The inter-store mobility of supermarket shoppers. Journal of Retailing 78 (4), 225–237. Rhodes, Andrew, 2014. Multiproduct retailing. The Review of Economic Studies 82 (1), 360–390.

References

Richards, Timothy J., 1999. Dynamic model of fresh fruit promotion: a household production approach. American Journal of Agricultural Economics 81 (1), 195–211. Richards, Timothy J., 2006. Sales by multi-product retailers. Managerial and Decision Economics 27 (4), 261–277. Rozen, Kareen, 2010. Foundations of intrinsic habit formation. Econometrica 78 (4), 1341–1373. Sahni, Navdeep S., Zou, Dan, Chintagunta, Pradeep K., 2017. Do targeted discount offers serve as advertising? Evidence from 70 field experiments. Management Science 63 (8), 2688–2705. Salant, Stephen W., 1989. When is inducing self-selection suboptimal for a monopolist? The Quarterly Journal of Economics 104 (2), 391–397. Schindler, Robert M., 1992. A coupon is more than a low price: evidence from a shopping-simulation study. Psychology & Marketing 9 (6), 431–451. Seiler, Stephan, 2013. The impact of search costs on consumer behavior: a dynamic approach. Quantitative Marketing and Economics 11 (2), 155–203. Sethuraman, R., Srinivasan, V., 2002. The asymmetric share effect: an empirical generalization on crossprice effects. Journal of Marketing Research 39 (3), 379–386. Sethuraman, Raj, Srinivasan, V., Kim, Doyle, 1999. Asymmetric and neighborhood cross-price effects: some empirical generalizations. Marketing Science 18 (1), 23–41. Shaffer, Greg, Zhang, Z. John, 2002. Competitive one-to-one promotions. Management Science 48 (9), 1143–1160. Shelegia, Sandro, 2012. Multiproduct pricing in oligopoly. International Journal of Industrial Organization 30 (2), 231–242. Silva-Risso, Jorge, Ionova, Irina, 2008. A nested logit model of product and transaction-type choice for planning automakers’ pricing and promotions. Marketing Science 27 (4), 545–566. Simester, Duncan, 1997. Note. Optimal promotion strategies: a demand-sided characterization. Management Science 43 (2), 251–256. Simester, Duncan, Hu, Yu, Brynjolfsson, Erik, Anderson, Eric T., 2009. Dynamics of retail advertising: evidence from a field experiment. Economic Inquiry 47 (3), 482–499. Sinitsyn, Maxim, 2008. Price promotions in asymmetric duopolies with heterogeneous consumers. Management Science 54 (12), 2081–2087. Sinitsyn, Maxim, 2012. Coordination of price promotions in complementary categories. Management Science 58 (11), 2076–2094. Slade, Margaret E., 1998. Optimal pricing with costly adjustment: evidence from retail-grocery prices. The Review of Economic Studies 65 (1), 87–107. Sobel, Joel, 1984. Non-linear prices and price-taking behavior. Journal of Economic Behavior & Organization 5 (3–4), 387–396. Song, Inseong, Chintagunta, Pradeep K., 2007. A discrete-continuous model for multicategory purchase behavior of households. Journal of Marketing Research 44 (4), 595–612. Srinivasan, Shuba, Pauwels, Koen, Hanssens, Dominique, Dikempe, Marnik, 2004. Do promotions benefit, manufacturers, retailers, or both? Management Science 50 (5), 617–629. Sriram, Srinivasaraghavan, Kalwani, Manohar U., 2007. Optimal advertising and promotion budgets in dynamic markets with brand equity as a mediating variable. Management Science 53 (1), 46–60. Stokey, Nancy L., 1979. Intertemporal price discrimination. The Quarterly Journal of Economics, 355–371. Swait, Joffre, Erdem, Tülin, 2002. The effects of temporal consistency of sales promotions and availability on consumer choice behavior. Journal of Marketing Research 39 (3), 304–320. Taylor, John B., 1980. Aggregated dynamics and staggered contracts. Journal of Political Economy 88 (February), 1–24. Tyagi, Rajeev K., 1999. A characterization of retailer response to manufacturer trade deals. Journal of Marketing Research, 510–516. Van Heerde, Harald J., Gupta, Sachin, Wittink, Dick R., 2003. Is 75% of the sales promotion bump due to brand switching? No, only 33% is. Journal of Marketing Research 40 (4), 481–491. Van Heerde, Harald J., Leeflang, Peter S.H., Wittink, Dick R., 2000. The estimation of pre- and postpromotion dips with store-level scanner data. Journal of Marketing Research 37 (3), 383–395.

551

552

CHAPTER 9 How price promotions work: A review

Van Heerde, Harald J., Leeflang, Peter S.H., Wittink, Dick R., 2004. Decomposing the sales promotion bump with store data. Marketing Science 23 (3), 317–334. Varian, Hal R., 1980. A model of sales. The American Economic Review 70 (4), 651–659. Waller, Matthew, Williams, Brent, Tangari, Andrea, Burton, Scot, 2010. Marketing at the retail shelf: an examination of moderating effects of logistics on SKU market share. Journal of Academy of Marketing Science 38 (Spring), 105–117. Walters, Rocknie G., 1991. Assessing the impact of retail price promotions on product substitution, complementary purchase, and interstore sales displacement. Journal of Marketing 55 (April), 17–28. Zhang, Jie, Krishnamurthi, Lakshman, 2004. Customizing promotions in online stores. Marketing Science 23 (4), 561–578.

CHAPTER

Marketing and public policy✩

10 Rachel Griffitha , Aviv Nevob,∗

a Institute

for Fiscal Studies and University of Manchester, Manchester, United Kingdom b University of Pennsylvania, Philadelphia, PA, United States ∗ Corresponding author: e-mail address: [email protected]

Contents 1 Introduction ...................................................................................... 2 The impact of academic research on policy ................................................ 3 Competition policy .............................................................................. 3.1 Market definition and structural analysis ...................................... 3.2 Economic analysis of competitive effects ...................................... 3.3 A few recent examples ............................................................ 3.3.1 The Aetna-Humana proposed merger ...................................... 3.3.2 The AT&T-DirecTV merger .................................................... 3.3.3 Mergers that increase bargaining leverage ................................ 3.4 Looking forward .................................................................... 4 Nutrition policy .................................................................................. 4.1 Objectives of nutrition policy .................................................... 4.2 Nutrient taxes ...................................................................... 4.2.1 The effects of taxes ............................................................. 4.2.2 Estimating pass-though........................................................ 4.3 Restrictions to advertising........................................................ 4.3.1 The mechanisms by which advertising might affect demand.......... 4.3.2 Empirically estimating the impact of advertising.......................... 4.4 Labeling ............................................................................. 4.5 Looking forward .................................................................... 5 Concluding comments .......................................................................... References............................................................................................

554 556 558 559 561 564 565 567 569 572 573 573 577 578 580 583 583 584 588 589 589 589

✩ We thank Phil Haile and Anna Tuchman, as well as the Editors and three anonymous referees for com-

ments on earlier versions of this chapter. Griffith gratefully acknowledges financial support from the European Research Council (ERC) under ERC-2015-AdG-694822, the Economic and Social Research Council (ESRC) under the Centre for the Microeconomic Analysis of Public Policy (CPP), RES-544-28-0001. Handbook of the Economics of Marketing, Volume 1, ISSN 2452-2619, https://doi.org/10.1016/bs.hem.2019.04.005 Copyright © 2019 Elsevier B.V. All rights reserved.

553

554

CHAPTER 10 Marketing and public policy

1 Introduction Research in quantitative marketing has broadly focused on two areas, as the previous chapters in this Handbook demonstrate: understanding consumer demand and deriving the implications for firm strategy. A logical next step is to consider the interaction of consumer behavior and firm strategy with public policy. Historically, quantitative marketers have largely left public policy and social consideration to economists. However, with the inflow of researchers trained in economics into the field of marketing this has started to change.1 In our view, the true impact marketing can have on policy debates has yet to be realized. Large parts of the policy space are directed at affecting the behavior of consumers and firms, yet policy work has not always relied as much as it could on the insights from marketing. The main thesis of this chapter is that research in quantitative marketing, and the marketing profession more generally, has important tools and insights that can be useful to public policy, and that there are potentially large gains from collaboration that are only starting to be realized.2 There are potentially many areas of public policy that we could discuss, including, for example, health (nutrition, smoking, pharmaceutical drugs, and medical treatment), antitrust, intellectual property and innovation, environmental, financial regulation, privacy, economic development, globalization, and more. We could not do justice to all these areas in one chapter, so we explicitly focused on only two areas: competition policy and nutrition policy. Furthermore, we generally do not attempt to provide a comprehensive literature review of all the academic work that has been written in these areas. Instead, we use these areas to demonstrate how the field of marketing can contribute to policy questions. We focus on competition policy and nutrition policy for two main reasons. First, these are areas that are near and dear to our heart. Besides dealing with them in our research we have both spent time working on them in actual policy settings.3 Second, we think these areas, and the contrast between them, demonstrate both how 1 For some recent examples where researchers have extended the traditional areas of quantitative marketing to study public policy issues see Bollinger and Gillingham (2012), Bollinger (2015), and Bollinger and Karmarkar (2015), looking at environmental questions, Shapiro (2018a, 2018b), looking at pharmaceutical drugs, Lewis et al. (2014), Wang et al. (2016), and Tuchman (2017) looking at smoking behavior, and Wei et al. (2016) looking at credit markets. 2 We are not the first to realize the potential of the field of marketing to contribute to public policy. For example, since 1982 the American Marketing Association has been publishing the Journal of Public Policy & Marketing. The focus of the journal has been mostly “managerial”, and as far as we can tell very little overlap exists between authors that publish in that journal and authors of chapters in this Handbook. 3 Rachel Griffith served as a Senior Economist in the UK Competition Commission 2001-2002, she is a co-Investigator on the Obesity Policy Research Unit for the UK Department of Health and she is the Research Director of the Institute for Fiscal Studies in London. Aviv Nevo served as the Deputy Assistant Attorney General (“Chief Economist”) in the Antitrust Division of the US Department of Justice 2013-2014. He has also worked on several co-op agreements with the USDA looking at various aspects of food policy.

1 Introduction

quantitative marketing can be helpful for policy and how there is room for improvement. Competition policy is an area where economists have long had an impact. As Tim Muris, then Chairman of the U.S. Federal Trade Commission (FTC), noted in 2003: “Policy discourse no longer focuses on whether economics should guide antitrust policy. That debate was settled long ago. The pressing question today is how.”4 Historically, how economics has shaped competition policy has tracked rather closely, but with a 20 year or so lag, the academic field of Industrial Organization (IO). This, combined with the availability of rich data, suggests that modern empirical IO, which is closely intertwined with modern quantitative marketing, is bound to take a leading role in shaping the future of competition policy. As we discuss in more detail below, a key input in almost any analysis in competition policy involves consumer diversion: if prices were to go up or quality were to go down where would consumers substitute. This same question is key to understanding many marketing problems and therefore an area where marketers and applied economists have developed significant expertise. In the area of nutrition policy a different picture emerges of the role that economists and marketers have played, despite the fact that many of the fundamental issues – such as understanding the role of consumer substitution and heterogeneity – are similar. For example, policy makers around the world are considering various forms of nutrition taxes; methods and insights from economics and quantitative marketing have a lot to offer in terms of understanding the impact of such policies. Suppose, for example, we want to quantify the effect of a tax on sugary drinks. We first need to know the incidence of the tax, i.e. how much of the tax would be passed on to consumers. Second, we need to know how consumers would respond to the higher price. Would they switch to alternative products? If so, which ones? Would they reduce overall consumption? Will the effect vary across consumers? If so, which consumers will be impacted the most? To answer these questions we need to understand consumer behavior and firms’ strategies. Quantitative marketing also has a lot to add in other policy areas. For example, if we want to understand the effects of restrictions on advertising unhealthy food we would need to understand how firms will respond to the restrictions and the implications this has for consumer behavior. Suppose advertising of unhealthy food is banned. Intuitively, we think this should lead to lower consumption of unhealthy food. However, in response to the ban firms might charge different prices. Indeed, if prices are reduced consumption of unhealthy food might actually increase. Once again, to answer these questions we need to understand firms’ pricing and consumer behavior, both of which are the core of quantitative marketing. The rest of this chapter proceeds as follows. In the next section we discuss the impact that academics and academic research can and does have on policy making. 4 “Improving the Economic Foundations of Competition Policy,” Remarks before George Mason Univer-

sity Law Review’s Winter Antitrust Symposium, Washington, D.C., January 15, 2003. Available at https:// www.ftc.gov/public-statements/2003/01/improving-economic-foundations-competition-policy.

555

556

CHAPTER 10 Marketing and public policy

In Section 3 we discuss antitrust and in Section 4 nutrition policy. A final section makes a few concluding remarks.

2 The impact of academic research on policy As we noted in the Introduction, academic work in quantitative marketing has had more impact on policy in some areas than in others. Specifically, in the two areas we focus on in this chapter the impact of economists and marketers in antitrust policy has been larger than in nutrition policy. However, even in the competition policy setting, where academics have been influential, it is not necessarily marketers who have taken the lead. In this section we discuss why this is the case. Up to this point we have not distinguished between research done by IO economists and quantitative marketers. Indeed, we referred to the fields as being intertwined and one could claim that there is little difference between IO economists and modern quantitative marketers. However, it was not so long ago that one would draw a line between IO and marketing, by saying that IO looked at public policy while marketing stopped at the interaction of firms and consumers. Some marketers defined their role this way. For example, a prominent marketing professor writing in Forbes claims that the difference between marketers and economists is “The marketer’s desire is to understand the way the world really works – not how it should work [italics in the original] . . . ”5 There are various ways to read this quote. One interpretation, which explains why economists are often more involved in policy, is that economists are more focused on how the world can and should change rather than merely describing how it really is. We suspect that many quantitative marketers, especially those trained as economists, would disagree with the above characterization. We surely disagree with it. Nonetheless, this does reflect the view that some in the marketing profession hold. If marketers want to have an impact on policy and be part of the public discourse they need to learn from economists and focus not just on describing the world but also on describing how it should be. By the same token, the knowledge that marketers have on how consumers (really) make choices and how firms (really) respond, as opposed to how theory tells us that they should act, is extremely valuable addition to policy debates. We strongly believe that there are true intellectual gains from trade. We are not the first to point to the value marketers can bring to public policy. For example, Stewart (2015) directly addresses why marketers should study public policy, and what marketers can contribute to the understanding of public policy – because they have a lot to add to the debate in terms of methods and insights into how businesses and markets work, which is important for the formation of effective

5 https://www.forbes.com/sites/wharton/2012/06/15/marketing-vs-economics-gymnastics-or-high-wireact/.

2 The impact of academic research on policy

and efficient policy. Indeed, a new generation of quantitative marketers are taking on this challenge and expanding the methods and ways of thinking of marketing into new directions (see footnote 1 for examples). We now turn to the question of why has the impact of academic research been so different in antitrust and nutrition policy? In both these settings, understanding consumer behavior is very important. So the answer to the question of why the outcome is different does not lie, in our view, in the social return to the insight quantitative marketing might have. A cynical view might be that the private monetary returns to consulting in the competition policy area are higher than the private return in nutrition policy. This might be true, and as economists, we would have to admit that economic agents likely respond to monetary incentives. However, this is unlikely to explain why the research that has been done is less influential. In our view, there are more fundamental reasons for the difference. Empirical work in policy analysis can take on two complementary roles: ex-post and ex ante analysis. Ex-post analysis involves the evaluation of a policy after it has occurred. The goal is to measure the effect of the policy, possibly to learn ways to improve it, and maybe consider ways to generalize beyond the specific setting. Two key challenges are internal validity – how to infer the causal effect of the policy – and external validity – how to extrapolate beyond the specific setting to other environments (see Deaton, 2009, and Heckman and Urzua, 2010). A different approach is one that involves ex ante policy analysis, where we ask what the likely effects of a policy are before the policy is implemented. When doing ex ante analysis the researcher needs to figure out not just what would happen in the absence of the policy, as in ex-post analysis, but also what would be the effect of the policy (which is observed in the case of ex-post analysis). To perform the analysis one needs to develop an economic model, estimate its primitive parameters using historical data, and use the estimates to compute counterfactuals. Many economists, and for sure many non-economists, feel uncomfortable with this approach, since they believe it relies on many untested assumptions. Some have argued that a better way to perform ex-ante analysis is to rely on ex-post analysis of past events and adjust the outcomes as needed (see Angrist and Pischke, 2010, and the response by Nevo and Whinston, 2010). In our view this side steps the key question, which is how should the outcome be adjusted. In order to know how the outcomes should be adjusted we need a “model”. This can be a purely statistical model or it can be an economic model, which is what the ex-post analysis does. Economists do, and should, rely on both carefully done ex-post and ex-ante analysis. However, in our view, the tools and methods of economics and quantitative marketing have a clear comparative advantage is in the more structural, ex-ante, analysis. This is both a curse and a blessing. It means that where structural ex-ante analysis is accepted so will the work of quantitative marketing be accepted. However, where ex-post, program evaluation, type analysis is the prominent paradigm quantitative marketing might play a lesser role.

557

558

CHAPTER 10 Marketing and public policy

For a variety of reasons ex-ante structural analysis is more prevalent in the competition policy world. Consider the case of mergers. The basis of the analysis is prospective: a competition agency has to decide in advance if the merger is “likely to lessen competition.”6 By nature, this requires ex-ante policy analysis. One could argue that the inference should be based on the outcomes of past mergers, but not surprisingly it is rare to have outcomes of similar mergers available. Furthermore, given the adversarial process in which mergers are evaluated, even if outcomes of past mergers were observed one could imagine that the relevance of these outcomes would be an area of great controversy. The adversarial process might also lead to an “arms race” in economic analysis, with each side trying to get the advantage by bringing in the latest and greatest methods from academic research. In the nutrition policy world the decision makers are usually politicians and public servants. The decision process could be laced with controversy, but it is less adversarial and therefore rarely leads to the same type of arms race as with competition policy.7 Decision makers are content to rely on simple, intuitive, and easy to explain methods, which often leave out important insights in particularly about the way that markets work; these insights are central to the design of effective and efficient policy. In order to provide these insights economists have to work harder to explain the value of their analysis.8 There is an additional component that contributes to the difference. In many areas of public policy there is a sense of “us” and “them” and that business is the “enemy”. Under this (misguided) view of the world, marketers whose day job involves teaching future executives how to sell products and maximize profits cannot contribute an unbiased view to the policy discourse. Indeed, the close connections marketers often have with firms – the same connections that often lead to incredible data sources – lead to concerns about impartiality. It is not uncommon to hear of research proposals that are not funded because of research collaboration with a firm.

3 Competition policy In this section we discuss the role that economics and quantitative marketing do, and can, play in competition policy analysis. We start by discussing the policy framework in order to motivate how methods from quantitative marketing can be useful in

6 The legal standard that controls mergers in the US is Section 7 of the Clayton Act that states that a merger should be deemed unlawful if “the effect of such acquisition may be substantially to lessen competition, or to tend to create a monopoly.” 7 While firms do sometimes resist policies such as taxes, advertising restrictions or labeling in court, these typically do not revolve around the functioning of markets or an economic analysis of the impacts of the policies. 8 Interestingly, one area of policy closely related to nutrition policy is tobacco control. In this area, litigation and an adversarial process are prevalent and indeed economists have played a much more prominent role.

3 Competition policy

competition policy. Specifically, we focus on the key role that consumer demand and diversion play. Next, we discuss a few recent examples that demonstrate the use of the methods and how they can be influential in both litigation and regulatory settings. A key point is that it is unrealistic to expect policy to be based on state-of-the-art methods that have not been fully vetted and might require data and time that is not always available. Finally, we discuss directions moving forward. In order to understand how economics and quantitative marketing can contribute to competition policy analysis we need to understand how competition policy is conducted. Take for example the case of a merger. If two firms want to merge they typically have to notify the government. Once they do, the competition agencies have a pre-determined period to evaluate the merger and decide whether they want to challenge it.9 The exact legal standard for challenging a merger differs from country to country but generally involves a question as to whether the merger will lessen competition. The key questions are: (i) How does a competition agency determine whether a merger will lessen competition? (ii) How can it prove in court that indeed the merger does reduce competition? Broadly speaking there are two approaches to answering these questions. First is an approach based on market definition and concentration analysis, often referred to as “structural analysis” (not to be confused with structural modeling). Second is an approach based on economic modeling of the likely competitive effects. Economics can play a crucial role in both of these approaches as we detail below. Similar type of questions arise in non-merger cases. The one exception is explicit collusion among firms, which is generally per-se illegal, and therefore does not depend on the competition agency demonstrating that the conduct had anti-competitive effects. The mere fact that firms colluded is illegal, and therefore little economic analysis is needed. Other forms of anticompetitive behavior, such as exclusionary conduct, predatory pricing, and agreements that restrain trade are generally evaluated under rule of reason, and therefore typically require significant economic analysis. Since these cases are somewhat more complicated and somewhat case specific we will not discuss them here.

3.1 Market definition and structural analysis A starting point in many, if not all, competition matters is the definition of the relevant antitrust market. This is done for two main reasons. First, to frame the relevant area in which there will likely be competitive harm. One goal of market definition is to

9 The exact process differs from country to country. In the US, for example, if the merger is above a

certain threshold the parties generally need to notify both the DOJ and the FTC. The agencies will then have 30 days to decide if they want to further review the merger. If an agency wants further review it will issue a second request to get more detailed information from the parties. Usually the parties and the agency will negotiate the scope of the request and the time the agency has to review the transaction. By the end of the agreed upon period the agency needs to either clear the transaction, possibly subject to some divestitures, or challenge it in court.

559

560

CHAPTER 10 Marketing and public policy

understand in what market would there will be harm to the competitive process? Who are the competitors in the market? And how difficult would it be to enter this market? Second, in the case of a merger another goal of market definition is to conduct a structural analysis. Once the market is defined one can compute the HerfindahlHirschman Index (HHI), which is the sum of the squares of the market shares of all firms. Typically the index is multiplied by 10,000.10 The higher the level of the HHI the more concentrated is the industry. For example, the FTC-DOJ joint Horizontal Merger Guidelines define a market as highly concentrated if the HHI is over 2,500. In addition to the level of the HHI one also computes the change in the HHI due to the merger.11 The Guidelines talk about a merger that raises the HHI by over 200 points and generates a highly concentrated industry, i.e. generates a post-merger HHI of over 2,500, as one that “will be presumed to be likely to enhance market power”. At this point it might be unclear what any of this has to do with quantitative marketing. However, it ends up that it has everything to do with a key element of quantitative marketing, namely, consumer choice. There are several ways to define a relevant market including qualitative analysis that relies on market realities and normal course of business documents. However, whenever possible the agencies and courts tend to rely on the so-called hypothetical monopolist test (HMT). The HMT tests if a candidate market is too narrow by asking whether a (hypothetical) profitmaximizing monopolist over this market would impose a small but significant and non-transitory increase in price (SSNIP). If the answer is no then the market is too narrow, because even a monopolist over this market would face significant competition from outside the market that limits its ability to (significantly) raise prices. There are potentially many markets that would pass the test so the Guidelines state that typically the smallest market is preferred to avoid defining markets too broadly. The key force limiting the hypothetical monopolist’s ability to raise prices is consumer substitution. If consumers have close enough substitutes then the monopolist will not raise prices because doing so would imply losing too many consumers. The HMT lays the framework but requires the key input of consumer substitution, or diversion, which is where quantitative marketing comes in. In practice, the HMT test is usually implemented in two ways. First, we ask whether the hypothetical monopolist would find it profitable to impose a SSNIP on at least one of the merging parties products. Note that this asks the question of whether a SSNIP is more profitable than the benchmark (in a case of a merger usually pre-merger prices) and not whether a SSNIP is the profit-maximizing price. To answer the question of whether a SSNIP is profitable we compute the “critical loss”, namely what share of demand has to be lost to make the SSNIP unprofitable. We then compare this critical loss to actual estimates of diversion in response to a SSNIP. If the estimate of diversion is higher than the critical loss a SSNIP will not be

10 In a monopoly market the index will equal 10,000, while in a market with n equally sized firms the

index will equal 10,000/n. 11 It is easy to show that this change is equal to 2 times the product of the shares of the merging parties.

3 Competition policy

profitable. We do not need an exact estimate of diversion, or loss, due to the SSNIP just to know if it is higher than the critical loss. Katz and Shapiro (2003) show that the critical loss is equal to X X+M

(3.1)

where X is the percent price increase and M percent gross margin (equal to (P − MC)/P ), where P is price and MC is marginal cost. The inputs required to this computation are margins and demand substitution, both of which are key inputs into pricing decisions that we regularly teach in pricing classes. Indeed, the idea behind critical loss analysis is very similar to break even analysis typically taught in these pricing classes (for example, see Nagle et al., 2011). Finally, the estimates of the diversion are usually either recovered from normal course of business documents or estimated from data using the methods discussed earlier in this volume. A slightly different way to conduct the HMT test is to compute the monopolist’s profit-maximizing price. This is done by simulating a merger to monopoly using merger simulation methods (Werden and Froeb, 1994; Nevo, 2000). The basic idea is to recover demand elasticities, either from normal course of business documents or using data to estimate them. These elasticities are then fed into a supply model and used to recover pre-merger margins and simulate the likely effects of the merger. The supply model can be very rich and include many of the models discussed earlier in this volume, but a typical model to use is the Nash Bertrand pricing model for differentiated products. The difference between the pre-merger and post-merger simulated price directly answers the question of whether a profit-maximizing monopolist would impose a SSNIP.

3.2 Economic analysis of competitive effects Most challenges that competition agencies make to mergers will involve the structural analysis described in the previous subsection. However, many economists are critical of it and would rather see a more economic based analysis (Kaplow, 2010; Ginsburg and Wright, 2015). This economics based analysis can take many forms that differ in the degree of econometric sophistication. In a data rich environment, which is more and more the case, econometric analysis of some form usually plays a role. Broadly speaking there are three types of econometric analysis that are used. The first type of empirical analysis often conducted is a regression of price (or other outcomes) on concentration. Often this will be a cross sectional regression correlating concentration with price across different geographic or product markets. Occasionally, the regression will look at changes over time and sometimes use a panel. The correlation between market concentration and price is used to predict the effect of a merger on price by using the change in concentration caused by the merger. This type of analysis was motivated by Structure-Conduct-Performance type regressions originally proposed by Bain (1956) and applied by hundreds of academic papers (Schmalensee, 1989). Interestingly, a typical grad level class in IO will start

561

562

CHAPTER 10 Marketing and public policy

by explaining why these regressions are no longer used in academic work (see, for example, the discussion in Salinger, 1990). A second approach that in some ways builds on the first, but tries to deal with some of the concerns, looks at a regression of price on market structure, but uses discrete events. A prime example is a merger retrospective where data before and after the merger is compared to a control market in a “diff-in-diff” analysis (see, for example, Weinberg et al., 2013). This can obviously only be done prospectively and therefore relies on having similar enough past mergers to evaluate. In many cases such events are not available The third approach, and the one most closely related to the themes of this chapter, is merger simulation. Broadly speaking, merger simulation involves writing down an economic model of the industry and using it to simulate the likely outcome of the merger. The inputs for the model typically include demand, cost, and a model of how firms interact. Parameters of the model can either be estimated, if data are available, or recovered from internal documents of the merging firms. To demonstrate the effect of a merger consider a merger in a differentiated products industry, where demand for each product j is given by Qj (p1 , . . .pJ ). For simplicity, assume single product firms that maximize static profits. The optimal price balances higher margins and lower volume and is given by the first order condition of the profit optimization problem, pj∗ = mcj −

Qj , ∂Qj /∂pj

(3.2)

where the last term is evaluated at the equilibrium prices of all firms. How does the merger change this? Suppose firms 1 and 2 merge, then firm 1’s optimal price takes account that some of the previously lost sales now go to its newly owned product 2. The first order condition becomes p1∗ = mc1 −

Q1 ∂Q2 /∂p1 − (p2∗ − mc2 ) . ∂Q1 /∂p1 ∂Q1 /∂p1

(3.3)

The main difference from before is the addition of the last term. This term is the margin of product 2 times the diversion ratio from product 1 to product 2. The diversion ratio measures what fraction of the demand that product 1 loses, as its price increases, goes to product 2. The product of these two terms generates what is often called an “upward pricing pressure”. The larger the margin and the larger the diversion, the larger this pressure will be. In the above, we left the marginal cost unchanged, but the merger might also generate efficiencies and reduce the marginal cost, which will reduce the optimal price. There are several ways to use this sort of analysis to compute the likely effect of a merger. One approach is to compute the magnitude of the upward pricing pressure (Farrell and Shapiro, 2010). The idea is to use normal course of business financial and marketing documents to compute the margins and diversion, and use these to directly compute the upward pricing pressure. Once we have the inputs the computation is trivial, and the hope is that the inputs are known to managers of the firms.

3 Competition policy

One common complaint about this computation is that it is not clear what this upward pricing pressure index means. One way to think about this computation is that it is like a change in the cost of the firm. Post-merger firm 1 has an opportunity cost for each unit it sells. The opportunity is a lost margin on product 2. How this cost translates into a price increase depends on the curvature of the demand curve, which determines the degree of pass through. In order to compute a price effect one needs to determine the curvature of the demand curve (for a more detailed discussion on pass-through see Section 4.2). Advocates of this method propose leaving everything in the “cost space” by computing both the change in the pricing pressure and the change in marginal cost in the same metric. Another, often missed, issue with this approach is that in principle both the margin and the diversion are a function of equilibrium prices. Even if we knew exactly what they were pre-merger we might do a bad job of predicting the post-merger effect. To see this, consider an extreme example. Consider a merger to monopoly in a nearly homogeneous good industry with two firms. Assuming Bertrand competition pre-merger the margins should be very low, because the products are nearly homogeneous. This implies a very low upward pricing pressure. However, if the two firms merge the price effect would potentially be very high. The reason is that in equilibrium the margin of firm 2 would significantly increase. If we know what the margin would be post-merger then we could compute the correct pricing pressure, but of course the whole point of the exercise is to compute what the margin will be post merger. This example is obviously extreme but it does provide a cautionary tale. An alternative way to use the above information is to use the first order conditions from all the products to compute the post-merger equilibrium (Berry and Pakes, 1993; Hausman et al., 1993; Werden and Froeb, 1994; Nevo, 2000). Formally, the prices post-merger solve the following set of first-order conditions p ∗ = mc + −1 (p ∗ )Q(p ∗ ),

(3.4)

where p∗ is the vector of equilibrium prices, and (p ∗ ) is a matrix with j r (p) = −∂Qj (p)/∂pr if j and r are produced by the same firm and zero otherwise. These prices can be compared to the pre-merger prices. Note that the computation can easily account for marginal cost reductions. To modern quantitative marketers this approach should seem quite natural, as it fits very well with many of the topics and methods discussed in the previous chapters. Merger simulation offers a coherent approach that allows the decision maker to conduct sensitivity analysis as well as account for various effects such as cost reductions, product re-positioning, and entry. On the other hand, merger simulation models are often perceived as being “too complicated” and dominated by the “simple” regression models described above that are viewed as more intuitive, or by the upward pricing pressure methods described earlier. These claims are not well founded. Consider the upward pricing pressure models. Conditional on having the inputs, merger simulation does really not require more assumptions. Margin and diversion information (and the pricing equation) can

563

564

CHAPTER 10 Marketing and public policy

be used to recover costs as well as own- and cross-price elasticities. Merger simulation does require knowing the shape of the demand curve, which one does not need in order to compute the upward pricing pressure, but which is needed in order to interpret this measure in a meaningful economic way. Merger simulation is often equated with demand estimation. But this need not be the case. Of course if data are available then the tools of modern quantitative marketing can be brought to bear. Because merger simulation can only be as good as the model (and inputs) that it is based on, quantitative marketers can have a big impact. As demonstrated in earlier chapters, quantitative marketing has developed models for flexibly estimating heterogeneity, cross-price effects, and demand curves more generally. Many of these methods are not ready for policy work quite yet, but as we demonstrate in the next subsection demand estimation and merger simulation can be very useful and powerful if used correctly.

3.3 A few recent examples In this subsection, we discuss a few recent examples where empirical quantitative models were used successfully in merger investigations and litigation. We first discuss the Aetna-Humana proposed merger that was successfully challenged by the DOJ. In this litigation the DOJ’s economic expert, Aviv Nevo, heavily relied on quantitative methods, specifically a Nested Logit model of demand and merger simulation methods. The methods used in this case followed the academic literature quite closely. The analysis used in this case did not necessarily push the academic frontier forward. Instead, the analysis used in the case provides a good example of how methods and results from the quantitative literature can be used successfully in court.12 Next, we discuss the merger of AT&T and DirecTV, where a merger simulation model, constructed by Steve Berry and Phil Haile, was offered by the merging parties and relied upon by the U.S. Federal Communications Commission (FCC) in its order approving the merger. The model used here was in many ways not strictly “offthe-shelf” and in some ways advances the state of knowledge, which at least in part was feasible because it was used in the context of a regulatory investigation and not litigation. The analysis is also a good example of how data collection methods can be applied productively. The analysis was quite successful and helped the parties get the merger approved. Finally, we conclude with a discussion of bargaining models in merger analysis. These models are an example where recent advances in academic work have had a great impact on competition policy in industries as varied as health care and video markets. 12 As one reviewer of this chapter noted, it seems surprising that we use as an example a case where the

methods used did not push the frontier of research. That this is a surprise shows the dis-connect between research and practice: just because a paper was published in an academic journal, even a leading one, does not make it ready for policy work, especially in the context of litigation. The mere fact that the court accepted the analysis, and dove into the technical details, is considered a significant advancement in practice.

3 Competition policy

3.3.1 The Aetna-Humana proposed merger Aetna and Humana are health insurance companies that provide a wide range of insurance products. In 2015, they announced their intention to merge and in July 2016, the Department of Justice and several state attorneys general filed a complaint seeking to enjoin the merger. In January 2017, after a three week trial, the court decided to block the merger, at which point the parties abandoned the transaction. The DOJ’s main concern was a loss of competition in the Medicare Advantage insurance market (for a more complete discussion of the case see Bayot et al., 2018). Medicare is a program administered by the federal government through the Centers for Medicare and Medicaid Services (“CMS”) to provide health insurance to eligible seniors aged 65 or older. The program partially covers the costs of hospital care, outpatient care, medical supplies, and preventive services. Enrollees pay deductibles, coinsurance, copayments, and a monthly premium for the outpatient services. Enrollees can seek care from any provider that accepts Medicare rates, which is the vast majority of all medical providers in the United States. Enrollees can offset the out-of-pocket costs by purchasing a Medigap plan and/or Medicare Part D coverage from a private insurer at additional premiums. Together these different combinations of Medicare with or without supplements are referred to as “Original Medicare options”. Alternatively, seniors can enroll in a Medicare Advantage insurance plan, which is administered by private insurers and use a network of providers. Unlike Original Medicare, Medicare Advantage insurers require or encourage their enrollees to use in-network providers. In exchange for the network restrictions, Medicare Advantage plans provide seniors with potentially lower cost and higher benefits. In 2016, Humana was the largest insurer in individual Medicare Advantage plans, with a nationwide market share of 21.2%. Aetna was the fourth-largest Medicare Advantage insurer nationally, with a 6% nationwide market share. Within the 364 counties that the DOJ focused on Aetna and Human jointly had 59% market share and 100% market share within 70 of these counties. On the other hand, nationally at least half of the potential enrollees choose Original Medicare, and in some (mostly rural) counties as much as ninety percent choose this option. Therefore, the key question became to what degree would competition from Original Medicare constraint any anti-competitive effects that might arise from the merger. The court’s decision to adjoin the merger relied, at least in part, on the results of demand estimation and merger simulation.13 The demand estimation presented in court was of a Nested Logit model of plan choice, where all the Medicare Advantage plans were in one nest, g = 1, and Original Medicare options were in the outside good in another nest, g = 0. The utility of enrollee i from plan j in market m is given by uij m = xj m β − αpj m + ξj m + ζig + (1 − σ ) ij m

(3.5)

13 The decision, along with other material from the trial, is available at https://www.justice.gov/atr/case/ us-and-plaintiff-states-v-aetna-inc-and-humana-inc.

565

566

CHAPTER 10 Marketing and public policy

where xj m is a vector of characteristics, such as a plan quality and additional benefits offered, pj m is a measure of price of the plan, and ξj m is an unobservable demand shock. A key parameter was the nesting parameter, σ , which determines the degree of substitution between Medicare Advantage and Original Medicare. The parameters were estimated using instrumental variables following the specification in Berry (1994). The instrumental variables used were attributes of competitors’ plans. Both sides estimated very similar models. The main difference was in the exact variables used as IVs. As a result the exact estimate of the nesting parameter (and the price elasticity) varied somewhat. The real difference, however, was in how these estimates were used. The expert for the government used the estimates to compute diversion and to perform a variety of HMTs, as described in the previous section. The test were both of the critical-loss variety and of a simulation of a merger to monopoly. The estimates were also used to simulate the likely effect of the merger. An important point in the trial was that as robustness test the estimates of the defendants’ expert were used. The fact that results were qualitatively unchanged gave the analysis significantly more credibility.14 Robustness is very important in academic settings but is significantly more important in a policy setting. The merging parties’ expert mainly used the nesting parameter to claim that Original Medicare should be included in the market because his estimate of the parameter was below 0.5. He also presented a regression that correlated the plan premium with concentration and found little correlation. In some ways, this was set up as a battle between a simple and intuitive regression based on “real world” data and a complicated abstract model that is ungrounded in reality. Many would have predicted the former would win. Based on the outcome and reading the decision this was clearly not the case. How is it that the “complicated” model won? In large part because the demand analysis was tied to the rest of the evidence. First, the approach was well grounded in market realities. The fundamental decision structure of the nested logit model was clearly supported in business documents. Second, the key result was supported by easy to understand simple patterns in raw data. Switching data on where consumers went once they left a Medicare Advantage plan clearly showed that at least eighty-five percent went to another Medicare Advantage plan. Third, results were consistent with the academic literature.15 The model and estimates followed closely what was done in the academic literature studying this market. Finally, the results were extremely robust. The basic conclusions regarding market definition and competitive harm held

14 A summary of the results presented in court are available at https://www.justice.gov/atr/us-and-

plaintiff-states-v-aetna-and-humana. 15 Several academic studies have used the Nested Logit model to study this market, and all estimated a

nesting parameter in the same range as what was presented to the court. These studies include, Guglielmo (2016), Curto et al. (2015), Dunn (2010), Hall (2011), Dafny and Dranove (2008), Town and Liu (2003), and Atherly et al. (2004).

3 Competition policy

under a wide range of parameter values including those produced by the merging parties’ expert. The above principles help explain the court’s decision and willingness to accept the choice modeling. But maybe more importantly, they give general guidance on how to convince policy decision makers to rely on the types of models used in quantitative marketing and economics. The models need to (i) be based on market realities; (ii) relate to simple statistics; and (iii) the conclusions need to be robust.

3.3.2 The AT&T-DirecTV merger AT&T is a telecommunication company that offers many services including cellular phone service, landline phone, broadband and wireline TV service in many, but not all, states. DirecTV is a satellite video provider. In the spring of 2014 AT&T announced its intention to buy DirecTV. The merger was reviewed by the DOJ and the FCC. Because the FCC was involved there are public documents explaining the decision and redacted versions of submissions are available online.16 Broadly speaking the parties put forward two arguments why the merger would be pro-competitive. First, they claimed that post-merger AT&T would have increased incentives to invest. The argument made was mostly theoretical, and while the argument might be correct, it seemed to gain little traction with the FCC. The second argument was that the parties offer complementary products and therefore post-merger would have incentives to reduce prices. The parties backed up this claim with demand estimation and a merger simulation model. Judging by the FCC’s order, which mostly accepted and slightly modified the merging parties’ model, the empirical analysis was influential in getting the merger approved. AT&T standalone video, offered mostly through its U-verse product, and DirecTV video are substitutes in areas where both are offered. It is well known, that absent cost savings a merger between substitutes will lead to higher prices. On the other hand, AT&T broadband service and DirecTV video may be complements, in which case the merger will lead to lower prices. Internet and video service might be complements because a provider offering both could offer a bundle discount, as most cable companies do, and install/service both together. The complementarity will not only lower the pricing of post-merger AT&T but also increase competition for bundles with the local cable provider. In theory it is unclear which effect dominates: the substitution or the complementarity. In order to answer this question the merging parties offered a simulation model that was estimated using market level data and used to simulate the likely effect of the merger. A key modeling challenge is that standard discrete choice models have built in that all products are substitutes. In order to model complementarity the model allowed for discrete choices that included video alone, internet alone or both. Specifically, each consumer can choose between several options that were grouped into four nests: video only, broadband only, both video and broadband, and the “outside op-

16 See https://docs.fcc.gov/public/attachments/FCC-15-94A1.pdf.

567

568

CHAPTER 10 Marketing and public policy

tion” of no video or broadband. The mean (conditional indirect) utility of product j k (where j indexes internet choice and k indexes video choice) in market m is given by δj km = xj km β − αpj km + ξj km ,

(3.6)

where xj km is a vector of characteristics, such as a product dummy, max speed offered, and DMA attributes interacted with product attributes, pj km is the price of the product, and ξj km is an unobservable demand shock. Each of the products was placed in one of four groups, or nests. Let the market share, conditional on a choice set χml , be denoted by σj k (δm , χml , θ), where θ denotes the parameters to be estimated. The Nested Logit distributional assumptions imply that ln(σj k (δm , χml , θ)) =

1−ψ δj m − ψg ln(Dglm ) − ln(1 + Dhlm h ), 1 − ψg

(3.7)

h

where the nesting parameter ψg measures the within-nest substitution in nest g. A simple specification, which is the one mostly relied upon, sets the parameter equal across groups. ln(Dglm ) is the “inclusive value” for group g and is given by ln(Dglm ) = ln

δj km , e1−ψg

(3.8)

where the summation is over the set of product that are in the nest and in the choice set, χml . One complication they needed to deal with is that the choice set varied within each DMA: not all consumers had access to the same choice set. The DMA level market shares are a weighted average of the conditional choice sets facing the different consumers sj km (δm , θ) =

l σj k (δm , χml , θ)wm

(3.9)

l l is the fraction of DMA households facing choice set χ l . The where the weight wm m model was estimated using DMA level data. The model parameters were identified using instrumental variables including product characteristics, characteristics of other products in the choice set, DMA-level product availability, and other DMA attributes. Because the choice set varied within DMA the estimation could not follow the method suggested by Berry (1994) and instead was closer to the estimation in Berry et al. (1995). One of the novelties of the analysis was the creation of different data sets and estimating using GMM based on marginal information (shares for video provider or broadband provider, but not shares the level of the j k “product”). Among other things, the authors utilized a survey conducted by a marketing professor. Such a sur-

3 Competition policy

vey would have been impossible to fund in an academic setting, but was possible here.17 The demand estimates were then fed into a merger simulation, like the one discussed above, and used to compute the likely effects of the merger. Reading through the FCC decision it is clear that the simulation model played an important role in getting the merger approved. The FCC used the merging parties model, modified it somewhat and “kicked the tires” a bit to check how robust it was. This analysis allowed the FCC to state: “We find that the combined AT& T-DIRECTV will increase competition for bundles of video and broadband, which, in turn, will stimulate lower prices, not only for the Applicants’ bundles, but also for competitors’ bundled products – benefiting consumers and serving the public interest. We also expect that this improved business model will spur, in the long term, AT&T’s investment in highspeed broadband networks, driving more competition and thus expanding consumer access and choice. This is, in other words, a bet on competition.”18 In many ways, the analysis done in this case was more novel than in the AetnaHumana merger discussed earlier. The model was not a standard Nested Logit model, in the sense that it allowed for complements. Because of within DMA variation in choice sets, the estimation was not the garden variety estimation of Nested Logit as in Berry (1994). Finally, the construction of market shares required a massive data collection effort that was only feasible with the financial backing of a company like AT&T. One could complain about several aspects of the model, such as lack of additional heterogeneity and the fact that options that included the same product got independent unobserved shocks. Indeed, if this model was used in litigation one could imagine these becoming real issues. However, in a regulatory setting the key issue was whether any of these concerns were significant enough to overturn the results. The above quote suggest that this was not the case and that the results gave the FCC comfort in approving the merger.

3.3.3 Mergers that increase bargaining leverage So far the examples have been on cases where modern methods of demand estimation have been used to estimate consumer demand and in turn simulate the likely effect of a merger. The merger simulation models used in these cases were relatively simple and based on linear Nash-Bertrand pricing. This is a reasonable assumption in some industries but not in others. For example, in many industries prices are negotiated. Recent academic work has empirically estimated bargaining models in a variety of industries (Draganska et al., 2010; Crawford and Yurukoglu, 2012; and Grennan, 2013). This is a prime example where recent academic work in marketing and economics has changed competition policy in industries as varied as health care and broadcast television.

17 See https://ecfsapi.fcc.gov/file/60001044292.pdf and https://ecfsapi.fcc.gov/file/60000973737.pdf for a more detailed discussion. 18 https://apps.fcc.gov/edocs_public/attachmatch/FCC-15-94A1.pdf at paragraph 5 on p. 4.

569

570

CHAPTER 10 Marketing and public policy

What is the effect of a merger in industries where prices are negotiated? Horn and Wolinsky (1988), Chipty (1995), and Chipty and Synder (1999) study this question. The basic ideas are described in Nevo (2014) and we follow that exposition closely here. Consider an industry characterized by bargaining between providers, who produce content or provide services, and distributors, who sell the products or services to final consumers as part of a bundle. For example, in health care, the providers are hospitals or physicians, and the distributors are the insurers. The consumers are patients, who choose an insurance plan, and use providers as medical needs arise. In video markets providers license content to cable companies or satellite distributors and the consumers are viewers who choose a provider and bundle, and watch content. The loss of competition from mergers in this setting is similar to the more standard setup, discussed above. Each provider supplies a potential improvement in the quality of a distributor’s bundle, or network. Providers compete to not be the one left out of any distributor’s bundle. That competition will impact the prices negotiated between the providers and distributors. The bargaining is assumed to follow Nash bargaining (Nash, 1950). The parties split the surplus between the benefits of reaching agreement and those of disagreeing. The outcome relies on two key factors: the division of these gains, which we will call the bargaining power, and the leverage that each party has. The bargaining power each party has can be motivated by requiring certain axioms to hold, as Nash did, or by looking at the relative patience of the parties (Rubinstein, 1982). As Horn and Wolinsky (1988) show, the effect of a merger depends on the curvature of the distributor’s value function: if the value function is convex then the merger will lead to higher fees and if concave to lower fees. This can be illustrated numerically. Let’s assume that the parties have equal bargaining power and that the split is fixed at 50:50.19 Suppose a distributor negotiates with two providers. The distributor nets $120 if its bundle includes both providers, $100 if its bundle includes either provider but not both, and nothing with neither provider in its bundle. The provider only gains fees it gets if in the bundle, and zero otherwise. The incremental gain from adding either provider to the bundle, relative to disagreeing, is $20 when the other provider is already in the bundle. Hence, the gain from making a deal is $20. Split equally with the provider results in a fee of $10. Now suppose the two providers merge and negotiate as a single unit. The gain from making a deal with the merged provider is $120. Split equally this results in the two providers acting together getting a combined $60, while acting separately they were only able to negotiate for $10 each or a total of $20 for both. The providers gained from joint negotiation. If we change the numbers slightly the result could 19 The level of the split is not important for what follows, but the assumption that it will not change with

the merger is potentially important. A merger in this setting will have an effect on the fees negotiated between providers and distributors if it changes the value of an agreement relative to the value of disagreement. This change can happen for different reasons, but the key is whether the value of an agreement post-merger is more or less than the sum of the pre-merger values.

3 Competition policy

change. For example, if each provider generated a value of $60 regardless, then negotiating jointly or separately the providers would gain the same: $30 each. On the other hand, if the value of having either provider in the bundle alone was only $20, and both $120, then negotiating separately the providers would get $50 each, so more than negotiating jointly (where they would get a combined $60). This might seem surprising, but it is just the counterpart of two complements merging in a price setting framework. There are several reasons why the distributor’s value function might be convex. For example, if consumers view the providers as substitutes then every provider adds value to the distributor, by making its plan more attractive, because some subscribers prefer each provider over all others. But the more providers already in a bundle, then the lower is the incremental value of an additional provider. Horn and Wolinsky cast this Nash bargaining model inside a Nash equilibrium, where each bargaining pair negotiates taking the bargaining outcome of other pairs as given. This is a strong assumption but has led to tractable empirical models (Crawford and Yurukoglu, 2012). In practice, this bargaining model has been taken to the data in a variety of ways. The first method, often called the willingness-to-pay (WTP) model, relates prices to measures of competition. The effect of the merger is estimated by computing how the merger changes the measure of competition. The second method is the equivalent of merger simulation but uses the bargaining model described above instead of a Cournot or Bertrand model in the simulation. Capps et al. (2003) propose estimating the WTP model in two steps.20 In the first step one uses historical data on provider choices to estimate a provider choice model. The model estimates the weight that consumers put on different attributes by choosing the parameters that best explain why consumers choose the providers they did, over those they did not. The estimates from the consumer choice model allow us to compute what consumers are willing to pay to add an option to various bundles. For example, suppose providers A and B are very close substitutes in the eyes of consumers, but very far substitutes from any other provider. In that case consumers will not be willing to pay much to add provider A to a network that already includes provider B, nor to pay much to add provider B to a network that includes A. Since neither A nor B add much incremental value to consumers, if the other is already in the network, they also do not add much value to the distributor trying to construct a network. Thus, on their own neither provider can obtain favorable rates and we would expect to see prices for providers A and B to be low. The second step of the WTP analysis simply correlates the expected value, or WTP, to prices paid historically and uses this relationship to simulate the likely effect of the merger. This regression is loosely motivated by the bargaining model, which says that WTP should be related to prices, but does not fully impose the relationship implied by the model. It parallels the idea of using historical data to estimate the

20 See Farrell et al. (2011) for details on how this approach is implemented in practice.

571

572

CHAPTER 10 Marketing and public policy

relationship between prices and concentration, as measured for example by HHI, and using it to predict the effect of a change in concentration. The main difference is in the measure of competition used, WTP instead of HHI. The second approach to empirically applying this model to the analysis of mergers expands merger simulation to these types of situations (Gowrisankaran et al., 2015). The key insight is that the equilibrium price equation given above can be modified to p ∗ = mc + [(p ∗ ) + (p ∗ ))]−1 Q(p ∗ ),

(3.10)

where (p ∗ ) is a function of the relative bargaining power of the two sides and various other bargaining terms (for details see Gowrisankaran et al., 2015). Ho and Lee (2017) build on this approach to study monopsony power. The bargaining model has influenced the FTC’s approach to mergers between health care providers (Farrell et al., 2011). The principle and ideas have been accepted by courts and let the FTC and DOJ break a long streak of losing hospital merger cases.21 Bargaining models also played a key in the challenge of Comcast attempted acquisition of Time Warner Cable (Rogerson, 2018) as well as the attempt by the DOJ to block the acquisition of Time Warner by AT& T. In the latter case the District court did not accept the government’s bargaining theory. Interestingly, a key empirical dispute involved a measure of diversion where two marketing professors – John Hauser for the DOJ and Peter E. Rossi for AT&T – were on either side.

3.4 Looking forward As we noted earlier, while academics and academic research have been influential in the area of competition policy there is still a wide gap between the academic frontier and the models used for policy analysis. For example, both Aetna-Humana and the AT&T-DirecTV cases, discussed above, relied on the Nested Logit model and not a more general Random Coefficients model. Furthermore, the model used are generally static and do not include demand side dynamics (such as those in Hendel and Nevo, 2006, Hartmann, 2006, or Gowrisankaran and Rysman, 2012) or supply side dynamics (such as Jeziorski, 2014). These models have not been used in policy work because of their greater data requirement, the time it takes to get a working version, and because empirical estimates of these models are often not as robust as the simpler models. However, looking forward as the profession’s understanding of these models increases, better computational methods are developed and more detailed data, such as consumer-level data, becomes available we could see the methods used in practice get closer to the frontier of research.

21 See for example the decisions in FTC v. ProMedica Health Sys., Inc., 2011-1 Trade Cas. (CCH) 77,395

(N.D. Ohio Mar. 29, 2011) and FTC v. OSF Healthcare Sys., 852 F.Supp.2d 1069, 1084 (N.D. Ill. 2012).

4 Nutrition policy

4 Nutrition policy Nutrition policy is a much newer area of activity, and one in which there is currently only a limited formal role for economics or quantitative marketing. The impact of economics on policy in this area to date has largely been to quantify the costs of public health interventions or provide cost benefit analysis. However, both ex post analysis, where we aim to measure the impact of an observed intervention, and ex ante analysis, where we aim to identify structural parameters that we can use to compute counterfactuals of situations that we have not observed, have important contributions to make to our understanding of the impact of policy interventions that aim to address problems associated with nutrition. As with antitrust analysis, key to understanding the effects of most policy interventions are empirical estimates of consumers demand behavior and understanding of how firms might strategically respond. In this section we first discuss what are the general objectives of nutrition policy and why quantitative market research has a contribution to make. We focus on nutrition policy in developed countries, nutrition policy in developing countries is often of a different nature, and is a place where economists have played a more significant role. We then discuss three active policy areas – nutrient based taxes, restrictions on advertising, and front of package nutrition labeling.22 We use these to illustrate some of the general points raised above, and some of the ways that quantitative research has and could make important contributions to our understanding of the effects of public policy.

4.1 Objectives of nutrition policy What is nutrition policy and what is it trying to achieve?23 There are many indicators that people are making poor food choices from a nutritional perspective. Obesity and the rise in diet-related disease is a major public policy issue; according to the World Health Organization (WHO) worldwide obesity has more than doubled since 1980, most of the world’s population now live in countries where overweight and obesity kills more people than underweight. WHO estimates that 42 million children under the age of 5 were overweight or obese in 2013. Quality-adjusted life years lost due to obesity in U.S. adults more than doubled from 1993 to 2008 (Jia and Lubetkin, 2010). The concern arises not only over poor food choices, but also regarding other potentially poor lifestyle choices over smoking, levels of physical activity, and alcohol consumption. 22 There are many other policies and research areas that we could discuss but have chosen not to; one

example is work by Allcott et al. (2018a) and Handbury et al. (2015) who use household-level marketing data (from Nielsen) to analyze the roles of supply (i.e. availability and prices) and demand (i.e. consumer preferences) in the nutrition-income gap observed in the US. These analyses have been heavily cited in the popular press and will likely inform future policy over “food desert”, i.e. whether restricted supply plays an important role in determining the nutritional quality of particularly poorer households shopping baskets. 23 This section draws heavily on Griffith et al. (2017b).

573

574

CHAPTER 10 Marketing and public policy

There is widespread concern that excess sugar consumption in particular is contributing not only to growing rates of obesity, but also to other diet-related diseases, including diabetes, cancers, and heart disease, and that excess sugar consumption is particularly detrimental for children.24 There is also evidence that poor nutrition, particularly early in life, leads to poor later life outcomes.25 This evidence has led to a deafening call for policy intervention26 based on the idea that such behaviors create “externalities”, costs that fall on others as a result of excess consumption, in the form of public health care costs, lost productivity, etc., and “internalities”, where excess consumption imposes costs on the person themselves in future that they do not account for at the time of consumption.27 A report by McKinsey Global Institute (Dobbs et al., 2014) estimates that globally obesity has roughly the same economic impact as smoking or armed conflict. They document 74 different policies that have been implemented to target obesity, including policies focused on changing food choices, on promoting exercise, on improving the balance between food and exercise, and other policies such as surgery or medication. In this chapter, we focus on policies that target the food and drink industry, including taxes, restrictions to advertising, and regulation over labeling food and drink products, as these are areas in which we think that quantitative marketing clearly has a lot to contribute. These policies are aimed at getting people to make better choices from a nutritional perspective over the foods they purchase and eat. There is not clear evidence about why people are making inappropriate choices (or indeed whether the choices are suboptimal from a welfare perspective). Maybe consumers are optimally trading off the benefits from consumption against the health costs. Or if not, is it because consumers are poorly informed about the nutritional composition of specific foods products? Do consumers not understand the implications of poor nutrition? Do consumers lack self-control (discount the future consequences of excess consumption at “too high” of a rate)? Do firms exploit these characteristics through advertising, obfuscation, and other means? There is a growing theoretical literature that suggests that all of these factors might be important. Individuals may have imperfect information about what a healthy diet looks like, or about the future health costs of eating unhealthy foods. A number of papers show that there is a strong correlation between education and nutritional choices, which is loosely suggestive of information related issues (for example, Cutler and LlerasMuney, 2010). Eating a healthy diet requires a balance of food types, and both what are the most nutritious foods and what are the consequences of a nutrient poor diet vary across individuals in complex ways related to genetics and lifestyle choices. It

24 See, for example, WHO (2015), Azais-Braesco et al. (2017), Nielsen and Popkin (2004). 25 See, for example, Belot and James (2011) and Glewwe et al. (2001). 26 CDC (2016), WHO (1990), Dobbs et al. (2014). 27 Gruber and Koszegi (2004), O’Donoghue and Rabin (2006), Haavio and Kotakorpi (2011), Allcott et

al. (2014).

4 Nutrition policy

is likely that most people, and in particular children, do not fully understand the implications of poor decisions, and so might be easily influenced to make poor choices. Consumers might not be consistent in the food choices that they make at different points in time, for example, preferring healthy foods when they are deciding the menu for future consumption but preferring unhealthy foods when choosing for immediate consumption. The marketing and economics literatures have incorporated some of the ideas from the psychology literature that capture this behavior. There are a large number of behavioral models of consumer choice; Della Vigna (2009) groups these into three broad categories: (i) models of non-standard preferences, for example, where the utility function is not time consistent; (ii) models of non-standard beliefs, for example, where consumers are incorrect in their predictions about the probability of future events; and (iii) models of non-standard decision-making, for example, where consumers use some simple rule of thumb rather than engage in maximizing behavior. An important feature of many of these models is that they result in consumers imposing costs on themselves in future that are not fully taken into account at the point of choice. Models of time inconsistency28 capture the idea that some individuals suffer from self-control problems, i.e. they value utility today more than utility tomorrow. This leads people to exhibit self-control problems; when asked today they would not want to eat a chocolate bar tomorrow, but when tomorrow arrives they decide to eat the chocolate bar. For example, Read and van Leeuwen (1998) ask participants in a study to make advance choices between healthy and unhealthy snacks to eat in a week’s time and then again asked them to choose at the time of consumption. They found that participants were dynamically inconsistent: they chose far more unhealthy snacks for immediate choice than for advance choice. Milkman et al. (2010) also provide evidence of this type of behavior in a study of on-line grocery purchases that finds that as the delay between order completion and delivery increases grocery customers spend more on products like vegetables and less on products like ice cream. Sadoff et al. (2015) observe considerable dynamic inconsistency in food choice in a within grocery store experiment. Other models that are relevant for understanding food purchases include those in which people incorrectly and systematically expect their future preferences to be the same or close to their present preferences (see Loewenstein et al., 2003). Gilbert et al. (2002) showed that a set of grocery shoppers who were randomly given a muffin prior to shopping (thereby satisfying their hunger) make fewer unplanned purchases than a control group who were not given a muffin. Read and van Leeuwen (1998) found evidence that making decisions while hungry leads to a higher likelihood of choosing an unhealthy option. They asked office workers to choose a healthy or an unhealthy snack to be delivered a week later (in the late afternoon). Workers were asked to make this decision either when they were likely to be hungry (in the late afternoon) or when they were satiated (after lunch). In the first group, 78 percent chose an unhealthy snack, compared to 42 percent in the second group. 28 Strotz (1955), Laibson (1997), O’Donoghue and Rabin (2003, 2006).

575

576

CHAPTER 10 Marketing and public policy

The theoretical literature has been influential in policy, and given the level of concern and advocacy it is clear that this will continue to be an active area for policy. Well designed policy should be informed by an understanding of how markets work. Parts of the public health community argue that the private sector is to blame for the current state of affairs (in particular the tobacco, alcohol, and processed food industries), and that industry should therefore play no role in finding the solution (see, e.g., Moodie et al., 2013). It is in the interests of policy makers, firms, consumers, and all market participants that policies are efficient and effective. Policy made in the absence of an understanding of how markets work risk being ineffective and leading to unintended consequences. The quantitative marketing (and economics) literatures potentially have a lot to contribute. In order to quantify the effect of specific nutrient taxes we need to have well identified demand models that will allow us to understand how consumers will respond to changes in prices. Will some consumers respond more than others? Are the consumers that respond the ones that the policy is seeking to target, i.e. those that have the highest internalities or externalities? We also need to understand the supply side of the market – how manufacturers and retailers will respond to the introduction of a tax – because this will determine the incidence of the tax. It will allow us to quantify by how much prices of each product will increase, and potentially whether firms will respond in other ways, for example, by withdrawing products from the market or introducing new products. When conducting either ex post or ex ante analysis it is important to remember the reasons for policy interventions, and to ensure that these are appropriately reflected in the methods that we use to study the impact of the policy. For example, if the rationale for introducing regulations that require front of package labeling is that we believe that consumers do not have complete information about the nutritional content of the products that they are choosing between, then it probably does not make sense to study the impact of the policy in a model where we assume that consumers have complete information. The rationales for policy intervention in the area of nutrition are various, but are based largely on models of consumer decision making in the absence of complete information or where some consumers have some type of cognitive limitation, so fail to use all of the information that they have, e.g. they do not fully weight the potential future consequences of consumption, or they do not take the time to read the back of package label. Modern methods for estimating demand allow us to study consumer behavior, and to understand how a particular policy intervention will change the incentives of consumers to purchase different products, in a robust way while remaining reasonably agnostic about the exact way that consumer behavior departs from the standard full information and fully rational model (subject to all the standard caveats about identification and external validity). However, to make statements about welfare generally requires that we are much more specific about the precise model of consumer behavior and about functional forms. Similarly, to make predictions about firm behavior we typically need to make additional assumptions about the nature of strategic interactions between firms.

4 Nutrition policy

4.2 Nutrient taxes One popular policy instrument is taxation, which aims to increase prices and change relative prices of less nutritious products relative to more nutritious ones. Such corrective taxes have long been used to address excess consumption of alcohol, tobacco, and other “sin” goods that arises because consumers do not take account of externalities (see Pigou, 1920, and Diamond, 1973). They are popular because they also typically raise tax revenue. More recently they have been advocated as a way to correct internalities, where consumption today imposes costs on the individual themselves in the future that the individual does not take account of at the time that they make the consumption decision. Corrective taxes have the potential to improve welfare by correcting consumption that generates internalities. The optimal design of such taxes involves trading off the welfare gains of reducing externalities or internalities, with the welfare losses from a reduction in consumer surplus due to the tax.29 A common criticism of excise style taxes is that they are regressive; the poor typically spend a higher share of their income on the taxed good, and so bear a disproportional share of the burden of the tax.30 However, if the tax plays the role of correcting an internality, then the distributional analysis is more complicated; if low income consumers also save more from averted internalities this may overturn the regressivity of the traditional economic burden of taxation (Gruber and Koszegi, 2004). These redistributive concerns become more subtle when income transfers are considered.31 Nutrient taxes of various forms have been introduced in many countries with mixed success. Alcohol taxes exist in most countries. Taxes on soda or sugarsweetened beverages have been introduced in a number of US localities.32 Early ex-post analysis to measure the immediate effect of these policies suggest that the taxes in Mexico (Colchero et al., 2015) and Berkeley (Falbe et al., 2016) have led to reductions in consumption. Bollinger and Sexton (2017) find similar though more modest effects for the Berkeley tax reform, due to the very limited pass-through. Other policies have met with less success. For example, Denmark introduced a tax on foods that are high in saturated tax in 2011, but abolished the tax just over a year later over concerns that the tax was having little impact on consumption and was putting jobs at risk, as many of the products that were affected by the tax were produced in Denmark. Key to the design of effective tax policy is an understanding of how manufacturers and retailers are likely to respond to a tax, in terms of pricing and potentially other strategic decisions (see e.g. Draganska et al., 2009), and how changes in prices (and other strategic variables) will affect consumption choices over food and drink products. A number of papers in the public health literature that have studied the impact 29 See, for instance, Gruber and Koszegi (2004), O’Donoghue and Rabin (2006), Haavio and Kotakorpi

(2011), Allcott et al. (2014), and Griffith et al. (2017b, 2017c). 30 For instance, see Senator Sanders op-ed on the Philadelphia soda tax (Sanders, 2016). 31 See e.g. Lockwood and Taubinsky (2017), Allcott et al. (2015), Allcott et al. (2018b, 2018c). 32 Including Berkeley, Oakland, San Francisco, Boulder, Philadelphia, and others, in France in 2012,

Mexico in 2013, and in the UK in 2016.

577

578

CHAPTER 10 Marketing and public policy

of sugar taxes have been influential in the policy debate and have relied on demand models that make restrictive functional form assumptions and where identification arguments are questionable.33 Modern methods in demand estimation and the availability of scanner data such as Nielsen Homescan and Kantar Worldpanel provide the opportunity to improve substantially on this work. Scanner data provides a rich resource that is only beginning to be exploited for public policy analysis. The advantages of these data over existing standard data resources that are used for the analysis of public policy include that they are typically longitudinal, they are at the transaction level, and they contain well measured prices and product characteristics. In particular, being able to study demand at the individual product level, instead of at the product category level (which is more typical in public policy work and in standard consumption panels) is important when studying policy interventions such as nutrient taxes. For example, to account for how people switch between the large numbers of disaggregate products in markets such as alcohol and many food markets, it is important because policies can affect products differently and products have different nutritional characteristics (see discussion in Griffith and O’Connell, 2009). There are also disadvantages of these data for studying nutrition. Most importantly, the data record purchases and not consumption, although they can be overstated, because while the ultimate aim of policy is to reduce consumption, policies such as taxes (as well as advertising restrictions and labeling) aim to change purchase behavior in the first instance. In addition, while individual transaction level data has advantages, it does not solve the fundamental identification problem, which is a key ingredient in identifying the shape of demand. One important identification issue is being able to isolate the impact of changes in prices on consumer demand from other potentially confounding factors – shocks to demand are likely to be correlated with prices.34 Another issue on which there is less empirical evidence is how to measure internalities – the difference between the consequences that people account for when making food choice and those that they do not account for. To set optimal nutrition taxes we would need these precisely, though if we know that they are positive, and have information on how they vary across the population, we might be confident that a particular tax is welfare improving, even if not necessarily optimal.35

4.2.1 The effects of taxes What do we know about the effects of taxes on price and ultimately on consumer choices and welfare? There are some recent contributions using modern IO methods 33 See, for example, Briggs et al. (2013), Brennan et al. (2014), Purshouse et al. (2010), Brownell et al. (2009). 34 Berry (1994), Nevo (2011), Berry and Haile (2010). 35 Several papers discuss the bias that might arise due to mis measuring internalities in the context of setting taxes, including, Mullainathan et al. (2012), Bernheim and Rangel (2009), and Handel and Schwartzstein (2018), and Allcott et al. (2015) in the context of setting taxes.

4 Nutrition policy

to carry out ex ante analysis of the impact of nutrient taxes using scanner data. For example, Wang (2015) studies the impact of a soda tax and shows the importance of accounting for the dynamics in demand that arise through stockpiling. Patel (2012) and Dubois et al. (2017) show the importance of allowing for rich consumer heterogeneity in preferences and responsiveness. Dubois et al. (2017) conduct an ex ante analysis of the introduction of soda taxes in the UK, and the results suggest that a soda tax would be effective at targeting young people who consume a lot of soda, but would not be so effect at targeting older consumers who drink large amounts of soda, because they are not very sensitive to price changes. Understanding better who responds to taxes, how they respond, how firms respond, and how all this varies in different contexts are important ingredients for better policy making. Other work has shown that it can be important to consider not just a single food category, but to consider possible interactions across food groups, for example, if taxing sugar in soda led consumers to substitute to sugar in other forms (Dubois et al., 2014; Harding and Lovenheim, 2017). These ex ante studies are useful because we can consider the likely impact of policies that have not yet been introduced, we can study whether they target the right part of the population, and we can study the welfare implications of policy reforms. However, policy makers often find the assumptions that are required unpalatable, and in order for work in this vein to have a policy influence it is important to be able to articulate well the restrictiveness, or not, of the structural assumption. See the discussion in Section 3.3.1 regarding the Aetna-Humana merger and the importance of ensuring that a model is based on market realities, relates to simple statistics, and the conclusions need to be robust in order to be convincing to policy decision makers. In some cases we can combine the advantages of ex post reduced form strategies, in particular that they are more transparent and so more palatable to the lay person and often rely on more easily explained and more credible identification strategies, with the ability of structural models to make statements about counterfactual situations and about welfare. The “sufficient statistics” approach, has been widely applied in public economics.36 The basic idea is this, in order to answer some important policy relevant questions we do not need a full structural model, all we need to know is a sufficient statistic. If we can derive a formula for the welfare consequences of policies that are functions of objects (such as elasticities) that we can estimate from ex post analysis, then these might be sufficient for us to say whether welfare will be increased or decreased if we undertook some counterfactual reform, under some assumptions. For example, under the assumptions of the Mirrlees (1971) model we can make inference about the optimal progressive income tax schedule from labor supply elasticities (Saez, 2001). This can be a powerful result and empirical work can make two important contributions. First, ex post studies can provide robust evidence on the sufficient statistics of interest. Second, ex ante studies can help us to understand whether the assumptions required to implement the sufficient statistics approach are

36 See Chetty (2009), Saez (2001), Gruber and Saez (2002), Jacobsen et al. (2016).

579

580

CHAPTER 10 Marketing and public policy

valid. To our knowledge these insights and methods have not been widely applied to inform the setting of taxes on alcohol, soda, and other nutrient related taxes.

4.2.2 Estimating pass-though We can use standard demand methods in order to undertake ex ante analysis of the likely impact of a nutrient tax – we can estimate demand, assume supply behavior, specify the first-order conditions, invert these to recover marginal cost, and then do counterfactual analysis with and without the taxes (or at different tax rates). When undertaking ex ante analysis the functional form assumption on demand can be particularly important. The effects of a tax on prices will depend on the shape of the market level demand curve, and on the supply-side and how much of the tax firms choose to pass-through to consumers by raising prices. It will also depend on whether firms respond in other ways, such as entering or exiting the market, reformulating products or changing other strategies, which we will not discuss here. One general point that a number of papers have highlighted is that, in addition to the usual endogeneity concerns, the curvature of market demand is a crucial determinant of pass-through of cost shocks and taxes to consumer prices.37 For example, in the case of a single product monopolist with constant marginal costs, pass-through of a cost shock will be incomplete if and only if the monopolist faces a demand curve that is log-concave. Let the demand curve be q(p) and constant marginal cost be c; optimization implies q + p(dq/dp) = c(dq/dp). Differentiating with respect to cost and substituting yields pass-through as dp 1 1 = = 2 2 . d 2 q/dp 2 dc ln(q) q 2 − q (dq/dp)2 1 − d dp 2 (dq/dp)

(4.1)

This expression shows that pass-through will be incomplete (dp/dc < 1) if and only if demand is log-concave (d 2 ln(q)/dp 2 < 0); if we restrict market demand to be logconcave then this rules out pass-through of more than 100% by assumption. More generally, assuming a particular degree of concavity or convexity of log demand will place strong restrictions on the possible range of pass-through even when it does not exactly imply under or over-shifting. In the mixed logit demand model the functional form of indirect utility and the heterogeneity that is allowed will be an important determinant of the curvature of the log of market demand, and therefore on passthrough. Griffith et al. (2017a) discuss this and show its empirical relevance in the context of studying the impact of a fat tax on demand38 ; this discussion draws heavily on that paper. Consider a consumer with income y. The consumer makes a discrete choice from j ∈ {0, 1, . . . , J }. Denote price pj . Each product has an associated vector of observable product characteristics xj and an unobservable characteristic εj . There is 37 Bulow and Pfleider (1983), Seade (1985), Anderson (2001), Weyl and Fabinger (2013). 38 Khan et al. (2015) also conduct an ex ante study of the likely impact of a fat tax.

4 Nutrition policy

a vector of parameters θ , some of which may be random coefficients. Assume that the consumer’s indirect utility (conditional on purchasing j and spending y − pj on the outside good) is given by: U (y − pj , xj ) + εj .

(4.2)

We assume that U (y − pj , xj ) + εj satisfies the properties of an indirect utility function (it is non-increasing in prices, non-decreasing in grocery budget, homogeneous of degree zero in all prices and grocery budget, quasi-convex in prices, and continuous in prices and grocery budget); consumer theory does not impose further restrictions on how y − pj enters conditional utility. We assume that εj is independent and identically distributed across alternatives and drawn from a type I extreme value distribution, so the probability that the consumer selects option j is given by: exp(U (y − pj , xj )) . k∈{0,...,J } exp(U (y − pk , xk ))

Pj =

(4.3)

If we assume that utility is linear in y − pj : U (y − pj , xj ) = α(y − pj ) + g(xj ),

(4.4)

which means that the marginal utility of income (y) is constant and importantly, when comparisons are made across options y differences out of the model so that by assumption an increase in a consumer’s income does not impact on demand for the inside products j > 0. To capture the fact that consumers with different incomes are observed to make systematically different choices it is common to include y in the model as a “preference shifter” (see, inter alia, Nevo, 2001; Berry et al., 2004; Villas Boas, 2007; and many other papers). For example, the parameter α can be allowed to vary linearly across consumers with y: α = α0 + α1 y + ν,

(4.5)

where ν is a random coefficient. This “preference shifter” model allows researchers to capture, in a reduced form way, the empirical fact that spending patterns vary cross-sectionally with income. However, it rules out income effects at the individual level and is ad hoc in that consumer theory does not provide a theoretical explanation for why preferences should shift with y. Papers that allow y − pj to enter non-linearly include Berry et al. (1995), Goldberg (1995), and Petrin (2002). These papers consider demand for large budget share product categories (automobiles and mini-vans) and specify: U (y − pj , xj ) = α ln(y − pj ) + g(xj ),

(4.6)

α the marginal utility of income is given by y−p and is therefore inversely proportion j to y − pj . This specification implies that households with higher income are less price sensitive.

581

582

CHAPTER 10 Marketing and public policy

Consider the demand curve for a product in the market. Let each consumer be indexed by income and a vector of parameters, (y, θ ). Normalizing the size of the market to one, the market demand curve for option j is: qj (p) =

(4.7)

Pj (y, θ )g(y, θ )dydθ,

where Pj (y, θ ) is the individual purchase probability given in Eq. (4.3) and g(y, θ ) is the joint density over the elements of (y, θ ). The second derivative of the log of market demand with respect to price is given by: ∂ 2 ln qj ∂pj2

=

Pj (y, θ ) ∂ 2 ln Pj (y, θ ) g(y, θ )dydθ qj ∂pj2 Pj (y, θ ) ∂ ln Pj (y, θ ) 2 g(y, θ )dydθ + qj ∂pj 2

Pj (y, θ ) ∂ ln Pj (y, θ ) − g(y, θ )dydθ . qj ∂pj

(4.8)

The curvature of this depends on: (i) the weighted average of the second derivatives of log individual demand, which is negative if individual level demand is logconcave, and (ii) the weighted variance of the slope of log individual level demand, which is non-negative and is positive when there is heterogeneity in individual demands. Log demand will be concave if individual demand is log-concave and if the cross-sectional variance of the slope of log demand is not too big. It will be convex if individual log demand is convex or if the variance term is large enough in magnitude. If we assume that utility is linear in y − pj , i.e. Eq. (4.4), then

∂ 2 ln qj ∂pj2

collapses to

the second derivative of the log of individual level demand: ∂ 2 ln qj ∂pj2

=

∂ 2 ln Pj ∂pj2

= −α 2 Pj (1 − Pj ) < 0,

and curvature is then fully determined by the marginal utility of income parameter, α, and the market share. Both individual and market demand are restricted to be log-concave. If we allow for heterogeneity in α then individual demand is still restricted to be log-concave, but the market demand curve could be log-convex or log-concave in some regions and log-convex in others, depending on the weighting of consumers. If we allow y − pj to enter utility in a flexible nonlinear way then this allows flexibility in both individual level and market demand. In particular, individual level demand will not be constrained to be log-concave. The second derivative of the log

4 Nutrition policy

of consumer demand for option j with respect to its own price is given by:

∂ 2 U (y − pj , xj ) ∂U (y − pj , xj ) 2 ∂ 2 ln Pj = (1 − Pj ) − Pj . ∂(y − pj ) ∂(y − pj )2 ∂pj2

(4.9)

The degree of log-concavity (or convexity) is determined by the shape of the function U , and therefore the flexibility of the curvature of individual demand depends on the flexibility of the function U . Therefore the specification is crucial to study pass-through. There is a highly related literature in marketing that has analyzed the rate at which promotions are passed through by retailers to the final consumer.39

4.3 Restrictions to advertising Another popular policy is restrictions to advertising. The aim of policies that restrict the advertising of specific foods is to lower consumption, as is the case with widespread restrictions to advertising of alcohol and tobacco products. This is in contrast to regulation of advertising in some other markets where the aim is consumer protection or information provision (for example, the advertising of pharmaceuticals). The World Health Organization (WHO, 2010) has advocated restrictions on advertising of some foods, and recommended that the “overall policy objective [of an advertising ban] should be to reduce both the exposure of children to, and the power of, marketing of foods high in saturated fats, trans-fatty acids, free sugars, or salt.” The medical literature has called for restrictions on advertising on the basis of claims that advertising is especially effective among children (Gortmaker et al., 2011; National Academies, 2006; and Cairns et al., 2009). Many countries ban advertising of junk foods to children, with the exception of the US, which to date relies on voluntary restraints. For example, the UK bans advertising of foods that are high in fat, salt or sugar (HFSS) during children’s programming (TV shows for which the primary audience is under 16), and is extending this to digital and other platforms. What impact does a ban on advertising have on consumption, and who will it effect? How are junk foods defined and can firms game the system? These are important questions for the effective design of policy, and efficient and effective regulation is generally preferable to firms and consumers when compared to inefficient and ineffective regulation. Designing effective policy relies on understanding not only whether a policy works, but why it works and who it works on. Both ex post and ex ante analysis have a key role to play here.

4.3.1 The mechanisms by which advertising might affect demand Advertising can affect consumer choice behavior in a number of ways. Bagwell (2007) surveys the large literature on the mechanism through which advertising af-

39 See, for instance, Tyagi (1999); Sudhir (2001); Moorthy (2005); Besanko et al. (2005).

583

584

CHAPTER 10 Marketing and public policy

fects consumer choice, and conveniently distinguishes between the persuasive, characteristic, and informative advertising traditions. It is possible to model the effects of restrictions to advertising on choice while remaining reasonably agnostic about the mechanisms through which advertising works, though see discussion below about important functional form considerations. However, in order to make welfare statements we typically have to take a stance on which of these mechanisms is dominant. For example, the welfare consequences of restricting advertising differ considerable if we take the view that advertising is a characteristic that consumers value compared to if advertising distorts consumers’ decision making. If it is viewed as a product characteristic that consumers intrinsically value, as in Becker and Murphy (1993), then banning it will necessarily make consumers worse off; if it is persuasive (as in Dixit and Norman, 1978) then banning it will necessarily make consumers better off. The early literature on advertising focused on its persuasive nature,40 where the purpose of advertising is to change consumer tastes. The behavioral economics and neuro-economics literatures have focused on the mechanisms by which advertising affects consumer decision making. Gabaix and Laibson (2006) consider models in which firms might try to shroud negative attributes of their products, while McClure et al. (2004) and Bernheim and Rangel (2004, 2005) consider the ways that advertising might affect the mental processes that consumers use when taking decisions (for example, causing a shift from the use of deliberative systems to the affective systems that respond more to emotional cues). Rao and Wang (2017) and Jin and Leslie (2003) highlight firms’ misleading or selective advertising respectively, the impact these can have on consumers, and why the role of the regulator is particularly important in such settings. Rao and Wang (2017) also examine consumer heterogeneity in detail using scanner data (a point relevant to this Handbook) and find that exposure of firms’ deceptive activities primarily affects newcomers rather than existing consumers. Becker and Murphy (1993) and Stigler and Becker (1977) consider advertising much as any other product characteristic as something that consumers may like or dislike, and advertising might act as a complement to other goods or characteristics that enter the utility function. Advertising can also provide information to consumers – about the quality or characteristics of a product (Stigler, 1961 and Nelson, 1995), product price (Milyo and Waldfogel, 1999), or about the existence and availability of products (Goeree, 2008; Ackerberg, 2001, 2003). Firms may also have incentives to limit the informative content of adverts even when consumers are imperfectly informed (Anderson and Renault, 2006 and Spiegler, 2006), or provide false advertising (Rao and Wang, 2017).

4.3.2 Empirically estimating the impact of advertising In order to quantify the effects of restrictions on advertising of particular (unhealthy) food we need to have a well identified model of consumer demand in which adver40 Marshall (1921), Braithwaite (1928), Robinson (1933), Kaldor (1950), and Dixit and Norman (1978).

4 Nutrition policy

tising enters in a suitably flexible way. We also need to understand how firms will respond to the restrictions, for example, do they charge different prices, reformulate products, or change their behavior in other ways? Advertising of a brand might increase purchases of that brand by drawing new customers into the category, or it might largely act to shift purchases from a rival brand. These have different implications for consumption, and the effectiveness of restrictions to advertising will depend crucially on which of these effects dominate. Identifying the causal impact of advertising on demand can be challenging (see, for example, Lewis and Rao, 2015). Standard approaches to estimating demand in differentiated product markets impose that cross-price elasticities are positive, i.e. they do not allow products to be complements (recent exceptions include Thomassen et al. (2017) and work in the AT&T-DirecTV example discussed in Section 3.3.2). In many situations, for example, when we are modeling choice between a small number of branded products (think Coke and Pepsi) this seems like a reasonable assumption. However, sign restrictions on cross-advertising elasticities are not theoretically founded (advertising might tilt the demand curve or change the marginal rate of substitution between product characteristics, see e.g. Johnson and Myatt, 2006). Brand advertising may be predatory, in which case its effect is to steal market share of rival products, or it might be cooperative, so that an increase in the advertising of one product increases demand for other products, and there is a large literature showing empirical support for these effects across many product categories. For example, Rojas and Peterson (2008) find that advertising increases aggregate demand for beer; while other papers show that regulating or banning advertising has led to more concentration (Eckard, 1991, for cigarettes; Sass and Saurman, 1995, for beer; Motta, 2007, surveys numerous other studies) and in the case of partial ban in the cigarette industry, more advertising (Qi, 2013). Shapiro (2018c) finds that television advertising of prescription antidepressants exhibits significant positive spillovers on rivals’ demand. Dhar and Baylis (2011) find that a ban on fast-food advertising in Quebec led to substantial reductions in fast-food consumption. In order to allow for the possibility that advertising could be predatory, cooperative or some combination it is important to flexibly including both own brand and competitor advertising in the consumer’s decision utility. Dubois et al. (2018) provide a discussion of this point and empirical estimates for a specific market; our discussion here borrows heavily from them. Consider a model in which consumers i choose between products j = 1, . . . , J . Choice occasions are indexed t. Consumer i’s exposure to the advertising of product j is denoted aij t . Denote the consumer’s payoff: ailt + α1i pj t + ψ1i xj + ij t (4.10) vij t = λi aij t + α2i aij t pj t + ρi l=j

where pj t is product price, xj are other observed product characteristics, and ij t is an i.i.d. shock drawn from a type I extreme value distribution. The terms in square brackets captures the impact of advertising on the payoff function and incorporates enough flexibility to allow for the possibility that advertis-

585

586

CHAPTER 10 Marketing and public policy

ing is predatory (stealing market share from competitors) or cooperative (increasing market share of competitors); that advertising leads to market expansion or contraction; and that advertising may tilt the demand curve or change the marginal rate of substitution between product characteristics. Own advertising enters directly in levels; the coefficient λi captures the extent to which differential time series exposure to own advertising affects the valuation or weight the consumer places on the unobserved brand effect. Own advertising also potentially interacts with price, the coefficient α2i allows the marginal effect of price on the payoff function to shift with own advertising (as in Erdem et al., 2008). The coefficient ρi captures the extent to which time variation in competitor advertising affects the valuation or weight the consumer places on the unobserved brand effect. Denote the payoff to the outside option: v¯i0t = vi0t + i0t . The probability that consumer i buys product j at time t is:

exp vij t sij (ait , pt ) = .

1 + exp(vi0t ) + Jj =1 exp vij t

(4.11)

What is the effects of advertising on consumer level demands? The marginal impact of a change in advertising of one product on the individual level choice probabilities is given by: ∂sij t = sij t λi + α2i pj t − ρi (1 − si0t ) ∂aij t ∂sij t = sij t [ρi si0t ] for j = (0, j ) ∂aij t ∂si0t = −si0t [ρi (1 − si0t )] . ∂aij t If advertising of one product did not directly enter the payoff of other products (imposing ρi = 0), then we require λi + α2i pj t > 0 for advertising to have a positive own effect (so ∂sij t /∂aib(j )t > 0). In this case advertising would necessarily be predatory, stealing market share from competitor products (∂sij t /∂aib(j )t < 0) and it would necessarily lead to market expansion (∂si0t /∂aib(j )t < 0). By including competitor advertising in the payoff function we allow for the possibility that, regardless of the sign of own demand advertising effects, advertising may be predatory or cooperative and it may lead to market expansion or contraction (i.e. we do not constrain the signs of ∂sij t /∂aib(j )t or ∂si0t /∂aib(j )t ). Allowing advertising to interact with the consumer’s responsiveness to price and the nutrient characteristic allows advertising to have a direct effect on consumer level

4 Nutrition policy

price elasticities. The consumer level price elasticities are, for any j = (0): ∂ ln sij t = α1i + α2i aij t (1 − sij t )pj t ∂ ln pj t

∂ ln sij t = − α1i + α2i aij t sij t pj t for j = j. ∂ ln pj t This allows advertising to impact consumer level price elasticities in a flexible way, through its impact on choice probabilities and through its impact on the marginal effect on the payoff function of price, captured by α2i . Incorporating advertising into a demand model also raises additional identification challenges. Firms target advertising to specific consumers at specific times, and these might be correlated with the unobserved demand shocks. The increased access to big data on consumer behavior and on advertising is useful, but it is not a silver bullet, we still need to be careful about where the exogenous variation in exposure to advertising is coming from that allows us to identify the ceteris paribus effects of advertising. One approach to identification uses field experiments, as in Anderson and Simester (2013) and Sahni (2016). Sahni (2016) for example shows that adverts placed on a restaurant search website were found to increase sales of non-advertised restaurants. These provide very useful small scale studies with clear identification, however we do need to worry about external validity. Another approach is to use an IV strategy, as in Hartmann and Klapper (2017), Sinkinson and Starc (2017), and Dubois et al. (2018). In addition there are dynamic considerations. Advertising affects consumers choices today, but also potentially in the future, introducing the possibility that when firms choose their advertising strategies they play a dynamic game. Solving such a game entails specifying precisely the details of firms’ dynamic problem and of the equilibrium concept that prevails in the market (Dube et al., 2005). This considerably complicates studying supply side responses. A number of papers have shown that if advertising is primarily rivalrous, then firms are likely to advertise beyond the joint profit-maximizing level, largely canceling each other out in their efforts.41 In contrast, if advertising is primarily expansionary, then there is less advertising than the joint profit-maximizing level, which raises the possibility that restrictions on advertising are welfare improving and that (some) firms might gain as well. One effect that advertising has on consumer demand is to lower consumers’ sensitivity to price. Banning advertising therefore leads to tougher price competition. Dubois et al. (2018) show that banning advertising on potato chips would lead to the (quantity weighted) average price in the market falling by 4%. This is important for understanding the impacts of the policy. While standard economic measures of wel-

41 von der Fehr and Stevik (1998), Bloch and Manceau (1999), Netter (1982), and Buxton et al. (1984).

587

588

CHAPTER 10 Marketing and public policy

fare improve, consumer surplus rises because prices are lower, and profits do not fall by enough to compensate that, in fact quantity is likely to increase slightly. Understanding how manufacturers and retailers are likely to respond to restrictions to advertising is key and a difficult task. One policy design issue in restricting advertising of junk foods is the definition of what is a junk food. The importance of this is illustrated in McDonald’s adverts in Christmas 2017. The UK restrictions do not allow McDonalds to advertise their standard products during Children’s TV programs because of their nutritional characteristics. To circumvent this McDonald’s released an advert that featured carrots sticks (for the reindeer) see https:// www.youtube.com/watch?v=XZ2PenyNRjE. Other ways in which firms can circumvent the regulations is by advertising similarly branded products or through product reformulation.

4.4 Labeling There is a lot of legislation around front of package labeling and there have been many reforms in the US, UK, and other countries (see reviews in Cowburn and Stockley, 2005 and Heike and Taylor, 2013). In general the aim of nutritional labels is to provide information to consumers and to make the nutritional content more salient. Other policies, such as taxation, can also increase the salience of characteristics (see for example, Chetty et al., 2009). When considering whether labeling is effective we need to consider what the policy is aiming to achieve. The impact of labeling on choices and demand will depend on what information consumers had in the absence of labeling, and how their expectations were formed prior to labeling. If consumers have a systematic misunderstanding of the composition of specific food products, or if increased salience is the key effect of labeling, then introducing labels should shift demand. However, if consumers have on average correct perceptions about the ingredients in food, and there is just noise around that, then we would expect labeling to have different effects, for example, compressing the estimated distribution of preferences for the characteristic that is labeled. However, if the problem is that consumers have cognitive constraints in their ability to process information, or if they lack self-control, then labeling might be less effective. One interesting question is whether labels have direct effects in terms of inducing guilt or other utility-relevant emotions. Most of the literature focuses on the impact of introducing labels on quantity demanded, but these quantity effects are of course not welfare effects. Several papers make this point with respect to graphic cigarette warning labels.42 Marketers might be able to contribute substantively to this literature by providing insights here on the psychology of how labels work and how to measure the potential welfare implications.

42 For example, Cutler et al. (2015), Jin et al. (2015), Glaeser (2006), and Allcott and Kessler (2017)

make related points about measuring the welfare effects of information provision and other nudges.

5 Concluding comments

Empirical work has found that the introduction of more prominent nutrition labeling does lead to changes in consumption choices, for example, Bollinger et al. (2011) find that the introduction of mandatory calorie posting in Starbucks led to an average 6% reduction in calories purchased; Kiesel and Villas-Boas (2013) show that labeling effects consumer choice in an experimental setting in supermarkets, but that the impact depends on the exact design of the label and the information conveyed. However, other literature has shown that in some circumstances requirements for nutrition labeling can have unintended and perverse effects, for example, reducing the nutritional quality of products (Moorman et al., 2012). In response to this Ratchford (2012) emphasizes the need for theoretical and empirical research on the supply-side reactions to various policy measures and disclosure laws; Pappalardo (2012) highlights the role of marketing in policy design.

4.5 Looking forward To date the influence of quantitative IO and quantitative marketing research on nutrition policy has been limited. There is large potential for future research to provide valuable input into the formulation of efficient and effective policy. This is in the interest of researchers, industry, and policy makers.

5 Concluding comments In this chapter we discuss two areas of policy – competition policy and nutrition policy – and how quantitative marketing can impact and has impacted these areas. Despite having similar potential for impact from quantitative marketing, the actual impact has been quite different. In competition policy economists have been using the models and methods of quant marketing and IO to influence actual policy. The effect, at least up to now, has been smaller in nutrition policy. However, in both areas there is scope for great impact from recent research. Our focus has been on how researchers can impact policy, but there is also an effect in the other direction. Marketers need to pay more attention to the policy debates for at least two reasons. First, a proper discussion of firm and consumer interaction, which is at the heart of marketing, cannot be complete without accounting for the regulatory, legal, and policy environment. Second, having a greater involvement with policy will help marketers shape their research in relevant and interesting directions. In order to have an impact the discussion with policy makers has to be a two way exchange.

References Ackerberg, D., 2001. Empirically distinguishing informative and prestige effects of advertising. The Rand Journal of Economics 32 (2), 316–333.

589

590

CHAPTER 10 Marketing and public policy

Ackerberg, D., 2003. Advertising, learning, and consumer choice in experience good markets: an empirical examination. International Economic Review 44 (3), 1007–1040. Allcott, H., Diamond, R., Dubé, J-P., 2018a. The Geography of Poverty and Nutrition: Food Deserts and Food Choices Across the United States. Stanford Business School Working Paper No. 3631. Allcott, H., Kessler, J.B., 2017. The Welfare Effects of Nudges: A Case Study of Energy Use Social Comparisons. NBER Working Paper 21671. Allcott, H., Knittel, C., Taubinsky, D., 2015. Tagging and targeting of energy efficiency subsidies. The American Economic Review 105 (5), 187–191. Allcott, H., Lockwood, B., Taubinsky, D., 2018b. Regressive sin taxes, with an application to the optimal soda tax. The Quarterly Journal of Economics. https://doi.org/10.1093/qje/qjz017. Allcott, H., Lockwood, B.B., Taubinsky, D., 2018c. Should We Tax Soda? An Overview of Theory and Evidence. NBER Working Paper 225842. Forthcoming in Journal of Economic Perspectives. Allcott, H., Mullainathan, S., Taubinsky, D., 2014. Energy policy with externalities and internalities. Journal of Public Economics 112, 72–88. Anderson, S., 2001. Tax incidence in differentiated product oligopoly. Journal of Public Economics 81 (2), 173–192. Anderson, S., Renault, R., 2006. Advertising content. The American Economic Review 96 (1), 93–113. Anderson, E.T., Simester, D., 2013. Advertising in a competitive market: the role of product standards, customer learning, and switching costs. Journal of Marketing Research 50 (4), 489–504. Angrist, J., Pischke, J.S., 2010. The credibility revolution in empirical economics: how better research design is taking the con out of econometrics. The Journal of Economic Perspectives 24 (2), 3–30. Atherly, A., Dowd, B.E., Feldman, R., 2004. The effect of benefits, premiums, and health risk on health plan choice in the Medicare program. Health Services Research 39 (4), 847–864. Azais-Braesco, Sluik, Maillot, Kok, Moreno, 2017. A review of total and added sugar intakes and dietary sources in Europe. Nutrition Journal 16. https://doi.org/10.1186/s12937-016-0225-2. Bagwell, K., 2007. The economic analysis of advertising. In: Armstrong, M., Porter, R. (Eds.), Handbook of Industrial Organization, vol. 3. Elsevier, pp. 1701–1844. Bain, J., 1956. Barriers to New Competition: Their Character and Consequences in Manufacturing Industries. Harvard University Press, Cambridge, MA. Bayot, D., Hatzitaskos, K., Howells, B., Nevo, A., 2018. The Aetna-Humana proposed merger. In: Kwoka Jr., John E., White, Lawrence J. (Eds.), The Antitrust Revolution, 7th edition. Oxford University Press. Becker, G.S., Murphy, K.M., 1993. A simple theory of advertising as a good or ban. The Quarterly Journal of Economics 108 (4), 941–964. Belot, M., James, J., 2011. Healthy school meals and educational outcomes. Journal of Health Economics 30 (3), 489–504. Bernheim, D., Rangel, A., 2004. Addiction and cue-triggered decision processes. The American Economic Review 94 (5), 1558–1590. Bernheim, D., Rangel, A., 2005. Behavioral Public Economics: Welfare and Policy Analysis with NonStandard Decision-Makers. National Bureau of Economic Research Working Paper 11518. Bernheim, D., Rangel, A., 2009. Beyond revealed preference: choice theoretic foundations for behavioral welfare economics. The Quarterly Journal of Economics 124 (1), 51–104. Berry, S., 1994. Estimating discrete-choice models of product differentiation. The Rand Journal of Economics 25 (2), 242–262. Berry, S., Haile, P., 2010. Nonparametric Identification of Multinomial Choice Demand Models with Heterogenous Consumers. Cowles Foundation DP 1718. Berry, S., Levinsohn, J., Pakes, A., 1995. Automobile prices in market equilibrium. Econometrica 63 (4), 841–890. Berry, S., Levinsohn, J., Pakes, A., 2004. Differentiated products demand systems from a combination of micro and macro data: the new car market. Journal of Political Economy 112 (1), 68–105. Berry, S., Pakes, A., 1993. Some applications and limitations of recent advances in empirical industrial organization: merger analysis. The American Economic Review 83 (2), 247–252. Besanko, D., Gupta, S., Dube, J.P., 2005. Own-brand and cross-brand retail pass-through. Marketing Science 24 (1).

References

Bloch, F., Manceau, D., 1999. Persuasive advertising in Hotelling’s model of product differentiation. International Journal of Industrial Organization 17, 557–574. Bollinger, B., 2015. Green technology adoption: an empirical study of the Southern California garment cleaning industry. Quantitative Marketing and Economics 13 (4), 319–358. Bollinger, B., Gillingham, K., 2012. Peer effects in the diffusion of solar photovoltaic panels. Marketing Science 31 (6), 900–912. Bollinger, B., Karmarkar, U., 2015. BYOB: how bringing your own shopping bags leads to pampering yourself and the environment. Journal of Marketing 79 (4), 1–15. Bollinger, B., Leslie, P., Sorensen, A., 2011. Calorie posting in chain restaurants. American Economic Journal: Economic Policy 3, 91–128. Bollinger, B., Sexton, S., 2017. Local excise taxes, sticky prices, and spillovers: evidence from Berkeley’s soda tax. https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3087966. Braithwaite, D., 1928. The economic effects of advertisement. The Economic Journal 38, 16–37. Brennan, A., et al., 2014. Potential benefits of minimum unit pricing for alcohol versus a ban on below cost selling in England 2014: modeling study. British Medical Journal 349. Briggs, A., et al., 2013. Overall and income specific effect on prevalence of overweight and obesity of 20% sugar sweetened drink tax in UK: econometric and comparative risk assessment modelling study. British Medical Journal 347. Brownell, K., et al., 2009. The public health and economic benefits of taxing sugar-sweetened beverages. The New England Journal of Medicine 361 (16), 1599–1605. Bulow, J., Pfleider, P., 1983. A note on the effect of cost changes on prices. Journal of Political Economy 91 (1), 182–185. Buxton, A., et al., 1984. Concentration and advertising in consumer and producer markets. Journal of Industrial Economics, 451–464. Cairns, G., Angus, K., Hastings, G., 2009. The Extent, Nature and Effects of Food Promotion to Children: A Review of the Evidence to December 2008. Technical Paper for World Health Organization. Capps, C., Dranove, D., Satterthwaite, M., 2003. Competition and market power in option demand markets. The Rand Journal of Economics 34 (3), 737–763. CDC, 2016. Cut back on sugary drinks. Centers for Disease Control and Prevention. http://www.cdc.gov/ nutrition/data-statistics/sugar-sweetened-beverages-intake.html. Chetty, R., 2009. Sufficient Statistics for Welfare Analysis: A Bridge Between Structural and ReducedForm Methods. NBER WP 14399. Chetty, R., Looney, A., Kroft, K., 2009. Salience and taxation: theory and evidence. The American Economic Review 99 (4), 1145–1177. Chipty, T., 1995. Horizontal integration for bargaining power: evidence from the cable television industry. Journal of Economics & Management Strategy 4 (2), 375–397. Chipty, T., Synder, C., 1999. The role of firm size in bilateral bargaining: a study of the cable television industry. Review of Economics and Statistics 81 (2), 326–340. Colchero, M., Popkin, B., Rivera, J., Ng, S., 2015. Beverage purchases from stores in Mexico under the excise tax on sugar sweetened beverages: observational study. British Medical Journal 352, h6704. Cowburn, G., Stockley, L., 2005. Consumer understanding and use of nutrition labelling: a systematic review. Public Health and Nutrition 8 (1), 21–28. Crawford, G., Yurukoglu, A., 2012. The welfare effects of bundling in multichannel television markets. The American Economic Review 102 (2), 643–685. Curto, V., Einav, L., Levin, J., Bhattacharya, J., 2015. Can Health Insurance Competition Work? Evidence from Medicare Advantage. National Bureau of Economic Research Working Paper 20818. Cutler, D., Jessup, A., Kenkel, D., Starr, M., 2015. Valuing regulations affecting addictive or habitual goods. Journal of Benefit-Cost Analysis 6 (2), 247–280. Cutler, D., Lleras-Muney, A., 2010. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2824018/. Dafny, L., Dranove, D., 2008. Do report cards tell consumers anything they don’t already know? The case of Medicare HMOs. The Rand Journal of Economics 39 (3), 790–821. Deaton, Angus S., 2009. Instruments of Development: Randomization in the Tropics, and the Search for the Elusive Keys to Economic Development. National Bureau of Economic Research Working Paper 14690.

591

592

CHAPTER 10 Marketing and public policy

Della Vigna, S., 2009. Psychology and economics: evidence from the field. Journal of Economic Literature 47 (2), 315–372. Dhar, T., Baylis, K., 2011. Fast-food consumption and the ban on advertising targeting children: the Quebec experience. Journal of Marketing Research 48, 799–813. Diamond, P., 1973. Consumption externalities and corrective imperfect pricing. The Bell Journal of Economics 4 (2), 526–538. Dixit, A., Norman, V., 1978. Advertising and welfare. Bell Journal of Economics 9 (1), 1–17. Dobbs, R., et al., 2014. Overcoming Obesity: An Initial Economic Analysis. McKinsey Global Institute Report. Draganska, M., Klapper, D., Villas-Boas, S., 2010. A larger slice or a larger pie? An empirical investigation of bargaining power in the distribution channel. Marketing Science 29 (1), 57–74. Draganska, M., Mazzeo, M., Seim, K., 2009. Beyond plain vanilla: modeling joint product assortment and pricing decisions. Quantitative Marketing and Economics 7 (2), 105–146. Dube, J-P., Hitsch, G., Manchanda, P., 2005. An empirical model of advertising dynamics. Quantitative Marketing and Economics 3, 107–144. Dubois, P., Griffith, R., Nevo, A., 2014. Do prices and attributes explain international differences in food purchases? The American Economic Review 104 (3), 832–867. Dubois, P., Griffith, R., O’Connell, M., 2017. How Well Targeted Are Soda Taxes? CEPR Discussion Paper 12484. Dubois, P., Griffith, R., O’Connell, M., 2018. The effects of banning advertising in junk food markets. The Review of Economic Studies 1 (1), 396–436. Dunn, A., 2010. The value of coverage in the Medicare advantage insurance market. Journal of Health Economics 29 (6), 839–855. Eckard, E., 1991. Competition and the cigarette TV advertising ban. Economic Inquiry 29 (1). https:// doi.org/10.1111/j.1465-7295.1991.tb01258.x. Erdem, T., Keane, M., Sun, B., 2008. The impact of advertising on consumer price sensitivity in experience goods markets. Quantitative Marketing and Economics 6 (2), 139–176. Falbe, J., Thompson, H., Becker, C., Rojas, N., McCulloch, C., Madsen, K., 2016. Impact of the Berkeley excise tax on sugar-sweetened beverage consumption. American Journal of Public Health 105 (11), 2194–2201. Farrell, J., Balan, D.J., Brand, K., Wendling, B.W., 2011. Economics at the FTC: hospital mergers, authorized generic drugs, and consumer credit markets. Review of Industrial Organization 39 (4), 271–296. Farrell, J., Shapiro, C., 2010. Antitrust evaluation of horizontal mergers: an economic alternative to market definition. The B.E. Journal of Theoretical Economics, Policies and Perspectives 10 (1). Gabaix, X., Laibson, D., 2006. Shrouded attributes, consumer myopia, and information suppression in competitive markets. The Quarterly Journal of Economics 121 (2), 505–540. Gilbert, D., Gill, M., Wilson, T., 2002. The future is now: temporal correction in affective forecasting. Organizational Behavior and Human Decision Processes 88 (1), 430–444. Ginsburg, D., Wright, J., 2015. Philadelphia National Bank: bad economics, bad law, good riddance. Antitrust Law Journal 80 (2). Glaeser, E., 2006. Paternalism and psychology. Regulation 29 (2), 32–38. Glewwe, P., Jacoby, H., King, E., 2001. Early childhood nutrition and academic achievement: a longitudinal analysis. Journal of Public Economics 81 (3), 345–368. Goeree, M., 2008. Limited information and advertising in the U.S. personal computer industry. Econometrica 76 (5), 1017–1074. Goldberg, P., 1995. Product differentiation and oligopoly in international markets: the case of the U.S. automobile industry. Econometrica 63 (4), 891–951. Gortmaker, S., et al., 2011. Changing the future of obesity: science, policy and action. The Lancet 378, 838–847. Gowrisankaran, G., Nevo, A., Town, R., 2015. Mergers when prices are negotiated: evidence from the hospital industry. The American Economic Review 105 (1), 172–203. Gowrisankaran, G., Rysman, M., 2012. Dynamics of consumer demand for new durable goods. Journal of Political Economy 120 (6), 1173–1219.

References

Grennan, M., 2013. Price discrimination and bargaining: empirical evidence from medical devices. The American Economic Review 103 (1), 145–177. Griffith, R., Nesheim, L., O’Connell, M., 2017a. Income effects and the welfare consequences of tax in differentiated product oligopoly. Quantitative Economics. https://doi.org/10.3982/QE583. Griffith, R., O’Connell, M., 2009. The use of scanner data for research into nutrition. Fiscal Studies 30 (3–4), 339–365. Griffith, R., O’Connell, M., Smith, K., 2017b. Corrective taxation and internalities from food consumption. CESifo Economic Studies. https://doi.org/10.1093/cesifo/ifx018. Griffith, R., O’Connell, M., Smith, K., 2017c. Design of Optimal Corrective Taxes in the Alcohol Market. CEPR Discussion Paper 11820. Gruber, J., Koszegi, B., 2004. Tax incidence when individuals are time-inconsistent: the case of cigarette excise taxes. Journal of Public Economics 88, 1959–1987. Gruber, J., Saez, E., 2002. The elasticity of taxable income: evidence and implications. Journal of Public Economics 84 (1), 1–32. Guglielmo, A., 2016. Competition and Costs in Medicare Advantage. University of Wisconsin. http:// www.eief.it/files/2015/12/guglielmo-rev.pdf. Haavio, M., Kotakorpi, K., 2011. The political economy of sin taxes. European Economic Review 55 (4), 575–594. Hall, A.E., 2011. Measuring the return on government spending on the Medicare managed care program. The B.E. Journal of Economic Analysis & Policy 11 (2). Handbury, J., Rahkovsky, I., Schnell, M., 2015. Is the Focus on Food Deserts Fruitless? Retail Access and Food Purchases Across the Socioeconomic Spectrum. NBER Working Paper 21126. Handel, B., Schwartzstein, J., 2018. Frictions or mental gaps: what’s behind the information we (don’t) use and when do we care? The Journal of Economic Perspectives 32 (1), 155–178. Harding, M., Lovenheim, M., 2017. The effect of prices on nutrition: comparing the impact of productand nutrient-specific taxes. Journal of Health Economics, 53–71. Hartmann, W., 2006. Intertemporal effects of consumption and their implications for demand elasticity estimates. Quantitative Marketing and Economics 4 (4), 325–349. Hartmann, W., Klapper, D., 2017. Super bowl ads. Marketing Science. https://doi.org/10.1287/mksc.2017. 1055. Hausman, J., Leonard, G., Zona, D., 1993. Competitive analysis with differentiated products. Annales d’Économie et de Statistique 34. Heckman, James J., Urzua, Sergio, 2010. Comparing IV with structural models: what simple IV can and cannot identify. Journal of Econometrics 156 (1), 27–37. Heike, S., Taylor, C., 2013. A critical review of the literature on nutritional labeling. The Journal of Consumer Affairs 46 (1), 120–156. Hendel, I., Nevo, A., 2006. Measuring the implications of sales and consumer inventory behavior. Econometrica 74 (6), 1637–1673. Ho, K., Lee, R., 2017. Insurer competition in health care markets. Econometrica 85 (2), 379–417. Horn, H., Wolinsky, A., 1988. Bilateral monopolies and incentives for merger. The Rand Journal of Economics 19 (3), 408–419. Jacobsen, M.R., Knittel, C.R., Sallee, J.M., van Benthem, A.A., 2016. Sufficient Statistics for Imperfect Externality-Correcting Policies. NBER Working Paper 22063. Jeziorski, P., 2014. Estimation of cost synergies from mergers: application to U.S. radio. The Rand Journal of Economics 45 (4), 816–846. Jia, H., Lubetkin, E., 2010. Obesity-related quality-adjusted life years lost in the US from 1993 to 2008. American Journal of Preventative Medicine 39 (3). https://doi.org/10.1016/j.amepre.2010.03.026. Jin, L., Kenkel, D., Liu, F., Wang, H., 2015. Retrospective and Prospective Benefit-Cost Analysis of US Anti-Smoking Policies. NBER Working Paper 20998. Jin, G., Leslie, P., 2003. The effect of information on product quality: evidence from restaurant hygiene grade cards. The Quarterly Journal of Economics 118 (2), 409–451. Johnson, J., Myatt, D., 2006. On the simple economics of advertising, marketing, and product design. The American Economic Review 96, 756–784.

593

594

CHAPTER 10 Marketing and public policy

Kaldor, N., 1950. The economic aspects of advertising. The Review of Economic Studies 18, 1–27. Kaplow, L., 2010. Why (ever) define markets? Harvard Law Review 124 (2). Katz, M., Shapiro, C., 2003. Critical loss: let’s tell the whole story. Antitrust (Spring), 49–56. Khan, R., Misra, K., Singh, V., 2015. Will a fat tax work? Marketing Science 35 (1), 10–26. Kiesel, K., Villas-Boas, S.B., 2013. Can information costs affect consumer choice? Nutritional labels in a supermarket experiment. International Journal of Industrial Organization 31, 153–163. Laibson, D., 1997. Golden eggs and hyperbolic discounting. The Quarterly Journal of Economics 112 (2), 443–477. Lewis, R.A., Rao, J.M., 2015. The unfavorable economics of measuring the returns to advertising. The Quarterly Journal of Economics 140, 1941–1973. Lewis, M., Wang, Yanwen, Berg, Carla J., 2014. Tobacco control environment in the United States and individual consumer characteristics in relation to continued smoking: differential responses among menthol smokers? Preventive Medicine 65, 47–61. Lockwood, B.B., Taubinsky, D., 2017. Regressive Sin Taxes. National Bureau of Economic Research Working Paper 23085. Loewenstein, G., O’Donoghue, T., Rabin, M., 2003. Projection bias in predicting future utility. The Quarterly Journal of Economics 118 (4), 1209–1248. Marshall, A., 1921. Industry and Trade: A Study of Industrial Technique and Business Organization and of Their Influences on the Conditions of Various Classes and Nations. MacMillan and Co., London. McClure, S., Laibson, D., Loewenstein, G., Cohen, J., 2004. Separate neural systems value immediate and delayed monetary rewards. Science 306 (5695), 503–507. Milkman, K., Rogers, T., Bazerman, M., 2010. I’ll have the ice cream soon and the vegetables later: a study of online grocery purchases and order lead time. Marketing Letters 21 (1), 17–35. Milyo, J., Waldfogel, J., 1999. The effect of price advertising on prices: evidence in the wake of 44 Liquormart. The American Economic Review 89 (5), 1081–1096. Mirrlees, James A., 1971. An exploration in the theory of optimum income taxation. The Review of Economic Studies 38 (2), 175–208. Moodie, R., et al., 2013. Profits and pandemics: prevention of harmful effects of tobacco, alcohol, and ultra-processed food and drink industries. The Lancet 381 (9867), 670–679. Moorman, C., Ferraro, R., Huber, J., 2012. Unintended nutrition consequences: firm responses to the nutrition labeling and education act. Marketing Science 31 (5). Moorthy, Sridhar, 2005. A general theory of pass-through in channels with category management and retail competition. Marketing Science 24 (1), 110–122. Motta, M., 2007. Advertising Bans. UPF Technical Report 205. Mullainathan, S., Schwartzstein, J., Congdon, W., 2012. A reduced-form approach to behavioral public finance. Annual Review of Economics 4 (1), 511–540. Nagle, T., Hogan, J., Zale, J., 2011. The Strategy and Tactics of Pricing. Pearson, NY. Nash Jr., J.F., 1950. The bargaining problem. Econometrica 18 (2), 155–162. National Academies, 2006. Committee on Food Marketing and the Diets of Children and Youth Report on Food Marketing to Children and Youth: Threat or Opportunity? Technical Report. National Academies Press, Washington, DC. Nelson, P., 1995. Information and consumer behavior. Journal of Political Economy 78, 311–329. Netter, J., 1982. Excessive advertising: an empirical analysis. Journal of Industrial Economics, 361–373. Nevo, A., 2000. Mergers with differentiated products: the case of the ready-to-eat cereal industry. The Rand Journal of Economics 31 (3), 395–421. Nevo, A., 2001. Measuring market power in the ready-to-eat cereal industry. Econometrica 69 (2), 307–342. Nevo, A., 2011. Empirical models of consumer behavior. Annual Review of Economics 3, 51–75. Nevo, A., 2014. Mergers that increase bargaining leverage. Available at https://www.justice.gov/atr/file/ 517781/download. Nevo, A., Whinston, M., 2010. Taking the dogma out of econometrics: structural modeling and credible inference. The Journal of Economic Perspectives (Spring), 69–82.

References

Nielsen, S., Popkin, B., 2004. Changes in beverage intake between 1977 and 2001. American Journal of Preventive Medicine 27 (3). O’Donoghue, T., Rabin, M., 2003. Studying optimal paternalism, illustrated by a model of sin taxes. The American Economic Review 93 (2), 186–191. O’Donoghue, T., Rabin, M., 2006. Optimal sin taxes. Journal of Public Economics 90, 1825–1849. Pappalardo, J.K., 2012. Are unintended effects of marketing regulations unexpected. Marketing Science 31 (5), 739–744. Patel, K., 2012. Obesity and the heterogeneous response to taxes. In: Essays in IO (Chapter 1). 2012 NWU PhD dissertation. Petrin, A., 2002. Quantifying the benefits of new products: the case of the Minivan. Journal of Political Economy 110 (4), 705–729. Pigou, A., 1920. The Economics of Welfare. McMillan and Co., London. Purshouse, R., et al., 2010. Estimated effect of alcohol pricing policies on health and health economic outcomes in England: an epidemiological model. The Lancet 375 (9723), 1355–1364. Qi, S., 2013. The impact of advertising regulation on industry: the cigarette advetising ban of 1971. The Rand Journal of Economics 44 (2), 215–248. Rao, A., Wang, E., 2017. Demand for ‘healthy’ products: false claims in advertising. Journal of Marketing Research 54 (6), 968–989. Ratchford, B.T., 2012. Suggestions for further research on firm responses to NLEA and other disclosure laws. Marketing Science 31 (5), 744–747. Read, D., Van Leeuwen, B., 1998. Predicting hunger: the effects of appetite and delay on choice. Organizational Behavior and Human Decision Processes 76 (2), 189–205. Robinson, J., 1933. Economics of Imperfect Competition. MacMillan and Co., London. Rogerson, W., 2018. Economic theories of harm raised by the proposed Comcast/TWC transaction. In: Kwoka Jr., John E., White, Lawrence J. (Eds.), The Antitrust Revolution, 7th edition. Oxford University Press. Rojas, C., Peterson, E., 2008. Demand for differentiated products: price and advertising evidence from the U.S. beer market. International Journal of Industrial Organization 26, 288–307. Rubinstein, A., 1982. Perfect equilibrium in a bargaining model. Econometrica 50 (1), 97–109. Sadoff, S., Samek, A., Sprenger, C., 2015. Dynamic inconsistency in food choice: experimental evidence from a food desert. https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2572821. Saez, E., 2001. Using elasticities to derive optimal income tax rates. The Review of Economic Studies 68 (1), 205–229. Sahni, N.S., 2016. Advertising spillovers: evidence from online field experiments and implications for returns on advertising. Journal of Marketing Research 53 (4), 459–478. Salinger, M., 1990. The concentration-margins relationship reconsidered. Brookings Papers on Economic Activity 21 (1990), 287–335. Sanders, B., 2016. Bernie Sanders op-ed: a soda tax would hurt Philly’s low-income families. Philadelphia, 24 April 2016. https://www.phillymag.com/citified/2016/04/24/bernie-sanders-soda-tax-op-ed/. Sass, T., Saurman, D., 1995. Advertising restrictions and concentration: the case of malt beverages. Review of Economics and Statistics 77, 66–81. Schmalensee, R., 1989. Inter-industry studies of structure and performance. In: Handbook of Industrial Organization, vol. 2, pp. 951–1009. Seade, J., 1985. Profitable Cost Increases and the Shifting of Taxation: Equilibrium Responses of Markets in Oligopoly. Warwick Economic Research Papers No. 260. Shapiro, B., 2018a. Informational shocks, off-label prescribing and the effects of physician detailing. Management Science 64 (12), 5461–5959. Shapiro, B., 2018b. Advertising in health insurance markets. Marketing Science. https://doi.org/10.1287/ mksc.2018.1086. Shapiro, B., 2018c. Positive spillovers and free riding in advertising of prescription pharmaceuticals: the case of antidepressants. Journal of Political Economy 126 (1), 381–437. Sinkinson, M., Starc, A., 2017. Ask Your Doctor? Direct-to-Consumer Advertising of Pharmaceuticals. National Bureau of Economic Research WP 21045.

595

596

CHAPTER 10 Marketing and public policy

Spiegler, R., 2006. Competition over agents with bounded rationality. Theoretical Economics I, 207–231. Stewart, David, 2015. Why marketers should study public policy. Journal of Public Policy and Marketing 34 (1), 1–3. Stigler, G., 1961. The economics of information. Journal of Political Economy 69, 213–225. Stigler, G., Becker, G., 1977. De gustibus non est disputandum. The American Economic Review 67, 76–90. Strotz, R., 1955. Myopia and inconsistency in dynamic utility maximization. The Review of Economic Studies 23 (3). Sudhir, K., 2001. Structural analysis of manufacturer pricing in the presence of a strategic retailer. Marketing Science 20 (3). Thomassen, O., Smith, h., Seiler, S., Schiraldi, P., 2017. Multi-category competition and market power: a model of supermarket pricing. The American Economic Review 107 (8), 2308–2351. Town, R., Liu, S., 2003. The welfare impact of Medicare HMOs. The Rand Journal of Economics, 719–736. Tuchman, A., 2017. Advertising and demand for addictive goods: the effects of e-cigarette advertising. https://papers.ssrn.com/abstract=3182730. Tyagi, R.K., 1999. A characterization of retailer response to manufacturer trade deals. Journal of Marketing Research 36. Villas Boas, S., 2007. Vertical relationships between manufacturers and retailers: inference with limited data. The Review of Economic Studies 74 (2), 625–652. von der Fehr, N-H., Stevik, K., 1998. Persuasive advertising and product differentiation. Southern Economic Journal 65 (1), 113–126. Wang, E., 2015. The impact of soda taxes on consumer welfare: implications of storability and taste heterogeneity. RAND 46 (2), 409–441. Wang, Yanwen, Lewis, Michael, Singh, Vishal, 2016. The unintended consequences of counter-marketing strategies: how particular anti-smoking measures may shift consumers to more dangerous cigarettes. Marketing Science 35 (1), 55–72. Wei, Yanhao, Yildirim, Pinar, Van den Bulte, Christophe, Dellarocas, Chrysanthos, 2016. Credit scoring with social network data. Marketing Science 35 (2), 234–258. Weinberg, Matthew, Ashenfelter, Orley, Hosken, Daniel, 2013. The price effects of a large merger of manufacturers: a case study of Maytag/Whirlpool. American Economic Journal: Economic Policy 5. Werden, Gregory, Froeb, Luke, 1994. The effects of mergers in differentiated products industries: logit demand and merger policy. Journal of Law, Economics, and Organization 10, 407–426. Weyl, Fabinger, 2013. Pass-through as an economic tool: principles of incidence under imperfect competition. Journal of Political Economy 121 (3), 528–583. WHO, 1990. Diet, nutrition and the prevention of chronic disease. http://apps.who.int/iris/bitstream/ handle/10665/42665/WHO_TRS_916.pdf;jsessionid=BCD67FE6CB27BA80013ACE6BD0843451? sequence=1. WHO, 2010. Set of recommendations on the marketing of foods and non-alcoholic. https://www.who.int/ dietphysicalactivity/publications/recsmarketing/en/. WHO, 2015. Sugars intake for adults and children. https://www.who.int/nutrition/publications/guidelines/ sugars_intake/en/.

Index A Accommodation industry, 263 Accumulated sales, 470 Acquisition costs, 510 Adjustment costs, 502, 542 Administrative costs, 521 Advertised brands, 337 price promotions, 541 pricing policies, 278 pricing restrictions, 278 promotions, 530 Advertising agencies, 446 allocations, 144 allowances, 520 auctions, 276 automobile, 481 ban, 583 benefit, 336 brand, 292, 316, 318, 319, 327, 585 budget, 501 campaign, 110, 317, 424 competition, 345 consumers, 337, 482 content, 246, 424 cost function, 329 decisions, 111, 324, 330, 445 dynamics, 424 effectiveness, 208, 273–275, 325, 326, 424 effects, 120, 275, 317, 319, 425 efforts, 445 equilibrium, 336 expenditures, 144, 321, 327, 344 expenses, 326, 443, 501 experiments, 115, 117 exposure, 71, 109, 146, 305, 422, 423, 425 goodwill, 319 goodwill stock, 318 in marketing, 422 industry, 238 inventory, 401 investments, 275, 324, 328, 348 levels, 455 message, 60, 94, 315, 445 models, 260 offline, 265 online, 273, 276, 277 pages, 514, 520

platforms, 116, 275 potential endogeneity, 246 product, 585 research, 120, 422 resources, 87 responsiveness, 106 restrictions, 578 role, 328, 338, 374 sellers, 276 signals quality, 336 space, 273 spending, 275, 324, 329, 330, 335–337, 502 strategies, 253, 305, 325, 335, 587 support, 542 targeted, 109, 445 unhealthy food, 555 Agent preferences, 485 Aided awareness, 300 Airline prices, 261 Airline pricing, 510, 536 Altruistic preferences, 58, 60 Annual promotion process, 525 Asset price, 326 Attribute preferences, 168, 170 Auto insurance market, 219 Automobile advertising, 481 brand preferences, 309 prices, 406 selling, 488 Awareness, 204, 246, 247, 299–304, 306, 347, 360, 373, 374, 423 brand, 294, 299–301, 303, 319, 424 consumers, 246, 306, 347, 374 shifter, 247

B Banner advertising, 273, 274 Banning advertising, 585, 587 Baseline brand utility, 312 Baseline sales, 462, 469 Beer brands, 35, 295 price promotion, 530 purchases, 54 Behavioral economics, 584 Binary purchase decision, 60 Bracket pricing, 521, 522

597

598

Index

Brand advertising, 292, 316, 318, 319, 324, 327, 585 assets role, 348 associations, 300, 338 attitudes, 295, 297 attributes, 89, 156, 330 awareness, 294, 299–301, 303, 319, 424 build, 326 building, 325 capital, 293, 310, 311, 326, 348, 423 capital stock, 308, 310, 314 choice alternative, 41 coefficient, 185 command, 261 competitor, 299 consideration, 300, 301 constitute experience, 314 credibility, 303, 332 differentiation, 528 effect, 299, 306 experiences, 308, 318 extensions, 338, 339, 341, 343 health, 539 history role, 320 identifiers, 295 image, 294, 327 incremental profits, 348 influence demand, 322 informational role, 337 intercepts, 33, 183, 185, 188, 198, 226, 297, 298, 313, 328 investments, 348 knowledge, 294, 314, 318 labeling, 294 loyalty, 293, 307, 308, 311–314, 317, 344, 345, 347, 415, 416, 528 management, 325 managers, 540, 542 name, 7, 172, 180, 183, 246, 266, 293, 296, 298, 300, 308, 311, 314, 322, 327, 328, 335, 338, 344, 348, 544 parent associations, 341 performance, 348 positioning, 528 power, 529 preference parameters, 106 preferences, 2, 119, 171, 292–296, 298, 301, 302, 306, 307, 309–311, 321, 347, 504, 530 premium, 322, 330 price, 322, 328 promoted, 529, 530, 534 purchase, 299, 316 history, 305 intentions, 316

probabilities, 532 quality, 315–317, 329, 338 reputation, 304, 327, 339, 340 role, 4, 5, 293, 294, 331, 341 sales, 528, 531 specific intercepts, 102 stretching, 338 switching, 7, 25, 50, 501, 507, 509, 529, 530, 532 switching costs, 312, 313 taste parameters, 48, 49 tastes, 313, 314 umbrella, 293, 339, 341–343, 533 uncertainty, 317 valuation, 306, 321 value, 293, 296, 298, 299, 306, 311, 319, 321, 322, 324–326, 335, 348, 458 variants, 4 Branded antihistamine, 299 consumer goods, 328, 344 CPG, 319 firm, 420 goods, 299, 302, 303, 317, 327, 328, 330, 331, 347 keyword, 266 products, 306, 322, 585, 588 retailers, 335 Branding, 4, 5, 52, 292, 293, 296, 299, 303, 311, 314, 324, 327, 328, 331, 334, 335, 337, 346–348 affected choices, 303 decisions, 296 endogenous, 302 goodwill, 308 investments, 328 literature, 300, 327 market, 278 stems, 300 strategies, 299 theories, 328 umbrella, 327, 338–342 Broadcast advertising, 443 Broader commodity groups, 34 Broader market, 216 Budgeted trade spend, 514 Business markets, 502 Buying brand, 298 probabilities, 234, 236 process, 251 product, 224, 387

C Camcorder market, 234

Index

Capital markets, 397 Car purchases, 235 Car purchases online, 280 Category captain, 541 consumers goods, 4 pairs, 532 Central role, 2, 5 Cereal brands, 422 Cereal category, 323 CES specification, 14, 15, 38 Chain brand, 279 Chemical product, 212 Chinese consumers, 309 Cigarette industry, 585 Clearance sales, 376 Close connections marketers, 558 Clothing retailer, 270 Cognitive costs, 394 Commodity group, 4, 5, 9, 20–23, 28, 29, 31, 35–40, 42, 58 Commodity market, 328 Compensation costs, 443, 477 Compensation incentives, 471 Competitor advertising, 585, 586 brand, 299 prices, 522 Condition pricing, 415 Conditional, 133 brand choices, 219, 220, 223, 234, 246, 304, 568 consideration, 246 correlation, 139 demand, 10, 12, 18 distributions, 77, 78, 99, 101 expectation, 125, 127 heteroskedasticity, 133 independence, 96, 101, 130, 139 indirect utility, 29, 51, 312 likelihood ratio statistic, 137 mean, 96, 110, 126, 130 posterior, 106 prediction, 141 utility, 581 Conduct specification, 54 Confounds regional preferences, 120 Conjoint specifications, 175, 177, 178 Console prices, 412, 413 Constant fixed costs, 324 Consumer packaged goods (CPG), 6, 499 brands, 6, 319, 320 categories, 6, 7, 44, 308, 322, 339, 346 manufacturers, 501, 523 markets, 330, 499 product, 7, 31

product categories, 6, 25 retailers marketing budgets, 499 Consumers actions, 197, 396 adoption, 5, 367, 370, 410, 426 advertising, 482, 516 associations, 318 attention, 170 attributes, 71 awareness, 246, 347, 374 base, 381 behavior, 2, 4, 6, 62, 106, 197, 247, 249, 251, 261, 281, 299, 376, 380, 385, 386, 392, 407, 415, 418, 426, 427, 500, 512, 513, 554, 555, 576, 587 beliefs, 332, 373–375, 423 benefit, 269 brand, 310, 318, 345 awareness, 306 choices, 307, 313 equity, 293, 318, 324, 327 experiences, 315 loyalty, 307 preferences, 293, 294, 313 value, 299 buzz, 543 characteristics, 198, 220, 224 choice, 24–26, 30, 33, 43, 52, 60, 61, 89, 199, 247, 293, 304, 337, 373, 533, 560, 571, 575, 578, 589 data, 71 decision, 44, 382 demand, 8, 35, 36, 61, 141, 166, 293, 302, 307, 323, 332, 347, 395, 408, 418, 423, 553, 558, 569, 573, 578, 583, 584, 587 discounts, 46, 316 diversion, 555 durable goods, 142 expectations, 363, 376–378, 380–383, 389, 398, 400, 405, 412, 414, 420, 522 face, 7, 31, 52, 314, 414, 415 facing, 447 goods, 5, 8, 25, 35, 52, 53, 300, 301, 311, 321, 334, 347, 536, 543 advertising, 337 category, 4 industry, 319, 320 markets, 348 product categories, 41 habits, 512 hardware utility, 412 heterogeneity, 5, 21, 85, 362, 382, 406, 407, 412, 426, 530, 579, 584 incentives, 576

599

600

Index

income, 58 information, 51, 253 interest, 543 inventory, 418, 535 knowledge, 172 learning, 250, 307, 313, 316, 317, 333, 366, 418, 419 level price elasticities, 587 level utility, 89 loyalty, 344 marginal, 36 marketing programs, 540 markets, 344 models, 328 opinions, 297 packaged goods, 4, 6, 220, 499, 509 packaged goods industry, 93 packaged goods markets, 415 panel, 153 parameters, 89 patience, 376, 377, 395 perceptions, 296 population, 330 preferences, 2, 3, 71, 87, 88, 106, 153, 239, 366, 511 price, 506, 528, 580 price index, 503 pricing policy, 377 product, 142, 170, 194, 345, 348 markets, 366 preferences, 365 storability, 511 promotions, 527, 540 protection, 583 psychologists, 301, 306, 307 psychology, 295, 299, 307, 341 purchase decisions, 248, 385, 392 quality beliefs, 338 quality choice, 17 rationality, 333 responses, 3, 61, 87, 367 sales, 507 screen brands, 206 search cost, 203, 224, 261, 535 literature, 199, 205, 208, 243, 250 role, 5 searching, 210, 227, 232, 249 segments, 213, 504 shopping, 238, 252, 295 shopping behavior, 312 shopping panels, 309 side, 396 socialization, 309 spreading, 336

stockpiling, 510, 511, 534, 535 substitution, 555, 560 surplus, 577, 588 surveys, 296 tastes, 2, 62, 584 theory, 3, 581 time preferences, 396 type, 337, 384, 385, 388, 389, 392, 411 uncertainty, 43 utility, 38, 173, 298, 407, 412 valuations, 421 value, 584 view, 571 welfare, 2, 161–163, 219, 328 Consumers Bulletin, 337 Consumers Reports, 337 Contemporaneous purchase, 7 Contextually targeted advertising, 274 Controlling product, 267 Convince consumers, 399 Cooperative advertising, 544 Cooperative marketing funds, 544 Coordinated marketing, 113 Coordinated marketing activities, 542 Coordinating pricing, 401 Corporate brand, 334 Costs, 260, 269, 270, 272, 281, 306, 313, 329, 330, 363, 378, 392, 394, 411, 413, 426, 451, 456, 458, 469, 506, 520, 521, 565, 573, 574 brand extension, 343 consumers search, 203, 224, 261 distribution, 269 incremental, 329 marginal, 324, 329, 336, 344, 365, 394, 503, 539, 580 retailers, 518 transaction, 405–407, 409, 414 umbrella branding, 342 Counterfactual brand, 324 Counterfactual profits, 324 Countervailing incentives, 344 Coupon promotions, 534 Cumulative sales, 367, 369, 371–373, 473 Curtailing production, 363, 399 Customers behavior, 108, 276 demands, 277 groups, 484 heterogeneity, 93 interaction, 466 level purchase, 93 preferences, 113, 175 purchase, 470 rebates, 479

Index

requirements, 480 segments, 455 service, 479 Cutting prices, 381

D Data envelopment analysis (DEA), 457 Database tracks purchases, 6 Dealer incentives, 488 Declining advertising, 422 future prices, 317 price, 317, 362, 385, 395, 402 price paths, 412 Deferring purchase, 383 Delegated pricing, 480 Dental product, 168 Designated market areas, 120, 319 Detergent purchases, 528 Devise marketing policies, 70 Differentiated product, 4, 164, 231, 242, 244, 247, 303, 344, 561, 562 product markets, 229, 585 retailers, 511 Digital rights management (DRM), 407 Digitization reduced automobile prices, 261 Direct to consumer (DTC), 482 advertising, 443, 482 Dirichlet process (DP), 104 Disappoint consumers, 544 Discounted prices, 505 Discounted product, 47 Discrete brand choice, 7 choice conjoint study, 164 probabilities, 304 purchase decision, 302 specification, 39 product, 21, 22, 25 product choice, 20 Distant markets, 346 Distribution costs, 269 heterogeneity, 159–161 price, 201, 209, 223, 224, 241, 242, 245 Drink industry, 574 Drug manufacturers, 478 Drug sales, 489, 499 Durable consumer goods, 317 goods, 154, 263, 366, 369, 374, 376, 386, 387, 393, 394, 405, 407–409, 417, 420, 543, 544

manufacturers, 543, 544 markets, 35, 406, 499 monopolist, 386 oligopolist, 386 pricing, 380, 420 role, 5 software goods, 426 Dutch supermarkets, 532

E eBay sales, 266 Econometric specification, 76, 77, 195 Economic price premium (EPP), 164 Economics applications, 108 literature, 123, 246, 260–262, 281, 448, 575 marketing, 153 selling, 448 Elicited price beliefs, 239 Emerging markets, 499 Employees incentives, 448 Endogeneity, 51, 72, 110, 111, 115, 123, 124, 126–128, 140–143, 145, 153, 246, 247, 486 advertising, 51 bias, 5, 52, 54, 56, 124, 126–128, 130, 142–144 concerns, 143, 461 issue, 461 marketing variables, 50 potential, 51 price, 51, 52, 55, 56, 127, 142–144, 236, 237, 375, 500 problem, 72, 127, 128, 139, 140, 144 selling effort, 461 stemming, 451 Endogenous brand choices, 315 branding, 302 prices, 51 Endogenous sunk costs (ESC), 327, 330 Energy markets, 252 Entertainment goods, 409 Entertainment industry, 264, 265 Equalization price, 297 Equilibrium advertising, 336 market, 163, 322, 323 market structure, 328 price, 53–55, 165, 174, 210, 299, 324, 344, 381, 392, 404, 407, 413, 414, 562, 563, 572 path, 402 premium, 164 vector, 165 pricing policy, 377, 381 Equity across salespeople, 489

601

602

Index

Evaluation costs, 302, 303 Eventual purchase, 197, 275 Excess product, 519 Exogenous potential market size, 323 Extant branding, 348 Externally acquired brands, 348 Extreme value (EV), 218

Goodwill, 318, 319, 338, 423, 424, 453, 454 advertising, 319 branding, 308 model, 423 stock, 318, 319 Grocery purchases, 32 Ground coffee category, 30, 345

F

H

Factor prices, 53, 54 Factual profits, 324 Falling prices, 380, 412 Feature advertising, 50 Feature promotions, 143 Federal Communications Commission (FCC), 564 Federal Trade Commission (FTC), 554 Flash sales, 402 FMCG products, 417, 425 Focal commodity group, 31 Focal product, 166, 456 Food markets, 578 products, 574, 588 purchases, 575 Fool consumers, 335 Fringe brands, 300 Frozen pizza purchase, 183 Funds prices, 214

Harvesting incentives, 414 Health costs, 574 Hedonic price, 322 Heterogeneity consumers, 5, 85, 362, 382, 406, 407, 412, 426, 530, 584 customers, 93 distribution, 159–161 substantial, 479 unobserved, 312, 313, 316 Heterogeneous consumer, 208, 408 products, 215 Hierarchical model specification, 179 Historic brand experiences, 307 Historic brand market share, 310 Home production, 29 Homogeneous products, 203, 209, 244 Homothetic preferences, 31 specification, 32 translog specification, 19, 23, 40 Hotel prices, 263 Household brand switching, 308 cleaning goods, 417 production, 29, 44, 45 production function, 373 Hurt sales, 489 Hypothetical monopolist test (HMT), 560 Hypothetical purchase, 154

G Gasoline markets, 212 prices, 32, 57 retailer, 426 General accepted accounting principles (GAAP), 321, 348 General data protection regulation (GDPR), 278 Geographic markets, 212, 345, 346, 530 German ground coffee market, 300 Goods branded, 299, 302, 303, 317, 327, 328, 330, 331, 347 consumers, 5, 8, 25, 35, 52, 53, 300, 301, 311, 321, 334, 347, 536, 543 demand, 405 durable, 154, 263, 366, 369, 374, 376, 386, 387, 393, 394, 405, 407–409, 417, 420, 543, 544 markets, 8, 44, 299, 377, 407–409 multiple, 34 profitable provision, 269 sales, 406 stock, 407 storable, 393, 415, 417, 418

I Impending price freeze, 518 Imperfect price information, 50 Imperfect product, 327 Impulse purchases, 523 Incentives, 333, 334, 339, 386, 412, 414, 421, 422, 462, 463, 466, 469, 473, 478–480, 482, 500, 502, 513, 514, 518, 526, 567, 584 consumers, 576 firms, 332 intertemporal, 410 manufacturers, 526 marginal, 470

Index

pricing, 400 promotional, 526 sales, 479, 488 salesperson, 452, 479 umbrella branding, 340 Incentivizing purchases, 509 Incremental costs, 329 profits, 322, 327 sales, 529 Incumbent hotel industry, 263 Incumbent sellers, 263, 264 Indirect utility, 17, 19, 163, 178, 213, 214, 220, 303, 306, 312, 315, 580, 581 conditional, 51, 312 function, 17, 19, 23, 40, 160, 162, 177, 216, 229, 581 Individual rationality (IR), 476 Inducing consumers, 400 Industrial market structure, 4, 327, 329 Industrial organization (IO), 555 Industry, 72, 94, 113, 119, 121, 266–268, 275, 293, 321, 324, 325, 327, 328, 335, 345, 443, 452, 456, 461, 464, 482, 560, 562, 570, 576, 589 advertising, 238 conditions, 326 consumers goods, 319, 320 pharmaceutical, 94, 446, 457, 460 research, 275 state, 325 Inexperienced consumers, 331 Inferior goods, 15, 20, 31 Informational roles, 328 Informative advertising, 314, 328, 584 Informative marketing, 317 Infrequent repeat purchase, 417 Infrequently purchased goods, 180 Insight quantitative marketing, 557 Instrumental variable (IV), 124 Insurance markets, 219, 565 prices, 261 products, 194, 565 Intangible brand capital, 326 Intangible goodwill stock, 324 Intentionally complementary category pair, 533 Intermediary offering products, 248 Internet search costs, 261 Intertemporal incentives, 410 price discrimination, 365, 376, 385, 394, 395, 403, 404, 412, 418 pricing policy, 381 Intraproduct discrimination incentives, 410

Introductory low prices, 335 Inventory costs, 417, 510 Inventory holding costs, 508, 521 Investing incentives, 416 Invoice price, 521 Irreversible marketing costs, 327

J Joint business planning (JBP), 515 Joint pricing, 409 Joint purchase, 37 Justifying brand extensions, 343

K Key marketing questions, 2 Keynesian sticky price, 498, 513

L Labor market, 115 Lagged prices, 143, 393 Lagged sales, 489 Laundry detergent purchases, 50 Learning costs, 414 Likelihood criterion, 81 function, 3, 10, 40, 51, 53, 55, 78, 79, 85, 90, 95–97, 178 marginal, 92 model, 106 price, 53 principle, 78, 80 Linear expenditure system (LES), 14 Local markets, 94, 266, 310, 535 Logit specification, 154 Lowering prices, 386, 408 Lowering search costs, 265 Loyal consumers, 507 Luxury goods, 31

M Machine learning (ML), 62, 107 Manufacturers benefit, 507 brands, 334 CPG, 501, 523 drug, 478 durable goods, 543, 544 face consumers, 507 incentives, 521, 526 promotion budget, 518 representatives, 449 Margarine purchases, 33, 313 Marginal consumers, 36

603

604

Index

costs, 324, 329, 336, 344, 365, 394, 503, 539, 580 incentives, 470 likelihood, 92 prices, 42, 396 utility, 15, 21, 30, 43, 48, 58, 60, 77, 154–156, 178, 180, 581, 582 Markdown pricing, 402 Market basket, 532, 533 behavior, 411 branding, 278 changes, 379 concentration, 320, 327, 330, 561 conditions, 152, 471, 480 definition, 559 demand, 126, 174, 180, 580, 582 environment, 365 equilibrium, 153, 163, 322, 323 expansion, 586 for durable goods, 408 fragmentation, 330 frictions, 219 goods, 8, 44, 299 inefficiency, 219 level, 164, 567, 580 mix variables, 141 niche, 419 outcomes, 153, 194, 253, 421, 426 participants, 365, 392, 411, 576 potential, 367, 371–375, 380, 382, 384, 388, 390, 409 power, 2, 210, 292, 322, 408, 414, 512, 529, 560 prices, 153 realities, 560, 566, 567, 579 research, 377, 573 segment, 361 settings, 401, 406, 409 share, 212, 586 size, 328–330, 372, 410 state, 362, 378, 416 structure, 234, 307, 319, 320, 328, 330, 562 value, 163, 321, 326 variables, 93 volume, 539 Marketers, 267, 272, 277, 447, 545, 555, 556, 558, 588, 589 Marketing academics, 500, 536 actions, 70–72, 112, 123, 146 activities, 87, 113, 218, 219, 260, 302, 455, 458, 473, 488, 501, 542, 543 allocations, 369, 370 analytics, 108, 109, 120, 122

applications, 17, 62, 70, 72, 74, 76, 77, 94, 95, 114, 115, 122, 124, 128, 135, 141–144, 159 budget, 482, 523 campaigns, 5, 58, 61, 522 channel, 270 communications, 146, 271 community, 366, 422 concept, 360 conditions, 2, 4 considerations, 307 contexts, 72, 94, 262, 265, 281 cost function, 327, 330 data, 8, 71, 80, 143, 424 decisions, 2, 71, 94, 299, 338, 344, 455, 478, 482 documents, 194, 562 dollars, 516 economics, 153 effects, 146 efforts, 33, 216 elements, 460, 478 expenditure, 537 experiences, 299 exposures, 427 inputs, 73, 86, 109 instruments, 113, 366, 444, 450, 455, 456, 460 investments, 306, 327, 347, 348, 451, 461, 478, 482, 485, 528 literature, 2, 74, 91, 106, 108, 152, 204, 262, 268, 337, 361, 365, 446, 450, 481, 487, 512, 531 message, 60 models, 446, 502 offline, 281 plan, 543 platforms, 275 policy, 277, 360, 364 practice, 393 practitioners, 109, 377 presentations, 489 problems, 72, 93 profession, 554, 556 promotional, 8 promotions, 52 purposes, 490 questions, 268 research, 87, 108 researchers, 71, 72, 115, 146, 423, 505 resources, 94, 108 response models, 73, 123 risks, 455 role, 361, 589 scholars, 269, 273, 275, 277 science, 293, 427 statistics literature, 261

Index

strategy, 261, 262, 270, 360, 370, 371, 376, 377, 411, 423, 426, 427 studies, 8 targeted, 5, 146 textbooks, 166 tools, 2, 461 variables, 50, 51, 55, 72, 93, 108, 109, 115, 118, 120, 362 Marketplace, 152–154, 161, 167, 170, 172, 187, 188, 360, 366, 368, 397, 404, 415, 423, 425, 542 demand, 161, 185 experiments, 425 offerings, 186, 188 predictions, 167, 184 transaction, 162, 163, 171, 180 Mature consumers, 424 Maximum empirical likelihood (MEL), 210 Maximum likelihood, 12, 13, 47, 49, 59, 78, 85, 96 estimation, 11, 12, 33, 55, 97 estimator, 11, 19, 26, 27, 51, 54, 88 Media advertising, 523, 540 Media market, 120 Medical journal advertising, 481 products, 479 salespeople, 452 Mental accounting, 5, 57, 58 Minimum advertised price (MAP), 543 Minimum price, 201 Misspecification, 454 Model goodwill, 423 likelihood, 106 sales, 451 specifications, 62, 76, 77, 92, 155, 242 Monetary incentives, 557 Monitor retailer, 519, 523 Monitoring costs, 403 Monopolist retailer, 504, 507, 508, 510 Monopoly markets, 513 price, 376, 419 pricing, 376, 404 Mouthwash category, 341 Multinomial logit specification, 27 Multiple goods, 34, 173 markets, 212 products, 35, 506, 526 Multiproduct grocery stores, 38 Multiproduct retailer price competition, 506 Music industry revenue, 268 Myopic consumer, 377, 417 Myopic consumer model, 377

N National brand, 32, 33, 300, 319, 321, 322, 334, 335, 346, 507, 543 brand manufacturers, 544 market structure, 330 Negative price coefficient, 241 Neighboring markets, 458 Neoclassical consumer, 6, 8 Net prices, 61, 522 Net product utility, 205 Newspaper industry, 330 Niche diaper brands sales, 271 market, 419 products, 265 sellers, 265 Norwegian market, 400 Nutrition policy, 554–558, 573, 589 Nutrition preferences, 15

O Obfuscate prices, 262 Obtrusive advertising, 274 Offline advertising, 265 brands, 279 marketing, 281 product assortment, 264 retailers, 270, 272 sales, 271, 275 Older consumers, 45, 167 Oligopoly pricing, 419 Online advertising, 273–277 car purchases, 280 consumer, 57, 230 market, 196, 211, 337 marketplaces, 337 price, 208, 210, 211, 262, 272, 278 pricing, 272 pricing strategies, 272 product, 265 product variety, 271 promotion, 272 purchases, 239, 531 retailers, 270, 272, 510, 538 sales, 271 search costs, 262, 265 sellers, 278 Optimal advertising strategies, 425 brand choice, 49 equilibrium pricing strategy, 393 marketing strategies, 427

605

606

Index

price, 165, 381, 385, 391, 392, 418, 419, 421, 510, 562 path, 371–373, 395, 402, 403 policies, 421 promotion strategy, 537 pricing, 362, 363, 393 policy, 385, 412, 418 strategy, 414 profits, 488 Optimizing advertising, 109 marketing policies, 145 prices, 538 Orange juice brands, 416 Organization economics, 447

P Parsing customer, 260 Penetration pricing, 364, 372, 374, 375, 410, 412, 414 Perceived brand value, 296 Perceived product attributes, 329 Perishable goods, 401, 403, 404, 532 Permanent price changes, 50 Persistent brand preferences, 311 Persistent brand tastes, 314 Persistent heterogeneity, 407 Persuasive advertising, 318 Persuasive role, 463 Pervasive heterogeneity in marketing applications, 72 Pharmaceutical industry, 94, 446, 457, 460 marketing arena, 482 salesperson, 445 Physician Payments Sunshine Act (PPSA), 490 Pizza purchases, 180 Plain advertising, 274 Planned promotions, 523 Planned purchases, 57, 501 PLC pricing problem, 380 Pooling price, 211 Posted price, 44, 378, 402, 404 Postpone category purchases, 530 Postulated roles, 401 Potential customer firms, 457 customers, 71, 108, 278, 442, 444, 445, 452, 480 endogeneity, 5, 51, 141 market, 367, 371–375, 380, 382, 384, 388, 390, 409 Predatory pricing, 559 Preferences brand, 2, 119, 171, 292–296, 298, 301, 302, 306, 307, 309–311, 321, 347, 504, 530

consumers, 2, 3, 71, 87, 88, 106, 153, 239, 511 consumers brand, 293, 294, 313 consumers product, 365 customers, 113, 175 for CPG brands, 311 homothetic, 31 product, 363, 395 sellers, 404 Premium brands, 527 Premium promotions, 533 Price adjustment, 344, 408, 502, 536 agreements, 400 beliefs, 46 brand, 328 changes, 44, 154, 503, 525, 534, 536, 579 coefficient, 97, 99, 142, 157, 161, 177, 184 comparison, 211, 262 competition, 164, 165, 278, 344, 360, 410, 414, 505, 532 consumers, 506, 528, 580 cuts, 376, 402, 412, 414 cutting, 376 cycles, 415 data, 200, 209, 211, 212 differences, 322 discount, 44, 50, 499, 509, 534 discount period, 47 discovery, 276 discrimination, 5, 267, 276, 381, 395, 401–404, 406, 418, 500, 503–506, 508, 509, 511, 513, 536, 539 dispersion, 194–196, 208, 209, 216, 261, 262, 282 distribution, 201, 209, 210, 223, 224, 241, 242, 245 dynamics, 420 effect, 143, 563 elasticity, 5, 24, 25, 50, 81, 106, 115, 145, 160, 252, 566 endogeneity, 51, 52, 55, 56, 127, 142–144, 236, 237, 375, 500 bias, 53 problem, 500 equilibrium, 54 expectations, 5, 44, 48, 251, 376, 383, 512 flexibility, 536 for advertising, 276 functions, 385 high, 400 index, 536 inflation, 503 information, 262 instruments, 143

Index

level, 33, 181, 414 likelihood, 53 monopoly, 376, 419 movements, 536 obfuscation, 262 optimal, 165, 381, 385, 391, 392, 418, 419, 421, 510, 562 optimization exercise, 145 paid, 42, 201, 250, 571 paths, 372–375, 377, 378, 392, 394, 419 patterns, 226, 242 potential endogeneity, 56 premium, 164, 292, 322, 331 promoted, 499, 504, 510, 513, 525, 535, 536, 541, 542, 545 promotion decisions, 500, 542 elasticities, 533 for brands, 507 for durable goods, 543, 544 in marketing, 545 planning, 513 spillovers, 542 strategy, 537 timing, 544 protection, 522 quotes, 210, 222, 227 reductions, 298, 363, 399, 534 relative, 241 retail, 208, 323, 400, 416, 506, 507, 510, 513, 518–520, 525, 526, 534 role, 363, 375, 379 search, 50 sensitive, 420, 426, 501, 511, 527 sensitivity, 218, 220, 262, 302, 395, 507, 527 sensitivity coefficient, 102 setting, 536, 571 shock, 55 signaling, 336 skimming, 317, 376, 377 skimming strategy, 374, 376 stickiness, 545 targeting, 276 temporary, 24, 47, 50 uncertainty, 222 variability, 531 variation, 144, 153, 211, 403, 410, 426, 498, 503, 506, 537 Priced products, 262, 533 Pricing authority, 480 behavior, 384, 390 choices, 380 classes, 561 cues, 480

decisions, 381, 481, 561 delegation, 480 distortions, 480 durable goods, 380 equation, 563 flexibility, 272 incentives, 400 models, 503 monopoly, 376, 404 online, 272 optimal, 362, 363, 393 patterns, 416 policy, 109, 376, 378, 379, 381, 383, 385, 392, 393, 397, 398, 403, 421 problem, 381, 385, 386, 415 promotional, 528, 530 retail, 510 strategies, 276, 376, 381, 411, 415, 543 Product adoption, 410 advertisements, 424 advertising, 585 alternatives, 43 assortment, 264, 265 attributes, 55, 152–154, 158, 170, 171, 176, 178, 180, 185, 187, 253, 295, 296, 568 availability, 208, 401, 419 characteristics, 30, 141, 142, 154, 197, 212, 229, 236, 237, 244, 294, 295, 298, 307, 323, 425, 452, 568, 578, 584–586 complementarity, 38, 62 decisions, 486 design, 3, 17, 407, 544 differentiation, 209, 211, 212, 216, 307, 329, 330, 415, 501 durability, 405 expertise, 480 features, 171, 172, 298 forms, 361 introductions, 2, 425 launches, 367 line, 341, 405, 514, 528 market, 253, 294, 323, 339, 561 market competition, 348 market shares, 168 match, 196, 244 offerings, 152, 273 pairs, 233 penetration, 168 performance, 341 positioning, 3, 153 preferences, 363, 395 price, 207, 296, 506, 584, 585 provider, 165

607

608

Index

quality, 3, 21, 262, 279, 293, 298, 301, 313, 314, 317, 321, 322, 327, 328, 331, 336–338, 341, 343, 347, 381, 389, 390, 413, 419, 421 information, 335 provision, 341 signaling, 341 recommendations, 244 shipments, 526 strategy, 479 types, 538 usage, 166 variety, 271, 282 Productive activity, 453 effort, 447 process, 471 Productivity, 467–469 Profitable price discrimination, 377 Profitable pricing, 405 Profits, 86, 108, 113, 263, 274, 275, 321, 325, 342, 376, 377, 381, 400, 401, 408, 416, 417, 477, 480 fall, 407 from price discrimination, 403 incremental, 322 optimal, 488 optimize, 86 sellers, 407 Promoted brand, 529, 530, 534 price, 504, 510, 513, 525, 535, 536, 541, 542, 545 products, 501 products sales, 501 Promotion budgets, 539 channels, 272 duration, 516 effects, 532 event, 515–517, 524, 537 guidelines, 542 online, 272 period, 519 plan, 539 planning, 523, 538, 539, 541 profitability, 537 sales, 500 strategies, 272 timing, 541 tools, 269 Promotional, 73, 113, 116, 142, 144, 528, 543 activities, 121, 144, 195, 224 approach, 512 cues, 534

decisions, 507 decomposition, 529 discount, 529 discounting, 528 effectiveness, 528 effects, 115, 532 efforts, 143 events, 514, 540 frequency, 531 funds, 514, 534 incentives, 526 marketing, 8 price changes, 499 discounts, 511 elasticity, 529, 530 variation, 530 prices, 504, 513, 522, 536 pricing, 528, 530 pricing strategy, 542 purchases, 533, 535 sales, 527, 529, 530, 532, 535 Public economics, 579 Purchase acceleration, 25, 44, 47, 50, 417, 501, 509, 513, 529, 532 advertising, 522 assortments, 7 behavior, 3, 4, 6, 200, 228, 347, 419, 578 behavior for pricing, 415 brand, 299, 316 complementary products, 506 conditional, 228 consumers, 7, 12, 34, 71, 501 contexts, 273 customers, 470 cycles, 532 data, 71, 237, 270, 533 decision, 38, 45–47, 50, 60, 195, 223, 238, 240, 247, 250, 251, 279, 294, 302, 312, 316, 341, 382, 384, 385, 388, 392 decision problem, 46, 48, 316 environment, 299 experience, 305 frequencies, 30 fund, 213 funnel, 247, 273 history, 71, 94, 415 household, 7 incidence, 24, 41, 44 information, 310 intent, 274 local TV advertising, 120 market, 409 market shares, 239

Index

occasion, 8, 183, 198, 301, 347, 501 outcome, 227 prices, 241, 262, 416 probability, 61, 236, 238, 532 process, 246, 249 product, 32, 168 promoted products, 530 quantity, 48, 163, 188 quantity decisions, 61 share, 320 situation, 204 stages, 247, 251 timing, 5, 44, 50, 121, 510 Purchasers, 390

Q Quant marketing, 589 Quantifying consumer search costs, 196 Quantitative marketers, 554, 556, 564 Quantitative marketing, 2, 70, 194, 260, 272, 308, 553–560, 564, 567, 573, 574, 576, 589 Quoted prices, 222, 223

R Randomized advertising, 33, 274 Ranked sales, 270 Rational consumers, 363, 379, 381, 399, 420 Rational price expectations, 239 Recalled brands, 300 Regression discontinuity (RD), 121 Relative prices, 31 Relevance selling, 449 Rental markets, 409 Rental prices, 409 Replication costs, 267 Reproduction, 281 Reproduction costs, 282 Reproduction marginal costs, 269 Reputable brand, 341 Resale market, 405–407 Resale price, 400 Resale price maintenance (RPM) policy, 400 Research studies branding, 327 Reservation price, 18, 24, 203, 332, 373, 395, 508 Reserve price, 404 Residential plumbing industry, 337 Respondent heterogeneity, 159 Restaurant accountable, 333 Restaurant industry, 330 Retail brand awareness, 540, 541 market power, 512 marketing activities, 518 markets, 499, 512

price, 208, 323, 400, 416, 506, 507, 510, 513, 518–520, 525, 526, 534 changes, 525 competition, 511 discounts, 520 promotions, 505, 506, 508, 510, 511 pricing, 510 product assortment, 544 promotions, 502, 514, 522, 533 regular price, 525 sales, 417, 528 shelf prices, 522 watermelon market, 334 Retailers activities, 537 benefit, 507 branded, 335 competition, 534 costs, 518 differentiated, 511 margins, 507 offline, 270, 272 online, 270, 272, 510, 538 performance, 522 price discounts, 520 price promotions, 507, 527 prices, 511, 522 Risk preferences, 397 Rival brands, 319, 585 Role advertising, 328, 374 brand, 4, 5, 293, 331 consumers search, 5 durable goods, 5 inventive contracts, 453 marketing, 589 price, 363, 375, 379 salespeople, 448 selling, 446, 448, 449, 458, 467, 478, 482 RTE cereal brands, 320

S Sales agents, 445, 446, 477 behavior, 361 brand, 528, 531 call, 460, 479 category, 50, 482 consumers, 507 data, 73, 94, 212, 270, 393, 471, 472 drug, 489 equation, 110, 371 goods, 406 growth, 361, 366 incentives, 479, 488

609

610

Index

incremental, 529 management, 446, 449, 459, 484 model, 451 offline, 271, 275 online, 271 organizations, 447 outcomes, 453, 469 promotion, 500 promotional, 527, 529, 530, 532, 535 rank, 229, 233, 234 representatives, 442 response, 76, 77, 86, 87, 106, 110, 113, 114, 118, 452–454, 460, 484 function, 110 retail, 417, 528 transactions, 446, 453 Salesforce, 446, 447, 455–459, 469, 477, 478, 484–488, 490 activities, 450 allocation problems, 484 compensation, 449, 471, 487 effects, 458 management, 449, 472, 483, 490 decisions, 479 size, 455, 457, 459 structure, 485 Salespeople, 442 medical, 452 role, 448 selling, 453 selling effort, 488 Salesperson behavior, 477, 486 compensation, 469 decision, 472 effort, 466 incentives, 452, 479 level, 470, 473, 477 pharmaceutical, 445 Scan back promotions, 519 Seasonal goods, 401 Seasonal products, 520 Secondary market, 406, 407 Sellers advertising, 276 list, 270 mechanism, 404 niche, 265 online, 278 preferences, 404 profits, 407 sets prices, 421 Selling activities, 443, 444, 450, 455, 479, 514 automobile, 488

budget, 443 decision, 480 economics, 448 effects, 466 effort, 446, 447, 450, 451, 453, 455, 459, 479, 483, 484 endogeneity, 461 role, 467 elasticity, 457 element, 449 encounter, 445 function, 446, 447, 482 gas, 453 horizon, 372, 377 investments, 447, 484 mechanism, 377, 404, 409 operations, 443 perspective, 465 process, 486, 489 role, 446, 448, 449, 458, 482 salespeople, 453 season, 469 system, 490 unique, 447 Shelf prices, 522, 534 retail, 522 Signaling role, 328, 336, 343 Single brand, 7 product, 164, 562 retailer, 508 Social preferences, 57–59, 62 role, 5 Specification discrete choice, 39 error, 28, 53, 55, 141 homothetic, 32 SSNIP unprofitable, 560 Stacked chips brand, 325 Static brand, 325 Static monopoly price, 419 STAX brand, 324 Sticky price model, 502, 503 Stock market, 216, 326 Stockpiling benefit manufacturers, 501 Storable consumer goods, 511 goods, 393, 415, 417, 418 products, 44 Store brand, 334, 529, 544 promotions, 529 sales, 531 Strategic consumers, 500

Index

Substantial financial costs, 277 heterogeneity, 479 price dispersion, 194 Supermarkets, 6, 301, 338, 531, 535, 589 data, 534 Superstar products, 264 Supranormal economic profits, 292 Surviving brands, 320 Survivor brands, 345

T Target consumers, 275 Target sales, 474 Targeted advertising, 109, 445 banner advertising, 273 marketing, 5, 146 marketing activities, 71, 72 marketing strategies, 2, 238 Targeting consumers, 535 customers, 146 Television advertising, 265, 585 Temporary price, 24, 47, 50 changes, 498 discounts, 3, 44 promotions, 143 promotions, 3 sales, 536 Ticket price, 61, 403 Tractable specifications, 29 Trade promotions, 87, 502, 507, 514, 522, 523, 526, 538 budgets, 518, 527 spend, 498, 499, 502, 514, 518, 519, 523, 537, 539, 540, 545 budgets, 514 Trade promotion management (TPM), 523 Trade promotion optimization (TPO), 525 Transaction across products, 327 costs, 405–407, 409, 414 data, 6, 30, 182–185, 187, 188, 212, 238 database, 8 dataset, 180 marketplace, 162, 163, 180 Translog specification, 20 homothetic, 19, 23, 40

Transportation costs, 260, 269–272, 281 Travel costs, 485 Truckload purchases, 521 True impact marketing, 554 True price distribution, 245 TV advertising, 120, 330

U Umbrella brand, 293, 339, 341–343, 533 branding, 327, 338–343 incentives, 340 Unaided awareness, 300 Unconditional brand choice, 25 choice probability, 223 purchase probability, 219, 223 Unfamiliar brands, 316, 317 Uninformative advertising, 335, 337 Uninformed consumers, 419 Universal product code (UPC), 6 Unobserved brand effect, 586 heterogeneity, 312, 313, 316 market characteristics, 142 product attribute, 212 product characteristics, 28, 143 promotional efforts, 143 Unplanned purchases, 57, 501, 575 Upward pricing pressure, 562–564

V Valuation consumers, 381 Variable costs, 330 Verification costs, 261, 272, 273, 278, 281 Vertically differentiated products, 212, 264 Video markets, 564, 570 Virtual prices, 17–19, 23 Volumetric purchases, 173

W Weekly brand competition, 545 Weekly price promotions, 540 Wholesale prices, 143, 212, 416, 518, 520, 539 discounts, 544 Willingness to buy (WTB), 163 Willingness to pay (WTP), 161 Withdrawing products, 576 World Health Organization (WHO), 573

611