Data-driven Retailing: A Non-technical Practitioners' Guide (Management for Professionals) 303112961X, 9783031129612

This book provides retail managers with a practical guide to using data. It covers three topics that are key areas of in

103 12 4MB

English Pages 272 [259] Year 2022

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Data-driven Retailing: A Non-technical Practitioners' Guide (Management for Professionals)
 303112961X, 9783031129612

Table of contents :
Preface
Acknowledgements
Contents
Part I Pricing
1 The Retailer's Pricing Challenge
1.1 The Potential of Data-Driven Pricing
1.2 Limitations of Traditional Economic Theory
1.3 The Shifting Objectives Behind Price
1.3.1 Price Strategy
1.3.1.1 Consistency in Pricing
1.3.1.2 Relative Price Position
1.3.1.3 Minimal Margin Rules
1.3.1.4 Competitive Price Position
1.3.1.5 Psychological Pricing Rules
1.3.1.6 Challenging Pricing Rules
1.3.2 Price Tactics During the Product Life Cycle
1.3.2.1 Product Introduction
1.3.2.2 Pricing During the Product Life Cycle
1.3.2.3 Shifting Objective over Time
1.3.2.4 End-of-Life Pricing
1.4 Escaping the Discount Trap
1.5 The Next Chapters
References
2 Understanding Demand and Elasticity
2.1 Price-Response Curve, Not Demand Curve
2.2 Measures of Price Sensitivity
2.3 A Sensible Model of Demand
2.3.1 Linear Price-Response Model
2.3.2 Constant Elasticity Price-Response Model
2.3.3 Logit Price-Response Model
2.4 Fitting Demand Curves Using Data
2.4.1 Demand and Price Indices
2.4.2 Fitting Price-Response Curves to Historical Sales Observations
2.4.2.1 Grouping Products
2.4.2.2 Scaling Product Sales for Combination
2.5 Making Forecasts
2.6 Evaluating Performance
2.7 Conclusion
References
3 Improving the List Price
3.1 Improving List Pricing
3.2 Market Conditions: Direct Versus Indirect Competition
3.3 Obtaining Competitor Price Information
3.3.1 Web Scraping
3.3.2 Transformation and Matching
3.3.3 Using Competitor Price Information
3.4 Dynamic Pricing
3.4.1 Preconditions for Dynamic Pricing
3.4.1.1 Data Availability and Quality
3.4.1.2 Variability of External Conditions
3.4.1.3 Dynamic Prices Are Socially Acceptable
3.4.1.4 Operational Feasibility
3.4.2 Types of Dynamic Pricing
3.4.2.1 Fixed Rule Dynamic Pricing
3.4.2.2 Variable Rule Dynamic Pricing
3.4.2.3 Dynamic Pricing by Means of a Learning Agent
3.4.3 Dynamic Pricing and Price Wars
3.5 Differential Pricing
3.6 Optimizing Long-Term Value
3.7 Conclusion
References
4 Optimizing Markdowns and Promotions
4.1 The Challenges of the Markdown Decision
4.2 The Traditional Markdown Process
4.3 Where the Markdown Process Fails
4.3.1 Not Making Use of Price Elasticity
4.3.2 Contaminating the Objective
4.3.3 No Anticipation of Changes in Demand Patterns
4.3.4 Time-Consuming and Error-Prone Process
4.3.5 Repeating Past Mistakes
4.4 Blueprint of an Improved Markdown Process
4.4.1 Objective
4.4.2 Portfolio Forecast and Price Selection Engine
4.4.3 Product-Level Forecast Model
4.5 Core Components of an Improved Markdown Process
4.5.1 Defining the Right Objective: Transaction Costs and Residual Value
4.5.1.1 Estimating Transaction Costs
4.5.1.2 Estimating Residual Value
4.5.2 Estimating Rotation Speed
4.5.3 Estimating Elasticity
4.5.4 Updating Elasticity
4.5.5 Satisfying Business Rules and Other Constraints
4.6 Complicating Factors
4.6.1 Operating in Multiple Markets
4.6.2 Demand Erosion
4.6.3 Combined Discount Types
4.6.4 Substitution and Cross-Price Elasticity
4.6.5 Virtual Stockouts and Low Inventory
4.7 Running Markdown Experiments
4.7.1 Single and Fixed Objective
4.7.2 A Good Split of Test and Control Groups
4.7.3 Avoid Contamination of the Control Group
4.7.4 Big Differences
4.7.5 Do Not Continue Testing Indefinitely
4.8 Promotional Discounts
4.8.1 The Purpose of Price Promotions
4.8.2 Estimating Promo Effects
4.8.3 Selecting Products for Promotional Discounts
4.9 Conclusion
4.10 Markdown Terms Glossary
References
Part II Inventory Management
5 Product (Re-)Distribution and Replenishment
5.1 Inventory Management as a Profit Driver
5.2 The Traditional Retailer's Perspective on InventoryManagement
5.3 Data-Driven Inventory Management Framework
5.4 Correcting Demand to Account for Lost Sales
5.4.1 Regular and High Sales Volumes
5.4.1.1 Traditional Time Series Models
5.4.1.2 Analyst in the Loop
5.4.1.3 Causally Related Time Series
5.4.2 Low Sales Volumes
5.5 Demand Forecasting Models
5.5.1 Forecasting Without Observed Sales
5.5.2 With Limited Historical Data
5.5.3 Improved Time Series Forecasting
5.6 Evaluating Forecast Accuracy
5.6.1 Basic Forecast Performance Measures
5.6.1.1 Use of Unseen Data
5.6.1.2 Evaluating a Point Estimate
5.6.1.3 Evaluating a Time Series Forecast
5.7 Optimizing Allocation
5.7.1 Initial Distribution of Inventory
5.7.2 Redistribution of Inventory
5.7.2.1 Identification of Sources and Destinations
5.7.2.2 Solving the Allocation Problem
5.7.2.3 Possible Extensions
5.7.3 Continuous Replenishment
5.8 Inventory Management When Selling on Third-PartyPlatforms
5.9 Conclusion
References
6 Managing Product Returns
6.1 The Challenges Created by Returns
6.2 How to Measure the Impact of Returns
6.3 Investigating Patterns in Return Behavior
6.3.1 Estimating Return Likelihood Based on Product Properties
6.3.2 Estimating Return Likelihood Based on Product Performance
6.3.3 Estimating Return Likelihood Based on Customer Behavior
6.3.4 Estimating Return Likelihood Based on OrderProperties
6.4 Taking Action to Prevent or Reduce Returns
6.4.1 Product-Based Actions
6.4.1.1 Addressing Product-Specific Causes for Returns
6.4.1.2 Adjusting the Product Assortment
6.4.2 Transaction-Based Actions
6.4.3 Customer-Based Actions
6.5 Conclusion
References
Part III Marketing
7 The Case for Algorithmic Marketing
7.1 What Is Algorithmic Marketing?
7.2 Why Algorithmic Marketing Systems Fail to Take Off
7.3 Precision Bombing, Not Carpet Bombing
7.4 Should You Focus on High-Value Customers?
7.5 Do Not Try to Beat Big Marketplaces at Their Own Game
7.6 The Low-Hanging Fruit: Get Started Without the Need for Complex Algorithms
7.7 Measuring and Experimenting
7.8 Conclusion
References
8 Better Customer Segmentation
8.1 The Purpose of Segmentation
8.2 The Problem with Traditional Segmentation
8.2.1 Segments Based on Descriptive Properties
8.2.2 RFM Segmentation
8.3 What Makes a Segment Actionable?
8.4 Customer Value Done Right
8.4.1 The Traditional RFM Approach
8.4.2 CLV-Based Customer Segmentation
8.5 From Lifetime Value to Customer Segments
8.5.1 A Simple Approximation Using Customer Groups
8.5.2 Causal Model for Variable Selection
8.5.3 From Variables to Segments
8.5.3.1 Creating Segments Using Unsupervised Clustering Techniques
8.5.3.2 Creating Segments Using Supervised Prediction Models
8.6 You Have Your Segments, Now What?
8.6.1 Product-Specific Nudges
8.6.2 Measured Incentives
8.6.3 Creating Customer Journeys
8.7 Conclusion
References
9 Anticipate What Customers Will Do
9.1 Propensity Modeling 101
9.2 The Basic Principles of Scoring Models
9.3 Using the Outputs of Scoring Models to Experiment
9.4 Pitfall: Models That Are Too Generic to Perform Badly
9.5 What Can Be Predicted Using Propensity Models?
9.5.1 Will a Customer Buy Something?
9.5.2 The Bigger Picture: Actions During Life Cycle Stages
9.5.3 Will This Customer Act on This Promotion, Action, Event, etc. ?
9.6 Using Nudges to Influence Customers
9.7 What About Recommendation Engines?
9.8 Conclusion
References
10 Anticipate When CustomersWill Do Something
10.1 Getting the Timing Right
10.2 Survival Modeling Basics
10.3 Churn Prediction Using Survival Models
10.4 Find the Rhythm: Predicting Renewal Purchases
10.5 Putting Models to Work
10.6 Conclusion
Part IV Conclusion
11 Conclusion
11.1 Where Is Retail Headed Next?
11.2 Three Big Forces
11.2.1 David and Goliath Will Keep Fighting
11.2.2 Environmental Impact Will Continue to Become More Important
11.2.3 Intelligent Models Will Learn to Cooperate
11.3 Being a Retailer
A Experimenting the Right Way
A.1 The Need for Experiments
A.2 The Basics of a Good Experiment
A.2.1 A Reasonable Path for Cause and Effect
A.2.2 A Good Hypothesis
A.2.3 Defining Success Measures
A.2.4 Actionable Results
A.3 Power: Estimate the Chance of a Successful Experiment
A.4 Selecting an Audience
A.4.1 Avoiding Experiment Contamination
A.4.2 To Stratify or Not to Stratify?
A.5 When Not to Experiment
A.5.1 Volatile Environment
A.5.2 When the Proof Has Already Been Delivered
References

Citation preview

Management for Professionals

Louis-Philippe Kerkhove

Data-driven Retailing A Non-technical Practitioners' Guide

Management for Professionals

The Springer series Management for Professionals comprises high-level business and management books for executives. The authors are experienced business professionals and renowned professors who combine scientific background, best practice, and entrepreneurial vision to provide powerful insights into how to achieve business excellence.

Louis-Philippe Kerkhove

Data-driven Retailing A Non-technical Practitioners’ Guide

Louis-Philippe Kerkhove Roeselare, Belgium

ISSN 2192-8096 ISSN 2192-810X (electronic) Management for Professionals ISBN 978-3-031-12961-2 ISBN 978-3-031-12962-9 (eBook) https://doi.org/10.1007/978-3-031-12962-9 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

To my wife, Julie. Thank you for the never-ending love and support.

To my son, Jérôme. Who provided me with a strict deadline by being born this summer. I look forward to getting to know you.

Preface

The inspiration of this book comes from practice. The past 7 years I have had the good fortune to collaborate on many data-related projects for retailers throughout Europe. The past decade has been a challenging one for many retailers, with increasing competitive pressure, pandemics, and stressed supply chains. These are choppy seas to navigate, but opportunity abounds. And I believe many retailers are up for the challenge. This book contains practical guidance for retail professionals who want to improve data-driven decision-making. The goal of this book is to be usable by practitioners, rather than being a purely academic pursuit of all the technical details. The amount of mathematics, statistics, data science, and software engineering is therefore kept to a minimum. The goal is to provide the reader with sufficient information to decide if a certain application is relevant in the current context or not. At the same time, this book aims to do more than scratch the surface. The goal is not to provide the reader with a list of newly hyped technologies and tools. The chapters revolve around key retail processes and decisions, and what can be achieved within those contexts. Concepts that are important are fleshed out in greater detail. This should provide the reader with sufficient knowledge to start a project, as well as to ask the right questions during a project. In short, this book aims to answer the practical question: “What should I be doing with my data, and what can I expect in return?” A pragmatic premise that I hope will resound with what I know to be a very pragmatic crowd of decision makers. Throughout the book I have used real – albeit obfuscated – datasets. In doing so, I hope to illustrate that the applications covered in this text have real-world significance. Moreover, while I believe that there is value in didactic examples, it is valuable to give an idea what kind of real-world performance can be expected from specific applications. Real data is often messy, but this should never be an excuse to forego data-driven decision-making altogether. The book itself consists of three parts: pricing, inventory management, and marketing. These are the three core domains where I believe the biggest opportunities lie for retailers. The parts can be read in any order, and most chapters can be read separately. Hence, this book does not need to be read cover to cover—though you are of course free to do so.

vii

viii

Preface

Writing is a subjective experience, and this book is no different. This text represents my opinion on where the biggest opportunities are for retailers, and how these can be grasped. But I am as fallible as the next person—if not more so. In spite of this I have chosen to take a stance on various subjects, rather than act the politician and make equivocal statements. Because of this, I would like to extend an open invitation to get in touch with me if you feel that I have made a mistake or omission. This book is not the final word on the topic of data-driven retail. Many of the topics in this book could be fleshed out further, and a lot of ideas remain on the cutting room floor. Projects such as this can never be fully finished. Yet, I hope to return to this project at a later date and add to it new discoveries—and likely correct some old mistakes. I hope you find reading this book as enjoyable as I have found writing it. Roeselare, Belgium June 2022

Louis-Philippe Kerkhove

Acknowledgements

I have many people to thank for the experiences I write about in this book. This includes the talented group of people at Crunch with whom I have shared many joys over the past years. There are still many stones left unturned, and I hope to continue experimenting and building things for many years to come. A second group consists of the many talented people working at retail organizations we have cooperated with during the past years. Much of what is new and interesting in this book would never have come to the surface, were it not for your involvement. Lastly, I also want to thank everyone who has helped me to proofread these materials. Your comments have been extremely valuable and have greatly improved the quality of the final product.

ix

Contents

Part I

Pricing

1

The Retailer’s Pricing Challenge . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 1.1 The Potential of Data-Driven Pricing . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 1.2 Limitations of Traditional Economic Theory . .. . . . . . . . . . . . . . . . . . . . 1.3 The Shifting Objectives Behind Price . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 1.3.1 Price Strategy . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 1.3.2 Price Tactics During the Product Life Cycle . . . . . . . . . . . . . 1.4 Escaping the Discount Trap . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 1.5 The Next Chapters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .

3 3 4 5 6 9 27 28 28

2

Understanding Demand and Elasticity . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 2.1 Price-Response Curve, Not Demand Curve . . . .. . . . . . . . . . . . . . . . . . . . 2.2 Measures of Price Sensitivity . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 2.3 A Sensible Model of Demand . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 2.3.1 Linear Price-Response Model . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 2.3.2 Constant Elasticity Price-Response Model . . . . . . . . . . . . . . . 2.3.3 Logit Price-Response Model . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 2.4 Fitting Demand Curves Using Data . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 2.4.1 Demand and Price Indices . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 2.4.2 Fitting Price-Response Curves to Historical Sales Observations . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 2.5 Making Forecasts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 2.6 Evaluating Performance . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 2.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .

31 31 33 35 36 36 39 41 41

Improving the List Price . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 3.1 Improving List Pricing . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 3.2 Market Conditions: Direct Versus Indirect Competition . . . . . . . . . . . 3.3 Obtaining Competitor Price Information . . . . . . .. . . . . . . . . . . . . . . . . . . . 3.3.1 Web Scraping . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 3.3.2 Transformation and Matching . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 3.3.3 Using Competitor Price Information . .. . . . . . . . . . . . . . . . . . . .

61 61 62 63 63 64 65

3

45 56 58 60 60

xi

xii

4

Contents

3.4

Dynamic Pricing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 3.4.1 Preconditions for Dynamic Pricing . . .. . . . . . . . . . . . . . . . . . . . 3.4.2 Types of Dynamic Pricing . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 3.4.3 Dynamic Pricing and Price Wars . . . . . .. . . . . . . . . . . . . . . . . . . . 3.5 Differential Pricing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 3.6 Optimizing Long-Term Value . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 3.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .

66 67 69 73 74 77 77 77

Optimizing Markdowns and Promotions . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 4.1 The Challenges of the Markdown Decision . . . .. . . . . . . . . . . . . . . . . . . . 4.2 The Traditional Markdown Process . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 4.3 Where the Markdown Process Fails . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 4.3.1 Not Making Use of Price Elasticity . . .. . . . . . . . . . . . . . . . . . . . 4.3.2 Contaminating the Objective . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 4.3.3 No Anticipation of Changes in Demand Patterns . . . . . . . . 4.3.4 Time-Consuming and Error-Prone Process . . . . . . . . . . . . . . . 4.3.5 Repeating Past Mistakes . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 4.4 Blueprint of an Improved Markdown Process . .. . . . . . . . . . . . . . . . . . . . 4.4.1 Objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 4.4.2 Portfolio Forecast and Price Selection Engine . . . . . . . . . . . 4.4.3 Product-Level Forecast Model . . . . . . . .. . . . . . . . . . . . . . . . . . . . 4.5 Core Components of an Improved Markdown Process . . . . . . . . . . . . 4.5.1 Defining the Right Objective: Transaction Costs and Residual Value . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 4.5.2 Estimating Rotation Speed . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 4.5.3 Estimating Elasticity . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 4.5.4 Updating Elasticity . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 4.5.5 Satisfying Business Rules and Other Constraints . . . . . . . . 4.6 Complicating Factors . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 4.6.1 Operating in Multiple Markets . . . . . . . .. . . . . . . . . . . . . . . . . . . . 4.6.2 Demand Erosion . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 4.6.3 Combined Discount Types . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 4.6.4 Substitution and Cross-Price Elasticity . . . . . . . . . . . . . . . . . . . 4.6.5 Virtual Stockouts and Low Inventory .. . . . . . . . . . . . . . . . . . . . 4.7 Running Markdown Experiments . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 4.7.1 Single and Fixed Objective . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 4.7.2 A Good Split of Test and Control Groups . . . . . . . . . . . . . . . . 4.7.3 Avoid Contamination of the Control Group . . . . . . . . . . . . . . 4.7.4 Big Differences . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 4.7.5 Do Not Continue Testing Indefinitely . . . . . . . . . . . . . . . . . . . . . 4.8 Promotional Discounts . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 4.8.1 The Purpose of Price Promotions . . . . .. . . . . . . . . . . . . . . . . . . . 4.8.2 Estimating Promo Effects . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 4.8.3 Selecting Products for Promotional Discounts . . . . . . . . . . .

79 79 80 83 83 84 85 85 85 86 86 87 88 88 88 95 97 98 98 99 99 99 100 103 104 105 105 106 106 107 107 107 108 109 110

Contents

xiii

4.9 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 111 4.10 Markdown Terms Glossary . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 112 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 113 Part II 5

6

Inventory Management

Product (Re-)Distribution and Replenishment . . . . .. . . . . . . . . . . . . . . . . . . . 5.1 Inventory Management as a Profit Driver . . . . . .. . . . . . . . . . . . . . . . . . . . 5.2 The Traditional Retailer’s Perspective on Inventory Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 5.3 Data-Driven Inventory Management Framework . . . . . . . . . . . . . . . . . . 5.4 Correcting Demand to Account for Lost Sales . . . . . . . . . . . . . . . . . . . . . 5.4.1 Regular and High Sales Volumes . . . . .. . . . . . . . . . . . . . . . . . . . 5.4.2 Low Sales Volumes . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 5.5 Demand Forecasting Models . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 5.5.1 Forecasting Without Observed Sales . .. . . . . . . . . . . . . . . . . . . . 5.5.2 With Limited Historical Data . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 5.5.3 Improved Time Series Forecasting . . . .. . . . . . . . . . . . . . . . . . . . 5.6 Evaluating Forecast Accuracy . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 5.6.1 Basic Forecast Performance Measures .. . . . . . . . . . . . . . . . . . . 5.7 Optimizing Allocation . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 5.7.1 Initial Distribution of Inventory . . . . . . .. . . . . . . . . . . . . . . . . . . . 5.7.2 Redistribution of Inventory . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 5.7.3 Continuous Replenishment . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 5.8 Inventory Management When Selling on Third-Party Platforms .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 5.9 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .

117 117

Managing Product Returns . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 6.1 The Challenges Created by Returns . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 6.2 How to Measure the Impact of Returns . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 6.3 Investigating Patterns in Return Behavior . . . . . .. . . . . . . . . . . . . . . . . . . . 6.3.1 Estimating Return Likelihood Based on Product Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 6.3.2 Estimating Return Likelihood Based on Product Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 6.3.3 Estimating Return Likelihood Based on Customer Behavior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 6.3.4 Estimating Return Likelihood Based on Order Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 6.4 Taking Action to Prevent or Reduce Returns . . .. . . . . . . . . . . . . . . . . . . . 6.4.1 Product-Based Actions . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 6.4.2 Transaction-Based Actions . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 6.4.3 Customer-Based Actions . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .

161 161 162 164

118 120 121 122 125 125 126 131 136 138 138 142 144 149 157 158 159 159

165 165 167 169 170 171 173 174

xiv

Contents

6.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 175 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 175 Part III 7

Marketing

The Case for Algorithmic Marketing . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 7.1 What Is Algorithmic Marketing? . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 7.2 Why Algorithmic Marketing Systems Fail to Take Off . . . . . . . . . . . . 7.3 Precision Bombing, Not Carpet Bombing . . . . . .. . . . . . . . . . . . . . . . . . . . 7.4 Should You Focus on High-Value Customers? .. . . . . . . . . . . . . . . . . . . . 7.5 Do Not Try to Beat Big Marketplaces at Their Own Game . . . . . . . 7.6 The Low-Hanging Fruit: Get Started Without the Need for Complex Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 7.7 Measuring and Experimenting . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 7.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .

179 179 180 182 183 189

8

Better Customer Segmentation . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 8.1 The Purpose of Segmentation . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 8.2 The Problem with Traditional Segmentation . . .. . . . . . . . . . . . . . . . . . . . 8.2.1 Segments Based on Descriptive Properties . . . . . . . . . . . . . . . 8.2.2 RFM Segmentation . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 8.3 What Makes a Segment Actionable? . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 8.4 Customer Value Done Right . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 8.4.1 The Traditional RFM Approach . . . . . . .. . . . . . . . . . . . . . . . . . . . 8.4.2 CLV-Based Customer Segmentation . .. . . . . . . . . . . . . . . . . . . . 8.5 From Lifetime Value to Customer Segments . . .. . . . . . . . . . . . . . . . . . . . 8.5.1 A Simple Approximation Using Customer Groups . . . . . . 8.5.2 Causal Model for Variable Selection . .. . . . . . . . . . . . . . . . . . . . 8.5.3 From Variables to Segments . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 8.6 You Have Your Segments, Now What? . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 8.6.1 Product-Specific Nudges . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 8.6.2 Measured Incentives . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 8.6.3 Creating Customer Journeys . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 8.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .

195 195 196 196 197 199 200 202 205 207 207 209 214 218 219 219 219 220 221

9

Anticipate What Customers Will Do . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 9.1 Propensity Modeling 101 . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 9.2 The Basic Principles of Scoring Models . . . . . . .. . . . . . . . . . . . . . . . . . . . 9.3 Using the Outputs of Scoring Models to Experiment . . . . . . . . . . . . . . 9.4 Pitfall: Models That Are Too Generic to Perform Badly . . . . . . . . . . 9.5 What Can Be Predicted Using Propensity Models? . . . . . . . . . . . . . . . . 9.5.1 Will a Customer Buy Something? . . . . .. . . . . . . . . . . . . . . . . . . . 9.5.2 The Bigger Picture: Actions During Life Cycle Stages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .

223 223 223 225 227 228 228

190 191 193 193

228

Contents

xv

9.5.3

Will This Customer Act on This Promotion, Action, Event, etc. ? . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 9.6 Using Nudges to Influence Customers . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 9.7 What About Recommendation Engines? . . . . . . .. . . . . . . . . . . . . . . . . . . . 9.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .

229 229 231 232 232

10 Anticipate When Customers Will Do Something .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 10.1 Getting the Timing Right . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 10.2 Survival Modeling Basics . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 10.3 Churn Prediction Using Survival Models . . . . . .. . . . . . . . . . . . . . . . . . . . 10.4 Find the Rhythm: Predicting Renewal Purchases . . . . . . . . . . . . . . . . . . 10.5 Putting Models to Work . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 10.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .

233 233 233 236 236 237 237

Part IV

Conclusion

11 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 11.1 Where Is Retail Headed Next? . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 11.2 Three Big Forces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 11.2.1 David and Goliath Will Keep Fighting . . . . . . . . . . . . . . . . . . . . 11.2.2 Environmental Impact Will Continue to Become More Important . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 11.2.3 Intelligent Models Will Learn to Cooperate . . . . . . . . . . . . . . 11.3 Being a Retailer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .

241 241 242 242

A

245 245 246 247 247 248 249 249 253 254 255 255 256 256 256

Experimenting the Right Way . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . A.1 The Need for Experiments . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . A.2 The Basics of a Good Experiment . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . A.2.1 A Reasonable Path for Cause and Effect . . . . . . . . . . . . . . . . . A.2.2 A Good Hypothesis . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . A.2.3 Defining Success Measures . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . A.2.4 Actionable Results . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . A.3 Power: Estimate the Chance of a Successful Experiment . . . . . . . . . A.4 Selecting an Audience . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . A.4.1 Avoiding Experiment Contamination .. . . . . . . . . . . . . . . . . . . . A.4.2 To Stratify or Not to Stratify? . . . . . . . . .. . . . . . . . . . . . . . . . . . . . A.5 When Not to Experiment . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . A.5.1 Volatile Environment . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . A.5.2 When the Proof Has Already Been Delivered . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .

242 243 243

Part I Pricing

1

The Retailer’s Pricing Challenge

1.1

The Potential of Data-Driven Pricing

Price is one of the easier variables to adjust in a retail organization, but often no data-driven intelligence is used to set prices. Where changes in logistics or marketing campaigns imply changing complex processes, this is not the case for price. Changing prices can be as simple as adjusting a single field in the ERP system, which then propagates throughout all channels. This lack of attention to price is wasteful, since smart pricing is one of the fastest and most cost-effective ways to improve profitability[1]. A 1% increase in the average price typically increases operating profit anywhere between 5 and 15%. Getting this magnitude of returns from other analytics cases is harder and requires greater upfront investment in new systems. In lieu of this fact, many retailers still rely purely on historical rule-driven decision-making to set prices. Data-driven pricing algorithms provide a more fine-grained pricing landscape while also leaving room for strategic pricing decisions. Strategic pricing rules will always remain important, and these rules should fence of the domain wherein algorithms can set prices. Prices set by algorithms will often show greater variety and change faster than was previously possible using human decision-makers. This allows for a better approximation of the customer’s willingness to pay and taking into account inventory levels. Pricing algorithms are also useful as a challenge to strategic and tactical pricing rules. Many retailers use identical prices across all channels and enforce upward price rigidity.1 Algorithms are often able to quantify what these or other business rules are costing a retailer. If analysis shows that substantial margin improvements can be attained by eliminating or changing a business rule, this can be of great benefit to an organization.

1

The price of a product is not allowed to increase over time.

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 L.-P. Kerkhove, Data-driven Retailing, Management for Professionals, https://doi.org/10.1007/978-3-031-12962-9_1

3

4

1 The Retailer’s Pricing Challenge

Operational limitations also influence the way prices are set. Businesses that use physical price tags attached to products often have severe limitations for the velocity at which prices can be changed. Another important limitation is the degree to which customers can be fenced off from each other, making it possible to charge different prices depending on the customer. This chapter aims to clarify what the different objectives in the pricing process are and how they can change. Without a clear objective, it is impossible to make the right decision. This problem is exacerbated when handing over part of the control to an algorithm—which will be ruthless in maximizing any objective that it is given.

1.2

Limitations of Traditional Economic Theory

Anyone who has taken a class in economics has seen demand functions showing the relation between price and quantity. The fact that the typical explanatory variable of choice is price, and not time, customer properties, marketing efforts, etc., should tell you a lot about the crucial importance of price. The issue with economic theory is that it is too far removed from the question that retailers must answer in practice. Reality is messy, and the optimal price point for a perfectly competitive market with linear demand curves does not really translate well to a typical retailer who has to make decisions on a large collection of products. In spite of these limitations, economic theory provides very useful concepts that can be used to shape decision-making in practice. Notably, different shapes of demand curves and models for price elasticity can be highly valuable when applied correctly in real-life settings. The central concept behind price is value. Setting a price means that you are guessing the value customers attach to a certain product and its associated services. Improving pricing often boils down to being better at estimating the value customers attach to a given product and being able to do so on a more granular level. This improved granularity allows you to ask the right price to the right customer.2 An example of this is the difference between online and offline customers. A customer who is already at your store looking at a product must expend a greater amount of effort to compare prices than someone who is browsing your website online. It might therefore make sense to price more aggressively online than in the physical stores. The opposite might be true for grocery shoppers who are willing to pay a premium if they can have their produce delivered to their doorstep. A variation of this dynamic arises when retailers decide to be active on online marketplaces such as Amazon, Zalando, ASOS, Wehkamp, etc. The consumers who visit marketplaces are different from your brand-loyal customers, and depending on the number of alternatives they find on these websites, they might be less willing to pay for products.

2

In economic literature, this concept of value will often be dubbed the reservation price of the consumer, which represents the highest price they are willing to pay for a product.

1.3 The Shifting Objectives Behind Price

5

One of the main reasons why economic theories have failed to escape the classroom is that few authors suggest pragmatic approaches to using these models in real situations. Much of economic theory assumes that demand functions are somehow known. Data available to retailers in practice is often considerably sparser (= less data points) than the examples presented in the classroom. If a product has only been sold at a single price since its introduction, it is impossible to fit even a simple linear demand function using only that information.3

1.3

The Shifting Objectives Behind Price

The price of a product is always a compromise between the different objectives of a retailer. The price that will maximize the total turnover is unlikely to be the same that maximizes gross margin or sell-through rates. The optimal price may be different for different stores and different channels. Moreover, the optimal price for a single transaction may not be the price that optimizes the complete lifetime value for a customer. Likewise, setting the optimal price for a single product is unlikely to guarantee that the complete product portfolio is priced in a way that makes sense. The complexity of these conflicting objectives often results in a lack of datadriven decision-making during the pricing decision. Getting to a clear definition is hard. Because of this, many retailers do not arrive at a clear definition and revert to rules of thumb. Most frequently, these rules of thumb are inspired by the idea not to rock the boat—keeping policies in line with what has been commonplace historically. Even if things go wrong, this prevents fingers being pointed at decisionmakers, who can prove that their methods are consistent. The aim of the following sections is to help in breaking this status quo and show how clear and objective goals can be defined for different pricing decisions. To this end, three big stages in the life cycle of a product are considered: the introduction, the intermediate phase, and the phase-out of a product. This does not mean that there is a single cookie-cutter solution for all retailers, but for most retailers, getting to clearly defined objectives should be feasible. Pricing algorithms deal with tactical and operational pricing decisions—i.e., deciding prices for specific products and customers at specific points in time. Naturally, it is also important to consider the boundaries of the playing field for these algorithms, which is why Sect. 1.3.1 first goes into more detail on the aspects of pricing strategy. Next, Sect. 1.3.2 focuses on tactical considerations and specific objectives that can be used in the act of setting prices, as well as setting the right parameters for algorithms.

3

This is basic algebra; if you only have a single point, there is no way to estimate the slope of a demand curve.

6

1 The Retailer’s Pricing Challenge

1.3.1

Price Strategy

The price positioning of the company is one of the most important strategic decisions for a retailer. In broad lines, the choice is often between a low-price strategy and a high-price strategy: the former aiming at providing the best possible value for money while the latter aiming to provide premium quality that is hard to imitate. Common management wisdom dictates that the middle ground between these two strategies is a risky place to be [2]. In spite of this, there are players who venture into this no man’s land. Oftentimes, these are the companies that experience severe competitive pressure in their native strategy. This could be a premium brand that sees that other competitors are offering products that offer better quality at similar prices or discounters who do not succeed in getting to the absolutely lowest price. Most companies who do so fail in the mid to long run. The middle ground is rarely viable. A premium strategy does not mean that a product has to be the most expensive in the market. There is still a place for brands such as Mercedes and BMW, in spite of the fact that Bentley and Rolls Royce are more expensive and deliver higher quality. Rather, the aim is to provide the right premium features for a specific customer segment. The customer segment who can afford a e50k car is not the same as the segment happy to put down e250k for a car. Investigating the success of companies following a value or premium strategy shows that—on average—it is easier to succeed when following a premium strategy[3]. While there are often many different ways to create superior value in products and associated services, being the cheapest often implies being the biggest and most efficient player in the market. This implies that the market for a value player has a much more winner-takes-all nature.

The objective of the strategic price decision is to maximize the company value in the mid to long term. This implies that the total profits are maximized over a long horizon. This perspective is often at odds with tactical pricing decisions, which are concerned with short-term gross margin maximization on the level of individual product. This type of low-level maximization cannot be expected to yield the most optimal decisions across the complete product portfolio or for the company in the long run. The objective of the price strategy is to set the right boundaries wherein the tactical pricing algorithms can operate in this “greedy” fashion.

Pricing strategy goes far beyond the basic decision of going for value or premium prices. Most retailers also adhere to other strategic pricing rules. Some of these may be inspired by operational limitations, legal requirements,4 or positioning toward

4

For example, when advertising that there are discounts of up to 50%, there is often a legal obligation to have a minimal amount of products and inventory to support this claim.

1.3 The Shifting Objectives Behind Price

7

customers. The next paragraphs highlight some of the most frequently employed strategic rules in retail organizations.

1.3.1.1 Consistency in Pricing Retailers often use a single company-wide price per product, rather than using different prices in different channels and markets. This can be necessary in situations in which there are high levels of price transparency across channels or situations where there exists a large fraction of the customer base who conduct active price comparisons. In reality, this rule is also often in place because of operational limitations, making it hard to sell products at different prices in different channels. One of the drivers behind this strategy is the increased adoption of omnichannel strategies[4]. This situation where product availability and visibility is maximized makes it much easier for consumers to compare prices. Price comparison websites are facilitating this to an even greater extent. If this rule is motivated by constraints rather than a conscious choice, it is often beneficial to try and lift this constraint. This opens up a substantially greater freedom in price setting, which can substantially inflate the overall gross margin of a retailer. 1.3.1.2 Relative Price Position Often retailers carry a multitude of different products in a specific category. When this is the case, it can be important to guard consistency of pricing within categories. For example, prices of premium products should always be higher than those for more basic alternatives in the assortment, even if the production cost for both products is not significantly different. Car manufacturers, for example, will adhere to a strict hierarchy in their models and the prices that are asked for specific models. Selling a top-line model at a price equal to a lower-segment model is unlikely to be desirable, even if there were to be excess production capacity for the former. Doing this would erode the strategic positioning of these models and is unlikely to be profitable in the long run. 1.3.1.3 Minimal Margin Rules Many countries impose regulations that state that products must never be sold below cost. This legislation is often put in place to prevent predatory pricing[5] strategies. Companies employing such a strategy start off with a phase of predation—during which a dominant firm starts selling products below cost, soaking up all demand until all competitors are defunct. Next, once a monopoly is obtained, prices will be increased beyond the starting level in order to recuperate losses that have been incurred during the predation phase. Oftentimes, there are exceptions to this rule at the end of a sales season for seasonal products, in the context of clearance sales. Retailers often go one step further and require a minimal margin when selling a product. This margin is calculated by also accounting for the variable cost of a sale, as well as a percentage representing a contribution toward general overhead cost. On the face of it, this strategy makes sense, but it is liable to the sunk-cost fallacy when applied to products that have a limited lifespan. Once a certain amount of product has been purchased and cannot be returned, what was originally paid for

8

1 The Retailer’s Pricing Challenge

the product should not factor into the pricing decision. With the notable exception of the erosion of the perceived value of a product. Luxury brands will generally be highly reluctant to drop prices on their products and will at times prefer destroying products, rather than risking their premium brand perception.5

1.3.1.4 Competitive Price Position Few retailers are able to provide a product that has no substitutes. This implies that the price of products will have to be positioned in a range that makes sense when comparing it to the alternatives that a customer considers. This can be a complicated matter and give rise to game-theoretical quandaries. For seasonal products in the fashion industry, prices are often fixed at the point of ordering products, implying that all parties involved make a decision at the same time, without knowledge of what competitors have decided. For other companies, price positioning can lead to price wars in which two companies continually undercut each other. This can often spell the doom of either—or both—of these companies. 1.3.1.5 Psychological Pricing Rules Humans do not think in mathematical equations to calculate the utility they will derive from purchasing a specific product. Price carries an emotional and psychological component as well. Because of this, retailers will often also have rules that dictate the type of prices that are and are not allowed. These rules might relate to anchoring[7] or be related to specific price threshold such as crossing into the three-digit (≥ e100) barrier. This may also apply to the discounts applied, which can be limited to a fixed set of percentage discounts. Likewise, some organizations prefer to show discounts in monetary value (e10 discount), whereas others put more belief in percentages. Some of these decisions may have to be taken into account by automated pricing rules. Often, this will result in a degree of rounding that happens after the optimization of prices. Decision-makers should be aware that being too strict in the adoption of these rules will significantly limit the freedom of a pricing algorithm. This in turn can have a significant effect on overall profitability. Ideally, the validity of these psychological rules is validated for the specific context of this retailer (see Chap. A for guidelines on how to conduct such experiments). Moreover, any rules that are tested should be grounded in research that leads to believe that there is a reasonable chance of applying to the situation of the retailer. A good introductory book on this subject is Decoded by Phil Barden [8], which deals with the most commonly accepted factors that influence the buying decision of a consumer.

5

Fortunately, public opinion is driving change in this context, and France has even passed law prohibiting this wasteful practice [6]. All this enters into overarching trends where consumers attach value to environmental awareness in the brands they purchase.

1.3 The Shifting Objectives Behind Price

9

1.3.1.6 Challenging Pricing Rules Because of their widespread implications, strategic price decisions should not go unchallenged. The goal should be to create a set of rules that is as simple as possible, but no simpler. Defining a set of rules that is too big and too detailed will always result in conflicts between different pricing rules. Situations such as this can cause substantial amounts of damage. Suppose you have a retailer with two rules: (i) products must never be sold in situations where the price does not cover the variable cost of making the sale,6 and (ii) discounts are always permanent and cannot be turned back. What happens if external factors cause the cost of shipping to increase significantly, causing some prices no longer to adhere to the first rule? Even in situations as simple as this, a deadlock is created by formulating two simple rules. One or the other will have to be broken. A way of mitigating this in practice is to define a hierarchy between rules: stating that rule (ii) can only be broken in cases where this is required to comply with rule (i). This is no panacea as the complexity of such a hierarchy increases exponentially as the number of strategic pricing rules starts to increase. The role of a pricing team in an organization should be directed to discussing and challenging these rules, rather than the detailed analysis of the price point of every product. The latter employs human intelligence for the job for which it is worst suited—i.e., solving complex combinatorial puzzles. The end goal should be to create a situation that uses a simple set of rules that contain as little conflict as possible. After this, algorithms can receive free rein.

1.3.2

Price Tactics During the Product Life Cycle

The tactical pricing domain is concerned with setting prices of individual products at different moments in time. These decisions are made within the boundaries that have been set out by the price strategy.7 This is the realm where useful data-driven algorithms can be created and deployed.

The success or failure of a pricing algorithm depends on having an accurate objective. The right objective depends on the circumstances in which pricing decisions are being made and changes over time.

The high-level goal of tactical pricing is to search for the price that maximizes profits generated by the sale of a certain product. The profit of a retailer can be

6

For example, handling, shipping, and possible return costs. Ideally, the implications of these strategic rules are quantified, and the strategic pricing rules are challenged.

7

10

1 The Retailer’s Pricing Challenge

expressed quite simply as the gross margin generated by sales, minus the fixed overhead costs. These overhead costs are the costs required to pay employees, warehouses, stores, energy, etc. Profit = Gross margin − Overhead

(1.1)

Gross margin can be further split up into the price that has been paid for the product, minus the cost of the product, and any costs that need to be made to complete the transaction. This might include the shipping costs to get the product to the customer, as well as a provision for the fraction of products that gets sent back as product returns. Profit = Turnover − Purchase cost − Variable cost − Overhead

(1.2)

This expression is incomplete because a retailer is playing an infinite game[9]. There is no future point in time when a retailer can plan on being “done,” having sold all products and closing up shop. This is reflected in the fact that a retailer will always have inventory. Two situations where identical profits have been generated but with different final inventory levels will still be different. To this end, the value of the final inventory is added to the equation. Note that the value of inventory is positive,8 but must also account for possible costs for shipping inventory back to warehouses, storage costs, costs of capital, etc. In extreme cases where inventory has low intrinsic value, the inventory value may even be negative. Profit = Turnover−Purchase cost−Variable cost−Overhead+Inv value

(1.3)

A more mathematically sound way of expressing this can be obtained by introducing a simple shorthand. This notation assumes that a specific time period has been defined and that the set of products for which decisions have to be made has also been determined. P I C pur C var D M V F

8

List price/initial price Initial amount of inventory at the start of the period considered Purchase cost Variable costs associated with sale Total demand in number of units sold Discounted product value Inventory value at the end of the time period The fixed overhead costs for the complete period

This can be a point of contention. Leftover inventory is at times viewed as a cost because it represents what has been purchased in excess. This is only true when evaluating the initial purchasing decision, not subsequent changes that have been made to the price of a product.

1.3 The Shifting Objectives Behind Price

11

Table 1.1 Differences in controllable dimensions in assortment decisions throughout the life cycle of a product Decisions Quantity List price Markdown/promo

Introduction Yes Yes No

Intermediate Yes (usually only adding) No Yes

Phase-out No No Yes

This results in the expression shown in Eq. 1.4. There is no need to worry if you are less mathematically inclined; the discussion that follows and the subsequent chapters are equally legible if you stick to the more worldly expression of Eq. 1.3. Profit = (P − M − C pur − C var ) · D + (V − C pur ) · (I − D) + F

(1.4)

Depending on the point in time when pricing decisions are made, the elements from this equation that can be influenced will change. At the outset, both the quantity (I ) and the price (P ) will be a decision that can be controlled; at the end of the product life cycle,9 only the promo price can be controlled. Table 1.1 presents a brief overview of the three main phases and the aspects that can or cannot be controlled for a product. The next sections go into more detail about how Eq. 1.4 can be translated into a true objective for pricing decisions during different stages in the product life cycle. This exercise serves also as a framework to determine what type of pricing decision is being made. Once the type of decision and the associated objective are both clear, more detailed tools and algorithms can be applied. The next chapters in this part of the book will go into more detail about the specifics of these tools and algorithms.

1.3.2.1 Product Introduction For most retailers, adding a product to the assortment implies setting a list price for the product. If the retailer is also the manufacturer, this often can go by the name of the MSRP: the manufacturer’s suggested retail price. Especially if a product is sold through a variety of resellers, this price point can be of great importance. Retailers who themselves are primarily resellers of products can be obliged to follow this MSRP exactly. In this case, the pricing decision at the point of a product introduction is not extant. Effectively, the decision changes into one of whether to add a product to the assortment and if so how much inventory to purchase. For other types of retailers, ceiling prices might exist. The MSRP is the highest allowable price to sell a product at. This is often seen in European fuel markets or in online pharmacies in Europe. In this case, there is still a question as to how much below the MSRP the retailer might want to sell the product. 9

The concept of product life cycle is seen within the context of a retailer. The start of the product life cycle is the point at which a product is added to the assortment; the end of the product life cycle is the moment when the product is removed from the assortment.

12

1 The Retailer’s Pricing Challenge

Another facet to consider in this decision is the product lifespan and the ability to purchase additional inventory at later points in time. Fashion retail is an extreme scenario where it is often the case that most if not all of the inventory has to be purchased beforehand, prior to being able to observe the sales of the product. Other types of retailers such as supermarkets will be more concerned with restocking rhythms rather than the size of the initial batch of the product. All this has influence on the degrees of freedom in Eq. 1.4. For a fashion retailer, making the decision will require estimating the strength of the demand for a product—either based on historical data or using intuition, as well as possible residual value. Residual value will depend on the type and style of product as some products can be re-introduced in subsequent years or can easily be sold off to outlets, while others cannot. Regardless of the type of retailer, introducing a new product is done with the intention of positively contributing to total profits. If at a later point in time a product proves not to have been profitable, it is the purchasing decision that has failed. Specifically, the estimation of the demand at a certain price point was incorrectly estimated. It is important to note that underestimating demand can also have a profound negative impact. Missing sales due to lacking inventory is also something that negatively influences earnings. However, this is often much less visible since there is no pile of unsold inventory to testify to the fact that the demand estimate was inaccurate. Figure 1.1 illustrates the decision that has to be made on a product level at the point in time when a new product is introduced. For this example, a simple linear demand curve is used.10 Such a linear curve translates into a concave curve that indicates where the total gross margin is maximized. For this scenario, the gross margin is simply expressed as the turnover minus total purchase cost, minus the variable fulfillment costs per transaction. Comparing this reasoning to Eq. 1.4, there are a number of things that are conspicuously absent. Firstly, the fixed costs are disregarded. Given the fact that these remain a constant overhead and cannot be changed by means of product selection, there is no need to include this in the analysis at this point. Moreover, discounts are also not considered at this point, since the decisionmaker is mainly concerned with selling at full price. Depending on the way of working of the organization, it can be desirable to include planned discounts in order to get the volume of the product to the right level. However, more often than not, either promotions are used to get rid of superfluous products of which too much was ordered, or promotions are decided after the creation of the first purchasing plan. At this point, specific products that are likely to respond well to sales promotions are selected, and their ordering quantity is increased—in anticipation of specific sales actions.

10 Contrary

to economic conventions, the price is shown on the horizontal and not the vertical axis.

1.3 The Shifting Objectives Behind Price

13

10

35 30

8

6

20 15

4

Demand in units (00s)

Gross margin (1000€)

25

10 5 0

2

demand gross margin minimal inventory 60

80

100 Price (€)

120

140

0

Fig. 1.1 The essence of the pricing decision when introducing a new product. The goal is to estimate the demand curve as accurately as possible and then calculate the optimal price point where the gross margin is maximized. The chart also shows a minimal inventory quantity; orders below that level are undesirable since they will not be able to be distributed efficiently. Where this minimum is situated depends on the nature of the store network and products. A possible driver could be that a certain quantity is needed to provide an attractive display. Furthermore, a purchase price of e25 and a variable fulfillment cost of e7.5 are assumed. The demand curve in this case is a linear function: D = 1500 − 10P

Finally, this decision also makes abstraction of the residual value of the product, the reason for this being that the goal of the purchasing decision is to order the exact right amount of inventory to cover demand. Effectively, this implies that the expected residual inventory would be zero, regardless of the quantity ordered (assuming that the price is adjusted accordingly). It is often only after demand has been observed and found to be lower than the available inventory that the residual value of inventory is taken into account. A variation where the final stock should be accounted for during the purchasing decision is retailers who work with a dedicated outlet system. In this case, it may be desirable to estimate the demand in outlet channels and primary channels. At this point in time, the goal should then still be not to have any superfluous inventory, but this is now taken to mean no more inventory than can be sold in the traditional channels in combination with the outlet channels.

14

1 The Retailer’s Pricing Challenge

A simplification in this example is that the purchase price is assumed to be constant. Often, there will be a certain volume discount applied as the number of products ordered starts to increase. This may cause slight shifts in the optimal quantities, but does not intrinsically change the nature of this analysis. At this point, the decisions of price and quantity are intertwined. If a certain quantity is chosen, the optimal price is a given, and vice versa. The optimal decision for this simplified example is a quantity of 800 at a price of e70, resulting in a gross margin result of e30,000. If it had been possible to order smaller batches of product, the optimal quantity would have been lower, and the products would have been offered at a higher price point. In real settings, this situation is further complicated by the fact that decisions have to be made in a situation of scarcity. The total purchasing budget is not infinite, and even if it would be, there is a limited amount of available space to store and display inventory. Moreover, the overall customer demand may be limited, and the reality will be that certain products will be substitutes of each other. This implies that there will often be a purchasing budget for each category of product. Under these conditions, all feasible purchasing options for products should be listed and considered jointly. The example shown in Table 1.2 contains three feasible price/quantity combinations: (50, 1000), (60, 900), and(70, 800). When combining this with the possible options for all other products, the end result is a variation on a traditional problem in combinatorics: the knapsack problem—albeit a twodimensional variant[10]. For most practical problem sizes, it should still be doable to create a model that can find the optimal solution to this problem using integer programming. Fortunately, there is often no need to create such complex systems, and simple heuristics are usually sufficient to find a solution close to the optimal. Assuming a situation where there are both limited funds and limited shelf space available, a Table 1.2 New product introduction price decision, continuing the example shown in Fig. 1.1

Sales price 50 e 60 e 70 e 80 e 90 e 100 e 110 e 120 e 130 e 140 e 150 e

Expected demand 1000 900 800 700 600 500 400 300 200 100 –

Gross margin 17,500 e 24,750 e 30,000 e 33,250 e 34,500 e 33,750 e 31,000 e 26,250 e 19,500 e 10,750 e –e

1.3 The Shifting Objectives Behind Price

15

simple heuristic could be based on the return per unit of inventory.11 For example, a purchase of 1000 products at a sales price of e50 would return an average of e17.5 in gross margin per unit of inventory (GMU I ). This could be used in a simple greedy heuristic: Greedy heuristic for buying decisions Step 1: Remove all options that exceed remaining space Step 2: Remove all options that exceed remaining budget Step 3: If no options remain STOP Step 4: Select the option with the highest GMUI Step 5: Remove all other options for the selected product Step 6: Remove all options that exceed remaining budget Step 7: Update remaining space and budget Step 8: Continue back to step 1

The end result of this simple heuristic is a set price and quantity combinations that should provide a reasonable utilization of the purchasing budget. Naturally, this simple heuristic can provide no guarantees of being optimal. In reality, this decision process will be restricted further since there are other aspects required to build an attractive product collection. A fashion retailer might observe that blue jeans is the product that has the best return relative to occupied shelf space. In spite of this, it would be an unwise decision to buy nothing but denim pants, as this would be unlikely to provide an attractive storefront. Effectively, minimal thresholds will be set for specific product categories. This situation could still be solved with a heuristic that is only slightly more complex than the example that was just presented. The process that was just described is most applicable for retailers that have product collections that evolve rapidly: either because of traditional seasonal variations such as in fashion or simply because the products sold have very short life cycles. Retailers who are offering a fixed assortment will typically be more focused on optimizing replenishment rhythms, as this is what prevents them from having to invest in large inventory buffers.

1.3.2.2 Pricing During the Product Life Cycle Product prices are also subject to change during the product life cycle. The motivations for this can be grouped into two big categories. The first is an adjustment to an incorrect price point, where the current price no longer reflects the optimal 11 This

assumes that all products take up similar space in stores and warehouses.

16

1 The Retailer’s Pricing Challenge

balance between supply and demand from the perspective of the retailer. The second reason is to use price as a marketing instrument, launching promotions to increase traffic to specific sales channels. Adjusting the price to changed circumstances is often defined as dynamic pricing. This is in stark contrast with the world of static pricing, where a single product retains its price for its complete lifespan. Adjustments of the price in this case imply a change of the list price of the product. A simple introduction to the world of dynamic pricing can be found in the book The Extinction of the Price Tag by Sharda [11]. When used as a marketing instrument, the price change will often not be a change of the list price, but rather a promotional discount that is applied to a product. These discounts will often be temporary in nature, as opposed to the discounts that are applied during markdown periods. The motivation of these discounts is also different in that they serve as general advertisement for the retailer’s sales channels as much as they are meant to have an impact on the sales of an individual product. Next, the objectives behind these two types of mid life cycle price changes are explored. Dynamic Pricing Adjusting list prices has never been easier. The continued adoption of online retail as well as digital price tags has made price changes much more accessible. This in turn has sparked increased interest in the underlying methods that can be applied to optimize prices. Even for products where price changes still imply a significant amount of labor, there is still a greater willingness to adjust price. The aim of these price changes is no different from other price decisions: to maximize overall company profits. Airlines and the hospitality industry have long been champions of this practice. A situation where capacity is fixed forces heads to turn toward other variables to manipulate—specifically price. The prices for airline tickets and accommodation take into account a myriad of variables in order to be as close to the willingness to pay of the consumer as possible. This all takes it most extreme form in the shape of dynamic pricing. Here, the base price of a product is continually changing to reflect inventory as well as demand. The most obvious examples can be found on websites like Amazon, where it is not unlikely to see products at exorbitant prices because the algorithms have identified them as almost running out of inventory. The reality is that many retailers also face a situation of relatively constant supply and varying demand.12 Whereas demand can change instantaneously, it is harder to

12 At the time of writing, this is exacerbated by geopolitical events that are negatively impacting global supply chains—specifically the aftermath of the COVID pandemic and the Russian invasion of Ukraine, further reducing the flexibility of demand. Beyond their short-term impact, these shocks have revealed that global supply chains can easily be disrupted—with far-reaching effects to the supply of goods in consumer markets around the world.

1.3 The Shifting Objectives Behind Price

17

10 50

8

30

6

20

4

10

0

2

initial demand est. initial gross margin est. observed demand observed gross margin 60

80

Demand in units (00s)

Gross margin (1000€)

40

100 Price (€)

120

140

0

Fig. 1.2 Implication of a higher than expected demand level on the objective of a retailer. The fine lines represent the initial estimate of the retailer of the demand level and the associated demand levels. The thick lines show an upward shift in the customer demand and the implication on the associated gross margin for different price points

ramp up supply of products quickly. As a result, it is becoming more commonplace to use dynamic pricing strategies. A change in demand implies that demand is either larger than anticipated or lower than anticipated. Both situations pose challenges and have implications for the objective of a retailer. Both these scenarios will now be discussed in turn. Figure 1.2 shows the implications of demand being higher than initially anticipated. For this simplified scenario, the original optimal decision for the retailer would be to purchase an inventory of 600 products and offer these products at a price of e90. This would have resulted in a gross margin equal to e34,500. However, under the new situation, the optimal point shifts. The optimal decision under this level of demand would be to offer 700 units at a price of e105 to result in a total gross margin of e50,750. This is nothing more than a simple re-evaluation of the basic profit equation of the retailer (Eq. 1.4). This kind of re-evaluation is possible for products that are staples in the assortment and are continually replenished. Depending on the type of retailer, this can be a large or small fraction of the product assortment.

18

1 The Retailer’s Pricing Challenge

However, the moment when this information becomes available, it may no longer be possible to order additional units of inventory. Under these conditions, the objective shifts. The question now becomes how to maximize the total profits of the retailer given the fact that the retailer has already committed to 600 units of inventory. This scenario is a mirror image of a typical markdown decision, where there is superfluous inventory and prices are typically adjusted downward.13 The objective of the retailer under these conditions can now be expressed as shown in Eq. 1.5. This is simply the total contribution, taking into account that the customer demand is a function of the asking price and the fact that no more than the available inventory can be brought to market. Objective = f (P ) = [P − C var ] · min[I, D(P )]

(1.5)

For the example as shown in Fig. 1.2, this implies that the price will be increased to e115, at which point the demand is exactly equal to the 600 units of inventory that are available. This is basic economics and it is heavily simplified. The purpose here is purely to illustrate the objective and the degrees of freedom that are still available to a retailer. Specifically, the choice of a retailer is reduced to an adjustment of the price that maximizes the total income generated. The sunk cost made to invest in product can be completely disregarded at this point. A variation of this scenario is also possible where the option to increase inventory exists. Whereas the inventory that has already been purchased is fixed, suppliers may still offer the option to purchase additional inventory. If the demand is higher than expected, this simply equates to the re-evaluation of the objective as shown earlier—resulting in bringing 700 products to market at a price of e105. The situation is different when demand is lower than anticipated and inventory levels have already been committed to. Figure 1.3 shows how the new objective of the retailer does not coincide with the new gross margin calculation. Assuming that the inventory level is fixed, the only variable that is left to control is the price at which inventory is offered. The objective can still be expressed as shown by Eq. 1.5. For the example shown here, the optimal price point will be e65, which results in all products being sold. In situations where transaction costs are significant relative to the value of the product, the optimal price point can be higher than the price that equates to selling all inventory. Similarly, more realistic demand models do not take a linear shape where the demand for a product keeps increasing as the price is lowered. In reality, the demand will flatten out below a certain price point. At this point, it is detrimental to lower prices further as the reduction in price no longer yields a net positive by means of a sufficient increase in demand. This is a simplified scenario, and tactical and strategical considerations can enter into the equation at this point. One aspect may be that a retailer does not want to

13 This

assumes that demand is price elastic; if demand is not elastic, the best option is not to change the price as this would only decrease total revenue.

1.3 The Shifting Objectives Behind Price

19

10 30 8

10

6

0 4

−10 −20 −30

initial demand est. initial gross margin est. observed demand observed gross margin contribution 60

80

Demand in units (00s)

Gross margin (1000€)

20

2

100 Price (€)

120

140

0

Fig. 1.3 The retailer’s pricing problem in case demand is lower than initially anticipated. Under these conditions, the inventory that has been purchased is often greater than what would have been purchased in case the demand had been estimated accurately. The right method of decisionmaking is not to search for the new maximum of the total gross margin, but rather to calculate the marginal contribution. This accounts for a fixed amount of inventory and accounts for the variable transaction costs

erode the reservation price of consumers by pricing too low. This could result in a definitive reduction of the willingness to pay and eat into long-term profits of the retailer. Likewise, there is also the option to keep inventory at hand in anticipation of end-of-season sales or outlet channels. To what extent this is desirable will depend on the volume that these sales channels can handle and the average price point that can be expected from these channels. Likewise, substitution of products is a valid concern at this point. If the product for which demand is lower than expected can serve as a substitute to others, this can be a valid reason to keep the prices higher. This will result in more consumers buying products at a higher price, rather than steering some high-paying customers to heavily discounted alternatives. These arguments should however not be used haphazardly. Ideally, statements such as those above should be backed by data that supports the statement. It is important to be aware of the price point that can be expected from outlet channels, as well as the volume that these channels can generate. Likewise, if substitution is a

20

1 The Retailer’s Pricing Challenge

concern, there must be a reasonable manner of estimating what products are and are not affected. The danger here can often be to do too little in fear of making mistakes, whereas not changing prices can also be a costly mistake. A variation on this dynamic arises when a company is selling a product indefinitely—meaning longer than a single season or a set period of time. Under these conditions, there may also be a reduction in demand that causes current supplies to be excessive. This often implies that this inventory is taking up scarce resources of other products that could make better use of the capital employed, inventory, or shelf space. Under such conditions, it may be desirable to have a temporary decrease in the price to increase the speed at which inventory reaches the new optimal level for continual renewal—i.e., the steady-state process that uses something resembling the economic order quantity model[12]. For such cases, it is important to avoid using accounting valuations to make pricing decisions. It may be that goods placed in inventory are depreciated over time by accounting. This may lead to products being discounted solely because they have been in inventory for longer. However, it does not make sense to sell a product at a heavy discount that would not result in a reasonable margin if the product is to be replaced by a new unit of inventory. Naturally, this only holds for products that have an indefinite shelf life. An example of this would be a furniture store that sells a cabinet at a price below cost because it has been sitting in inventory for a certain period of time. Only then to replace it by an identical product at a cost that is not covered by the price paid for the discounted cabinet. Such situations can often arise if the sell-through of product is a key management KPI. Profits are sacrificed just in order to improve the sellthrough KPI. This highlights the importance of clearly defining what the objectives of the company are and to make sure that the KPIs that are reported provide the right direction. The objective of a retailer wishing to employ dynamic pricing is determined by the ability or inability to change the inventory positions. If the inventory can still be changed, which is most frequently the case for non-seasonal products, the objective is profit maximization. If the inventory position has been locked in, the question is reduced to a maximization of contribution (accounting for transaction costs). Naturally, this can be nuanced depending on peripheral aspects of a retailer’s situation. Dynamic pricing boils down to estimating customer demand as accurately as possible while understanding what aspects can still be controlled at the present moment in time. A key limitation in this context is the inherent noisiness of the observed data. It is essential that retailers are capable of separating the signal from the noise—adjusting future expectations of demand only when this is warranted. Dynamic pricing is no excuse for starting a wild goose chase. Excessive price fluctuations can create uncertainty in the mind of consumers and give rise to the idea that there is a degree of unfairness linked to product pricing.

1.3 The Shifting Objectives Behind Price

21

Promotional Pricing Mid-season promotional price changes can have two different objectives. The first is product-specific: for some products, the demand will inevitably have been overestimated. For such products, it may make sense to start reducing prices by means of promotions before the end of the expected lifespan has been reached. Note that this differs from dynamic pricing in that the adjustment is presented as a promotion rather than an adjustment of the base price. This often elicits a more powerful response from consumers than a mere adjustment of the base price. The second reason is the usefulness of promotions as a tactical instrument to increase traffic to specific sales channels. Advertising a promotion for a specific product is expected to elicit a response that is broader than the demand for only the single product. If this is the objective of a promotion, it may be that products that are selling very well are discounted, even to the point that the profit generated by the product is lower in total due to the discount. Under these conditions, it would be expected that this is made up for by higher traffic and the additional products that are purchased by visitors of the store or website. Product-specific motivations for promo pricing are the simplest to dissect. Yet again the goal is to maximize profits for the organization, and a major element in the decision-making process is whether or not the inventory position can still be controlled, as was discussed in the sections on dynamic pricing. A key difference is that the expected demand has to be re-estimated. Once a product is displayed as having a promotional discount, the nature of the demand changes. Generally, the response of the demand will be stronger when a product is advertised as having a promotional discount, when compared to merely reducing the list price of a product. Rather than estimating a completely new demand curve for each product, it is easier to estimate promo price elasticity. This elasticity will usually be greater (i.e., a more significant response to a promotion), when compared to the regular price elasticity. This elasticity can be determined by collecting data from past promotions and simply measuring elasticity as the relative response of demand to the relative change in price (Eq. 1.6). =

D D P P

(1.6)

The caveats of measuring price elasticity will be discussed at length in later chapters. Depending on the amount of available data and the nature of the products, different methods may be preferred. As always, it is better to measure the performance of a simple model before investing in more complex models. There is an interplay between the decisions on the product level and the decisions on the assortment level. Promo price elasticities are measured using historical data. In doing so, this implicitly assumes that the situation wherein decisions are going to be made will be similar to the past situation. It may be the case that when following the optimal decisions on a product level, the aggregate is significantly different from

22

1 The Retailer’s Pricing Challenge

the historical situation. A product that is discounted will stand out less if all other products are also discounted. As such, it is important to consider the overall picture on the level of the product portfolio—in a similar manner as was discussed for markdown prices at the end of the product lifespan. The objective of a retailer then becomes a trade-off between additional sales during the promotion and sales potential after the promotional period, as shown in Eq. 1.7. The demand during the promotional period is estimated using the price elasticity, which allows the quantification of the demand for different promo price levels. Objective = Promo contribution + Post-promo contribution = (P promo − C var ) · min(I, D promo ) + (P − C var ) · min(I − D promo , D post-promo ) + V · (I − D promo − D post-promo)

(1.7)

Most often, this type of objective is combined with a strategic/tactical decision on how many products can be discounted during a promotion. This automatically translates into a combined problem of product selection and promo price setting. The simplest way to go about this is to calculate what the additional contribution for each product would be if it were to participate in the promotion and then sort products accordingly. Next, the top products can be selected until the quota of products or inventory is met. A nuanced quota that requires sufficient promotions in each of the product categories changes little to this process. Taking this argument further naturally leads to the second objective of sales promotions: increasing traffic. Traffic is an ambiguous term and can be used to mean different things. One interpretation could be the desire for a spike in turnover— freeing up capital that is required to invest in new products. Another objective could be to increase the number of customer transactions as much as possible. Another variation is the desire to promote products that increase transaction size. If the goal is to increase turnover, the objective from Eq. 1.7 can simply be replaced by turnover or the demand measured in units. Often, it will still be desirable to investigate the trade-off in this case. Specifically, there is likely to be a reduction in the contribution. This is caused by offering promotions on products that are in relatively high demand relative to their availability. These items would still have generated substantial revenue. A heuristic to achieve a good trade-off would be to calculate what product offers generate the highest revenue per unit of future contribution margin that is sacrificed. A more nuanced analysis can be conducted by using information on customer and basket value. As is explained in Chap. 8, it is possible to identify products that are likely to be responsible for increased customer lifetime value. Knowing this, these products become interesting candidates for promotional offers, as customers who purchase these products are likely to be more profitable in the future. This type of causality cannot be purely inferred from the data, and therefore it is advisable to

1.3 The Shifting Objectives Behind Price

23

test these hypotheses using well-construed experiments (see Chap. A for guidelines on experimentation). Environments in which there is no clear picture of lifetime value may still conduct analyses on the level of the shopping basket. The core idea here is to analyze the types of products that are often purchased jointly. This type of analysis can then be constructed as a search for products that cause an uplift in the purchase of highvalue (high-margin) products within the same transaction. This kind of analysis goes by the name of association rule mining, and an excellent introduction to this kind of analysis can be found in the sixth chapter of Introduction to Data Mining [13].

1.3.2.3 Shifting Objective over Time The pricing decision is complex because it is a combination of multiple decisions. These decisions are often made with very different amounts of information available, as well as a vastly different amount of freedom in making peripheral decisions (i.e., how much of a product to purchase). Generally speaking, the objective of the decision-maker shifts to one of longterm profits to a perspective that accounts much more for short-term turnover maximization or even the freeing up of cash to pay for new inventory. Arguments will still be voiced during later decisions that relate to the initial purchase price of a product. However, these must be treated very carefully. A sunk cost can never be a valid argument in an economical discussion, but it may be the case that going too low in price could erode the perception of your brand for your customers. Alternatively, discounting too deep may also stress your relationship with suppliers. These are valid strategic concerns and should be treated as such. However, more often than not, these arguments are raised simply from a perspective of loss aversion[14]. 1.3.2.4 End-of-Life Pricing When a product nears the end of its lifespan, the perspective changes. At this point in time, a retailer has a residual amount of inventory that should be cleared within a reasonable period of time. This is in order to free up space and working capital for products that have a better return on capital employed than the current product. Depending on the type of retailer, the context for this can be different. A retailer selling white goods may phase out certain models over time, clearing shelf and display space for newer models—in order to comply with new standards regarding energy efficiency. This is typically a relatively slow and continuous trickle of products that are phased out. Fashion retailers on the other hand will have entire collections that near the end of their lifespan simultaneously. This often implies that they have two or more highintensity markdown periods during which a large amount of products are cleared. This implies that a much greater number of decisions have to be made at the same time. Knowing what the objective is for the pricing decision in this stage of the product life cycle is of paramount importance. By far, the most common misconception is wrongfully focusing on objectives that contain aspects that are outside of the control of the decision-maker at the moment of making this decision. Whereas in general a

24

1 The Retailer’s Pricing Challenge

Table 1.3 Two scenarios to illustrate the markdown decision. When making the decision, the starting inventory, the purchase price, and the full retail price are known. The decision to be made is at what discounted price the product is going to be sold. This results in an expected number of units sold, as well as an associated total turnover. Common rules of thumb will often favor the result in scenario A, whereas the total value for a retail organization is maximized under scenario B Available inventory Purchase price Full retail price Discounted price Units sold Remaining inventory Turnover

Scenario A 100 pcs e50 e100 e60 98 pcs 2 pcs e5880

Scenario B 100 pcs e50 e100 e80 74 pcs 26 pcs e5920

retailer should strive to maximize the gross margin rather than just revenue, during this stage in the product life cycle, maximizing the turnover is the only objective that truly maximizes the value to a retailer. Too often retailers are still concerned with the original cost of an item, while the initial investment is a sunk cost at this point. This originates from not making a clear distinction between the collection planning efforts and the markdown efforts. Part of the cause for this is that it is often the same teams who are responsible for markdowns as well as collection planning. This makes it harder to “kill your darlings” and admit that products you have chosen did not perform as well as expected. This leads to markdowns that are less aggressive than they should be to maximize value to the retailer. Inversely, the opposite also happens when the sell-through is taken as the objective of a markdown exercise. This will cause decision-makers to favor markdowns that do not maximize the financial value to the retailer. The effect of aiming for a specific final inventory level—which may or may not be greater than zero—has the same effect. This is best illustrated using a simple example. Table 1.3 shows a situation in which a choice must be made between two different pricing strategies: a markdown of e40 or e20. Assuming that the decision-maker is able to predict what the response to a specific markdown is going to be,14 the number of units sold15 as well as the total turnover can be estimated. When a retailer aims to maximize sell-through or minimize the final inventory,16 scenario A will be preferred over scenario B. The root of this misconception can

14 This

is often not the case, as discussed in detail in Chap. 4. it might be required to use a different definition of a stockout than zero inventory. The reality might be that specific sizes of an item are hard to sell and that sales will fall to zero when there is still a fraction of inventory left. Where this level lies exactly is dependent on the nature of the retailer and should be determined based on past inventory values. 16 Both these objectives effectively have the same real-world meaning. 15 Also take note that in real scenarios,

1.3 The Shifting Objectives Behind Price

25

often be found in the accounting-driven perspective in calculating profits at this point. Equation 1.4 can be used to illustrate this. In its entirety, this equation includes the following big components: • The gross margin generated during the transaction • The value of the residual inventory, minus the investment in inventory • The fixed overhead These components paint a correct and complete picture of the profitability of a retailer. However, not all of the variables in this equation can be controlled during markdown decisions (see Table 1.1). When products are phased out, both the list price and the purchased quantity are a given. It could be stated that these are constants and not variables. Regardless of the decisions made in the markdown process, these numbers will not change. In economical terms, this is saying nothing more than that the investment in inventory is to be considered a sunk cost. Such a cost should no longer be accounted for when making decisions on what actions to take.17 Effectively, the decision is reduced to a maximization of the total turnover plus the residual inventory value minus variable costs associated with completing a transaction. Returning to Eq. 1.4, this knowledge can be used to make some simplifications. First, the constant F can be dropped from the equation, resulting in the following expression: Objective’ = (P − M − C pur − C var )D + (V − C pur )(I − D)

(1.8)

Next, the purchasing cost of the inventory can be isolated from this equation. In doing so, the following expression is created: Objective” = (P − M − C var )D + V (I − D) − C pur (D + I − D) (1.9) = (P − M − C var ) · D + V · (I − D) − C pur · I

(1.10)

Again, the same situation is created where a constant is present at the end of the equation. The final term represents the sunk cost of the inventory that has been purchased in the past. No actions can currently change this amount,18 and thus this

17 Returning

to the domain of mathematics and optimization, it can be stated that the point at which a maximum is obtained will not change if a constant is added to the equation. The maximum value of y = −x 2 and y = −x 2 + 100 will both occur for x = 0. The additional constant does not change this.

26

1 The Retailer’s Pricing Challenge

final term can also be dropped from the equation, creating a simpler form of the objective during the markdown process: Objective”’ = (P − M − C var ) · D + V · (I − D)

(1.11)

Returning to the example in Table 1.3, it can be determined that scenario B is to be preferred over scenario A. Assuming that there is a variable cost associated with each transaction of e5 and a residual value of e20 for each unit of inventory, scenario A results in an objective of 5.430 (Eq. 1.12), and scenario B results in an objective of 6.070 (Eq. 1.13). Objective”’(Scenario A) = (100 − 40 − 5) · 98 + 20 · (100 − 98) = 5.430

(1.12)

Objective”’(Scenario B) = (100 − 20 − 5) · 74 + 20 · (100 − 74) = 6.070

(1.13)

Note that the absolute amount of the transaction cost is of no importance to the preference between these two scenarios—since there are always more transactions in scenario B and the amount will always be greater than or equal to zero. Likewise, the value of the residual inventory does not have an impact on the preference between the two scenarios, unless this amount is negative. This can be the case in scenarios where the cost to remove or destroy inventory is substantial. However, it must be noted that this only applies to an out-of-pocket expense—no depreciation or initial purchasing cost is to be taken into account at this point. It makes no difference to the end result if the longer form in Eq. 1.4 is used, rather than the shorter form expressed in Eq. 1.11. However, in practice, the shorthand is to be preferred for multiple reasons. Firstly, it is beneficial not to be distracted by numbers that cannot be changed and that could potentially introduce errors in the total amount. Secondly, the shorter equation focuses only on the things that can be changed at the time when markdowns are being set. Taking this perspective avoids painting a very negative picture where only less bad outcomes can be attained. Finally, when comparing different algorithms, only the shorthand form will result in accurate relative performance comparisons (i.e., method X is 50% better than method Y for making markdown decisions). The most complex factor in defining the correct objective for this process is attaching a correct valuation to the residual inventory. A complicating factor of residual value is that it cannot be purely calculated on the level of an individual product, the reason for this being that the residual value will depend on the leftover inventory. If there are so many items left that they cannot easily be sold within a reasonable time frame, the residual value may be lower than expected. Likewise, the residual value of products will also be affected by the total amount of leftover 18 An exception may be re-negotiating prices with suppliers for products who have performed badly—something which at times is accepted. Along the same vein, it may be possible to return a fraction of unsold inventory to suppliers. But this is by no means a common practice.

1.4 Escaping the Discount Trap

27

inventory. The various channels that are available to sell superfluous inventory can only be expected to handle a certain number of products. If the amount of leftover inventory is excessive, this might cause problems. The opposite can also occur, where residual inventory is too low to be a sufficiently sizeable body of inventory for certain outlet channels. For example, because the shipping and handling costs for a limited amount of product exceed the possible value that could still be derived from them. All this implies that determining residual value accurately can at times be a quite intricate exercise. All this is not a sufficient argument to use the sell-through percentage as an objective in the markdown exercise. Doing so may significantly reduce the overall profit, especially if products are discounted too aggressively to reach a predefined sell-through target. All things being equal, a situation with higher gross margins should be preferred to a situation with an ideal sell-through rate. Superfluous inventory can always be donated to charity or sold to another party. The final sell-through percentage will be a result of the optimization, not an optimization objective. The basis for the correct decision-making is to be found in an accurate measurement of residual value of products and groups of products.

1.4

Escaping the Discount Trap

Many retailers have become over-reliant on discounts to drive sales. This has resulted in customers who purposefully delay purchases until the next discount period[15]. Overall, this has proven to be a net negative for the retail industry as a whole. The root of this problem is that this is a typical example of the tragedy of the commons. Where it would be much more ideal for the complete industry not to discount so heavily and frequently, individual retailers will often be better off when being more aggressive when it comes to discounting and promotions. All this is conducive to the race to the bottom that many retail sub-sectors are experiencing. Consumers have grown used to low-price strategies, and frequent promotions are causing shoppers to delay purchases, awaiting the next big sale. This is eating away at the margins of retailers and is giving an advantage to larger retailers who have greater buying power. However, this does not mean that nothing can be done; there are many ways to be innovative using pricing—many of which are feasible even on a smaller scale. Some retailers are launching campaigns that go against traditional consumerism. The most well-known and early example being Patagonia[16], who actively boycotted Black Friday as a form of social activism. This is of course not pure corporate altruism and is aimed to position the brand as environmentally conscious. Even so, if companies are getting on this wave because of consumer demand, they are at least heading in a more fruitful direction. A better understanding of what constitutes value in the mind of the customer will be essential to escaping this situation. This will allow retailers to align their offering with the wishes of the consumer. Employing smart pricing strategies will

28

1 The Retailer’s Pricing Challenge

be essential to be able to recapture what constitutes as healthy and sustainable profit margins. In combination with consumers who have an increased awareness of the ecological and societal implications of their consumption, this should point companies into worthwhile directions. This returns to the beliefs of Adam Smith who stated that companies exist to serve the consumer, not just as profit-maximizing entities—even if that is the proximate objective that they use to make decisions.

1.5

The Next Chapters

The next chapters present methods to improve price decisions during the lifespan of a product. The aim is not to go into great technical depth, but rather to focus on the big patterns and motivations behind these approaches. For technically minded readers who want to know everything there is to know about pricing, The Oxford Handbook of Pricing Management is strongly recommended as a starting point for your journey [17]. A more easily digestible tome is the Confessions of the Pricing Man by Herman Simon [18].

References 1. Phillips, R. L. (2005). Pricing and revenue optimization. Stanford University Press. 2. Porter, M. E. (2011). Competitive advantage of nations: creating and sustaining superior performance. Simon and Schuster. 3. Raynor, M. E., & Ahmed, M. (2013). Three rules for making a company truly great. Harvard Business Review, 91(4), 108–117. 4. Van Ossel, G. (2014). Omnichannel in retail: het antwoord op e-commerce. Lannoo Meulenhoff-Belgium. 5. Ursic, M. L., & Helgeson, J. G. (1994). Using price as a weapon: An economic and legal analysis of predatory pricing. Industrial Marketing Management, 23(2), 125–131. 6. Cernansky, R. (2021). Why destroying products is still an everest of a problem for fashion. https://www.voguebusiness.com/sustainability/why-destroying-products-is-stillan-everest-of-a-problem-for-fashion. Accessed 19 Apr 2022. 7. Chandrashekaran, R., & Grewal, D. (2006). Anchoring effects of advertised reference price and sale price: The moderating role of saving presentation format. Journal of Business Research, 59(10–11), 1063–1071. 8. Barden, P. P. (2013). Decoded: the science behind why we buy. John Wiley & Sons. 9. Sinek, S. (2019). The infinite game. Penguin. 10. Caprara, A., & Monaci, M. (2004). On the two-dimensional knapsack problem. Operations Research Letters, 32(1), 5–14. 11. Sharda, S. (2018). The extinction of the price tag: How dynamic pricing can save you. New Degree Press. 12. Harris, F. W. (1990). How many parts to make at once. Operations Research, 38(6), 947–950. 13. Tan, P.-N., Steinbach, M., & Kumar, V. (May 2005). Introduction to data mining. Addison Wesley, us ed ed. 14. Novemsky, N., & Kahneman, D. (2005). The boundaries of loss aversion. Journal of Marketing Research, 42(2), 119–128.

References

29

15. Fang, Z., Gu, B., Luo, X., & Xu, Y. (2015). Contemporaneous and delayed sales impact of location-based mobile promotions. Information Systems Research, 26(3), 552–564. 16. Collins, M. (2020). Resisting black friday: Rei and patagonia’s stances on consumerism. Elon Journal, 27. 17. Özer, Ö., & Phillips, R. (2012). The Oxford handbook of pricing management. OUP Oxford. 18. Simon, H. (2015). Confessions of the pricing man. Springer.

2

Understanding Demand and Elasticity

2.1

Price-Response Curve, Not Demand Curve

The demand curve is one of the cornerstones of economic theory. It represents the basic relationship between customer demand and the price charged for goods or services. Specifically, as prices rise, customers are likely to purchase less and vice versa. A solid understanding of such curves is an essential element for any retailer looking to improve the way in which prices are being set. The methods described in this chapter take the perspective of a single retailer. Unlike most economists, the individual retailer is not concerned with macroeconomics and the creation of aggregated demand curves for the complete market. Rather, she is concerned with modeling demand for the products and channels where she is actively selling (or planning to sell) products. This in turn has implications for the type of demand models that are suitable to this purpose. d = f (p)

(2.1)

In economics textbooks, demand is typically placed on the horizontal (x-axis) and the price on the vertical (y-axis).1 In the context of pricing decisions, the assumption is often that demand responds to price and not the other way around (Eq. 2.1). This implies that the traditional demand curve goes against mathematical conventions of placing the dependent variable on the vertical axis. For this reason, this book uses the price-response function rather than the demand function as the basis to model customer demand (see Fig. 2.1).

1

This practice is mainly historical, but can also make sense for analyses where quantity is the controlled variable, as well as calculating things like the customer’s surplus—which takes the form of an integral if the demand is on the horizontal axis.

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 L.-P. Kerkhove, Data-driven Retailing, Management for Professionals, https://doi.org/10.1007/978-3-031-12962-9_2

31

32

30 Price (€)

Fig. 2.1 The demand curve as used in traditional economics and the price-response function which inverts the x- and y-axes. The way in which the curve is presented does not change anything about the underlying relation, but it is important to note that this book does not follow convention in economics when comparing models to other textbooks

2 Understanding Demand and Elasticity

Demand curve

25 20 15 10 300 400 500 600 700 Demand (#) Price-response curve

Demand (#)

700 600 500 400 300 10

15

20 25 Price (€)

30

There is no such thing as “the demand function” or “the price-response function”—as an invariable law of nature. The relationship between price and demand depends on the time period and the context in which the relationship is investigated. An important context is the manner in which the price is presented, as either the sticker price or a discounted price—regardless of the absolute price that is being asked. As time progresses, products may lose value in the eyes of consumers who may start to delay purchases in anticipation of newer versions or lose interest because the season for the product is running to a close. Marketing exposure, either using traditional campaigns or by investing in specific products using technology such as Google Merchant Center, also affects demand for products. All these conditions will imply that the price-response curve takes a different shape. The list of factors that can have a potential impact on demand is very long, but a retailer making a decision has to be pragmatic. Creating a model that accounts for all variables is impossible in practice, because of both the implied complexity of the model and the immense data requirements to adequately determine all the parameters in such a model. Therefore, it must be decided what level of detail is adequate for the decisions that are going to be made using the model. This chapter presents basic theory and concepts that will be used in the chapters on pricing, but which can also be relevant in marketing and operations contexts. The focus lies on aspects that are of immediate practical relevance to a retailer making

2.2 Measures of Price Sensitivity

33

decisions. First, the concept of elasticity is explored (Sect. 2.2) as this is the basic for much of the pragmatic understanding of the impact of price changes. Next, this knowledge is used to analyze functions for demand that can be useful in practice (Sect. 2.3). Finally (Sect. 2.4), this chapter deals with how demand models can be fitted.

2.2

Measures of Price Sensitivity

Elasticity is a measure for how strongly demand responds to a price change. Most often, this is talked about as if elasticity is an intrinsic property of a certain product or category. Statements such as “the consumption of tobacco is inelastic” are frequently uttered. This is then taken to mean that increased prices of tobacco— by means of taxation—do not result in proportionate decreases in consumption. This implies that governments who raise prices on tobacco do not do so purely as a measure to discourage tobacco consumption; it is also an ideal way to raise government income. In reality, elasticity is not a constant for a given product. Elasticity differs depending on the size of the price change, as well as the context wherein the price change is conducted. This can be illustrated by considering an extreme case:2 Suppose that the price for tobacco were to increase 100-fold overnight, a single pack of cigarettes now being priced at e1.000 or e50 for a single cigarette. It is clear that this would likely have an extreme impact on the demand, going against the supposedly inelastic nature of tobacco.3 A less extreme case is the fact that a decrease in the price of a product with 5% and one of 50% are likely to result in very significant relative responses. The former may result in an increase in demand of a couple of percentage points; the latter may cause the complete inventory to be sold out in a matter of minutes. This is reflected in the basic formula for price elasticity (Eq. 2.2), which shows how elasticity is calculated when the price is changed from p1 to p2 . This expression correctly implies that the value of elasticity is dependent on both the original price point and the new price point. The notation here uses d(p) to represent the demand for a price p, and  is shorthand for elasticity. (p1 , p2 ) =

d(p2 )−d(p1 ) d(p1 ) p2 −p1 p1

(2.2)

Under normal conditions, the result of this calculation is less than or equal to zero. This implies that a price decrease will never result in a demand increase, 2

This is a very valuable tool when thinking analytically; this principle and many others can be found in the book Maxims for Thinking Analytically [1]. 3 This would likely also be a big incentive for illegal sales of tobacco and may not be the greatest public policy idea.

34

2 Understanding Demand and Elasticity

something which holds true under normal conditions.4 Often, the absolute value of the elasticity will be reported. This should not be a cause for confusion since under normal conditions, there will not be a situation where demand responds in the same direction as the price change (i.e., a price decrease will always result in greater than or equal demand and vice versa). In practice, it is often still desirable to have a way to express the elasticity of a product in a single number. To this end, the point elasticity of a product can be calculated. This is nothing more than the value of the elasticity for an infinitesimal change in the price. To understand this, it is easiest first to consider the manner in which the slope of the price-response curve can be calculated at a specific price p1 . As shown in Eq. 2.3, this is nothing more than calculating the derivative (d  ) of the price-response function.5 δ(p1 ) = lim

x→0

d(p1 + x) − d(p1 ) = d  (p1 ) x

(2.3)

The same derivative can be used to calculate the point elasticity, as is shown in Eq. 2.4. This can be a useful proxy to estimate the elasticity of a product, but it must be remembered that this manner of calculation only holds true for very small changes in price. (p1 ) =

d  (p1 ) · p1 d(p1 )

(2.4)

Retailers often play fast and loose with the concept of elasticity. Arc elasticities are observed, but then used as if these are a product’s point elasticities—in combination with the implicit assumption that all price changes that are investigated are sufficiently small to warrant the use of a point elasticity to make a good estimate of demand. These assumptions are not made explicit and often originate from insufficient proficiency with the concept of elasticity.

For the reader who wants more detailed insight into elasticity, the third chapter of Robert Phillips’ Pricing and Revenue Optimization [2] provides an excellent starting point. The 18th chapter of The Oxford Handbook of Pricing Management [3] provides more technical detail, as well as useful references.

4

There are exceptions to this rule such as in cases of conspicuous consumption such as the infamous “I am rich app” [4]. Other examples include Giffen goods and situations where price is used as an indicator of quality. 5 Depending on the nature of the price-response curve, this may be hard to calculate using traditional calculus, and an approximation strategy may be more appropriate.

2.3 A Sensible Model of Demand

2.3

35

A Sensible Model of Demand

Elasticity is often used as a synonym for a price-response or demand curve. The latter is however much more encompassing and altogether useful when making pricing decisions. The price-response curve gives insight into the full relationship between price and demand, as opposed to just being able to make relative statements. The argument can be raised that it is easier to determine elasticity than to create a full model for demand. This argument does however confound two different concepts. While it is true that it is easier to measure elasticity when observing sales, this is not the same as predicting elasticity. The latter would mean that the response to every possible price change can be predicted (see Sect. 2.2). Using a model for the price-response curve implies that assumptions are made about the nature of the relationship. The technical name for such a model is a parametric model. This is contrasted with non-parametric models, which rely on large datasets to determine the right structure of the equation that predicts a certain outcome. The latter tend to be much harder to explain, typically taking on the form of black box models. Such models can however outperform parametric models, as they can uncover very complex relationships and are not as easily affected by possibly incorrect initial assumptions. However, in practice, these models require vast amounts of data to reach this potential, and they are much more prone to overfitting6 in situations where there is insufficient data. For this reason, it is often preferable to stick to parametric models such as those that will be discussed now. Choosing a form for this model is always a compromise. On the one hand, the formulation should be complex enough to be a realistic representation of the true nature of demand. On the other hand, the formulation should be simple enough to be able to fine-tune it using the data at hand. Moreover, a simpler formulation often allows for much easier and quicker calculation and optimization. The ability to conduct a sanity check on the demand model is also an important argument to keep things as simple as possible—avoiding black box models. Three different demand models will now be discussed in increasing order of complexity. The first two models come with significant caveats and cannot be assumed to reflect the true reality of demand accurately. In spite of their simplicity, they can be very useful when making rapid prototypes. Understanding these models and being aware of their limitations are essential to making correct data-driven decisions.

6

Overfitting is a situation where a model relies too much on what has happened in the past, not being able to make any sensible predictions about the future because it has lost its ability to generalize.

36

2.3.1

2 Understanding Demand and Elasticity

Linear Price-Response Model

The simplest of all demand functions is the linear demand function, as expressed in Eq. 2.5. In this constant, C can be taken to mean the maximal possible demand for a product, equated to the demand assuming the price is zero.7 The slope of the demand curve is expressed as s > 0. d(p) = C − s · p

(2.5)

The advantage of the linear demand curve is that it requires very few parameters to be estimated. The main disadvantage is that demand can often only be accurately approximated using linear curves if the price range under investigation is narrow. If the objective of the exercise is a potentially large change in price, such as for promotions and markdowns, a linear approximation is often not advisable. The price elasticity () of a linear price-response curve depends only on the starting price (p1 ) from which a price change is considered. The relative magnitude of the change in demand will be the same regardless of the new price p2 (see Eq. 2.2). How the elasticity evolves as the starting price is varied is shown by the dotted line on Fig. 2.2. Lower prices equate to lower elasticities, whereas higher prices equate to higher elasticities. A common misconception is that if demand is estimated by observing historical elasticities, linear demand curves are created. However, it is clear from this example that this is not the case and that elasticity varies greatly depending on the starting price. There is however a class of demand functions that does satisfy this assumption; this type of demand function will be covered next.

2.3.2

Constant Elasticity Price-Response Model

A constant elasticity price-response function has the same elasticity regardless of the current price point (p1 ). This can be achieved using the simple expression shown in Eq. 2.6. This expression assumes that the elasticity is a negative number ( < 0). d(p) = C · p

(2.6)

The constant C in this equation is equal to the expected demand for a price equal to 1. The price-response function will display the most realistic behavior around this point of the curve. For this reason, it can be useful to add a scaling factor to this equation that allows the current price point and associated demand to be set as the

7

In reality, this will often not correspond to a realistic value, as the relevant price point is significantly removed from zero. The value of C will be the result of fitting to current observations of price and demand combinations, rather than an estimate of the demand at a price of zero. Nevertheless, this would be the theoretical meaning of the constant C.

2.3 A Sensible Model of Demand

37

3

2.5

600

Demand (#)

400 1.5

200

Elasticity (absolute value)

2

1

Linear demand Elasticity for p1

0 10

15

20 Price (€)

25

0.5

30

Fig. 2.2 A linear price-response function d(p) = 1.000 − 25p. The chart shows both the demand and the elasticity for a given point p1 on the demand curve

base point for the price-response curve, rather than having to estimate demand at a price point that is unrealistic. A scaled variation of the constant elasticity model is shown in Eq. 2.7. Assuming a scenario in which the current price is equal to e25, and 375 units are sold in every period, this would imply setting p1 = 25 and C = 375. For different values of elasticity, this results in the price-response curves as shown in Fig. 2.3. This shows that arc elasticities around the current sales price do not show extreme jumps, as would be the case for the non-scaled variation of the constant elasticity price-response model. d(p) = C · (p/p1 )

(2.7)

The form of this function is more realistic than the linear price-response function. Still, demand never drops to zero regardless of how high the price is elevated. Likewise, as the price approaches zero, demand will approach infinity. Both of these situations are not a realistic reflection of demand. It is important to note that the expression “constant elasticity” indicates that the point elasticity of the price is constant. For relatively large price changes, the arc

38

2 Understanding Demand and Elasticity

3000 Constant elasticity e = -4.0 Constant elasticity e = -3.0 Constant elasticity e = -2.0 Constant elasticity e = -1.0

2500

Demand (#)

2000

1500

1000

500

0 15

20

25 Price (€)

30

35

Fig. 2.3 Constant price elasticity price-response models, for different elasticity values and a current price at p1 = 25

elasticity (Eq. 2.2) will still have different values. This again makes it clear that this type of demand model is mainly suited to model relatively small price changes. This is illustrated in Fig. 2.4 which shows that if an arc elasticity of 2.5 (= the vertical axis) is observed for a price increase of 20% (= the horizontal axis), this would actually be realistic for a constant price elasticity curve with a point elasticity of 3.5. This means that the point elasticity is underestimated in the case we use the arc elasticity of a price increase as a proxy. The larger the price increase, the larger this difference becomes. Again, this corroborates the fact that this curve—while more realistic than linear demand curves—is only well suited for relatively small price changes. Misuse of constant elasticity price functions is common in spite of this problem. When a retailer uses measures of arc elasticity as if they are point elasticities, this implies that a constant elasticity price-response function underpins the demand dynamic. Moreover, the implicit assumption that all investigated price changes are small enough not to be affected by the dynamic shown in Fig. 2.4 is made. The end result is a fairly unrealistic model of demand, which can have very detrimental effects as this is one of the cornerstones of many decisions taken by the retailer.

2.3 A Sensible Model of Demand

39

Arc elasticity of demand curve

3.5

Constant point elascity e = -0.5 Constant point elasticity e = -1.5 Constant point elasticity e = -2.5 Constant point elasticity e = -3.5

3 2.5 2 1.5 1 0.5 0

0.1 0.2 0.3 0.4 Size of price increase as fraction of current price

0.5

Fig. 2.4 The observed arc elasticity on a constant elasticity price-response curve, for different sizes of a price increase. The chart shows that as the size of the price increase grows, the difference between the point elasticity and the arc elasticity increases

At this point, the discussion of this topic may seem quite technical and farremoved from practice. The main thing to remember is that both linear and constant elasticity models have significant limitations when it comes to larger price changes and are often quite complex to link to elasticity. The logit priceresponse model that is treated next solves a number of these issues.

2.3.3

Logit Price-Response Model

An answer to the limitations of the linear and constant elasticity model is the logit price-response model. In spite of being lesser known, this price-response model has been shown to work well in practice [5, 2]. Equation 2.8 shows the mathematical expression that constitutes this model. d(p) =

C · e−(a+bp) 1 + e−(a+bp)

(2.8)

In this equation, three parameters have to be estimated. C > 0 can be interpreted as the size of the total market (similar to the linear model). The parameters a and b do not have as straightforward an interpretation as C. What can be stated is that larger values of b imply that there is greater price sensitivity; the value of b should

40

2 Understanding Demand and Elasticity

always be greater than zero. Moreover, the inflection point of the demand curve is equal to pˆ = − ab , where pˆ often takes on the meaning of the current market price. However, defining a market price is not required to make effective use of these models. The initial purpose of this model was to reflect the situation where there is a certain market price for a product. Variations close to this market price result in fluctuations of demand, but as soon as the price becomes significantly higher or lower than the market price, there are no longer marginal changes to demand. This reflects the situation where demand drops to zero either when the price is too high or when all customer demand is already being captured, in a situation where the price is significantly lower than the market value. Even without the explicit assumption that there is a specific market price, this model works well when modeling customer demand. The main advantage being that it does not suffer from the same problems as the constant elasticity and linear models, which paint a very unrealistic picture for larger price changes. Figure 2.5 visualizes the logit price-response curve. The advantages of this model for larger price changes are clear: there is more realistic behavior for high and low prices. The point elasticity for lower prices also shows more natural behavior for lower prices up until the price significantly exceeds 30

1000

25 800

Demand (#)

600 Logit price-response Point elasticity

15

400 10 200 5 0 10

15

20

25 Price (€)

30

35

40

Fig. 2.5 A logit price-response function with C = 1.000 and a market price pˆ = 25

0

Elasticity (absolute value)

20

2.4 Fitting Demand Curves Using Data

41

the inflection point (i.e., the market price, depending on the interpretation). When the price becomes too high, demand will drop to zero. For very low prices, the demand will max out the market capacity, rather than continuing to rise indefinitely. As shown in Fig. 2.5, the elasticity remains close to zero for very low prices: all the available demand is still being captured. As the inflection point nears, the elasticity begins to creep upward, implying that a change in price starts to have a greater relative response in terms of demand. By this point, the total volume of demand has already been significantly reduced, and relative responses have a smaller absolute impact. At prices significantly above the inflection point however, the elasticity continues to increase linearly. This is caused by demand approaching zero, not by continually greater sensitivity to prices by consumers. This highlights an issue when working with elasticities to estimate demand. Because elasticity is an expression that uses relative magnitudes, it inherently takes on erratic value when the basis for comparison approaches zero. This is an argument for use price-response functions rather than elasticity measures as a basis for decision-making. A minor disadvantage of this model is that there is one more parameter to estimate, making this curve slightly harder to fit than the linear and constant elasticity model. In practice, this means that more observations of demand at different price points are needed to get an accurate read of the price-response curve.

2.4

Fitting Demand Curves Using Data

Most literature on pricing will assume that the price-response (or demand) curve is a given. The reality of a retailer is that getting an accurate measure of the priceresponse curve is one of the most challenging problems. Nor is this a problem that can be fixed by data science techniques, which start from the premise that models will be shaped by the data. Retailers often have large swathes of data, but prices for individual products are often fixed for longer periods of time. Moreover, the context wherein a retailer operates is continually changing. This makes it that fitting a price-response curve using observed data points is hard to achieve. This section illustrates an approach for fitting price-response models using data as it is typically available for a retailer. As is done in other chapters, real but anonymized data will be used to illustrate the approach. Specifically, the data originates from a mid-sized fashion player, as this is often one of the most challenging contexts to work in, as products have such limited lifespans. As such, using this approach for other types of retailers should be relatively straightforward.

2.4.1

Demand and Price Indices

Changes in the context wherein a product is sold are one difficulty when fitting price-response models. There are periods in the year where there is more demand

42

2 Understanding Demand and Elasticity

than usual, just as there are periods where demand is unusually weak. This can be caused by inherently unpredictable forces such as weather conditions or by more predictable things such as the start of a sales period that causes a spike in overall demand. Similarly, there will be a variation in the price point of the product portfolio. If the complete assortment of products is discounted, an individual discount on a product will have a smaller effect, and vice versa. This is a second cause for noise in the data that can be used to fit price-response curves. To deal with this, index values can be constructed that reflect the way in which external factors have influenced prices in the past. For every time period t, a demand index DIt and a price index P It can be calculated. If this index takes on a value lower than unity, this implies that demand is negatively influenced by external factors. A factor greater than unity means the opposite: demand is stronger than average due to external factors. DIt =

I

i=1 dit I τ =1 i=1 (diτ ) t

t

(2.9)

Equation 2.9 shows how a simple index could be calculated. This index uses the demand for product i at time t in units as the key variable. Note that this index does not use turnover, since turnover is also influenced by the price point at which products are being sold. This calculation assumes that a reasonable time period can be defined over which demand can be averaged (τ ∈ [0, t]). In a strongly seasonal market, an index value can only be calculated after some time in the season has elapsed. A variation of this formula could account for the fact that not all products are introduced at the same time. Assuming that a group of products has been purchased for the summer season, it is usually the case that these products become available gradually and not all at once. This is to allow for the processing in warehouses and distribution centers as well as handling in physical stores. This may also include the creation of product information and pictures for online channels. In such cases, the demand index can be corrected by weighting demand against the available number of SKUs. This may present a truer representation of market demand at different points in time. One important caveat is that some products may behave differently than the average product. Just like there are non-cyclical shares in stock markets, there may be products that are purchased more when demand falters on average. An example of this could be the sales in a garden center during winter months. The top-line revenue is certain to drop during this period, but there will be some products that are purchased more than others such as bird feeders and mulch to protect plants against strong frost. This can be mitigated by using multiple indices and creating baskets of products that represent a certain type of customer demand. Product should not be grouped

2.4 Fitting Demand Curves Using Data

43

just based on historical correlations, as this may result in overfitting on patterns that are unlikely to repeat in the future. P It =

I

i=1 pit /I I τ =1 i=1 (pit /I ) t

T

(2.10)

Equation 2.10 shows a similar method for creating a price index value. This is especially important for retailers that offer significant discounts at specific moments in time. In this expression, pit represents the price asked for product i at moment t in time. For some retailers, it may also be important to weigh prices based on the inventory levels of products. This is the case when demand for products is strongly dependent on the available inventory, such as is the case when a relatively low level of inventory has to be spread across multiple physical store locations. There is however an important caveat when using price indices in automated pricing systems. When setting prices for the complete assortment, the price index is not independent from the prices that are set for individual products. This poses risk for circular reasoning, where a product’s price needs to be lower than the “market rate,” but the market rate continues to drop as the prices decrease. Because of this, the price index must be used with caution, but it can still be a relevant instrument at specific points in time when there are big discounts for the complete assortment, such as the Black Friday sales period. Next, these indices can be used to calculate simple corrected figures for the demand and price of products. This is as simple as dividing the observed demand or price by the previously calculated indices, as shown in Eqs. 2.11 and 2.12. dit∗ =

dit DIt

(2.11)

pit∗ =

pit P It

(2.12)

An important footnote when using demand curves that have been fitted on the adjusted price and demand values (dit∗ and pit∗ ), rather than the raw observations (dit and pit ), is that the result of these forecasts will have to be adjusted to the expected value of the index in the future. This is most important in situations where the index is used to correct as systematic trend rather than random noise. If there is only random noise, the expected value for the index in future periods should simply be 1. In cases where there is a significant upward trend in sales over time, it will be reasonable to assume that the demand resulting from the price-response curve will have to be corrected for this.

44

2 Understanding Demand and Elasticity

Demand index (DI) Observed sales (d) Corrected sales (d*)

140

2

120 1.5

80 1

60

Index value

Sales (#)

100

40 0.5 20 0 10

15

20 Time (weeks)

25

30

0

Fig. 2.6 Real data-based example of the application of a demand index DIt to the sales pattern. This specific product has been discounted heavily (−50%) in week 23

The use of this demand index can be illustrated using a real anonymized dataset.8 Specifically, the dataset from a mid-sized fashion retailer has been used. A subset of 4.000 unique products and the sales during the summer season have been used to this end. This constitutes around 90.000 transactions in total, which are of course skewed. Figure 2.6 shows the demand index as well as the observed (d) and corrected demand (d ∗ ) of a single product. The index itself shows that the strength of demand varies substantially throughout the season. At its peak, overall demand is twice as large as the average demand. For fashion players such as this one, these high peaks typically coincide with discount seasons. Here, it is clear that demand is relatively weak at the start of the season and then starts to increase gradually. The first big discounting period starts in week 22. A slight lull in demand is visible prior to this as customers start to postpone their

8

It would have been possible and easier to use artificial data to create examples for this chapter; however, this runs the risk of being too clean to translate well into practical applications. The goal of this chapter is to paint a realistic picture of what a retailer can expect when looking at data. At times, this can give rise to graphs and analyses that are noisier than typical classroom examples.

2.4 Fitting Demand Curves Using Data

45

purchases in anticipation of discounts. The start of two big discount waves is clearly visible as peaks. This example also shows the usefulness of using the demand index to correct demand when fitting price-response functions. For this product, only a single price reduction has been effected, specifically in week 23. However, the observed sales also clearly show a big peak in week 26. During that week, a second big discount period started, but the price of this specific product has remained unchanged. For this example, the corrected demand correctly identifies the peak in week 26 as seasonal and corrects the demand back to a more reasonable level. Even after the correction, there are still substantial variations in demand in subsequent weeks where no price changes have been made. Especially at a moderate scale, such variability is unavoidable, and fitting demand curves implies creating models that succeed in being right on average. The smaller the sales volume for a single product, the noisier the signal is likely to be (in relative terms). Toward the end of the season, demand will start to decrease, as shown by the demand index. However, the single product investigated here decreases significantly steeper than the index. In this case, this is due to the fact that inventory for this product is starting to be depleted in multiple sales channels. Once there is no longer sufficient supply, the observations of demand are no longer useful to model the demand of customers on. Hence, these observations should be dropped. Not all products should be corrected using the overall demand index. A typical example in a fashion context is the sales for swimwear. Figure 2.7 compares the global demand index to the demand index for swimwear in particular. This more specific index shows that there is a much greater peak for these products once the summer holiday season starts. Investigating how many different indices are relevant for the product portfolio is a first step in fitting the price-response curves. For the next step, it is assumed that this exercise has taken place and that the corrected demand (d∗) has been calculated with an adequate index.

2.4.2

Fitting Price-Response Curves to Historical Sales Observations

Lack of price variation is another key challenge when constructing price-response curves. It is not unusual that products have only been offered at a single price for their entire sales history. A single point is insufficient to fit a price-response curve where at least three parameters have to be estimated. To mitigate this, information from multiple products can be combined in order to create a reasonable estimate of the shape of the price-demand curve. This raises the complex question which products can be grouped, as well as how observations from such a group of products can be used to create product-specific price-response functions. The right approach will depend on the nature of the retailer and the product portfolio of the retailer. While there is no one-size-fits-all approach to this, the following sections will present a possible approach illustrated with real datasets.

46

2 Understanding Demand and Elasticity

Global demand index (DI) Swimwear demand index (DI)

4

Index

3

2

1

0 10

15

20 Time (weeks)

25

30

Fig. 2.7 Comparison of a global demand index and the index for the swimwear category. The chart shows that the seasonal peak during summer months is much more pronounced for this category

Naturally, this practice is only relevant if there is indeed insufficient price variation for a single product. If product prices have already varied significantly in the past, combining products might even have detrimental effects. For some types of retail, such price variations are very common, a typical example being FMCG retailers.

2.4.2.1 Grouping Products The goal of this step is to create groups of products that are likely to respond similarly to price changes. For retailers with a relatively narrow assortment and products with long lifespans, this can be a purely data-driven exercise. Other retailers who have broad assortments and shorter product life cycles will need to take a more meticulous approach and formulate and test explicit assumptions. The basic principle can be illustrated with an example, an electronics retailer who sells a particular brand of smartphones. Every year, a new flagship is introduced, and last year’s model remains on offer at a reduced price. Here, the assumption could be made that once the new product is launched, the behavioral patterns that have been observed in the past will be repeated in the future. That is, the downgraded flagship of this year will react to (relative) price changes similarly to the flagship that was downgraded the year before.

2.4 Fitting Demand Curves Using Data

47

There are however a lot of retailers who do not have the luxury of such simple systematic patterns in their product portfolio. These retailers have products that have short lifespans and are not always replaced on a like-for-like basis. A typical example here is a fashion retailer who re-builds the product collection every year. The advisable approach here is to start from common sense rather than from a purely data-oriented approach. Not doling so can result in a classical case of “if you torture the data long enough, it will confess to anything.” In theory, it would be possible to just list every single property of a product as a variable and then calculate which variables appear to correlate with similar responses to price changes. However, this is very likely to result in spurious correlations as there will be a great many of these variables and often only a limited set of products and observed sales. In practice, this implies that a decision-maker makes assumptions about causal relationships that cause products to be similar in manners that are relevant to customer behavior. A starting point for this can be the basic product properties that make products closer substitutes (e.g., the products are both t-shirts). Another point of view may be that products are being sold to the same customer segments (e.g., both are premium products in our segment). The specific options will vary, but some typical perspectives include the following: • Categories and sub-categories: Products that reside in the same position in the product hierarchy. • Brands: In close proximity to a product category is the brand of the product. When selling power tools, brands aimed at professionals will tend to be less price elastic than brands that are aimed at DIY customer. • Price point: The base price at which a product was introduced can also be telling as to the nature of the product. This is especially important if a retailer caters to different customer segments who buy at different price points. A cosmetics retailer may offer a line of basic products for daily use, but also top-of-the-line French brands that can cost a small fortune. • Historical sales patterns: Two products that seem similar may in reality be very different. Again, fashion is a context where this can hold true. A product can be identical in all respects but the color. Yet this difference can be sufficient to see the sales of the product drop or rise to levels on a different order of magnitude. Having a measure of the historical success of a product can therefore be highly relevant to estimate how a product is likely to respond. Naturally, this is only a feasible strategy for price changes, not for setting the introduction price of a product. • Inventory position: The current or historical inventory position can also be used to group products. In cases where the availability of inventory and a wide array of sizes are an important driver for sales, this may have a significant effect on the price-response dynamic. It may be the case that products for which only limited sizes are available are much less visible in stores, causing a negative pressure on demand. Also, the initial inventory purchased can also convey information about the beliefs that the buying department holds (or held) about the product.

48

2 Understanding Demand and Elasticity

• Marketing visibility: If products are visible in the store windows, or if they are featured in advertising campaigns or billboards, this is likely to have an impact on customer demand. This is often one of the hardest pieces of historical information to collect, since this is often not recorded in a structured fashion. Hence, this may require a significant amount of manual work to codify. Products that have been clustered together will be modeled using a similarly shaped price-response function. This means that there is a clear balance between creating more clusters to better capture detailed nuances in customer demand and the desire to have models that generalize well. Striking this balance requires experimenting. This experimentation should take place using holdout data. This is a traditional concept in data science where not all the historical observations are used to create the model, but some data points are left out. A data point in this case is an observed combination of price and the corrected demand (d∗; see Sect. 2.4.1) for a specific product. The model is then asked to make predictions about these data points that have not been used before to train the model, but are known with certainty since they have been observed in practice. By doing so, the optimal size of product clusters can be determined. Moreover, this approach can also be used to determine the type of variables that are used to create these clusters. As stated earlier, the choice for the candidate variables is based on a belief in a causal connection, but the final usage of these elements is to be validated using data. The creation of good clusters is especially valuable for products with little to no sales history. Being able to approximate the shape of the demand curve before a new product introduction can greatly improve profitability—even when the estimate is not precisely correct. Especially in situations where it is hard to change list prices due to operational, strategic or legal constraints.9 A product that fits in a group that is relatively price inelastic will then be introduced at a higher price relative to a similar product in a high price elastic group of products. Traditional clustering algorithms simply group products that are similar based on any kind of information that is provided about them. For this kind of application, it is more useful to provide a specific objective to the clustering procedure, striving to identify differences that are as relevant as possible to the way in which a product responds to price changes—i.e., aspects that are relevant to the price-response function that is fitted for a product.10 Scoring variables in this regard can easily be done by using variables as predictors in a regression model. Equation 2.13 shows an example of how such a model looks like. The dependent variable is the corrected demand dt∗ , which a model will attempt to explain by a number of variables X that are candidates for creating a product group. It is also important to include some other independent variables that are obviously predictive for the demand in period

9

A typical example is a printed price tag attached to a product. Chap. 8 for a similar line of reasoning for customer segmentation.

10 See

2.4 Fitting Demand Curves Using Data Table 2.1 The resulting importance scores for the example dataset. These scores reflect the relative impact of different variables when using them to predict future sales

49 Independent variable ∗ dt−1 pt pt−1 Women’s fashion Shorts Sweatshirts Coats Suits T-shirts Dresses

Importance score 0.586 0.201 0.103 0.013 0.024 0.013 0.013 0.009 0.008 0.005

t: the demand observed in the previous period dt∗−1 and the price point of the product during this period as well as the preceding period (pt and pt −1 ). dt∗ = f (dt∗−1 , pt , pt −1 , X)

(2.13)

A wide range of models can be selected for the purpose of making these estimates. Solid and convenient candidates are a random forest regressor [6] and AdaBoost [7], which have low training times and can handle nonlinear relationships. If large amounts of data are available, neural networks are also an option, although they are likely to be more complex than required for these purposes. For the purpose of clustering, the overall performance of the prediction of this model is not extremely important. The most important piece of information from this exercise is which variables carry predictive power and which ones do not. This can be expressed as feature importance [8] or SHAP values [9]. The precise workings of this goes beyond the scope of this book, but both techniques essentially attach importance scores to variables that reflect how influential these variables are. For the example dataset, it is assumed that the relevant variables are limited to the target gender, as well as the main categories of products.11 The variable importance resulting from running a random forest regression on the sample dataset is shown in Table 2.1. It is clear that the most important predictors of future sales for an individual product are the past sales and the relative and absolute price of the product. This is to be expected and is the primary reason why these variables are included as predictors. This makes sure that variables (X) are not simply a proxy for the price point or historical sales volume of a product. The importance scores of product categories are an order of magnitude smaller, but are not zero. There are also clear differences in the importance of different variables. It is clear that the shorts category tends to respond differently from

11 The example dataset used in this chapter is on the small side, so the approach illustrated here will be based on a situation where initial assumptions are made about what is important—not a purely data-driven approach.

50

2 Understanding Demand and Elasticity

other categories. The target group (men vs women) also appears to be important. Fifteen other categories were also tested, but did not result in a significant feature importance score. An important remark here is that the impact of smaller categories might be underestimated when using this approach. Because the dataset is inherently unbalanced, variables that apply to larger groups will automatically be perceived as more important. There are simple techniques to mitigate this. The smaller categories could, for instance, be over-sampled in the dataset. Alternatively, rather than running a single large model, multiple small models could be used. These smaller models then each contain the observations from the category, as well as a random sample of other observations, in a balanced fashion. For the sake of brevity, this approach has not been explored fully in this example. While this approach is certainly not “by the book,” and very valid objections can be raised against this way of predicting demand.12 Moreover, multiple variations are possible in the chosen predictors as well as the dependent variable that is predicted. It is equally feasible to try and predict the relative change in demand, for instance. The expected value of the demand index can also be used as an independent variable, etc. In spite of these valid critiques and ambiguity, it can be assumed that this way of validating what variables are important is superior to selecting variables without any basis. Once variables have been identified as being relevant in this context, they can be used to create groups of similar products. The goal here should be to create groups that are as homogeneous as possible while maintaining a sufficient amount of observations to be able to fit a price-response curve. If there are many historical price changes for a single product, it may be that there is no need to add this product to a basket containing multiple products. Homogeneity of products can be measured as few differences in the variables that have been identified as being important. Multiple techniques can be used to create these groups; clustering algorithms in particular can be useful at this point in time. For relatively small-scale datasets, it can be sufficient to use a common sense approach and create product groups manually. For instance, in the example dataset, it has now been observed that the t-shirt category appears to be distinct in the manner in which it responds to price changes. Moreover, the historical sales volume and price point have also been shown to be relevant to the price response of a product. Hence, creating groups can be as simple as selecting all products from the t-shirt category and creating a number of groups based on the historical volume of demand and the price point of the product. Table 2.2 shows the product count for such a simple grouping on the t-shirt group in the sample data. The purpose of this table is purely to validate if there are a sufficient number of observations in each of the groups that are created in this fashion.

12 The most important perhaps being that caution needs to be applied when modeling time series: Training and testing sets should be carefully selected in a manner that avoids the training data from being contaminated with information that should only be present in the testing data [10].

2.4 Fitting Demand Curves Using Data

51

Table 2.2 Product groups created based on the observed feature importance for the example dataset. Specifically, the t-shirt category is further subdivided based on the price point and the historical sales success of the product Price category High Mid Low

Sales category High 42 609 2.121

Mid 273 987 3.066

Low 147 1.470 2.541

The bounds between high, mid, and low categories are defined simply by looking at the distributions of the price and the sales and ensuring that cutoffs are set at natural points or points that guarantee roughly equal-sized groups. For larger datasets, it may be valuable to search for specific cutoff points where behavior tends to change. For example, a price above e100 may cross a mental barrier for a customer, resulting in different behavior. However, given the limited size of the dataset, a simple approach has been taken here. Once these groups have been created, the next challenge is to fit price-response curves using historically observed sales. These price-response curves should be product-specific, but can use data from products in the same groups to be estimated more accurately.

2.4.2.2 Scaling Product Sales for Combination Assuming that a reasonable grouping of products has been created, the next step is to fit the price-response functions themselves. Specifically, the goal is to fit the logit models that have been introduced in Sect. 2.3.3, the main challenges being that products are sold at very different prices and that the volume of demand for different products can be orders of magnitude apart. This means that fitting a curve is more than just collecting the price-demand combinations and fitting a curve. The approach presented here makes use of the inherent structure of the logit price-response model. Specifically, the fact that the market size is given by the parameter C and that the parameters a and b determine the sensitivity of the priceresponse function. The approach taken for fitting price-response curves makes use of this property by determining the market size on an individual product level and the price sensitivity (i.e., the shape of the price-response curve) on an aggregated level using relative figures. Estimating Market Size (C) The market size is simply the upper bound of the logit price-response function. This is clearly shown on the left size of Fig. 2.5, where as soon as the price drops below 20, the demand starts to get close to the market demand of 1.000. The concept of market size is not used in the strict sense, as the complete market size for a certain product can be impossible to cover by a single retailer. Still, the term is used because it makes the parameters less abstract.

52

2 Understanding Demand and Elasticity

The market size (C) is best estimated separately for each product. In spite of products having been identified as belonging to similar groups, it is likely that the absolute magnitude of demand will be very different. Even within the group of low-priced and high-selling t-shirts, for instance, the sales volume varies between 1.303 and 299 units. This implies that the market size for the former product is substantially larger than for the latter. It is important to note that historical data is unlikely to contain many products that actively sell the maximal possible amount. Some products may run out of inventory, causing sales to be lost.13 Other products will never have been sold at sufficiently low prices to sell to their fullest potential—as it may be more lucrative to charge higher prices and sell lower volumes. Because of this, the approach to estimating C is to determine a reasonable upper bound, rather than to forecast the highest future sale that is likely to happen. The objective is to determine what the highest possible magnitude of demand could be, in the case where the price is low and inventory is plenty. In practice, it is not uncommon to observe retailers who implement extreme price reductions to get rid of superfluous inventory. Oftentimes, these extreme price reductions fail to have the desired effect of eliminating all inventory. The nature of the logit model, and specifically the parameter C, explains why this is the case. There is an upper bound to market demand, and even if prices are very low, the number of people who are interested in purchasing a certain product is not infinite. A simple approach is to determine the market size as an expression relative to the historical sales of a product. To do this, the sales data must be analyzed in order to find a good summary measure that can serve as a reference point. The effectiveness of such a summary measure was tested for the example dataset assuming that the market size is being estimated after observing sales for the first 6 weeks after a product’s introduction. The goal being to use these first 6 weeks to estimate the market size parameter for the remainder of the lifespan of the product (i.e., the remainder of the summer season). Three straightforward summary measures have been tested: the average weekly sales, the maximal weekly sales, and the average weekly sales plus two standard deviations. The degree to which these measures are appropriate is then measured based on how easy it is to construct an upper bound for the subsequent (unobserved) weeks. This corresponds to an adequate value for C, which effectively sets the upper bound for demand. A good upper bound is one that is as low as possible.14 An easy and visual way to gauge this is to compare the fraction of observations that are covered with the height of the bound. A lower average value of the bound for the same coverage indicates a higher-quality upper bound. Figure 2.8 shows what this looks like for the sample dataset. Three different bases for the market value are tested alongside each other. The chart is limited to

13 It

may be best to exclude such observations when training price-response models. is just common sense; positive infinity is always a possible upper bound—but not a very good one since it adds no information. 14 This

2.4 Fitting Demand Curves Using Data

53

Historic average Historic max Historic mean and std

400

Average upper bound value

350

300

250

200

150 0.9

0.92

0.94 Coverage

0.96

0.98

Fig. 2.8 Evaluation of different bases for estimating market size: comparing the coverage with the average value for different bases of comparison. Lower values are better, since these give equal coverage with lower average values

the likely relevant range upward of 90% coverage. Based on this figure, it appears that a combination of the historical mean and historical standard deviation tends to perform best for the interval between 0.95 and 0.975. Bi = μˆ i,t ≤6 + 2 · σi,t ≤6

(2.14)

This means that we can select this quantity as the basis (Bi ) for the estimate of the market size C for each product i (Eq. 2.14). This alone does however not provide sufficient information to set a definitive value for C. While the basis for this estimate should be adequate at identifying later sales peak, the exact level of this market size should be calibrated. Effectively, the desired coverage as illustrated in Chart 2.8 should be selected based on the level that provides the best possible approximation of the observed sales and elasticities. A simple and practical manner of doing so is by calibrating a multiple α, as shown in Eq. 2.14. Cˆ i = α · Bi

(2.15)

54

2 Understanding Demand and Elasticity

This calibration must however happen after the price sensitivity parameters have been estimated. It may be the case that a slight underestimation of the true market size results in a better-fitting curve to explain the observed price-demand patterns. Especially for products where the price variations are unlikely to ever approach the extreme low end of the spectrum, this may be favored. For products that have yet to be introduced, using historical data is of course not possible. In this case, an estimate for C could be made using a nearest neighbors approach. A nearest neighbor heuristic takes the n products that have historically been most similar to the product and averages the market size for these products. Such an estimate is of course likely to be much less accurate than one based on historical data, and it is advisable to correct this initial estimate as soon as sales data have been observed. Estimating Price Sensitivity The price sensitivity of the logit model is determined by the parameters a and b. The major challenge in determining these parameters is that there are usually too few observations for price changes of an individual product. To mitigate this, retailers often resort to making calculations using presumably constant elasticity values. This does however result in inaccurate estimates, especially for larger price changes. Making the observations of multiple products comparable requires re-scaling the prices and the demand. Depending on the context, different approaches to re-scaling are possible. If the products have a MSRP, this can provide a natural hinge, with the MSRP for all products being re-scaled to unity. The average demand at this price level can then also be re-scaled to unity to provide a natural base coordinate of (1,1). It is not an absolute requirement that products have actually been sold at exactly this price to use this approach. Likewise, if there is a strong competitor in the market who is price-setting, the price charged by this company can be taken to be the unity value for the re-scaling. Alternatively, if there are a lot of price variations over time, the average price at which a product has been sold can be an adequate candidate. At this point, it is very important to consider the context wherein a company operates in order to avoid selecting an inadequate candidate. It is unavoidable that the end result of this calibration will still be noisy, and outliers can be expected. However, for most contexts and under the assumption that an adequate grouping of products has taken place, the end result is sufficiently accurate for the purpose of fitting a price-response curve. Another scenario that will now be explored using the example data is that of a fashion retailer fitting a price-response curve for discounting. For these purposes, the list price of a product can be used as the unity value for re-scaling. This assumes that these points have been chosen somewhat sensibly on average, meaning that the retailer is not pricing itself out of the market nor is undercutting the market. Even if there are some products that are priced insensibly, the fact that products are grouped will avoid that single outliers have too big an impact on the end result. The standardization of the demand is more difficult than the price. Even after correcting for the demand index, there can still be substantial variations in the levels

2.4 Fitting Demand Curves Using Data

55

of demand over time. This means that the standardized price 1 can correspond to different levels of demand. Because the main purpose of the price-response curve is evaluating what the possible effect of a price change will be, it is also an option to use relative rather than absolute changes in demand. This often results in a much less erratic behavior. The disadvantage of using relative prices is that the demand at the starting point is assumed to be known. If the list price is reduced by 20% (i.e., from a standardized price of 1.0 to a standardized price of 0.8), the new data point can only be collected if the demand at the original price has a value. For the base pair (1.0, 1.0), this is the case, but for other points, this is not automatically the case. This is mitigated by using an iterative procedure that loops over starting prices. The process starts at the base pair (1.0, 1.0) and records all price changes from that point. For the example dataset, this only contains price reductions since we are dealing with data from a fashion player who only executes price reductions relative to the list price, but will never increase the list price of a product during the season. Once all price changes departing from this point have been observed, the observed price changes to every new price are counted. The process then moves to the price point that has the largest number of observations. These observations are then used to make an estimate of the re-scaled demand at that price point. For example, if the greatest number of price changes starting from price 1.0 is observed to a price of 0.8 and the average relative response is an increase in demand of 30%. In this case, the demand at a price point will be estimated at 1.3. This will then be the starting point for a next series of observations to be collected, as all the price changes starting from the price point of 0.8 are collected. This continues until all relevant price changes have been recorded. Note that it may be required to round prices in order to make this process manageable. One advantage of this method is that only actual price changes are taken into account. This avoids the price-response curve from being influenced too much by inherent variability of demand. The focus lies on the average response of demand in reaction to a price change. The solid line in Fig. 2.9 shows the results of this process when applied to the low-priced and high-volume t-shirt group in the example dataset. Up until prices that are a 50% discount, the resulting curve looks like it would be expected. Once discounts go deeper however, it appears that demand actually decreases as price increases. This is something that is not unusual when working with real datasets. The reasons for such observations are typically inventory being close to sold out or severely unpopular products being heavily discounted. This results in the apparent inverse relation between price and demand. This is not however behavior that a retailer should wish to model in the price-response curve for decision-making. The observed and re-scaled observations can be used to fit a price-response curve, as is also shown in an example on Fig. 2.9. A lot of software tools are capable of fitting a curve to data points.15 When fitting such a curve, it can make sense not to

15 Examples

are the SciPy package in Python, MATLAB, or a solver in Excel—among many other candidate solutions.

56

2 Understanding Demand and Elasticity

Avg of re-scaled price-response combinations Fitted price-response curve

7

Re-scaled Demand

6 5 4 3 2 1 0.2

0.4

0.6 Re-scaled price

0.8

1

Fig. 2.9 Example demand curve fitted to low-priced, high-volume selling t-shirts in the example dataset. The solid line shows the averaged transformed observational data; the dotted line shows the demand curve that is fitted to the data

use the averages, but to use the actually combinations of re-scaled price and demand. This will make sure that the fitted curve adheres better to price points for which there is more observational data and that the curve is not fitted to outliers such as are visible for very low prices in Fig. 2.9. Additional Improvements There are several ways in which this approach can be further refined, depending on the nature of the retailer. One such improvement is attaching a higher relative weight to the sales of the individual product when fitting a price-response curve, for example, by adding each observation of the product itself multiple times to the set of observations used to train the model.

2.5

Making Forecasts

At this point, a curve has been fitted for a single product group. For the sake of this example, the parameters have been rounded to the following values: C = 8.0, a = −4, and b = 6 (Eq. 2.16). This results in the curve shown in Fig. 2.9. Of course,

2.5 Making Forecasts

57

Table 2.3 Illustrative example for the calculation of expected demand based on the observed demand in the current time period, in combination with the fitted price-response curve presented in Eq. 2.16

List price Discount Price Demand DI d∗

Current e24.99 10% e22.49 19 0.85 22.31

Next e25.99 40% e15.59 60 0.86 69.71

this is a re-scaled function that is valid for a group of products, but this cannot be used to make predictions on an individual product. Before doing so, the curve needs to be scaled back to the level of demand of the specific product. d(p) =

C · e−(a+bp) 8 · e−(−4+6p) = 1 + e−(a+bp) 1 + e−(−4+6p)

(2.16)

The simplest way in which the function can be scaled back for use with a specific product is by calculating the arc elasticities—which are expressed in relative terms—and using these to calculate the expected response of demand. This means that it is not required to estimate new values for C, a, and b for every product. This process is easiest to illustrate using an example. Table 2.3 shows two observations of the price and demand during two subsequent time periods. Assume that an estimate of the likely demand has to be made during the current period. The logit price-response function that has been fitted to this group of products can be used to this end. First and foremost, it has to be taken into account that the demand that is being estimated by this model is the demand that has been corrected for the price index, d ∗ , and not simply the observed demand in units. This means that it is assumed that the demand index (DI ) has to be estimated for future periods. In practice, this is typically feasible for 4 to 6 weeks into the future; with decreasing accuracy the further in the future, this forecast has to be made. For the calculation here, it will be assumed that the demand index can be accurately estimated for the next period. To calculate demand, the price-response curve (Eq. 2.16) will be used to calculate the arc elasticity (Eq. 2.2) for the specific price change. The starting price p1 is simply set to 0.9, and the new price p2 is set to 0.6. This corresponds to the manner in which prices have been standardized under the current approach.16 Calculating demand for these prices using the fitted function results in a demand of 1.58 and 4.78, respectively, as shown in Eqs. 2.17 and 2.18. d(p1) =

16 Depending

required here.

8 · e−(−4+6·0.9) = 1.58 1 + e−(−4+6·0.9)

(2.17)

on the manner in which standardization has taken place, there may be extra steps

58

2 Understanding Demand and Elasticity

d(p2) =

8 · e−(−4+6·0.6) = 4.78 1 + e−(−4+6·0.6)

(2.18)

These numbers can easily be combined to determine the expected relative response to a price change. Equation 2.19 shows how an estimate of the corrected ∗ ) can be created. Getting to an estimate of real demand in the next period (dˆnext demand requires multiplying this quantity with the demand index of the next period.17 This would mean that the expected demand in the next period is expected to be 58 units. d(p2) ∗ 4.78 ∗ dˆnext = · dcurrent = · 22.31 = 67.52 d(p1) 1.58

(2.19)

To evaluate the accuracy of the forecast, it is best to compare the actual relative ∗ to the forecasted demand dˆ ∗ . Different error measures are possible, demand dnext next but often the mean percentage error (MPE) and mean absolute percentage error are good and intuitive candidates that express the quality of the forecast. As always, it is important to be careful when interpreting such summary measures, as long tails and products that are very slow selling may skew this or other measures significantly. For this single example, the MAPE is equal to only 3%—a very good result. This is of course only anecdotal, and performance should be evaluated at scale to ensure that the price-response function can provide reasonably accurate predictions of demand. The next section will illustrate this process—again using the example dataset to ensure that a realistic picture is painted of attainable performance.

2.6

Evaluating Performance

Before adopting a price-response model in practice, it is important to evaluate its performance in a structured fashion. The most important thing to remember in this case is that the data on which the accuracy is tested should be different from the data on which the model has been calculated. Continuing the example of the high-selling t-shirts, a good setup would be to tune a model based on the summer season 2 years ago and to use the data from last year’s season to validate if the model performs adequately. This avoids problems with overfitting, which could result in accuracy appearing higher than it is in reality. Only observations where the price has been changed should be included in the performance measurement. Besides looking at summary metrics such as the previously discussed MAPE, it is often useful to look at the data in detail. A scatter plot that compares the actually observed values with the forecasted values is often a quick and simple method of doing so, as shown in Fig. 2.10. Evaluating this scatter plot means observing three things: (i) How close points are to the diagonal; the closer the points are, the more 17 As

mentioned earlier, it is assumed here that this can be estimated accurately.

2.6 Evaluating Performance

59

Observed demand

200

150

100

50

0 0

50

100

150 Forecasted demand

200

250

300

Fig. 2.10 A scatter plot as a tool to evaluate the quality of demand forecasts. The comparison is ∗ and the actually observed value. This assumes that there is a parallel between the estimate for dnext process for estimating the evolution of the demand index DI

accurate the forecast is. (ii) Substantial differences in noise for different demand levels. While it can be expected that absolute differences will increase as the volume of demand increases, a broadening of the relative error may indicate that products should be split into more groups. (iii) Systematic over- or under-forecasting; it can be the case that the majority of points are below or above the diagonal. This implies that demand is systematically over- or underestimated, respectively. Extremely high accuracy scores can never be expected in a context such as the example dataset. At times, there can be discussions about how good a forecast should be, aiming for a certain threshold of a statistical metric such as R 2 or MAP E. However, this is often not a fruitful path of discussion as it is often impossible to get very high values for these traditional measures. The goal however should be to get as close to what is feasible as possible and to make sure that there are no systematic mistakes in forecasts (i.e., over- or underestimations). Accuracy measures should be compared to a realistic benchmark (e.g., the current system that uses rules of thumb), not against a randomly chosen value of a performance measure. A good performance evaluation can also be used to construct confidence intervals around estimates. This can be an essential tool in decision-making. This confidence

60

2 Understanding Demand and Elasticity

interval is likely to be significantly broader for smaller product categories and narrower for categories that have a lot of data to support it. These intervals can be very useful to evaluate statements such as “at this price, it is 90% likely that this product will sell out in 10 weeks’ time.”

2.7

Conclusion

This goes to show that fitting demand curves can be more intricate in practice than introductions to economics might lead someone to believe. Especially for a retailer who offers a broad assortment, and for products that have limited lifespans, getting to an accurate approximation of the price-response curve can be challenging. In spite of this, an accurate price-response function is a conditio sine qua non for price optimization. This chapter has shown how this condition can be satisfied in a realistic setting, accounting for the sparsity of sales data and a wide product range. While useful these techniques will not work for every retailer, and adjustments are likely to be needed to get adequate results.

References 1. Levy, D. (2021). Maxims for thinking analytically: The Wisdom of Legendary Harvard Professor Richard Zeckhauser. Dan Levy. 2. Phillips, R. L. (2005). Pricing and revenue optimization. Stanford University Press. 3. Özer, Ö., & Phillips, R. (2012). The Oxford handbook of pricing management. OUP Oxford. 4. Zolciac, A. (2017). I am rich: The story of the world’s strangest mobile app. https://insanelab. com/blog/sales-and-marketing/i-am-rich-worlds-strangest-mobile-app/. Accessed 03 May 2022. 5. Brezger, A., & Steiner, W. J. (2008). Monotonic regression based on bayesian p–splines: An application to estimating price response functions from store-level scanner data. Journal of Business & Economic Statistics, 26(1), 90–104. 6. Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32. 7. Freund, Y., & Schapire, R. E. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55(1), 119–139. 8. Shin, T. (2021). Understanding feature importance and how to implement it in python. https:// towardsdatascience.com/understanding-feature-importance-and-how-to-implement-it-inpython-ff0287b20285. Accessed 19 May 2022. 9. Lipovetsky, S., & Conklin, M. (2001). Analysis of regression in game theory approach. Applied Stochastic Models in Business and Industry, 17(4), 319–330. 10. Radecic, D. (2021). Time series from scratch, train test splits and evaluation metrics. https:// towardsdatascience.com/time-series-from-scratch-train-test-splits-and-evaluation-metrics4fd654de1b37. Accessed 19 May 2022.

3

Improving the List Price

3.1

Improving List Pricing

The list price, sticker price, or base price is the most important anchor for product value. In spite of its visibility and unquestioned importance, the list price often has weak foundations. Frequently, there is more discussion about the quantity to order than the price at which to sell. This leaves a lot of potential untapped. This chapter explains how list pricing can be improved. The core idea is that the list price changes from something static into something that is variable.1 This variation can express itself as a change over time or as a change depending on a certain context, customer, or external influence. The price becomes conditional on something else, rather than a static property of a product. The variables that constitute this “something else” can be legion, but have in common that they are a departure from tradition and gut feeling. The most straightforward improvement that can be made to the list price has been discussed in Chap. 2, where demand models have been introduced. By using such models, price is based on the willingness to pay of consumers, rather than the cost of products. This basic premise will not be reiterated in this chapter. Rather, this chapter discusses nuances and more advanced techniques that can be useful in setting list prices. The mathematics behind many of these techniques can be quite complex and have therefore been omitted in their entirety. For the reader who wants to know more, some key reference works are mentioned. This chapter discusses three topics. First, the implications of the competitive landscape for list pricing are discussed (Sect. 3.2). Depending on these conditions, it may be very valuable to obtain competitor price information, which is the next topic

1

Another term for this might be dynamic pricing, but this concept is often defined more narrowly in literature. Dynamic pricing itself is discussed in the chapter and is seen as a possible part of variable list pricing—but not synonymous.

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 L.-P. Kerkhove, Data-driven Retailing, Management for Professionals, https://doi.org/10.1007/978-3-031-12962-9_3

61

62

3 Improving the List Price

of discussion (Sect. 3.3). The final and largest part of this chapter discusses more advanced techniques for intelligent pricing. This is placed under the umbrella of dynamic pricing, but aims to comprise the complete spectrum of algorithms ranging from simple to highly complex (Sect. 3.4).

3.2

Market Conditions: Direct Versus Indirect Competition

Most retailers are not so fortunate as to be monopolists and have some form of competition. This competition can be either direct or indirect. Direct competition means that there are other retailers who sell identical products. Typical examples include resellers who carry well-known brands, as well as supermarkets selling FMCG products. Alternatively, some retailers operate in markets with indirect competition. In this case, competitors are selling substitutes, but not identical products. These retailers are often selling their own brand of products and may or may not be vertically integrated and have control over the manufacturing process as well. Operating in a market with direct competition often means that customer demand is more elastic. All things being equal, a customer will choose to buy a product at the vendor offering the lowest prices. Naturally, there are other things such as the shopping experience and other accompanying services that can make it so that a customer is still willing to pay a premium to shop at a more expensive retailer. A second implication of direct competition is that retailers often have to relinquish control of list prices in order to be allowed to sell certain brands. A retailer who is reselling fashionable sneakers or perfume will have to follow the prices suggested by the manufacturer. If she fails to do so, it is likely that she will no longer be allowed to sell these products in the future. The result of this is that price competition takes place in the domain of discounts, rather than list prices. Retailers operating under indirect competition can take advantage of lower price elasticity, but have to cope with greater uncertainty. The reason for this uncertainty is the challenge to determine the perceived market value of their products. Substitutes are per definition different, which raises the question how much better or worse the retailer’s products are in the mind of the consumer. Market conditions have an effect on the way in which demand can be modeled. Retailers under direct competition rightfully attach more value to prices of competitors, which can be used as an integral part of the logit models that have been introduced in Chap. 2. Inversely, those who experience indirect competition are often much more reliant on their own historical data to model demand. Naturally, this is not a pure dichotomy, and many retailers will fall somewhere in between: selling some products that are identical to products sold by other retailers and selling other products that are unique in their assortment. Knowing and understanding the type of competition is key to understanding customer demand.

3.3 Obtaining Competitor Price Information

3.3

63

Obtaining Competitor Price Information

Competitor prices can be a key piece of information in the pricing decision. Understanding how demand reacts to current and historical price differences can give an unquestionable edge to retailers. In spite of this, competitors are often unwilling to share this information, even after asking nicely. This has given rise to a complete industry that sells price information. Prices are collected by anonymous shoppers who visit competing stores. This pricing information is then structured and sold. Web scraping is also an increasingly popular method to collect price information. This has resulted in an arms race between companies trying to make it difficult to scrape websites and others who circumvent these defenses [1]. Contrary to what might be expected, scraping has not made manual recording of prices obsolete. The rise of more advanced pricing strategies has resulted in many retailers offering different prices across channels and geographical locations—often at a very granular level. Because of this, it can still be relevant to collect information using mystery shoppers. The remainder of this section will focus on the challenges of price scraping, which is the only cost-effective way of obtaining price information for most retailers. The scraping process itself consists of two steps: the scraping and the transformation of data into a usable format. These two steps will now be briefly discussed.

3.3.1

Web Scraping

The act of scraping is first preceded by crawling. This is the act of finding the pages that contain relevant information. In a retail context, this often means that all the product detail pages have to be found. Depending on the size of the website being scraped, this can be a sizeable undertaking. Scraping is the subsequent step of downloading relevant information from these pages once they have been found. Depending on the technology used to create the website, this can be very simple or might require taking the long way round. In the simplest cases, it may be possible to find the underlying API that can provide the information and formatted. On the other end of the spectrum, it may be required to render the full webpage and simulate a human user browsing before information can be extracted. A retailer who wants to collect information by scraping has a choice of different technologies. For low data volumes, tools are available where users can manually input pages of interest and indicate what information they would want to collect and at what velocity. Moderate data volumes are likely to require some programming; a good introduction to this can be found in the book Web Scraping with Python [2], which introduces Python staples such as Scrapy and Beautiful Soup. If vast amounts of data need to be scraped, more advanced online services are often required. These

64

3 Improving the List Price

services often use smart tricks such as rotating IP addresses to trick websites into believing that traffic is being generated by human users and not by robots. The act of scraping websites has long been—and is likely to continue—to be a legal gray area. A famous case is the one of eBay versus Bidder’s Edge [3], where the latter was sued for systematically scraping information and selling this data to consumers. Essentially, this practice monetized eBay’s data by selling it to bidders who wanted an advantage. In the process, extremely high loads were generated on eBay’s servers, representing a not insignificant part of the total server cost of the company. The conclusion of this trial was that scraping itself was not illegal, but that putting unreasonable stress on other companies’ infrastructure was. More than two decades later, this is still the status quo in most legal jurisdictions, but your mileage might obviously differ.2 If you decide to scrape information from competitors, a decision will have to be made on the velocity at which products are scraped. More frequent measurement will be more expensive in terms of computing power and be more likely to be noticed by the company operating the website being scraped. Often, it can make sense to add some intelligence to this and to make the scraping speed variable for different products. Products that have been observed to have frequent price changes are scraped often; data on other products is collected less frequently. Scraping systems are extremely prone to breaking down by nature. The websites that are being scraped can decide to change important parts of their infrastructure without warning. This can cause the entire scraping process to come crumbling down. Depending on how crucial this information is for your own processes, it can be worthwhile to set up an active monitoring system and to have programmers on stand-by. A good monitoring system does not only include systems that check if data is still coming in but also sanity checks on the nature of the data coming in—as well as periodical checks by humans to see if the data can be corroborated on the websites themselves. Even so, uptime promises are hard to keep in a web-scraping context.

3.3.2

Transformation and Matching

Data generated by scraping systems often needs a substantial amount pre-processing before it is usable. An important part of this is cleaning and transforming the data into a uniform format. This might seem surprising, but consider the vast number of ways in which prices can be presented on websites. Different retailers will list discounts in different manners, and at times, discounts can get quite creative with things such as “buy 3 get 1 free”-style discounts. Also, there are no guarantees that every possible manner in which a retailer lists prices is active when the scrapers are being built. Other pieces of data that have to be pigeonholed are product

2

I am not a lawyer; please do not take legal advice from me. When in doubt, consult with a legal professional.

3.3 Obtaining Competitor Price Information

65

descriptions. These are often entered in a number of fields, but the nature of these fields and the categories used will tend to differ between different retailers. The fact that product collections are always changing adds further complexity to this exercise. The processing system needs to be capable to handle new products, preferably without human interaction. This should result in a relatively simple standardized data structure that stores static properties of products in a single table and volatile information such as the price and inventory level in a table containing timestamps for the specific observation. Making data usable typically requires processing by a matching engine. This process links identical (or suspected identical) products together. Such matching is often easiest for environments with direct competition, and truly identical products can be found. At times, product information may even include unique identifiers that are identical across different suppliers, such as is the case for online pharmacies. Not everyone is so lucky. Retailers operating under indirect competitions have a bigger challenge to match products. Here, there will often be relied on so-called fuzzy matching techniques [4]. These will calculate a probability of two products being identical. Most often, this uses structured information like the name of the product, as well as the product properties. In some cases, image processing can also be useful to match products. Regardless of the method that is being used, there will always be an error margin. This results in a trade-off: requiring high degrees of certainty that a product is indeed a match will result in a large fraction of products remaining unmatched. The opposite situation will result in a large number of matched products, but will inevitably contain a number of erroneous matches. The right answer depends on the manner in which this information will later be used. Some systems that give the impression of being fully automated are in reality Mechanical Turks [5]: human operators who are confirming the matching of products. This is often done in tandem with an algorithm that speeds up the process by suggesting a number of likely matches.

3.3.3

Using Competitor Price Information

The main use for competitor price information is to investigate cross-price elasticity, in other words, how is demand affected by the price positioning relative to your main competitors. If this proves to have a significant effect, this can be used as basis for re-scaling the demand model. In such a situation, the price will be expressed as relative to the price of a competitor or relative to a weighted basket of competitor prices. Competitor prices can also be used to investigate the dynamics of the market. Who is the price leader for a given product, and who is responding to price changes in kind? It is likely that different players are experimenting with intelligent pricing. Analyzing how they respond to price changes can reveal a lot of information about the tactics that they are using. This information in turn can be highly useful to adjust the employed strategy.

66

3 Improving the List Price

It must be noted that competitor price information is not a precondition for intelligent list pricing. A lot can be achieved based on information available in a retailer’s own systems. The observed demand can already be a solid basis to fit priceresponse curves. Moreover, even if the cause for a shift in the willingness to pay (i.e., a price change at a competitor) is unknown, the effect can still be detected and acted upon. The cost of obtaining competitor information must be weighted against the likely advantages that will be gained from it. Especially for smaller retailers, these costs may be very significant, and approaches that do not rely on competitor prices may be preferred.

3.4

Dynamic Pricing

Fitting a price function to historical data and calculating the optimal price level is a big step forward. But it does have one big limitation: the real world is not static. The nature of the price-response curve will change over time, and costs and objectives might as well. Hence, it is desirable to have a method of pricing that can act on these changes. This is where dynamic pricing enters the picture.

Dynamic pricing is the practice of adjusting prices based on changes in supply or demand, the purpose being to maximize profits of the retailer. As a consequence, prices of a product will change over time, reflecting current market conditions. Related concepts include revenue management, algorithmic pricing, surge pricing, demand pricing, time-based pricing, and contextual pricing. Dynamic pricing will be used as an umbrella term that covers all these variations.

The sort of the variables that are used to trigger price changes is very diverse and depends on the type of retailer. A well-known example of dynamic pricing can be found in the travel and hospitality industry: the price paid for a hotel room or an airplane ticket will depend on a number of variables. These will include the amount of time in advance the purchase is made, the expected popularity at the given time, and the fraction of tickets or rooms that have already been sold, among others. The mathematics behind dynamic pricing gets quite complex quite quickly. For this reason, this chapter focuses on the basic principles and motivations behind dynamic pricing, rather than the algorithms. The interested reader who wants to know more about the mathematics behind this is referred to the 23rd chapter in The Oxford Handbook of Pricing Management [6]. Another excellent, but somewhat technical, work is Revenue Management and Pricing Analytics by Gallego and Topaloglu [7]. For the more advanced techniques that are applied as a part of dynamic pricing, Sutton and Barto’s Reinforcement Learning: An Introduction is also recommended reading [8].

3.4 Dynamic Pricing

3.4.1

67

Preconditions for Dynamic Pricing

Implementing dynamic pricing implies making an investment. Prior to making this investment decision, it is best to validate that the preconditions for successful dynamic pricing are satisfied. This is not an all-or-nothing decision, and it is perfectly possible to only apply dynamic pricing to the part of the product portfolio where it is most relevant. Relevance does not automatically mean the top-selling products. It may be the case that these top-selling products are strategic to the positioning of the brand in the market, in which case prices are likely determined by human decision-makers. In such situations, it may be valid to create dynamic pricing systems for the long tail products, which cannot be evaluated with sufficient velocity by human decisionmakers. Depending on the context, the opposite might also be true. It may be the case that the products in the long tail have very low sales volumes and therefore cannot generate sufficient data for dynamic pricing systems. Hence, the choice may be made to only apply true dynamic pricing models to the top-selling products. Everything depends on the context.

3.4.1.1 Data Availability and Quality The first precondition for dynamic pricing is data availability. Dynamic pricing policies work best if good estimates can be made of the price-response function. This can only be done if there is good-quality information on the product and its properties, as well as accurate information on historical sales. In cases where the product price has not been changed often, data from related products (i.e., products from the same type or category) can be used to model demand. Arguably more important than the quantity of data is data quality. Dynamic pricing implies that prices will be changed based on signals in the data. If these signals are not recorded correctly, this can cause systems to go haywire. This often means that the manner in which costs are calculated must be carefully evaluated. Special attention must be paid to the way in which inventory is valued,3 which can be a complex undertaking. Possible issues are too numerous to discuss, but suffice it to say that the quality of the foundations must be above reproach. One of the pitfalls in offline retail is often knowing when prices are visible to consumers. There can often be considerable time between the adjustment of the price in the system and the time when the price change has permeated to all stores and products. Often, the only surefire way to confirm that a price has changed is the observation of a transaction. This constraint may have to be addressed if the velocity of possible price changes is increased. Otherwise, it may be that demand models are based on incorrect information—i.e., demand at a price level other than the one in the centralized system.

3

This relates to FIFO-/LIFO-style calculations: is the value of the inventory the value to replace it? Or should it be valued at the original cost of purchasing?

68

3 Improving the List Price

There is no hard and fast rule that states how much data is needed to implement a dynamic pricing system. In theory, it is possible to start with dynamic pricing without any historical data, basing the initial moves on best guesses of the price point. The system is then left to its own devices and allowed to change prices in order to find the optimal price point, the downside of this of course being that initial estimates may be far removed from the true optimal price point. Hence, it can take considerable amounts of time before the system converges to a close-to-optimal state.

3.4.1.2 Variability of External Conditions A second precondition is the existence of variations of supply or demand over time. The core idea of dynamic pricing is to adjust prices to adjust to a new optimal price point. If there are no shifts in either supply or demand, the optimal price point is simply constant. A change in demand can be as simple as decreasing demand toward the end of a season or toward the end of the lifetime of the product. It may also be much shorter term, such as demand driven by a heat wave or a cold snap. As discussed earlier, competitor prices can also have a significant influence on the demand observed by a retailer and may cause changes in the pricing policy. Changes in the supply can be driven by changing prices for raw materials, which may or may not be passed on to customers. There may also be issues with supply altogether, causing situations where there are insufficient products to satisfy demand at the current price level. Opposite forces may of course cause prices to decrease. 3.4.1.3 Dynamic Prices Are Socially Acceptable Social acceptance of dynamic pricing is an important third condition. Depending on the nature of the product and whom the product is sold to, dynamic pricing may or may not be perceived as acceptable. One such example is the price of concert tickets [9]; while it may be perfectly possible to increase revenue and profits— not to mention decreasing scalping—by applying dynamic pricing, there are loud voices who are opposed to this practice. This includes famous artists such as Bruce Springsteen [10] and bands like Pearl Jam [11] who stress that they want their concerts to remain affordable for all fans. There can also be a fine line between dynamic pricing and price gouging. Uber has received flak on multiple occasions for raising prices during natural disasters and emergencies [12]. Likewise, various retailers raised prices on remote working equipment such as webcams during the COVID lockdowns when working from home became mandatory in many countries [13, 14]. Dynamic pricing under such conditions has a bitter aftertaste. Preventing this means thinking carefully about what is acceptable and what is not given the targeted customer segment. These constraints should then be imposed on the dynamic pricing system to avoid damaging the brand image of the retailer.

3.4 Dynamic Pricing

69

3.4.1.4 Operational Feasibility A final precondition is the need to take into account operational limitations with regard to changing prices. Especially retailers operating an offline store network may have severe restrictions on the velocity at which prices can be changed. Processes may require employees to change price tags by hand, which is cumbersome and error-prone when done too often. Along the same vein, arguments can be made in favor of reasonable price stability. If left to their own devices, mathematical models are likely to suggest frequent price changes. Each time a relevant event is recorded (e.g., the sale of a product), the optimal price is likely to shift—albeit by a small amount most of the time. It is of course not desirable that products change price before the customer has been able to reach the register. As such, it is important to ensure that there is a layer of logic that prevents superfluous volatility of prices. Price changes should be effected at acceptable time intervals, and changes should only be applied if they are sufficiently significant.4

3.4.2

Types of Dynamic Pricing

Without getting lost in terminology, this section goes in more detail on the different techniques that can be employed to make prices dynamic. The different approaches to dynamic pricing are listed in rising order of complexity, ranging from static rules to reinforcement learning models. In essence, this could represent a natural progression of a retailer who starts out with simple improvements and continually increases complexity to reap additional benefits. As in the rest of this chapter, the focus lies on aspects that are relevant to a retailer, rather than going into great technical detail on the underlying mathematics.

3.4.2.1 Fixed Rule Dynamic Pricing The simplest form of dynamic pricing is the adoption of fixed pricing rules. Prices are conditional, but the conditions under which prices change are certain and can be anticipated by both customers and the retailer. An example of fixed pricing rules is the congestion tax that is becoming more and more common in big cities. Depending on the time and day of the week, a bigger or smaller tax is applied automatically. These amounts are fixed, based on the timing of typical rush hour traffic. The best known examples can be found Stockholm and London. In both cases, these taxes were first met with skepticism, but have in retrospect been found to be effective at reducing congestion in these busy city centers. Another well-known example is happy hour in a bar. The moment in time is strategically chosen to entice people to visit the venue earlier than they otherwise

4

This is of course relative, and a small price change may be highly relevant if large volumes of a product are sold.

70

3 Improving the List Price

intended to do. This increases overall revenue and makes the venue seem more attractive to passers-by, as there are more people present. A key condition for fixed rule pricing is that it can only be effective if there is no real possibility for arbitrage, hoarding, or delayed purchasing. For the happy hour example, this is indeed the fact, as it is undesirable to order all your drinks at once during happy hour. The fact that this rule is unchanging, and that customers anticipate this moment, is not a bug, but a feature. Customers adjusting their behavior in line with the pricing rule is the desired behavior. The same holds true for the congestion tax, where the goal is to encourage changes in behavior. Fixed rules make it possible to reap some of the benefits of dynamic pricing, without making large investments. Another advantage is that rules can be communicated openly to customers. This can improve the degree to which price changes are viewed as reasonable and fair. Customers who want to pay less know under which conditions prices are lower and can adjust their behavior accordingly. There are several ways in which a retailer can adopt such pricing rules. The most self-evident can be categorized as “special events,” moments in time when a certain deal is available. A typical example is Cyber Monday and Black Friday sales. While it may not be the case that precise discounts are advertised beforehand, customers are aware of what to expect. Another such fixed rule can be a discount offered for people who subscribe to a newsletter—as a one-time incentive. This is also something which is continually advertised, and customers understand what actions are required to reap the benefits. Within a retail context, fixed rules run the risk of opening the door for certain kinds of arbitrage or delayed consumption. Customers will postpone purchases in anticipation of Black Friday sales—where the goal of the latter was to increase overall sales, not to shift sales in time. Such behavior is a key motivation to adopt more complex variations of dynamic pricing. Within this context, dynamic pricing has been presented as a key solution to the discount trap [15] that pesters many retailers.

3.4.2.2 Variable Rule Dynamic Pricing Variable rule dynamic pricing makes the price dependent on variables that are not automatically known in advance. A side effect is that the set of rules becomes more complex, but this complexity is not a prerequisite—and simple strategies are still perfectly possible. This increase of complexity often coincides with decreased transparency toward customers, one reason for this being that the additional complexity is by definition harder to explain and might lead to discussions. Another motivation for increased secrecy is that these pricing rules are often employed to give the company a competitive edge. Communicating openly what the pricing tactics are opens up too many possibilities for competing retailers to game the system. Uber’s surge pricing rules that were mentioned earlier are an example of this kind of dynamic pricing. Depending on the level of demand in relationship to supply, prices for rides increase or decrease. The argument is made that this guarantees that people who really need a ride can always catch one. Naturally, the more base

3.4 Dynamic Pricing

71

economic motive that this increases overall profits obviously also factors into the adoption of this practice. Another common example can be found in food retail where discounts are applied to products that are nearing their sell-by dates. This increases the likelihood that products will be sold rather than having to be destroyed. A possible downside of this practice is that customers who purchase a discounted product might be substituting another product for which they would have been willing to pay full price. Inversely, it can also be argued that there is marketing/brand value in this practice, improving the environmentally conscious image of a retailer. This rule is different from the fixed pricing rules discussed earlier, because it cannot easily be predicted what products will be discounted at a given moment in time.5 While the concept behind this dynamic pricing rule is easily understood by customers, it cannot be predicted what products will have excess inventory and at what times, making arbitrage substantially harder. Applying markdowns to products that are nearing the end of their lifespan is also a kind of variable rule dynamic pricing. Depending on the expected demand and the available supply, it may be beneficial to present customers with a discount. The precise way in which this discount is calculated is not openly communicated to customers. An in-depth discussion of this practice can be found in Chap. 4. The price of competitors is especially important for retailers who are in direct competition (see Sect. 3.2). As such, the price placed on products is likely to be dependent on the price of one or more competitors. As soon as multiple competitors in a market adopt this practice, a game-theoretical problem arises (see Sect. 3.4.3). One of the undisputed champions of dynamic pricing is of course Amazon. The way in which Amazon has positioned itself as a platform for sellers makes it so that they do not even have to design many of the pricing algorithms themselves, as many sellers on the platform are employing their own pricing algorithms [16]. Amazon does however provide access to recent price information to sellers through a multitude of APIs, encouraging sellers to compete for the lowest price. Retailers on the platform often go to extreme lengths to determine what can give their product an edge, trying to guess the systems that Amazon is employing under the hood to decide the amount of visibility that a product receives. Moreover, there is much reason to believe that sellers who employ these kinds of tactics are significantly more successful in winning sales. The loudest voices on dynamic pricing tend to focus mainly on online markets. This goes both for research and for off-the-shelf software tools. The reason for this is obvious: dynamic pricing is easiest to implement and easiest to study in an online environment where fetching price information has comparatively low costs and fastpaced price adjustments are easily feasible. Dynamic pricing in offline markets does however have very significant potential, not in the least because much of what happens online focuses on new customer

5

Assuming that the retailer is not in the habit of consistently overstocking, meaning that there is always superfluous inventory.

72

3 Improving the List Price

Action

Environment

Learning agent Observation

Reward

Fig. 3.1 The basic dynamic of a reinforcement learning algorithm. The intelligence resides in the learning agent, who interacts directly with the environment in order to maximize long-term rewards

acquisition—more often than not by means of offering the lowest price for a product. Brick-and-mortar environments on the other hand have a much lower threshold for experimenting with raising prices and increasing margins for certain products, the downside being that there are often severe operational limitations to changing prices.

3.4.2.3 Dynamic Pricing by Means of a Learning Agent The next evolution of dynamic pricing would be the inclusion of true artificial intelligence. This implies that the applied rules are no longer thought up by human decision-makers, but are inspired on data that is observed by a system that decides on prices. Depending on the context, it can be desirable that the reasoning used by such a system is understood, or this level of understanding may not even be required. The specific branch of AI that is most applicable is the use of reinforcement learning. This branch of artificial intelligence is focused on systems that learn by means of interacting with a live environment. This is opposed to traditional systems that have to be presented with lots of relevant historical examples prior to be able to make predictions and decisions. Within a pricing context, this type of algorithm will adjust prices either to gain a better understanding of demand or in order to reap benefits from the current understanding of the nature of demand.6 A basic schematic explaining the workings of a reinforcement learning algorithm is shown in Fig. 3.1. A learning algorithm observes the environment; in this case, this would mean the demand for a product under certain conditions, as well as peripheral information such as inventory levels. Based on this input, the algorithm takes action.

6

This type of balancing between exploration and exploitation is typical for many algorithms. At some point in time, it is more beneficial to learn about the environment; at other points, it is more appropriate to use existing knowledge. Example of this can be found in optimization algorithms that often have elements that diversify the solution, such as the mutation operator in an evolutionary algorithm. At the same time, many algorithms also employ local search techniques to improve incrementally on existing good solutions.

3.4 Dynamic Pricing

73

The action in this case would be to adjust the price (or keep the price the same). The algorithm will then observe what the result of this action has been and learn from what it has observed. It is important to note that reinforcement learning algorithms are tuned to optimize long-term rewards, not just short-term profits. It may be the case that decreasing the price to be the cheapest supplier would increase revenue in the next period. However, if this means that inventory runs out prematurely, this is not an optimal strategy over the long run. Naturally, it is key to define clearly what the long run should be. An important downside to learning from practice is that there may be a considerable cold start problem. If the algorithm is not primed with some type of information, the first guesses it makes will likely be widely inaccurate, and it may take a considerable amount of time before the algorithm converges. Because of this, it is important to prime the algorithm with some basic kind of logic or strategy. One way of doing this is to let the algorithm learn in a simulated environment that makes some simple assumptions about the behavior of demand. This principle is a radical departure from the variable pricing rules that have been defined by a human decision-maker. The role of the human decision-maker changes into defining the right objectives for the algorithm, rather than defining the algorithm itself.

3.4.3

Dynamic Pricing and Price Wars

Dynamic pricing that acts on price changes at competitors must be treated with caution. Especially if multiple actors have implemented dynamic pricing systems, it is possible that these will inadvertently start price wars. Parallels can be drawn with flash crashes in the stock market [17], where interactions between many automated systems caused very undesirable end results. The advice here is to tread with caution and to safeguard systems against runaway pricing decisions. In effect, the game being played resembles a repeated prisoner’s dilemma. Each player has a short-term incentive to undercut the other. However, because the game is repeated, assuming a more cooperative attitude is generally a better long-term strategy. A classic computational experiment by Axelrod and Hamilton [18] showed that when pitting multiple such strategies against each other, the best performing strategy was a tit-for-tat approach. This approach was essentially cooperative and would not make an aggressive move by its own accord (i.e., undercutting competitor prices); however, when a competitor made such a move, it responded in kind and adjusted prices to undercut. The simple playground analogy being that you do not want to be the bully, attacking other children. At the same time, you do not want to be the pushover who gets bullied and takes no counteraction. While this might not be solid parenting advice, research shows that it is sensible in the context of dynamic pricing policies. Realistic strategies will of course at times try to undercut competitors to gain market share. This is an unavoidable market dynamic, but not one that should be

74

3 Improving the List Price

employed across the board for all products by most retailers. As was discussed in Sect. 1.3.1, a low-price strategy is typically one that can only be carried out successfully by the largest player in a market. Hence, the tactic is to compete on price for a select few products or customer types that are a specific focus of the retailer in the competitive landscape. Readers who want to know more about the most common game-theoretical models that are relevant in this situation are referred to the 19th chapter of The Oxford Handbook of Pricing Management [6], which provides an excellent introduction in some of the basic models that can be applied. For a more in-depth look at the specifics of game theory, the Game Theory by Fudenberg and Tirole is likely the best starting point [19].

3.5

Differential Pricing

While dynamic pricing was concerned with changing prices to adjust to changing conditions over time, it is also possible to charge different prices at the same time. This hinges on the basic premise that different customers have a different willingness to pay for certain products. This can be exploited by charging more to customers with a higher willingness to pay, as well as offering lower prices to customers who would otherwise not have made a purchase. The practice of doing this is called differential pricing.

Differential pricing is the practice of charging different prices for the same product. This practice relies on some form of customer segmentation as well as methods to limit price transparency. Related concepts include price segmentation, price discrimination, personalized pricing, menu pricing, and yield management.

Employing a different price point for different types of customers can greatly improve the profitability of a retailer. If a product is brought to market at a given price, there will always be customers who would have been willing to pay more for the product. Likewise, there will be customers for whom the retailer has priced itself out of the market. In both cases, this means a loss for the retailer. This could be solved if it were possible to identify these customers and adjust prices accordingly. The key challenges for implementing differential pricing are twofold. Firstly, customers with a different willingness to pay need to be identified. Secondly, there needs to be a mechanism that prevents customers with greater willingness to pay from paying the lower prices offered to another segment. Without such fencing, there is sure to be arbitrage. The identification of customers or customer segments with different willingness to pay can be achieved with market research or a data-driven approach. Ideally,

3.5 Differential Pricing

75

a combination of both is used. A data-driven approach should focus primarily on customer behavior, as opposed to descriptive properties of customers. How this can be achieved is explained in detail in Chap. 8. Depending on the context, it can be of interest to estimate a price-response model for each segment (see Chap. 2). Even when segments and their associated demand models have been identified, it is not trivial to implement differential pricing. It needs to be possible to fence off these different groups of customers from each other. Not doing so effectively can lead to dissatisfied customers who feel that they have been wronged by the prices they are paying. Some examples of how this can be achieved are detailed below. • Geographical segmentation: One of the most traditional forms of price discrimination is based on geography. This can be done on a micro level, for example, stores in city centers using different prices compared to those in more rural areas. Likewise, on a macro level, prices can also be differentiated across countries. This type of differentiation is also possible in an online setting when selling physical products, since customers can easily be segmented based on the location where products have to be delivered. • Time-based segmentation: The moment when a transaction takes place is also an easy way to create segments and use differential pricing. Examples include typical early-bird specials or even pre-orders of certain products at a discount. Recent examples can be found on crowdfunding platforms such as Kickstarter. The first customers who pledge take a greater risk and are rewarded by paying a lower price for the product. For traditional retail, this often takes the shape of a skim pricing strategy, where prices for a product start off high and decrease over time. Likewise, some products can be priced higher when there is active demand and low when intrinsic demand is low. An example of this is garden furniture, motorbikes, and convertible cars—all of which are typically more expensive in spring and summer. • Associated services: It may be the case that some customer segments attach greater or lesser value to certain services associated with the product. One example is an installation service for products such as a home theater. Another can be next-day fulfillment or more flexibility in the time when certain goods are delivered or provided. Customers who are willing to pay more for more premium services will segment themselves and be willing to pay a premium. • Personalized discounts: While it may not be easy to adjust list prices, it is much easier to offer personalized discounts to specific consumers. This is often done as part of a loyalty program, where customers classify for certain rewards. The latter comes at the risk of giving away discounts to top customers who do not need such incentives to make more purchases. Hence, it is important to model the uplift created by such programs. Specific rewards and nudges can also be offered without the context of a loyalty scheme. This makes it impossible to game the system and returns the control back to the retailer, who can choose only to use certain nudges when a positive uplift has been proven or is expected. The disadvantage to doing this is that this may decrease the perceived fairness of the pricing policy and might rub some customers the wrong way.

76

3 Improving the List Price

• Membership cards: Some retailers implement membership schemes that come with special advantages and discounts. Famous examples include Amazon prime and Costco supermarkets. It must however be noted that these schemes are more intended to maximize vendor lock-in, rather than being specialized vehicles for the purpose of dynamic pricing. In spite of this, these tools can be usable instruments to put up fences between customers. • Personalized customer portal: Some (pseudo-)retailers use websites that only show prices after a customer has logged in. This is especially prevalent in B2B sales, where customers often have to request a login to prevent competitors from obtaining exact price information. Such platforms make it easy to differentiate prices on a case-by-case basis. Within a B2C environment, this type of practice is much less commonplace, but is feasible for some retailers. A precondition here is that a single customer has a relatively sizeable value and is motivated enough to create an account. • Bundling: Creating product bundles is another way differentiate pricing. These product bundles can be constructed from identical products, essentially working as volume discounts. Alternatively, these bundles can also be constructed from different products that are complementary in the value they provide to customers. A more extensive overview of how bundling works can be found in Sect. 4.6.3. • Branding: Branding can also be used to practice differential pricing. Well-known examples include so-called value brands or own labels that many supermarkets use to compete on price [20]. These products often offer comparable quality, but at lower prices. Similar things can be seen in large car concerns such as Volkswagen that positions the Audi brand as their luxury line and Skoda as their best value alternatives. This allows for effective price differentiation, even if many of the components are shared between vehicles. • Device information and cookies: If sales take place via an online platform, there are many possibilities to adjust prices to an individual user. In the past, Amazon has used cookies to differentiate prices, but received considerable flak when this was discovered [21]. Another far-reaching example is again provided by Uber who have discovered that people with low battery levels on their devices are much less price sensitive (but who claim not to use this information for pricing purposes [22]). Some of these concepts are of course far-fetched for retailers. They are listed here as fuel for thought, not as techniques that are readily applied for all retailers. For the majority of retailers, the geographical, temporal, and coupon-based segmentation will be the most feasible options. Depending on the nature of the product being sold and a retailer’s clients, more advanced tactics may be feasible or too far-fetched.

References

3.6

77

Optimizing Long-Term Value

The preceding topics have focused on optimizing the profitability of a single sale, but it may be interesting to sacrifice short-term profits for longer-term gains. This value can take the form of customer lifetime value or other products purchased in the same transaction. This practice unites marketing techniques (see Chap. 8) with price optimization. The goal here would be to determine if the purchase of certain products is likely to result in higher long-term value. Discovering these products requires a combined perspective of data-driven discovery and product knowledge. One example could be a water pump for a pool. Such a purchase may give rise to annual renewal purchases for filters, as well as other purchases that are related to the installation and upkeep of a swimming pool. Especially if it is also discovered that the pump is a product that has high search volumes online, this might be a valuable angle for new customer acquisition. Attractive pricing for this product may then be an important part of the marketing mix.

3.7

Conclusion

List price optimization is a broad domain, but one rife with possibilities. As a retailer, it is key to question if the current prices are being set at least in part based on data-driven inputs. Moreover, much potential can be found in making prices increasingly variable and personalized. Doing so enables retailers to capture a greater part of the customer surplus, resulting in higher profits.

References 1. Foote, A. (2018). Scraper bots and the secret internet arms race. https://www.wired.com/story/ scraper-bots-and-the-secret-internet-arms-race/. Accessed 25 May 2022. 2. Mitchell, R. (2018). Web scraping with Python: Collecting more data from the modern web. O’Reilly Media Inc. 3. Chang, E. W. (2001). Bidding on trespass: eBay, Inc. v. Bidder’s Edge, Inc. and the abuse of trespass theory in cyberspace-law. AIPLA QJ, 29, 445. 4. Hosseini, K., Nanni, F., & Ardanuy, M. C. (2020). Deezymatch: A flexible deep learning approach to fuzzy string matching. In Proceedings of the 2020 conference on empirical methods in natural language processing: System demonstrations (pp. 62–69). 5. Buhrmester, M., Kwang, T., & Gosling, S. D. (2016). Amazon’s mechanical turk: A new source of inexpensive, yet high-quality data? American Psychological Association. 6. Özer, Ö., & Phillips, R. (2012). The Oxford handbook of pricing management. OUP Oxford. 7. Gallego, G., Topaloglu, H. (2019). Revenue management and pricing analytics (Vol. 209). Springer. 8. Sutton, R. S., & Barto, A. G. (2018). Reinforcement learning: An introduction. MIT Press. 9. Rushton, M. (2020). Pricing the arts. In Handbook of Cultural Economics, 3rd edn. Edward Elgar Publishing. 10. Knopper, S. (2017). Inside bruce springsteen and taylor swift’s war on scalpers, ticket bots. https://www.rollingstone.com/pro/news/inside-bruce-springsteen-and-taylor-swifts-war-onscalpers-ticket-bots-201770/. Accessed 26 May 2022.

78

3 Improving the List Price

11. Reed, R. (2020). Pearl Jam prevent scalping with ticket sales for ‘Gigaton’ tour. https:// ultimatepearljam.com/pearl-jam-scalping-tickets-gigaton-tour/. Accessed 26 May 2022. 12. David, J. (2016). Uber hammered by price gouging accusations during NYC’s explosion. https://www.cnbc.com/2016/09/18/uber-hammered-by-price-gouging-accusations-duringnycs-explosion.html. Accessed 27 May 2022. 13. Welch, C. (2016). Webcams have become impossible to find, and prices are skyrocketing. https://www.theverge.com/2020/4/9/21199521/webcam-shortage-price-raise-logitechrazer-amazon-best-buy-ebay. Accessed 27 May 2022. 14. Porter, J. (2020). Amazon sold items at inflated prices during pandemic according to consumer watchdog. https://www.theverge.com/2020/9/11/21431962/public-citizen-amazonprice-gouging-coronavirus-covid-19-hand-sanitizer-masks-soap-toilet-paper. Accessed 30 May 2022. 15. Anderson, E. T. (2013). Escaping the discount trap. Harvard Business Review, 91(9), 121–3. 16. Chen, L., Mislove, A., & Wilson, C. (2016). An empirical analysis of algorithmic pricing on amazon marketplace. In Proceedings of the 25th international conference on World Wide Web (pp. 1339–1349). 17. Kirilenko, A., Kyle, A. S., Samadi, M., & Tuzun, T. (2017). The flash crash: High-frequency trading in an electronic market. The Journal of Finance, 72(3), 967–998. 18. Axelrod, R., & Hamilton, W. D. (1981). The evolution of cooperation. Science, 211(4489), 1390–1396. 19. Fudenberg, D., & Tirole, J. (1991). Game theory. MIT Press. 20. Dwivedi, A., Merrilees, B., Miller, D., & Herington, C. (2012). Brand, value and relationship equities and loyalty-intentions in the australian supermarket industry. Journal of Retailing and Consumer Services, 19(5), 526–536. 21. BBC. (2000). Amazon’s old customers ‘pay more’. http://news.bbc.co.uk/2/hi/business/ 914691.stm. Accessed 30 May 2022. 22. Shankar Vedantam, M. P. (2016). This is your brain on uber. https://www.npr.org/2016/05/17/ 478266839/this-is-your-brain-on-uber?t=1653914641812. Accessed 30 May 2022.

4

Optimizing Markdowns and Promotions

4.1

The Challenges of the Markdown Decision

Product life cycles are a reality for most—if not all—retailers. For some products, this is more obvious than for others. Clothing goes out of fashion, new smartphones are introduced at a regular pace, food approaches its sell by date, and so on. This causes the value of a product to decrease over time. Depending on the nature of the product, this value decrease can be slow and constant, continually increasing in speed or simply a single moment in time when products lose a substantial amount of value.1 Retailers can act on this erosion by means of price decreases in the form of markdowns. Discounts that are applied to the list price and are more often than not accompanied with a marketing message, such as “winter sales with discounts of up to 60%”. These discounts are designed to increase the demand for products that are at risk of losing value. The choice of discounts should strike a balance between the value lost by charging a lower price and the expected decrease in consumer demand that destroys the value of existing inventory. In spite of the importance of markdowns, the process used to decide these discounts is often not data-driven and labor-intensive. Typically, everything starts with a ceremonial scroll through all products. Experienced eyes observe sales velocities and inventory levels before deciding on the right markdown price for each product. More often than not, this process is repeated a number of times throughout the sales season, making adjustments at each iteration. This process rightfully feels archaic but change can be an arduous endeavor.

1

For example, plane or concert tickets.

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 L.-P. Kerkhove, Data-driven Retailing, Management for Professionals, https://doi.org/10.1007/978-3-031-12962-9_4

79

80

4 Optimizing Markdowns and Promotions

A markdown is a permanent decrease in price of a product. A markdown differs from a promotional price because of this permanence,2 as promo prices are offered temporarily after which a product will return to its original price.

The traditional markdown process is fraught with weakly supported rules of thumb. The core decision to assign a price to a specific product is often nothing more than a guess based on anecdotal experience. This can result in prices that fail to serve the purpose of the retail organization. Incorrect prices can take many forms, such as waiting too long to decrease prices—causing a loss of turnover. Alternatively, prices can be decreased for products that are inelastic and therefore do not result in an appropriate increase in sales volume. This again results in decreased profits for the retailer. The two main shortcomings of the traditional markdown process are the absence of a clear objective and not having good estimates of how demand is likely to respond to a price change. The former has already been touched on in Chap. 1 and mainly relates to avoiding the sunk-cost fallacy. The latter is related to the economic concept of price elasticity—quantifying how demand is likely to respond to a change in price. Depending on the nature of the current process, data-driven markdown algorithms can yield up to 10% more contribution generated during the sales season.3 This chapter provides more insight into the dynamics at work within such a system. The objective is not to go into detail on the technical aspects of how to realize this, but rather to provide an overview of the major components of a data-driven markdown solution. This chapter starts with a brief overview of the traditional markdown process (Sect. 4.2) and highlights where mistakes are made during this process (Sect. 4.3), before presenting a blueprint of the key elements of an algorithmic markdown approach (Sect. 4.4). The nature of these building blocks is further explored in Sect. 4.5. Next, this basic blueprint is extended with more complex elements that can be present in a markdown setting (Sect. 4.6). Finally, some pointers are presented on how to test out markdown algorithms in practice (Sect. 4.7).

4.2

The Traditional Markdown Process

Commonly used rules of thumb contrast the current pace of sales with the available inventory of a product. The pace of sales is often referred to as the rotation speed of

2

While the discount remains for the complete lifetime of the product, the magnitude of the discount can at times be increased or decreased. 3 This number is based on experiments that have been carried out at eight European fashion retailers by the author.

4.2 The Traditional Markdown Process

81

a product: the number of units a product is selling over a period of time—most often a week. Dividing the inventory level by the velocity results in a rough estimate of the time needed to sell all available inventory. If this period is longer than the period during which the product can still be sold to customers, this indicates that the price of the product will have to decrease. Traditional approaches also include business rules specific to the organization. These are used to represent strategic or tactical beliefs held by a company. Examples include the manner in which a retailer positions itself versus its competitors or psychological pricing rules that they want to apply. Alternatively, business rules can also represent operational limitations faced by a retailer. It may be physically impossible to change more than X prices per week. Some examples of business rules are the following: • • • • •

Maximal frequency for price changes Identical prices on every channel Use of price ladders (e.g., always with 20% increments) Minimal and maximal price reductions (for specific categories, brands, etc.) Consistency rules (e.g., same price for every color)

Companies also tend to adhere to a historical pace of markdowns for the product assortment. This pace often follows a progression from lower discounts to higher discounts as the time progresses and less time remains to sell products. The prices should strike a balance between having an attractive offering to consumers and not giving away too much value to customers who would at the time be willing to pay higher prices. To quantify the current or historical markdown pressure, a number of summary measures can be useful. The simplest is the average markdown, as shown in Eq. 4.1. This metric is a simple as it sounds, and it just takes the average of the markdown (mi ) applied to all n products, using the index i to signify an individual product. An individual product in this case is used to signify a stock keeping unit (SKU4 ). This is an important measure that represents the visibility of discounts for visiting customers, especially on online sales channels where the visibility of products is not strongly affected by available inventory. Average markdown =

n

i=1 mi

n

(4.1)

This figure can however be skewed if different products have very different inventory levels. There may only be a few units of inventory left on a product that has a deep discount. Because of this, it can be valuable to look at the inventoryweighted markdown percentage for the products on offer. This simply accounts for the number of inventory units that remain. Equation 4.2 shows how this can be

4

A stock keeping unit, often represented as an unique product or barcode in a system.

82

4 Optimizing Markdowns and Promotions

calculated, using the notation Ii to represent the units of inventory that are left of product i. Weighted average markdown =

n i=1 mi · Ii  n i=1 Ii

(4.2)

Another important angle to consider is the value of products that are discounted. Often, there will be a substantial variation in the sales price of different products of a retailer. As such, it can be important to consider the monetary value that the offered discounts represent. Again, this is done to make sure that this quantity falls within a range that is deemed reasonable from a tactical perspective. Equation 4.3 shows how this can be calculated by simply taking into account the sales price of a product (Pi ). Total discounted value =

n 

mi · Pi · Ii

(4.3)

i=1

Because these KPIs are often calculated to ensure that the products on offer are attractive from the perspective of a consumer, it makes sense to use the list price of a product rather than the cost. There may be significant differences in the relative margin for different products, but this does not detract from the value derived from a product—and therefore a discount—from the perspective of a consumer. For some retailers, it may also be relevant to account for differences in the size of the assortment across years. This may be the case for retailers who are experiencing strong growth, resulting in a much bigger body of inventory than previous years. Alternatively, there may have been certain macro-economical shocks that result in big differences in the size of product collection across years. To account for this, the total discounted value can be expressed in relative terms to the value of the total product assortment, as shown in Eq. 4.4. n i=1 mi · Pi · Ii Relative discounted value =  n i=1 Pi · Ii

(4.4)

Finally, it may also be valuable to plot the distribution of markdowns. A single summary metric can never paint a complete picture, and situations with very similar summary statistics can have radically different distributions of discounts. This is illustrated in Fig. 4.1. The process itself is characterized by a multitude of iterations, especially when determining the initial markdowns. The decision-maker will evaluate over every product and decide on a markdown, before moving on to the next. Once a pass has been completed, the complete picture is evaluated by looking at summary metrics such as the ones above. Notably, this often includes sub-group analyses that investigate how different product categories are represented in the discount landscape. This evaluation is often a comparison to preceding years, making it

4.3 Where the Markdown Process Fails

83

Fig. 4.1 A comparison of three scenarios with identical average markdown value of 30%. On the horizontal axis are different markdown values ranging from 0% markdown up to 50% markdown. The vertical axis shows the number of products that are offered for a certain markdown percentage (out of a total of 120 products for this example). In spite of the identical average markdown value, these three scenarios are significantly different and will likely have a different effect on consumer demand

possible to confirm that the overall discount levels are in line with past years. If the result shows that the global situation is not yet optimal, the decision-maker starts a new iteration over all prices—cherry-picking products to change to move the needle in the right direction. Needless to say, this is a time-consuming process that is prone to inconsistent decision-making. This is aggravated by the fact that these decisions often have to be made under significant time pressure; as soon as sales figures are available, the new prices have to be set. Delayed decisions imply that products are being offered at sub-optimal price levels, bleeding profits. As it stands, the markdown season is often not something that teams look forward to.

4.3

Where the Markdown Process Fails

The traditional markdown process is not only cumbersome; it is unlikely to result in an optimal or adequate solution. This section goes into more depth about why this is the case, before an outline is sketched of a superior approach.

4.3.1

Not Making Use of Price Elasticity

A first shortcoming of this way of working is that there is no objective way of determining the right size of a markdown. More often than not, the value for the markdown is guessed, and only the fact that a markdown has to be increased is based on data. Moreover, this guess is likely to be skewed because of the perspective of the person who is making the decision. It is not unlikely that this person has been involved in the purchasing decision or has an opinion as to what constitutes a good or a bad product. This can result in significant inconsistencies in the pricing policy.

84

4 Optimizing Markdowns and Promotions

The answer is to introduce a sufficiently accurate measure of price elasticity. This allows the price change to be calculated based on objective assumptions, rather than being based on a hasty guess. The use of elasticity is especially valuable for products that have a low elasticity value and do not respond significantly to price changes. This allows the decision-maker to steer away from total inventory liquidation as an approximate objective.

Price elasticity is defined as the relative response of demand to a relative change in price. In other words, by what fraction does my demand increase if I decrease my price by a certain percentage? If the absolute value of this ratio is greater than 1, we say that a product is elastic;5 a value smaller than 1 indicates that a product is inelastic. Generally speaking, a price decrease for an inelastic product will be a bad idea, since the increase in volume does not make up for the loss in turnover due to the lower price. Inversely, a price decrease for an elastic product may be a good idea since additional volume does make up for the loss due to lower prices, the caveat here being that there may be variable costs associated with a sale that must also be covered. As such, it is often the case that a product should have an elasticity greater than unity before a price decrease is warranted. By how much depends on the nature of the variable costs associated with a transaction.

4.3.2

Contaminating the Objective

Another common misconception is wrongfully focusing on objectives other than turnover maximization during this phase in a product life cycle. Whereas in general a retailer should strive to maximize the profits rather than revenue, during this stage in the product life cycle, maximizing the turnover is the only objective that truly maximizes the value to a retailer. The retailer’s way of working implicitly assumes that a retailer should strive to have no inventory left at the end of the sales season. While this may be a reasonable assumption at the moment when the products are first purchased by the retailer, this is no longer the case once inventory has been purchased and is inherently fixed. The origin of this mistake often comes from not making a clear distinction between the collection planning decision and the markdown decision. This happens because it is often the same team that is responsible for the collection planning and the setting of markdowns for products. This is compounded with the fact that a person doing the collection planning is often evaluated based on the sell-through 5

This is a simplification since the elasticity depends also on the price change.

4.3 Where the Markdown Process Fails

85

rate of the products in the collection. This implies that this person has a very clear incentive to favor more aggressive discounts to reduce overall inventory levels, rather than opting for a maximization of contribution.

4.3.3

No Anticipation of Changes in Demand Patterns

The traditional logic assumes that the current rotation will remain constant. In reality, demand is likely to vary over time. For most products, there will be a gradual decline in the demand as the remaining lifespan decreases. Other products may still experience peaks at the end of their lifespan. This can be a random occurrence, such as the sales of road salt if there is a cold spell late in the season. Alternatively, this can happen by nature such as the sales of shorts and sandals during the hottest summer months. Either of these scenarios can carry some level of predictability and could therefore be taken into account during the decision-making process.

4.3.4

Time-Consuming and Error-Prone Process

The current process is also limited in the maximal velocity it can attain. Drudging through endless spreadsheets is time-consuming and is prone to mistakes if done too hastily. This inevitably results in a focus on the most important products first. While this is justified, this means that a lot of products spend long periods of time before prices are re-evaluated. Especially in combination with retailers offering wider assortments to cater to long-tailed demand, this can be a dangerous combination. Note that the main reason for adopting a data-driven system for markdowns should not be just to save time. Such time savings are often not materialized in the financial results of a company. Employees still have to be paid, and markdown management is typically not the sole occupation of an employee, given the seasonal nature of this exercise. Moreover, working with new tools also takes time, and ideally a significant amount of time is spent adjusting parameters and evaluating results.

4.3.5

Repeating Past Mistakes

Another risk factor is that the process is inherently aimed at making things as similar to past years as possible. This assumes that what was done in last years was good— which is a dangerous assumption to make. If structural mistakes have been made in the past, they are likely to be propagated indefinitely. Naturally, it may be desirable not to stray too far from the path in a single iteration, but if models indicate that it may be beneficial to be more or less aggressive with markdowns in certain areas, it is often worth experimenting to see if this is indeed the case. Past truths should not be left unchallenged.

86

4 Optimizing Markdowns and Promotions

In this context, it can be useful to consider Hitchens’s razor: What can be asserted without evidence can also be dismissed without evidence. A statement that is made based on nothing other than personal beliefs can in principle be countered by nothing more than an opposing belief by someone else, the clear instruction being to collect proof to support a statement.

4.4

Blueprint of an Improved Markdown Process

Creating an improved markdown process requires a number of components. A schematic overview of these components is shown in Fig. 4.2. Not all of these elements need to be perfect in order for the markdown process to work, but all components should be present. Often, a project that aims to improve the manner in which markdowns are set starts with some of these components in a simplified form, adding complexity as soon as the basic architecture has been proven to work. This avoids starting long and expensive projects that only provide added value at the very end and is generally a good idea.

4.4.1

Objective

As was detailed in Chap. 1, the primary element in a good system is clearly defining what the goal of the optimization is. In the case of markdown, this can often be reduced to an optimization of the total turnover while taking into account transaction

Portfolio-level forecast model

Residual value

Objective

Summary statistics

Updated constraints

Transaction costs Price selection engine

Product-level forecast model

Rotation forecast

Elasticity estimate

Historical and recent transaction and product data

Fig. 4.2 Schematic blueprint of the key components of a markdown system or algorithm

4.4 Blueprint of an Improved Markdown Process

87

costs and residual value. Depending on the nature of the processes of the retailer, it may be desirable to use some type of model to calculate or estimate an accurate residual value and estimate the transaction cost. Shipping and return costs can be significant as well—especially in an online theater. These costs are likely to be different across platforms, customers, and products. Having an accurate estimate of these costs will translate into better decision-making.

4.4.2

Portfolio Forecast and Price Selection Engine

The objective is used by two other components: the portfolio-level forecast model and the price selection engine. The former estimates the value of the objective function based on the markdown prices applied to different products. The price selection engine selects prices for individual products. In doing so, it aims to select the combination of prices that is likely to result in the maximal value of the objective while still satisfying the constraints that have been imposed. This set of constraints represent a number of business rules. Some of these will come from the organization; others will be inspired by the output of a preceding iteration of the price selection engine. As shown in Fig. 4.2, there is an iterative process going on between the price selection engine and the portfolio-level forecast model. The reason for this is that the price selection engine alone cannot provide an accurate estimate of the aggregated objective. Because products are not sold in a vacuum, the sum of the estimated sales of individual products cannot be expected to add up to a reasonable expectation of the total sales for the complete product portfolio.

The sum of the estimate of individual product sales is unlikely to result in an accurate estimate of the total sales.

This is best explained using an extreme example. Assume a retailer who sells a portfolio of 100 different seasonal products. Last year, these products were divided into five discount buckets: 10%, 20%, 30%, 40%, and 50%. Each of these buckets contained 20 products, resulting in an even spread of discounts across this range. For the current year, decisions are made on the level of individual products, maximizing the expected contribution of each individual product. This results in 90 of the 100 products being in the 50% category bucket. This represents a much more aggressive discounting strategy than last year, and a product in the 50% category must now compete for attention and sales with a much greater number of products that are also heavily discounted. For this reason, it can be expected that a discount of 50% in the latter situation will have a substantially smaller effect than a product that received an identical discount in the preceding year. This interplay between products is the reason why a portfolio-level forecast model is required for decision-making. However, using markdown optimization

88

4 Optimizing Markdowns and Promotions

models does not mean that such a portfolio-level forecast has to be a mathematically complex undertaking. In its simplest guise, this takes the form of sanity checks by human decision-makers. These decision-makers analyze the aggregated suggestions and judge if this situation is desirable or has to be adjusted. If the situation needs to be adjusted, they have to formulate new business rules (constraints) and re-run the price selection engine. This shows strong similarities with the current process in existence, with the difference that the heavy lifting of iterating over all products is now performed by an algorithm that can be faster as well as more accurate.

4.4.3

Product-Level Forecast Model

To make decisions on a product level, the price selection engine needs information on how products are likely to react to different markdown prices. This task is performed by the product forecast model. In its turn, this model consists of an estimate of the product sales under current conditions, as well as an estimate of the price elasticity—i.e., the sales under different conditions. These estimates are created based on historical information from previous seasons, as well as recent sales figures for the current set of products. Both types of information are crucial, as the sales of the specific product are always most informative for the product itself, but can often contain too little price variation to make claims about how a product will respond to a given price change. Similarly, historical price changes and associated demand patterns contain information on how products with similar properties are likely to respond to a price change. However, this historical information can rarely provide sufficient accuracy without being combined with recent sales patterns.

4.5

Core Components of an Improved Markdown Process

The next sections go into more detail on how these components can be constructed. This may seem complex, but simple variations of most of these components can easily be created. Which of these components should first be upgraded to a more complex variation depends on the context wherein a retailer operates.

4.5.1

Defining the Right Objective: Transaction Costs and Residual Value

As was discussed in Sect. 1.3.2.4, the objective pursued by a retailer in a markdown context is calculated as the turnover, minus the variable costs for conducting a transaction, plus the residual value of the leftover inventory (Eq. 4.5). Markdown Objective = (P − M − C var ) · D + V · (I − D)

(4.5)

4.5 Core Components of an Improved Markdown Process

89

Ensuring that the process is aiming in the right direction requires that transaction costs and residual value are estimated with a sufficient level of accuracy. Ballpark estimates on the level of the full product portfolio can serve as a starting point, but creating more nuanced estimates that differentiate based on the product and sales channel can be beneficial insofar as substantial differences are present.

4.5.1.1 Estimating Transaction Costs Depending on the sales channel and the type of product being sold, transaction costs can be significant. Especially for products that are sold at low prices, the transaction cost can outweigh the turnover if significant discounts are offered—especially when products are shipped to consumers. The most important transaction costs to consider are: • Packaging: The materials used to wrap and transport products. • Shipping costs: These costs can be paid to a parcel service or can take the shape of a fulfillment fee that has to be paid to a platform on which the products are sold. • Sales commissions: These can be payable to the retailer’s own staff in situations where customers are actively accompanied during the sales process, as is typically the case for car sales. This can also take the form of a commission paid to tools that are used for online sales, such as Google Ads that point to your products. Some webshop platforms also work based on commission, taking a cut on every sale that is made on the platform as a way to apply progressive pricing. Also, there may be costs for payment service providers. • Return costs: The costs associated with shipping a product back to warehouses, as well as handling costs, quality inspection, and possible loss of products due to damage. Notably, this does not include sunk costs such as the purchase cost of the product and also excludes the general overhead costs such as facilities and labor. The latter assumes that store staffing is independent from the magnitude of the discounts being offered. For the majority of retailers, this is a reasonable assumption. Most of these costs are known beforehand, with the exception of the return cost. It is impossible to know in advance exactly which products will be returned. The best way of dealing with this uncertainty is to investigate the major drivers behind product returns. Likely candidates are geographical markets, which often show big differences in the overall return rates. Product properties are also relevant. Some types of products such as medication are very unlikely to be returned. Women’s fashion on the other hand has very high return rates on average. Looking at the complete product portfolio, it is likely that different product categories show clear patterns. Moreover, given that markdown are typically applied to products that have been sold for some time, the historical returns of the specific product are likely to be predictive of future returns. It may also be the case that there are systematic patterns of increased or decreased returns during sales periods. Decreased returns are often most visible in physical stores that often advertise that products purchased

90

4 Optimizing Markdowns and Promotions

with a markdown cannot be returned—with the exception of damaged products. The same no-return policy can often not be enforced online, and there may even be a significant increase in product returns because of customers going on shopping binges during sales periods. For large sales volumes, it may be possible to create predictive models to estimate the return probability of individual products. For the majority of retailers, a set of descriptive statistics and simple rules is likely to provide a sufficiently nuanced picture. In situations where return costs are significant, it can be interesting to investigate possible manners of reducing these costs. Chapter 6 presents an analysis of how this can be done in practice.

4.5.1.2 Estimating Residual Value The fact that a product reaches the moment when it can no longer be sold in the sales channels of a retailer does not imply that a product is worthless. For some products, the option exists of re-entering a product in the assortment in the next year, to be sold in outlet channels, or simply to be sold in bulk to specialized companies. Especially for products for which the leftover stock is likely to be significant, this might be an attractive option when compared to an aggressive liquidation. Determining residual value is easiest for products for which the complete inventory is sold to a third party. For products that are re-introduced next year or that are sold through outlet channels, estimating the value can be more complex. The modeling approach for such a more complex analysis will now be discussed. A clear distinction must be made between products that are candidates for being re-introduced in the assortment in the next year as special deals6 and staple products that are re-purchased every year. The latter is often the case for certain staple items such as white t-shirts or road salt. Staple products do not need to be introduced at lower prices in the next year, which should be positively reflected in the residual value of the product. Even if there is no explicit devaluation of the product, there are still costs that influence the inventory value. Specifically, there may be handling and storage costs for these products, shipping them back to centralized warehouses to free up shelf space in stores. There is also a cost of capital that must be accounted for, since the funds that are locked in these products for a set period of time cannot be used to earn returns elsewhere. The residual value should paint a true picture of what a product is worth at the moment a decision is made that influences how much product will be transferred to next year. The accounting value is rarely a good proxy for this, even if certain global depreciation factors are applied to products. Decisions should be made based on a realistic estimate of the economic value. The original cost or the introduction price of a product is not relevant.

6

For example, in an in-store outlet.

4.5 Core Components of an Improved Markdown Process

91

Not being able to use book value implies the need for an alternative. Such an alternative can be found by looking at historical sales figures and products. Based on the success of re-introduced products in previous years, the value of a product in the current year can be estimated. Like many other applications outlined in this book, this comes down to estimating some type of demand function. This estimate of future demand should provide a realistic estimate of the volume of demand, as well as the price point at which the product can be sold. A typical demand model represents the trade-off between these two variables, but it may be that certain business rules (e.g., a fixed MSRP) imply that the point on this trade-off cannot be freely chosen. Given the long time frame over which this estimate is being made—often 6 months or more—an exact estimate is often hard to make. In this context, it is often valuable to measure the uncertainty of an estimate, working with confidence intervals rather than point estimates. This makes it possible to estimate the probability of selling a certain quantity at a certain price during the next season. Pinpointing an accurate inventory value also requires taking into account the cost of capital. A retailer is usually in a position where available capital can be invested in products in order to earn a certain return. Capital that is stuck in products that are kept in storage to be re-introduced does not earn a return. The cost of this missed opportunity should also be accounted for when making decisions. This is as simple as calculating the average return on capital over the period that transferred products are going to be kept in inventory. The opportunity cost of capital should be applied to the funds that could be extracted from inventory at the point in time when the decision is being made, not on the residual value of the products. The former is the cash that could potentially be freed up by selling the inventory at a discounted price right now—not the possible value that products would have in the next season. This is the capital that could be freed up and reinvested to earn a certain return in the period of time that products would otherwise be kept in inventory. An example shows how this dynamic results in a trade-off decision between selling inventory now and storing it to sell in the future. Figure 4.3 shows the effect of different markdown levels on a fictitious product. If no markdown is offered, it is expected that a total of 600 products are going to be sold, resulting in a leftover inventory of 900 units. As the markdown is increased, demand continues to rise. The total turnover reaches a peak when a markdown of 32.5% is applied, resulting in a total expected turnover of e99.984 for the six remaining weeks. Under these conditions, there is a leftover inventory of 315 units. For a situation where the residual value of products is zero, the decision can be made purely based on the current expected turnover. However, in the case where there is the opportunity to sell products in a subsequent season, the value of these sales needs to be estimated. Figure 4.4 shows an example of such an estimate, assuming that the product will be re-introduced at a slightly lower list price than the current year. Based on historical observations of re-introduced products, as well as the performance of the product in the initial year of its introduction, an estimate was made of the magnitude of demand for products in the next year. This estimate

92

4 Optimizing Markdowns and Promotions

1400

100

1200 1000

80

800

70

600

60

Demand in units (#)

Turnover (000€)

90

400

50

200

turnover demand

40 0

0.2

0.4 Applied markdown

0.6

0.8

0

Fig. 4.3 The markdown decision for a product with an elasticity of 3.0, a list price of e125, and an inventory of 1.500 units. It is assumed that the product is currently selling 100 units per week and that there are 6 weeks remaining in this sales season. To limit the complexity of this example, a number of simplifying assumptions have been made: (i) There is no demand erosion during the remaining sale period. (ii) There are no variable transaction costs. (iii) The demand is defined by an isoelastic function, implying that elasticity can be defined as a single value

is assumed to be stochastic, meaning that both the mean and standard deviation are reported. Assuming the observations of demand follow a normal distribution, this allows the likelihood of selling incremental quantities of product to be calculated. This means that products that are transferred do not all have the same value. The value of a product depends on the number of products that have already been transferred—i.e., the marginal value of an additional product decreases. Based on this information, bounds can be established for the rational range of products to be transferred. The lower bound of this range is defined by the peak turnover as shown in Fig. 4.3, which results in 315 units being transferred. Transferring less units would mean that a higher discount would have to be offered, but at this point, the increase in volume would no longer make up for the loss in sales price. This equates to the retailer actually having to spend money in order to get rid of inventory. This can only be a sensible decision if there are costs associated with destroying superfluous inventory that exceed the intrinsic residual value of the products. For the example presented here, inventory has a positive value, so it is not sensible to decrease turnover in order to transfer less than 315 units to the next season.

4.5 Core Components of an Improved Markdown Process

93

P(selling out) expected revenue

35

1

25

0.8

20 0.6 15 0.4

10 5

Probability of selling

Cumulative revenue (000€)

30

0.2

0 0

200

400 600 Products transferred (#)

800

0

Fig. 4.4 Continuing the example, this plot shows a retailer’s estimate of the likelihood that a product will be sold in the future. This is based on the assumption that the product will be re-introduced at a slightly lower price point of e100. This price point would result in a likely demand of 500 units, with a standard deviation of 100 units. As the number of units transferred increases, the probability of selling all products gradually decreases. Once more than 800 units are transferred, the likely marginal turnover generated by more products drops to zero, indicated by the expected revenue curve that reaches its limit. The rational decision range for the amount of products to transfer is indicated by the vertical red lines. When interpreting this graph, it is important to note that the direction of the axis is switched. Transferring more products implies that the markdown is decreased and higher prices are charged during the current season. The markdown decision remains the only variable that is actively controlled by the decision-maker at this point; the number of products being transferred is a consequence of the chosen markdown

Likewise, an upper bound of 810 units can be identified. This is simply the moment when the amount of inventory exceeds the amount that can be expected to be sold in the next year. It can still be the case that there is more inventory left than this upper bound, but a retailer should never actively strive to transfer more than this amount of units to the next year, even if products are sold at absolute bottom prices in the current season.7

7

It may be the case that strategic considerations come into play here and that pricing a product too low is undesirable because it can cause the perceived value of the product or brand to be eroded in the future.

94

4 Optimizing Markdowns and Promotions

35 30

Turnover (000€)

25 20 15 10 turnover y+1 lost turnover y lost turnover y, inc ROCI objective

5 0 300

400

500 600 Products transferred (#)

700

800

Fig. 4.5 The trade-off decision made when transferring products to the next year. As the number of products transferred is increased, future turnover increases, and more turnover is lost in the current year. For this example, it is assumed that there is a seasonal return on capital invested (ROCI) of 20%. This represents the possibility of purchasing new products with turnover generated in this season and earning a return in the next. For sake of simplicity, this is also assumed to incorporate inflation that may cause future revenues to have to be discounted. The scale of the x-axis is limited to the range between the lower and upper bounds that have been determined, as shown in Fig. 4.4. The objective as expressed here is simply the expected turnover in the next year, minus the lost turnover in the current year—including the ROCI

The optimal decision can be derived from the trade-off as shown in Fig. 4.5. Based on the upper and lower bounds, the rational range for the markdown price is somewhere between 10% and 35%, representing a transfer of 810 and 315 products season, respectively. The optimal point is that which results in the highest combination of current and future value generated with the available inventory. In Fig. 4.5, this is quantified as the expected turnover in y + 1, minus the turnover that is sacrificed in the current season, including the cost of capital. For this example, this results in a markdown of 22.5%, a turnover of 97k ein the current season, and 33k ein the next season. This result makes intuitive sense when compared to the initial optimal markdown of 32.5%, which assumed that leftover inventory had no residual value. The horizon of this exercise can often be limited to a single year. Most retailers will never aim for a situation where a product has to be kept in inventory for more

4.5 Core Components of an Improved Markdown Process

95

than 2 years before being sold. As such, it is often not a desirable strategy to keep extending the shelf life of a product, as this will take up capital and space that could be used by products that can offer a better return on capital invested. One exception here are retailers who specifically deal in the long tail of demand. An example of this are retailers who offer spare parts. The breadth of the assortment and the immediate availability are a part of the service offered in this line of business. This also means that such retailers will typically have substantially greater profit margins on an individual product or transaction. The price paid by the customer does cover not only the expense of a single product but also the cost of keeping a considerable assortment of products at hand. In summary, determining residual value for products must account for a number of different factors. Based on historical precedents, a reasonable estimate must be made of the strength of demand for a product in the next season. Ideally, this estimate can estimate the strength of demand for different price points—if this is also a variable that can be controlled at the moment when a product is re-introduced. Ideally, this estimate is based on product-specific properties and a not a single rule of thumb that is applied across the complete product portfolio. Finally, the retailer must also account for the cost of capital for products that will be tied up in inventory for a certain period of time. There may also be a holding cost associated with keeping goods in storage. This cost can represent the rent of space in a warehouse, but also the handling cost and possible losses due to damages when products are transported. These holding costs may also be added to the previous analysis. Doing so would imply that these costs would have to be subtracted from the future expected turnover of products. In Fig. 4.5, this would imply that the curve of the turnover y + 1 would shift downward and that the optimal amount of products to transfer would decrease. Ceteris paribus, this means that the markdown in the current season is likely to increase. A point of contention may be what to do about fixed overhead costs. Products that are returned make use of the logistical infrastructure of a retailer, but much of this cost does not vary depending on the amount of products being handled. As such, it could be argued that these costs should not be accounted for when making decisions on what to do with superfluous inventory.

4.5.2

Estimating Rotation Speed

The current demand is of key importance to judge if a product’s price needs to be adjusted. In this context, demand is often expressed as the rotation speed: the number of units that are being sold per time period—often 1 week. Like all other predictions, this can never be 100% accurate. This section explores simple and more complex ways of estimating the rotation speed of a product.

96

4 Optimizing Markdowns and Promotions

The rotation is simply the number of products that are sold per time period— assuming no change is made to the price. In the context of markdown decisions, the rotation is often used to make a short-term demand forecast.

The simplest approximation of rotation speed is to assume that the next period will be identical to the current one. This may be made more consistent by taking an average of the last n periods, avoiding that a recent outlier skews the estimate. The parameter n should be chosen in order to maximize the accuracy as measured on historical data. A simple extension of this logic is to use exponential smoothing, where rather than weighing the last n periods equally, the more recent periods are given a greater weight. As the name suggests, this is done using an exponential function that defines the appropriate weight. The additional complexity is limited to the addition of a smoothing parameter α, resulting in only two parameters to estimate in total (n and α). Equation 4.6 shows how the rotation for period t (Rt ) can be estimated based on the sales of the preceding period St −1 and the rotation estimate for the preceding period (Rt −1 ). A large value of α will place more focus on the short-term observations—and vice versa. Rt = αSt −1 + (1 − α)Rt −1

α ∈]0, 1], R0 = 0

(4.6)

Fitting the parameters is best done by using a full historical dataset: taking the full period of a preceding year and observing how well an exponential smoothing estimate continues to work for different values of α. No complex algorithms are needed to do this. Simply testing out a number of combinations in a spreadsheet or using spreadsheet solver capabilities is likely to be more than sufficient. An important caveat here is that historical rotation speeds can be influenced by changes in prices or markdowns. As the rotation speed is used to guess the scenario where nothing is changed, observations where a price has been changed should not be used to train the data on. This can be difficult in situations when retailers enact price changes en masse at the start of a discount season. Especially in environments where there is an explicit start of the sales season, there will be patterns to account for. Often, there is a slowdown of sales prior to the start of the sales season, as well as an uptick in global demand as soon as the sales period has started. This is often complex to measure since this coincides with a change in price for many products. This makes measurement on the level of an individual product convoluted. It is often more effective to measure the fluctuations of demand as an aggregate in order to provide an adjustment to the expected rotation speed. A more powerful tool that can be applied here is the prophet algorithm [1]. This algorithm developed at Facebook is widely considered one of the best ways for

4.5 Core Components of an Improved Markdown Process

97

predicting time series, a particular strength being that it allows domain knowledge of the user to be combined with various analytical forecasting techniques. If sufficient data is at hand, this is one of the best ways to predict global sales patterns in the mid term, an important caveat being that this will assume that past patterns still hold relevant. Big events taking place in the past (e.g., store lockdowns as have been observed during the COVID pandemic) may cause patterns to be contaminated. Moreover, if the strategy regarding markdowns is completely different from past years, aggregated predictions are unlikely to be successful. In a similar fashion, there may be external factors that influence demand at specific points in time. The most frequently encountered is weather conditions. Agreeable or disagreeable will have an effect on sales in general, as well as sales for specific product categories. If weather is very bad, people will be less inclined to go shopping, but might increase their spend online. Likewise, if the weather is extremely nice, this may cause an uptick in demand for categories such as pools or swimwear. There may also be product-specific demand patterns that cause demand to erode or increase over time. This is typically the case for products that are purchased for specific events, such as chocolate for Christmas. If products have such specific seasonal patterns, it is often valuable to include these aspects in the model that estimates future rotation. Depending on the context, it can be crucial to account for these forces as they might result in erroneous inputs for price setting. If demand has been unusually slow due to weather conditions, this is no reason to assume that the same will hold true for the next time period. Mistakenly assuming that this will be the case is likely to have unwarranted price decreases as a result.

4.5.3

Estimating Elasticity

Arguably, the most crucial component in a markdown optimization system is the way in which the expected response of demand to a price change is estimated. This is often dubbed the price elasticity of a product, but this is somewhat of a misnomer since the true pursuit is the price-response function. Because of its importance, this has been treated in-depth in Chap. 2. Typically, the demand for all possible price changes can be pre-calculated making it easy to analyze different scenarios for an individual product or a group of products. Creating a good price-response function for a markdown environment requires training on a realistic dataset. It is not advisable to use all data from all past promotions and price changes in order to estimate the response of demand to a price change. Markdowns create a specific context and are often offered during somewhat distinct sale seasons.8

8

Depending on local laws, these periods can be officially determined or can be freely chosen by retailers. Regardless of this, there are often clear conventions in different countries as to when the

98

4 Optimizing Markdowns and Promotions

While the correct terminology for the model that estimates demand is the priceresponse curve, this chapter will use elasticity interchangeably. This in accordance with commonly used terminology at retailers and economists alike.

4.5.4

Updating Elasticity

Initial estimates of elasticity are created using historical data, but it can be important to adjust these estimates based on observed demand. Because a single product has often not been sold at a wide variety of prices, the models that are created often have to be based on similar products. Because of this, it is important to immediately include newly observed data points in the price-response curve. At the same time, care has to be taken not to respond too violently to outliers, first and foremost because the observations may truly be outliers. Especially in cases where the absolute volumes sold of a product are relatively low outliers are common. Secondly, it is also important that re-calculations do not cause extreme price volatility. Physical stores often face limitations that make it difficult to effect high-velocity price changes. Also, customers may become agitated if they observe that prices are changing rapidly and significantly.

4.5.5

Satisfying Business Rules and Other Constraints

As shown in Fig. 4.2, a balance must be struck between what is optimal on a product level and what is optimal on a company level. This implies that rules have to be formulated and that the optimal compromise has to be searched for. To this end, optimization algorithms have to be created. If the problem size allows for it, a simple mixed integer programming model can be used to search for this optimum. The rules that are formulated can be inspired directly by business or can be the result of an automated correction based on the portfolio-level forecast model, the latter of course being indirectly inspired by a business objective for a certain demand level or a certain total gross margin that must be earned by a certain point in time. While the term “mixed integer programming model” can sound daunting for those who are not familiar with the term, the concept is reasonably simple. In essence, these models boil down to formulating what you want to maximize and what conditions you need to impose. Within this context, the objective has already been formulated (see Eq. 4.5). The constraints are simply mathematical expressions for the logical constraints that have been formulated. A full introduction to linear programming goes beyond the scope of this book, but the reader who wants to know more is referred to basic texts on operational research. A good suggestion is Introduction to Operations Research by Hillier and Lieberman [2].

sales season broadly starts and ends. Much of this is also inspired by seasonal demand for different types of products.

4.6 Complicating Factors

4.6

99

Complicating Factors

The preceding sections have briefly discussed the basic components that are required for a markdown system. A system that already covers these basics is likely to perform adequately, but there is often still room for improvement. Some key areas where improvements can be obtained will now be discussed.

4.6.1

Operating in Multiple Markets

Larger retailers typically sell their wares in a multitude of markets. The definition of what constitutes a market can differ, but the essence is that conditions are sufficiently different to warrant another approach. For some retailers, different markets may represent different countries in which they operate. Other retailers may consider their online and platform channels different from brick-and-mortar stores. Retailers with very dense store networks may even distinguish different markets within a country or state. The consequence of this fact is that there is not one single demand curve to be defined. Each market should have a separate demand curve fitted, representing the willingness to pay of customers in each of these markets. This in turn may result in very different prices being charged depending on the market wherein products are being sold. Different markets may also represent different variable and fixed costs. Just as differences in demand may result in different prices being charged, differences in operational costs can also have an impact on what constitutes the optimal price. Cases where inventory is shared between multiple markets can give rise to additional complexity, specifically if there is less inventory than would be required to cover the demand in each of the markets. In such situations, it can be desirable to sell at sub-optimal prices in regions that cannot generate the same profits from selling the unit of inventory. In some situations, it may even be desirable to pull products from specific markets completely, moving them toward markets that are more profitable and not even offering the opportunity to purchase certain products in less profitable markets. Again, this situation can give rise to conflicts between what is optimal on a product level and what is optimal on a company level. A certain market may be less profitable than another, but pull too many products from the shelves, and the brand image may be damaged because of empty shelves.

4.6.2

Demand Erosion

Another complicating factor in the markdown decision is that some products tend to lose value quickly during the sales season. Demand itself often displays clear patterns during the markdown season, as was discussed in Chap. 2. For some product groups, these patterns can be more pronounced and systematic.

100

4 Optimizing Markdowns and Promotions

The major consequence of this in light of the markdown decision is that current demand cannot be extrapolated over the remaining lifetime of a product. For products that do not experience demand erosion, it may be assumed that demand will remain at the current level for the remaining lifetime. However, for products that are very seasonal, the demand deteriorates much faster. In this case, it is important to take this into account. In cases where the product responds to price changes in an elastic fashion, it may be beneficial to lower prices even if a linear projection shows that inventory is going to run out before the end of the season is reached.

4.6.3

Combined Discount Types

Not all discounts are simply a reduction in price of a single item. Some retailers prefer to create—or allow customers to create—groups of products which are then sold at a discount. Typical examples include “buy one get one”-style discounts, also known as BOGO. This style of discounts or markdowns creates two problems. Firstly, this raises the question what products are fit to be combined into such an offering. Secondly, the price for such a bundle also has to be defined. The first of these problems has been researched in economic literature as the bundling problem. A common example of this in practice are option packages at car dealers. These have not been constructed for the convenience of the customer. Rather, these combinations of options have been construed to convince as many customers as possible to spend as much as possible. This is also what a retailer wants to achieve by offering this style of discounts. Bundling is often most advisable in situations where there is a strong negative correlation between customers’ willingness to pay for products. Why this is the case is easiest to explain using an example. Table 4.1 shows a small example where a fashion retailer is trying to decide on the right price for dress shirts and sweaters. The reservation prices represent the value that a customer attaches to a given product, the first customer is willing to pay a maximum of e100 for a dress shirt and e50 for a sweater. If the asking price does not exceed this amount, the customer will buy the product. The table also shows the reservation price for a bundle of products, containing a dress shirt and a sweater. The willingness to pay for such a bundle is simply the sum of the willingness to pay for the individual products for each customer. Assuming that this scenario takes place during the markdown season and that the cost of products can be considered to be a sunk cost, the total value is accurately represented by the turnover.9 The optimal price in this case is simply the price that maximizes the total turnover. As always, this price strikes the balance between the value of a single transaction and the total volume of transactions. For the example, selling products separately results in a total turnover of 170 + 200 = 370. However, 9

Assuming there are no transaction costs, for the sake of this simplified example.

4.6 Complicating Factors Table 4.1 Product bundling example with negatively correlated reservation prices

Table 4.2 Product bundling example with positively correlated reservation prices

101

Customer 1 2 3 4 Optimal price Turnover

Reservation price Dress shirt Sweater e100 e50 e85 e60 e50 e95 e40 e90 e85 e50 e170 e200

Bundle e150 e145 e145 e130 e130 e520

Customer 1 2 3 4 Optimal price Turnover

Reservation price Dress shirt Sweater e100 e90 e85 e95 e50 e60 e40 e50 e85 e50 e170 e200

Bundle e190 e180 e110 e90 e180 e360

the bundling option is clearly the better choice—resulting in a total turnover of e520. This assumes that bundling is executed as a “pure” strategy: the products can only be purchased as a bundle (or can only be purchased outside the bundle at considerably higher prices, perhaps even at the price of the complete bundle, making the bundle the de facto-only option). This makes intuitive sense: customers who would not normally buy some products are enticed to do so by creating smart combinations. The cause of this effect is the negative correlation of the reservation prices for individual customers. Customers who are willing to pay relatively high prices for one product have relatively lower value attached to other products. In situations where the reservation prices are positively correlated, bundling can be a bad strategy, as is shown in Table 4.2. The exact same reservation prices have been used here, but they have been spread across customers to make them positively correlating. Hence, there is no difference in the total turnover for the products when selling them separately, which still results in a total turnover of e370. The bundling price strategy has however become much less attractive, not being able to result in a total revenue of more than e360. Naturally, in practice, the reservation prices for all products are not a known quantity. Depending on the nature of the retailer, there are several options for approximating the reservation price. The conceptually simplest way of doing so is simply to ask customers. The technique for conducting such surveys and getting to reservation prices is called conjoint analysis [3], the specific details of which go beyond the scope of this book. Conjoint analysis is not always feasible, especially for retailers who have quickly changing assortments. These often need to tailor to ever-changing trends

102

4 Optimizing Markdowns and Promotions

in customer demand. Hence, the time and associated expense of conjoint analysis is not always possible. In such situations, an approximate method can be to calculate the correlation of customer demand in different product categories or for products with different properties. Any negative correlations that are found in this manner are possible candidates for bundling. These candidates are not automatically suitable of course, since it may be the case that some product bundles are nonsensical. For example, it is likely that lady’s shoes are negatively correlated with jeans for men. However, this does not mean that this would make for a sensible product bundle. Hence, it is required to vet these combinations using product and customer knowledge that should be present in any retail organization. This approach enables the identification of potential bundles, but does not automatically suggest a price that should be placed on these bundles. The challenge here is that the elasticity/demand/price-response models have modeled a market as an aggregated whole. These are not models of the individual customer. The latter would be required to set prices as was done in the simple example presented above. This is a problem that cannot be solved without experimentation. The right course of action would be to make sensible assumptions about the right price range for a bundle based on the constituent products and to vary the bundle price over time or across markets to collect information on demand. Using this demand information, a true demand model can be constructed for bundled products. Once demand for bundled products has been observed, it may be possible to construct models that can make reasonably good estimates for new product bundles. At this point, it is important to note that the bundling problem cannot readily be solved without looking at customer data. All other analyses that have so far been conducted have only used product and transaction data. However, to construct bundles of products that work well, data of customers has to be analyzed in order to create sensible combinations. This does not automatically mean that personal information of customers needs to be known, but the individual customers have to be identifiable. The former has focused on bundling as an instrument for cross-selling: pushing customers toward products that are different in some way from those they are already purchasing. It is also possible to use bundles as in instrument for convincing customers to buy more of the same product. Typical examples are promo packs that contain greater quantities of the same product, the motivation here being to increase the customer spend in a single transaction. This is often advantageous because the variable transaction cost is incurred as soon as a single product is purchased. Shipping one t-shirt or shipping two t-shirts to a customer does not alter the shipping cost. As such, a lower price on the second product is warranted since the transaction cost is already offset by the first product. For readers who want to know more about how to optimize product bundles, the book Optimal Bundling: Marketing Strategies for Improving Economic Performance by Fuerderer et al. is highly recommended [4]. The example in this section was inspired on the examples presented in this text. There is no more recent text available that surpasses this one in its completeness on the subject.

4.6 Complicating Factors

4.6.4

103

Substitution and Cross-Price Elasticity

The pricing of other products can also have an impact on demand. This is another reason why it may be optimal to deviate from what is optimal on the level of an individual product.10 The main challenge is understanding to what extent products are complements or substitutes. The cross-price elasticity of product A with regard to a change in price of product B is calculated as shown in Eq. 4.7. Unlike the normal product price elasticity (see Sect. 2.2), cross-price elasticity can be both positive and negative. If the value of ηAB is positive, the products are substitutes: A price decrease for product B will result in less purchases of product A and vice versa. For negative values of ηAB , the products are complements of each other: A price decrease in one will also cause the other to be purchased more often. ηAB =

dA /dA pB /pB

(4.7)

Within the context of markdowns, the main concern is the identification of substitutes. Complementing products can also be relevant when creating bundles (Sect. 4.6.3), but this is a less common concern. Another potential use for substitute products is list pricing, especially in situations where the goal is to increase market share and to focus on customer lifetime value. In such a context, the identification of complements can be an interesting strategy for new customer acquisition, after which efforts are made to further increase customer value by means of a long-term customer relation. While substitution effects are undoubtedly present, they are very hard to quantify for a typical retailer. The number of product-to-product comparisons increases exponentially as the number of products increases. Even for relatively moderately sized product collections, this means that a “brute force” approach that calculates all cross-price elasticities will result in many spurious results. This means that a different approach is required to determine if products are substitutes or not. To this end, it is possible to use product properties as a way of identifying products that are arguably close neighbors based on product properties. Two pairs of sneakers in the same price category that are both white can be clear substitutes of each other based on such properties. This is a typical context where clustering and nearest neighbor algorithms can be relevant. The main goal of this will be to greatly decrease the number of possible substitute candidates, resulting in less spurious observations of positive cross-price elasticities. Another method of comparing substitutes makes use of behavioral data. By tracking users on a webshop, the products they have compared can be identified. This may be because users have viewed product detail pages or even have added

10 The

first reason being company-level objectives and constraints, as has already been discussed extensively.

104

4 Optimizing Markdowns and Promotions

multiple products to their shopping basked before only buying a subset of products. Another source of information can be product returns, specifically in the case where customers buy multiple products to try and send back what they found to be the less attractive options upon closer inspection. While these techniques make it possible to form clear hypotheses on which products are substitutes, this information is not sufficient to serve as full-on proof of such a relationship. Rather, these results should inspire experimentation where the aim is to learn true cause-effect relationships about product substitution. While it may never be possible to gather timely information about individual product interaction, it may be possible to derive specific substitution effects for specific product categories, types, and properties. Once such information is made available, the next step is to adjust the markdown optimization algorithm. This means that a single price no longer influences a single product, but that a single price has an effect on a multitude of different products. This will unavoidably add to the complexity of the optimization model and may lead to a need for different heuristics. Alternatively, the knowledge of cross-price elasticity can be used to formulate additional business rules. The latter is a less intrusive way of adjusting markdown price optimization systems.

4.6.5

Virtual Stockouts and Low Inventory

The nature of demand for product can change before a product is completely sold out. This can be because small quantities of products are not as visible as large groups of products. Also, it can be the case that the most popular sizes or colors of a product have already sold out, causing a corresponding decrease in demand. Likewise, products can be sold out in some locations, increasing the time required for getting the product to customers—which in turn increases the threshold for making a product purchase. Situations such as these are often dubbed “virtual stockouts”; there is still some inventory available, but demand is being influenced by a lack of inventory regardless. Not taking these effects into account can cause markdown systems to go haywire. Depending on the context and the strategy of a retailer, it may be desirable to do different things. For some products, it can be preferable to launch very big discounts to clear out products as soon as this slowdown starts. Motivations for this are clearing out shelf space for products that can earn a better return than the current product. Along the same vein, this can be done to avoid disappointing customers who might see an interesting product but can no longer find it in their size. Alternatively, it can be the case that it is undesirable to discount these products just because they are running low. At times, this can have been highly successful products, which are only remaining in stores that are performing sub-par. Under these conditions, it can be normal that products need some more time to be sold, and this may not require an additional discount to spur demand. Along the same vein, it can be the case that a retailer does not want to discount a product because only the sizes on the extreme end of the spectrum remain, the idea

4.7 Running Markdown Experiments

105

being that these customers, though fewer in number, are not necessarily willing to spend less on products. Hence, the implicit assumption is that the demand of these consumers is relatively price inelastic. This implies that lowering the price of such a product is ineffective. Due to low data volumes at the moment of virtual stockouts, a data-driven answer to what is the right tactic is hard to come by. Hence, the decision on what to do for these products often comes down to a judgement call. Much depends on the opportunity costs for space, the brand image, as well as the overall breadth of the assortment.

4.7

Running Markdown Experiments

When a new markdown system is introduced, it is often desirable to benchmark the new approach against the current best practices. While this is a valid idea, making mistakes in the experimental design can have undesired effects. This section elaborates briefly on the chief risk factors when conceiving such an experiment. More general guidelines for setting up a good experiment can be found in Chap. A.

4.7.1

Single and Fixed Objective

The first thing that has to be locked in is the objective of the system. This often requires some in-depth discussions because human decision-makers tend to have a substantial number of proximate objectives that they pursue jointly. This can include ideas on a reasonable amount of leftover inventory at the end of the season or a specific sell-through that has to be achieved for certain products. Other alternatives include a certain minimal amount of margin to be achieved on transactions for a certain product category. Algorithms on the other hand need to have a single clear-cut objective to optimize. The latter objective can be a combination of many elements, but these need to be weighted in a sensible fashion. On this front, there needs to be agreement to make sure that the end results of the system can be compared. For more information, the reader is referred to Chap. 1 where various objectives are discussed in depth. At times, it may become apparent that the initially formulated objective was incomplete. Perhaps some of the costs have not been calculated correctly or even included. Changing the objective during the course of the experiment is however highly likely to invalidate the experiment. It may still be possible to draw some conclusions by analyzing the data, but statements about cause and effect are likely to be conjecture.

106

4.7.2

4 Optimizing Markdowns and Promotions

A Good Split of Test and Control Groups

An experiment will require that products be divided into test and control groups. This can be done in a number of ways, but some methods are superior to others. In essence, the process should be random, and it should result in two groups of products that are as comparable as possible. Selecting product categories to be part of either the test or the control group is often not a good way of creating groups. These groups are by definition different in their nature. This means that it is likely that some groups are easier to discount than others and that some are also easier to improve than others. Creating an experiment where one group comprises kids fashion and another women’s fashion is unlikely to yield generalizable results. Assuming that the objective is to maximize turnover, it will be hard to compare the final turnover achieved in women’s fashion versus that in kids fashion. For similar reasons, a geographical split of the testing and control groups is not desirable. Such a setup will always be subject to external effects that cause differences in performance and cannot be corrected without running the risk of making mistakes. In this context, it may even be preferable to run a smaller experiment in a single region, rather than a larger experiment that has a geographical split between the test and control group. The overall best practice is typically to randomly assign single products to either the test or control group. In doing so, the groups can be stratified (see Chap. A) to make sure that important differences are equally represented in both groups. It may be desirable to limit the size of the test group to lower the risk of the experiment. When doing so, it is advisable to estimate the power of the experiment (see Sect. A.3) to ensure that the group is still large enough to draw relevant conclusions.

4.7.3

Avoid Contamination of the Control Group

In the case where the benchmark is a human decision-maker, the price changes provided by the algorithm should not be shown to this decision-maker. By seeing what the algorithm is doing, the decision-maker can consciously or unconsciously change his decision-making behavior. This again is a major risk for the validity of the experiment. In practice, this often proves to be difficult. One manner of dealing with this risk is to automate the rules of thumb of the human decision-maker. This has the additional advantage of gaining information on the precise reasoning and is likely to uncover certain business rules that are of importance.

4.8 Promotional Discounts

4.7.4

107

Big Differences

When conducting the experiment, it can be useful to define certain limits to the freedom of the algorithms and decision-makers for both groups. Assume, for example, the situation where a new algorithm is much less aggressive in setting discounts, because it has calculated that lower discounts are preferable at the moment. The human benchmark on the other hand decides on a much more aggressive set of discounts. For most types of retailers, this will give rise to a shift of demand from the test group to the control group because of the much greater discounts. If the complete collection had received lower discounts, customers might have purchased an equal amount of product at higher prices, but when presented with a cheaper alternative, customers will prefer that over the more expensive option. To avoid this, it is best to set boundaries as to the global levels of discounts in either group. This has the additional advantage that the risk of leaving pricing decisions to an untested system is decreased. Typical bounds include a minimal and maximal number of discounted products, as well as a target for total discounted value (see Sect. 4.2). If relevant, some bounds can also be set on a category level, indicating, for example, that basic products should never receive large discounts. A disadvantage of setting such bounds is that it places arbitrary constraints on the possible potential of a markdown algorithm. It may be the case that operating with much lower discounts is better, but if the algorithm is forced by a constraint to give deeper discounts, this potential is left untapped. In spite of this, a good algorithm should be able to outperform human decision-making even within these constraints. Relaxing such artificial constraints is something that can then be tried in a next iteration of the system.

4.7.5

Do Not Continue Testing Indefinitely

As a final risk in markdown experiments is the danger of continuing to test the same thing. Once an algorithm has been proven to work equally well or better than the current solution, the next step should not be to run the same experiment again. Rather, the goal could perhaps be to test multiple variations of the algorithm against each other. This can be inspired by data analysis performed on the results of former experiments. These results are likely to provide good input for formulating some hypotheses that can be tested and improved in next iterations.

4.8

Promotional Discounts

A standalone chapter on promotional pricing would include much that has already been covered in the preceding discussion on markdown pricing. Specifically, the general architecture, predictive models, and experimentation methodology are

108

4 Optimizing Markdowns and Promotions

highly similar in nature. Hence, to avoid repetition, this section focuses on what is different about promotional discounts and what adjustments and extensions might be needed to the methods that are used for markdown management. This section consists of three parts. First, the purpose of promotion prices is discussed and positioned relative to other pricing decisions. Next, the extensions to the models used to estimate the demand response are elaborated. Finally, the challenge of selecting a set of products to be part of a promotion campaign is discussed.

4.8.1

The Purpose of Price Promotions

Price promotions can be viewed as a middle ground between dynamic list pricing (see Chap. 3) and markdown prices. Unlike markdown prices, a promotional discount is impermanent; after a certain amount of time, the discount is stopped, and the product is again sold at normal price levels. This fleeting nature is similar to the changing nature of dynamic pricing systems. However, a promo price differs mostly because no change is made to the sticker price of an item. This has a dual purpose. Firstly, the goal is to prevent erosion of the perceived value of the product in the eye of the consumer—i.e., the reservation price. Secondly, the advertising of a product as discounted has a presumably positive effect on demand. Especially given the temporary nature of a discount, it can be expected that customer demand will be higher than would be the case for a decrease of the list price of the product. Promo prices also serve the purpose of encouraging global demand, beyond the positive effect that these give to individual products. Many retail stores are in the habit of launching promotional campaigns throughout the year. These are typically timed at moments in time when there would otherwise be natural lulls in demand. For fashion products, this is the period after the new collection has been stocked for a few weeks, but the discount seasons is still a while away. Modeling this dynamic is best done using an architecture that is similar to the one presented in Sect. 4.4. Hence, promo prices should be set both in a manner that is sensible on the level of an individual product and in a way that creates an attractive offering to attract groups of customers. The expectation is often that such campaigns will increase overall footfall in stores, which will result in sales of items that are not promoted as well as items that are promoted. The true effect is of course dependent on many factors, and it is often worth testing these claims. While inherently distinct from markdown, promo prices can at times be a precursor to the markdown step. Specifically for products that are underperforming, it may be clear quite quickly that the inventory levels will be too large to sell by the end of the product’s lifetime. Under such conditions, these products are also excellent candidates for promo pricing, as there is a much lower opportunity cost to charging reduced prices for these products. There is also some overlap between differential pricing (Sect. 3.5) and promotional discounts. When using tools such as coupons, a discount price can be an effective tool to implement differential pricing tactics. The form of promotional

4.8 Promotional Discounts

109

pricing discussed in this section is however more concerned with price promotions in a broader context, which are typically broadcasted to all customers.

4.8.2

Estimating Promo Effects

Deciding on a promo price implies estimating the effect of a price on the demand. To this end, predictive demand models have to be created. In order for these models to make accurate forecasts, they have to be provided with information on historical promo prices. Simply re-using generic models that have been trained using regular price change and markdown prices is unlikely to provide the desired levels of accuracy. On the other hand, it can be limiting to only use data from historical price promotions, as there may not be a sufficient number of observations. As such, it is often most interesting to create hybrid model. Such a model uses all available data on transactions, but adds a number of variables that measure the impact of price promotions. Intuitively stated, such a model would be capable of understanding that certain products are inherently very responsive to price promotions based on historical sales of the product or similar products. Likewise, the model is also capable of estimating the additional effect that a promotional price typically has on a price-elastic product. Combining these two lines of reasoning should lead to a superior model when compared to a model that only falls back on sparse historical data during previous price promotions. To create a good-performing model, the information provided on promo prices has to be presented in a structured fashion. Simply indicating which transactions were conducted using a promo price typically does not result in good performance.11 Exactly what information is required depends on the specific context of a retailer. The list below can be used as a starting point for the feature engineering process:12 • Applied promo price: While often not sufficient to make adequate estimates about the price response, the exact discount remains important and should be included in the set of independent variables. Depending on the situation, this can be reported as a percentage or as an absolute value. • Global promo activity level: The amount and size of active promo prices in the rest of the portfolio is also important. This represents two opposing forces. On the one hand, a larger number of promoted products are likely to yield a bigger increase in overall footfall and demand. Inversely, if many other products are discounted, there are many other products to compete with for attention, which may have a detrimental impact on sales. It can be a modeling choice to also

11 In

other terms: Simply adding a dummy variable that indicates if a transaction is a promo price transaction is not sufficient. 12 The process of selecting and formatting the independent variables of a predictive model.

110

4 Optimizing Markdowns and Promotions

include this in the model that predicts the aggregated demand for the next periods (see Sect. 2.4.1). • Advertising visibility: Retailers will often advertise promotions using print advertising, display ads, television commercials, and other types of media. The global spend in this area is likely to influence demand and footfall and should be included in the estimates of the global demand. On a product level, it is often also important to account for the visibility of a specific item in such campaigns, as this also impacts sales. A product that is displayed on the cover of a promotional folder is likely to experience rising demand. The level of visibility should therefore be quantified when making estimates. Creating adequate features can often be challenging, because of both a lack of historical data and the complexity in designing good workable features. • In-store visibility: Closely related to the visibility in marketing campaigns is the visibility of the product in stores themselves. For physical stores, this can imply that a product is present in the display windows facing the streets. There may also be dedicated displays of the product within the stores, and the manner in which the product is stored is also likely to have an impact. Within a supermarket context, for example, products that are stocked at eye level sell in much bigger amounts than products in other locations. This effect is reinforced when using promo prices, and promo prices that are highly visible will have a larger effect. Likewise, in a digital store, measures of visibility can be created. Some products may be visible on the homepage or immediately at the top of a page when selecting a product category or when searching for common terms. • Shopping ads: Many retailers will also spend a certain budget to make a product more visible on services like Google Shopping or on platforms where they are selling products. The amount that is spent here should also be taken into account when modeling the impact of demand. Creating good inputs for predictive models is one of the most challenging things for a data science team. This process often requires careful consideration of what could plausibly have an impact, as well as how this information can be structured in a workable way for a predictive model. An excellent resource on this topic is the book Feature Engineering and Selection by Kuhn and Johnson [5].

4.8.3

Selecting Products for Promotional Discounts

Promo price optimization often cannot be viewed separately from the selection of products to promote. The general outline of the promotion campaign will often be determined by the marketing/sales department of a retailer. Examples include a campaign focusing on winter coats that is timed to coincide with the first cold stint or a campaign focusing on back-to-school products at the outset of a new academic year. This theme will often be combined with preconditions as the total amount of products that should be discounted in order to form a sufficiently attractive value proposition to customers.

4.9 Conclusion

111

Which products end up being selected and at what prices they are going to be sold are something that can be based on a more analytical approach. Essentially, this is a combinatorics problem: What selection of products needs to be selected and at what price in order to maximize total expected profits? The solution methods to solve such types of problems are outside of the scope of this book, but the interested reader is referred to introductory works on linear programming [6] and metaheuristics [7]. The precise elements that have to be included in this optimization problem are of course different for each: • Expected additional revenue on a product level: The promo price-response model for each product can make an estimate of the likely sales of the product at any given price level. This information is to be used to make the right combination of products that tend to maximize overall earnings. Note that this does not mean calculating the optimal value for each individual product, as compromises are likely necessary to satisfy the general properties of the promotional campaign that have been set out. • Future lost sales due to limited inventory: Specifically for cases where inventory cannot (easily) be replenished, there may be a loss of revenue because products are sold at a discount, while they would like have been sold at full price at a later point in time. • Lost revenue due to substitution: There is also the possibility of substitution effects. Assuming two similar products one of which is promoted, demand is likely to shift to this product from the product that is currently not being promoted. This is often one of the hardest effects to quantify. • Lost revenue due to demand shifts: Promotions can also cause demand to shift over time by customers making purchases earlier than intended or by hoarding products that are not perishable. This effect works against the intention of promo prices, which essentially aims to increase demand—not shift it over time. It is important to note that this assumes that only the product selection decision is made by this model. The overall magnitude of the promo and the impact that the promo is likely to have on overall footfall are assumed to be external to this model.

4.9

Conclusion

Most retailers have a love-hate relationship with promotions and markdowns, understandably so, as these tools can wreak havoc when wielded haphazardly. Still, the competitive landscape is often such that their use cannot be avoided. Like it or not, most customers simply love a bargain or at least a perceived bargain. If promotions are inescapable, the next best thing is to be smart about how these tools are used. The goal of this chapter was to introduce tools and techniques to this end, enabling retailers to employ these tools to the greatest possible benefit of the organization.

112

4.10

4 Optimizing Markdowns and Promotions

Markdown Terms Glossary

The field of markdowns tends to come with its own set of lingo. The list below presents an overview of commonly used terminology in this context. Markdown

Inventory price value Inventory cost value Discounted inventory value Rotation speed Remaining lifetime

Expected overstock

Value at risk

Expected turnover

Velocity change

The end-of-life discount offered on a product; expressed as a percentage of the original sales price of the item. Inventory valued at the full sales price of articles. Inventory valued at the purchase cost (excluding VAT). Inventory valued at the current discounted sales price. The current sales velocity of a product as a number of units per time period. A product is typically linked to a specific lifetime or sell-by date; this indicates what the latest possible period in time is that the product can be sold. Calculations typically assume that demand beyond this period is reduced to zero. The expected number of units that are expected to remain at the end of the period during which the product can be sold. This is amount is typically calculated assuming the current strategy is left unchanged and is based on the current level of inventory and the rotation speed. This is the expected overstock expressed in the currently listed sales price—i.e., the price minus the markdown. This measure allows for sorting products based on the sales value rather than on unit values, which can be important when working with product portfolios that have substantial differences in price. The total turnover over the complete lifetime of a product which can be expected when markdown values are not changed. The difference in sales in absolute numbers from the previous two periods. Effectively if the previous week a product sold 100 units, and the week before that it sold 100 units, the velocity change is −50 (minus fifty), expressing that the sales are slowing down for this product. This measure is typically expressed in absolute numbers since percentage figures would channel disproportionate attention to

References

Total discount value

113

very-slow-selling products, polluting visualizations that are created with that measure. A measure that quantifies how much overall discounts are given to clients by calculating the total monetary value offered as a discount. This is defined as sales price x markdown x current inventory in units. Calculating discounted value in this manner gives a better understanding than merely averaging overall markdown values since this might be heavily distorted by products that have lower sales prices and/or lower stock values.

References 1. Taylor, S. J., & Letham, B. (2018). Forecasting at scale. The American Statistician, 72(1), 37–45. 2. Hillier, F. S., & Lieberman, G. J. (2019). Introduction to Operations Research (11th edn.). New York, NY, USA: McGraw-Hill. 3. Rao, V. R. (2014). Applied conjoint analysis. Springer. 4. Fuerderer, R., Herrmann, A., & Wuebker, G. (2013). Optimal bundling: marketing strategies for improving economic performance. Springer Science & Business Media. 5. Kuhn, M., & Johnson, K. (2019). Feature engineering and selection: A practical approach for predictive models. CRC Press. 6. Vanderbei, R. J. (2014). Integer programming. In Linear programming (pp. 345–362). Springer. 7. Gendreau, M., & Potvin, J.-Y. (2010). Handbook of metaheuristics (Vol. 2). Springer.

Part II Inventory Management

Product (Re-)Distribution and Replenishment

5.1

Inventory Management as a Profit Driver

Allocating inventory to the right location is a core task for a retailer. The essence of retail is getting products to customers who are willing to purchase them, ideally generating the largest possible profit while doing so. The retailer is rewarded for expertise in selecting the right products, dealing with suppliers, and getting products to the right location. Today, it is hard to envision a reality without trade, having to create everything needed to cover human wants from scratch. The standard of living to which modern man has grown accustomed is not achievable without this middle man. In spite of this, the importance of distribution is often undervalued, not in the least by retailers themselves.

Inventory management covers the decisions that take place after the purchasing decision. This can be further subdivided into three problems. The first is the initial distribution of products to specific sales channels, before any sales have been observed. Depending on the nature of the product, there can then be a need for replenishment: buying more products from suppliers to replace products that have been sold. The challenge here is to search for the right balance between service levels and costs of inventory. Perishable or seasonal products on the other hand are not replenished, but are often redistributed across the store network. This act is motivated by the limited lifespan of products, which means that inaction in the face of superfluous stock destroys value.

The objective of the inventory allocation decision is simple: place products in the location where the expected value they are likely to realize is maximized. The expected value of a product being the probability of making a sale multiplied by © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 L.-P. Kerkhove, Data-driven Retailing, Management for Professionals, https://doi.org/10.1007/978-3-031-12962-9_5

117

5

118

5 Product (Re-)Distribution and Replenishment

the gross margin generated in a sale. This is not a difficult principle. Difficulties do however arise when translating this objective into specific actions; i.e., what product should go where? This chapter provides a frame of reference to facilitate the process of getting from this objective to improved decisions. In keeping with the theme of this book, this chapter focuses on how inventory management can be improved by leveraging data. This does not go to say that traditional ways of managing inventory have become useless. There are situations where the impact of advanced data-driven techniques is limited. These situations are characterized by comparatively simple store networks, a small number of products with long life cycles, and high and stable demand. The most well-known traditional technique is the economic order quantity, which is almost a century old at the time of writing [1]. This chapter starts by sketching the traditional retailer’s perspective and metrics for inventory management (Sect. 5.2). Next, a simple framework is provided for thinking about inventory management problems (Sect. 5.3), which makes it clear that traditional inventory management only covers part of this space effectively. Next, some attention is spent on discussing estimates of lost historical sales (Sect. 5.4), before continuing to the two major components of data-driven inventory management: the sales forecast (Sect. 5.5) and the optimized allocation (Sect. 5.7) in light of this forecast.

5.2

The Traditional Retailer’s Perspective on Inventory Management

Retailers often create visibility on the health of inventory using some basic KPIs. Products that are purchased once per season are evaluated by their sell-through rate (Eq. 5.1). This ratio represents the fraction of the initial purchase that has been sold. This value is often compared to benchmarks at different points in time. As an example, the sell-through of a product that has been available for 4 weeks should be at least 20%: Sell-through =

nb products sold nb products purchased

(5.1)

A similar metric for products that are continually restocked is inventory turnover (Eq. 5.2). This metric quantifies how often the available inventory is sold over a given time period—often 1 year. Depending on the type of retail environment, there are often clear benchmarks for this value. If inventory turnover is too low, it means

5.2 The Traditional Retailer’s Perspective on Inventory Management

119

that the average inventory level should be reduced, if possible. Alternatively, it might be a reason to eliminate the product from the assortment:1 Inventory turnover =

sales in period average inventory in period

(5.2)

Both these metrics provide useful information about the inventory position of a retailer. If sell-through or inventory turnover is too high or too low at a specific location, action can be taken to add or remove inventory from a specific location. However, there are some areas where these traditional measures fall short. These shortcomings are telling for the general approach taken by retailers, and the inefficiencies and blind spots that this creates. Firstly, these metrics are mainly suitable for products that have high sales volumes. This contradicts the reality of many retailers today who are catering to long-tailed customer demand by providing broad assortments. This difficulty is augmented by an increase in the diversity of sales channels. Today’s retailers are selling products in physical stores, their own website, as well as marketplaces. Each of these often requires an inventory position to be associated with it. This fragmentation implies that there are more small sources of demand that have to be accounted for. This in turn corrodes the relevance of traditional metrics and algorithms. Secondly, these measures are mainly suited for reactive inventory management. There is no support for the initial guess as to what the right inventory level should be at a given location. The focus lies on correcting imbalances, rather than preventing imbalances. This is fine for products that have long life cycles, but for products with short life cycles, there may not be sufficient time to correct the imbalance. Thirdly, it is often difficult to separate the performance of the purchasing decision from that of replenishment decisions. Bad performance of a product can be due to incorrect volumes of products being purchased just as much as incorrect allocations of products. A popular analogy likens inventory to a high water level that hides problems beneath the surface. If there is too much inventory overall, it will not matter that products are not allocated correctly relative to demand. These problems can however become painfully visible if purchasing is improved by better forecasting. A fourth and final criticism is that while the direction of actions can be clear, the optimal decision is often not. It may be the case that one location has a bad (low) sell-through, while another has an excellent (high) sell-through. This implies that it may be valuable to move product from the former location to the latter. This is the point where many retailers resort to rules of thumb, for example, moving the

1

This is formulated quite bluntly and there may be many reasons to keep a slow-rotating product in the assortment. One might be wanting to provide a certain level of service to customers by being a one-stop shop. Alternatively, it may also be the case that a single sale is highly profitable, making it interesting to keep the product in the assortment even if sales are few and far between.

120

5 Product (Re-)Distribution and Replenishment

amount of product that results in an equal sell-through at all locations. This is not always sensible and may have severe downsides. In spite of performing worse at a location, it may still be the case that all inventory will be sold at full cost (in the case of a generally popular product at an underperforming location). Moreover, there are also costs associated with moving products that should not be ignored; this includes the expense of moving but also the possible loss of revenue during the period spent in transit. The main conclusion is that managing inventory based on these measures is likely to fall short, especially when competing in a faster-paced environment. The approaches discussed in this chapter mainly aim at addressing the four limitations that have just been highlighted. This is done by taking a more granular and datadriven approach to decision making.

5.3

Data-Driven Inventory Management Framework

The task of inventory management starts after the buying process, though there may be some consideration given to logistics even during the buying process. However, since there is most often a certain lead time between buying and delivery of product, it is valuable to consider the inventory management process as separate in order to use as much progressive insight as possible. In other words, do not waste knowledge gained between the moment of ordering and the moment when products are delivered. Different models can be applied in the context of inventory management. The right model depends on the task being performed and is likely to be different across products as well as over time. A simple frame of reference can be provided by considering the three major contexts in which inventory allocation decisions have to be made: • Initial inventory distribution: This is the first delivery of new products to sales channels. At this point decisions often have to be made based on historical data, often using products that have similar features to estimate the strength of demand at a specific location. This decision might also include holding back some part of inventory in a centralized location to serve as a buffer. • Inventory redistribution: Once a product has been selling, the problem changes in nature. Demand patterns for the specific product become available, which allows for better estimates for future demand to be made. At this point decisions can also be made to pull products from specific locations and to move it to other locations. This can however come at a significant cost to the retailer—both in terms of transportation cost and a loss of time a product spends on shelves where it can potentially be purchased. • Continuous replenishment: Products that can be reordered, and which are present in the assortment for longer periods of time, need to be replenished continuously. There may still be a need for redistribution if the inventory buffers at certain locations are excessive, in cases where the costs of redistribution are low when compared to the loss of revenue due to offering a promotion on a product to encourage freeing up shelf space.

5.4 Correcting Demand to Account for Lost Sales

121

Table 5.1 A simple framework to understand the type of model that is relevant for different inventory-related decisions Demand forecast

Inventory allocation

Initial distribution Predict confidence interval based on historical performance of similar products (Sect. 5.5.1) Analogous to the news-vendor problem, balance over- and understocking costs (Sect. 5.7.1)

Redistribution Adjust previous forecasts based on observations of demand using Bayesian updating (Sect. 5.5.2) Using a graph representation of the store and warehouse network to solve a variation of the shipping problem (Sect. 5.7.2)

Replenishment Use time series specific models to forecast demand mainly based on historical data (Sect. 5.5.3) Variations of traditional EOQ models (Sect. 5.7.3)

Each of these subproblems requires a way of estimating demand, and a method for optimizing the allocation of products. This duality can be used to create a simple framework to categorize the models to support retail inventory management, as shown in Table 5.1. A retailer wanting to improve the use of inventory should consider in which of these areas the greatest potential for improvement can be found. This will depend on the nature of the product assortment, as well as the currently employed techniques.

5.4

Correcting Demand to Account for Lost Sales

Retailers are well aware of superfluous inventory. Problems in this area are clearly visible in financial statements. Moreover, a heap of unsold products is physically visible in warehouses or stores. This visibility often leads to sufficient management attention being spent on preventing or dealing with excess inventory. Inventory levels that are too low are often much less visible. Products have flown off the shelves and a solid profit has been made. Often this is celebrated as a success, without considering the amount of money that is left on the table because more inventory could have been sold. The hard question is: how much additional inventory would have been sold if more products would have been available for consumers to buy? The invisibility of lost sales can cause significant problems for both purchasing and inventory management processes. Naive approaches tend to copy the distribution of last year, possibly with an added growth objective. The result of this is that channels that have been underserved in the past are likely to continue to be underserved in the future.

122

5 Product (Re-)Distribution and Replenishment

When algorithms and automations are used, the need for a solution to this problem increases. Human decision makers will continue to think critically about the number of products that are purchased and allocated. In cases where they believe that a store should be able to sell more of a certain product, they will increase the allocation accordingly. Algorithms are not capable of this kind of lateral thinking. Historical demand must therefore be corrected upward for every product and channel in order to correspond as closely as possible with the true customer demand—as opposed to the fulfilled customer demand. This corrected demand is then what the forecasting models in the top row of Table 5.1 are going to predict. Without this correction it is inevitable that historical underestimates will continue to propagate. As always when creating such models, it is important to validate that models function correctly. The simplest manner of doing this is by testing the models on products that did not experience a stock-out. Comparing what the model predicts would happen to the real demand provides a good measure of performance for the model. Techniques that can be used to model this corrected demand will now be briefly discussed. A distinction is made between models that are suitable for products with high sales volumes (Sect. 5.4.1), and products that have low sales volumes (Sect. 5.4.2).

5.4.1

Regular and High Sales Volumes

Corrected demand is easiest to predict for products that have stable and high sales volumes. This implies that there is demand in every observed time period and that demand is not overly erratic. The latter can, for example, be caused by very big customers skewing the demand for a specific product with very large orders. Systemic effects due to weather or holidays are often not problematic, since these can be foreseen and the impact of these can be quantified. The right model to use depends on the context. Three commonly used approaches will now be discussed briefly. The objective of this book is not to discuss the mathematical details of these models but rather to focus on when a certain approach is applicable or not. For the reader who wants to know more about these techniques, references are included.

5.4.1.1 Traditional Time Series Models Complexity is not always required to obtain a good result. For some products it may even be sufficient to use the historical average as a prediction for future demand. In situations with no external shocks and a product that has a very long shelf life, this may be perfectly adequate. Traditional time series models go one step beyond this, and are capable of handling trends and seasonality when making predictions. The most widely used techniques include Holt-Winters’ approach for smoothing and accounting for sea-

5.4 Correcting Demand to Account for Lost Sales

123

sonality. Along the same vein, autoregressive models can be used; these techniques apply a regression on past values to predict future values. Explanations of these techniques can be found in most statistics textbooks. A good reference that is specific toward the domain of data science is Practical Time Series Analysis by Nielsen [2]. This book also includes code examples, making it very easy to test out these techniques.

5.4.1.2 Analyst in the Loop While generally still useful, traditional techniques have two key drawbacks. Firstly, they can be hard to implement at scale. If there is a large portfolio of products, tuning a forecasting model to each product can be very computationally (and labor) intensive. Secondly, these models typically do not deal well with artifacts2 in the data. An artifact in this case is a type of event that likely changes demand, but which is not present in the time series itself. A popular solution to this problem has been suggested by research teams at Facebook in the form of the Prophet algorithm [3]. The approach combines scalability with the possibility for analysts to make relevant adjustments to the process, without the need for a large background in statistics. Key inputs that can be provided by the analyst are: • Market size: Knowledge about the overall magnitude of demand for a certain product, which can be specific to a certain period in time. For example, an analyst can estimate that it is unlikely that more than 5000 white dress shirts are going to be sold over the summer months. • Change points: This is a form of data enrichment, where analysts can indicate points in time after which demand is likely to have changed structurally. This could, for example, be a new version of a product being released. Another example could be a renovation of a storefront—which is likely to attract more customers. • Holidays and special events: This is another form of data enrichment that allows analysts to highlight temporary effects. The simplest example is a bank holiday, but this may also include things such as forced store lockdowns—as was seen across the world during the COVID pandemic. The ability to provide these inputs often results in an increase of performance when compared to traditional techniques. It depends on the context if this performance increase is sufficiently relevant to invest in this kind of analysis. It must be noted that this algorithm is not unique and that many other variations and possible approaches exist. However, the advantage of this algorithm is that it is widely used, well documented, and available as an open-source piece of software in the most popular programming languages. 2

An example could be a temporary store closure for renovation works that has taken place last year. A simple model could view this as a seasonal pattern that can be expected to repeat itself in the current year.

124

5 Product (Re-)Distribution and Replenishment

Demand (#)

150

100

50 Demand for product A Demand for product B Reconstructed demand for product B

0 0

5

10

15 Time period

20

25

30

Fig. 5.1 Example of the principle of using related time series to reconstruct sales after an out-ofstock event

5.4.1.3 Causally Related Time Series Especially for relatively complex sales patterns, an alternative is the use of related time series to reconstruct what would have happened if there had been sufficient inventory. This approach does not require as many manual inputs as the “analyst in the loop” approach, the main limitation being that correlated time series must be present, and the correlation must be strong enough to warrant using it for predictions. Figure 5.1 illustrates the basic principle behind such models. The sales of two products are shown. For product B the inventory is depleted after the 20th time period, after which sales drop to zero.3 It can however be seen that the sales of product A are historically strongly correlated with the sales of product B. Hence, the observed sales of product A can be used to reconstruct the sales for product B. The correlated time series does not have to be the sales of other products. Things like marketing spend, weather conditions, macroeconomic conditions, and many others can be included as predictors. This goes accompanied with the common warning label against overfitting. A solid experimental design, testing these forecasts on historical series that did not really experience stock-outs, is needed to ensure that the model works correctly.

3

This is a simplification, as it is likely that demand will fall back before inventory drops to zero.

5.5 Demand Forecasting Models

125

Readers who want to know more are referred to the work done by Google researchers Brodersen et al. [4]. This paper is also accompanied by an excellent implementation available for immediate experimentation: causal impact.

5.4.2

Low Sales Volumes

Predicting lost sales for low sales volumes and erratic demand is challenging. Models that are designed for use on time series will tend to perform badly. If there are frequent periods with no demand, a model often performs well on paper by predicting that a product is never sold. Erratic and clumped demand patterns are often impossible to predict. The most fruitful route in this case is to keep models as simple as possible. This implies that demand is aggregated across multiple time periods, rather than looking at weekly demand looking at demand on a monthly basis, for example. Demand can also be aggregated across channels to create more stability in the sample. Often this means a departure from traditional time series models. The goal is no longer to predict the sales for each subsequent period. Rather, the objective simply becomes to estimate the total demand for the remaining time horizon. This aggregation reduces the amount of estimates that have to be made, and generally improves the accuracy of the result. This makes intuitive sense since it is easier to predict how many times a product will be sold in a full year than to get it right for every week of the year. Even so the best performance is often obtained by keeping the models as simple as possible. Using advanced machine learning models will likely only lead to overfitting. It must be understood that this demand is not highly predictable. A retailer’s portfolio will often consist of both high and low sales volume products. Hence, it is likely that different models will have to be combined in order to correct demand for the complete product portfolio. The best approach is usually to invest more into the models for high-volume products and test some simple heuristics for the low-volume products. Where the line should be drawn between these two categories should be determined experimentally.

5.5

Demand Forecasting Models

Demand forecasting is often the first application to be quoted when thinking about data science in a retail context. The reality is that a blind application of data science models only works in situations where there are large sales volumes and a good amount of sales history, luxuries that most retailers do not have. Creating models that work requires being aware of the context of a retailer, as well as the manner in which the output of models will be used in practice. The most important piece of context is the kind and amount of data that is available. Hence, in the context of replenishment, there is never a single demand forecasting system. Rather, the system is adjusted depending on the application for which the result is used.

126

5.5.1

5 Product (Re-)Distribution and Replenishment

Forecasting Without Observed Sales

The hardest context to make demand predictions is a new product introduction. Under these conditions there is no historical sales data for the product being added to the assortment. This does not mean that the sole option is to revert to guesswork. A good analogy can be found in the domain of real estate. Since no two properties are identical, a realtor faces the challenge of pinning a price on the property, effectively trying to guess the demand for this specific property under current market conditions. Retailers in such situations can often be found discussing the “comps.” This is realtor speak for comparable properties that have recently been sold on the market. Naturally, these properties will be different in some aspects, and the impact of these differences must be estimated to get to a reasonable estimate. An example could be a slightly better location, a more recent renovation, or the fact that there is a pool. The approach to forecasting products in a context with no historical data is in many ways similar. Since there is no historical data for the new product, the method will be to look at historical sales for products that show a certain degree of similarity to the products being introduced. What constitutes similarity depends heavily on the product the retailer is selling. In general this boils down to a selection of categorical product properties (color, brand, audience,. . . ), in combination with the price at which the product is offered. These product properties can be provided to a forecasting model to predict what the likely demand for a product will be. To illustrate this the example used in Chap. 2 will be reused. This is a dataset from a moderately sized fashion retailer, who is confronted with new product introductions on a regular basis. Specifically, the focus will be on one of the bigger categories: T-shirts. The experiment is set up by using data from two consecutive years. The first year will be used to train the model; the second year will be used to test the quality of the predictions. This setup is important to ensure the validity of the results. A training and test group that uses a split of products in the same season will not be able to avoid data contamination. This is a technical term that signifies that information about the test set is already included in the training data. In this case the demand for T-shirts is likely to be correlated during the same season, resulting in a risk for overestimating the quality of the model.4 Predictions will be made for the first 8 weeks of the sales season, using no other information than that which is available prior to the introduction of the product. The

4

Depending on the context, this can sometimes not be completely avoided because there is insufficient data available, or some change has occurred that makes historical data no longer representative. This does not automatically mean that an approach such as the one presented here cannot be used. While such modeling may lead to criticisms from statisticians and data scientists, it is often still better than pure guesswork. The main thing to watch out for is to ensure that the models that are used are not too complex in nature. This is a simple—but imperfect—safeguard against overfitting.

5.5 Demand Forecasting Models

127

Table 5.2 Base table showing the independent variables used to predict demand. The columns indicated by “B” represent major brands carried by the retailer. The last column indicates if the target group is men or women Product 1061 375 106 1497 245

Black 0 0 0 0 0

Blue 0 0 0 0 0

Grey 0 0 0 0 0

Other color 1 0 1 1 1

White 0 1 0 0 0

B1 0 0 0 0 0

B2 0 0 0 0 0

B3 0 1 0 0 1

B4 0 0 0 0 0

B5 0 0 0 0 0

BX 1 0 1 1 0

Price 19.95 14.99 44.95 33.00 17.99

Men 1 1 0 1 1

demand being predicted has been corrected to account for lost sales due to stock ruptures (see Sect. 5.4). To make this prediction, a very limited set of independent variables will be used: the product color, brand, price, and main target group. A small sample of these variables are shown in Table 5.2. These variables are passed to a regression-style model that uses this information to estimate the total sales. At this point it is important to select a model that can be parameterized to limit its complexity. The latter is a key tool to limit overfitting of the model. The expectations for future sales should be based on multiple similar products, not the single product that is the most similar. To use a popular adage, anecdote does not equal data. For this example a decision tree regressor was used. This technique uses a decision tree to make white-box predictions about the provided dataset. A key benefit of this method is that overfitting can easily be controlled by requiring that all the leaf nodes (= the outer ends) of the decision tree have a minimal number of observations. For this example the minimal number of observations was set to 10. The performance of this model on the training data yielded an R 2 value of 0.42. On the test group, an R 2 of 0.39 was measured. This indicates that there is no excessive overfitting taking place, and the model can be assumed to generalize5 reasonably well. Naturally, the value of these performance measures is not very high—but this is to be expected. Based on limited descriptive properties, it is highly unlikely that very high R 2 values can be attained. If very high values would be reported here, this may even be suspect. This would be an indication that there is severe overfitting taking place, a very big problem with the experimental design. A low R 2 value does not mean that the forecasts are useless. Even if the forecast is noisy, it is likely that it will be more correct on average than a forecast that is just based on rules of thumb. The purpose of using a forecasting algorithm is not to be exactly correct in the predictions being made but rather to be significantly better on average than the closest benchmark. A second and arguably more important argument is that forecasting models can also be a useful instrument to measure uncertainty. Rather than just assuming the

5

Being able to generalize purports that the model works in new situations.

128

5 Product (Re-)Distribution and Replenishment

best guess to be the truth, it becomes possible to understand how (un-)certain the performance of a product is. This makes more nuanced decision making possible: it becomes possible to calculate the probability that a certain quantity of product will be sold when it is allocated to a certain location.

Warning: At this point we run the risk of opening a statistical can of worms. The next few paragraphs get a little bit technical, so do not be afraid to skip them. Just understand that the end result is not just a point estimate (= a single value) but rather that the aim is to create a statistical distribution that accurately describes what can be expected of a certain product.

This variability can have one of two causes: either we do not have all the information that the model would need to make a better prediction or the underlying process is inherently variable and cannot be perfectly predicted. Naturally, in realworld applications, variability is caused by both these sources. In this context the reality is that the origin of this variability does not matter too much. The decision maker just needs to be aware that there is a certain variability associated with the forecast. Uncertainty can be represented with a prediction interval. This is different from a confidence interval, which represents uncertainty due to sampling. If we want to estimate the average height of a group of people, we take a random sample. This sample may be skewed (e.g., we accidentally favored tall people), causing uncertainty about the estimate being made. This uncertainty is what is represented by a confidence interval. The prediction interval on the other hand measures uncertainty around a single estimated value coming from a model. The latter is the more practically useful in the context of estimating demand. It is important to note that a prediction interval will be wider than a confidence interval. When confidence intervals are created using traditional statistics, the residuals are often used. However, in doing so these residuals need to be normally distributed. In practice, this is often not the case. For the example dataset, the hypothesis that the residuals are normally distributed can be refuted with a p-value of 1.1e − −14. In other terms, they are not normally distributed. This will be the case for most if not all retailers. Fortunately, there are handy tools coming from the domain of nonparametric statistics. Specifically for this case a resampling technique called “bootstrapping” can be used to construct a confidence interval. By drawing random samples (= resampling) from the existing dataset, the variability that would otherwise be described by a parametric distribution can be simulated. The basic steps of the process are simple: 1. Take random samples from the data used to create the model (in this case this would mean the base table from which a fragment in shown in Table 5.2).

5.5 Demand Forecasting Models

129

Typically this is done with replacement, meaning that the same observation can be selected twice. The size of the sample also often exceeds the size of the original set of observations. 2. Fit a regression model to the random sample that has been collected. 3. Draw a random residual from the model and add this to the prediction. This represents the individual prediction uncertainty. This process is repeated a large number of times, averaging the results. Intuitively this can be compared to repeating the experiment a large number of times and investigating how often the conclusions change. Normally, such a thing would require the collection of an impossible amount of data, but this is handled by using the resampling technique. The interested reader who wants a gentle introduction to this practice is referred to as Practical Statistics for Data Scientists by Bruce et al. [5]. To determine the prediction intervals for the example dataset, a slightly more advanced technique called Jackknife+ was used, which provides better coverage6 [6]. This information about the expected value and the uncertainty can be used to fit a statistical distribution. Specifically, the gamma distribution is used in this case. The gamma distribution has a number of interesting properties that makes it well suited for this purpose. Firstly, it is restricted to the positive domain. This means that negative values are not included. This is different from the traditional go-to distribution: the normal distribution. Normally distributed values can dip below zero without any problem, and this can cause problems when using the normal distribution to model a variable that can only take on positive values (as is the case here). Secondly, the gamma distribution is right-skewed. This is realistic for most retail contexts when a new product is introduced. There are a lot of average performing products, and there are some products that are outliers and that sell considerably more. Thirdly, the model is more flexible than the closely related exponential distribution—specifically in the control of the variability of the distribution. This allows for a more realistic coverage of a wide range of demand patterns. Fourthly, the gamma distribution has interesting properties that allows it to be updated easily when presented with new information (see Sect. 5.5.2 for more on that). The gamma distribution is defined by two parameters: α and β (Eq. 5.3). These parameters can be used to calculate the mean and the variance (Eqs. 5.4 and 5.5). However, these expressions can easily be inverted to obtain the α and β parameters for the gamma distribution (Eqs. 5.6 and 5.7): X ∼ (α, β) E[X] =

6

α β

(5.3)

(5.4)

This is getting a bit technical; do not worry about this too much—this was only included for sake of completeness for the technically inclined reader.

130

5 Product (Re-)Distribution and Replenishment

α β2

(5.5)

α=

(E[X])2 (V AR[X])

(5.6)

β=

E[X] (V AR[X])

(5.7)

V AR[X] =

The resulting gamma distribution for two random products from the sample dataset is shown in Fig. 5.2. Distribution A has a higher expected value, but also has significantly more uncertainty as represented by the heavier tails of the distribution. Inversely, distribution B has a lower expected value but is quite narrow—indicating that there is less uncertainty. The distributions also show that a demand lower than zero is considered impossible. Such distributions can be created for each product/channel combination. The distributions shown in Fig. 5.2 could represent the expected demand for the same product, but at different channels. Rather than just having a single-value estimate of 487 products in channel A and 152 products in channel B, the decision maker

Probability Density Function

A: Low expected, low uncertainty B: High expected, high uncertainty

0

200

400

600 Demand

800

1000

1200

Fig. 5.2 Two gamma distributions representing the expected sales for a product in a specific channel, based on sales from comparable products in the past. Distribution A is a product/channel combination with E[X] = 487 and V AR[X] = 247; distribution B represents values of E[X] = 152 and V AR[X] = 49.

5.5 Demand Forecasting Models

131

can now estimate how probable it will be that demand will exceed this value, or be lower than this value. These purely data-driven predictions can be further augmented by human decision makers. It is unavoidable that there are important influences on demand that cannot be captured by this forecast alone. This may be related to expected fashion trends that lead decision makers to believe that demand for certain products will be stronger than in the past. It may also be the case that certain products or assortments will be more heavily advertised than in the past, which is also likely to result in more demand. In the branch of Bayesian statistics, this practice of using expert opinions is called “eliciting priors.” Effectively, experts are asked to estimate what they think is a likely distribution for demand. Estimates from various experts, as well as from the datadriven method, can then be combined in order to form a new distribution of the expected demand (i.e., the prior). It is generally a good idea to avoid cherry picking when eliciting priors. Rather than adjusting the distribution for individual products, ask about general trends and expectations. If this is infeasible, it is often worth consulting multiple experts and combining their estimates, rather than relying on a single expert to make statements. The Delphi method [7] can be a useful source of inspiration to construct a framework that allows for consulting a group of experts and consolidating the results.

5.5.2

With Limited Historical Data

As soon as the first sales of a product are observed, the context for making a forecast changes. The proof of the pudding is in the eating, and in spite of prior expectations, the product may over- or underperform. Naturally this information should be included to adjust the future expectations for the product. But how? Traditional time series models—often the backbone of inventory management software—cannot yet be expected to work. If only a handful of data points have been collected, it is still not possible to adequately gauge trends or seasonality. While there is no exact rule, typical rules of thumb state that approximately 50 observations are required for time series models to work adequately [8]. Your mileage might differ, but if only a handful of observations have been made, it is unlikely that a time series-based model will perform adequately. Likewise, it is not because sales have been observed that prior expectations should be abandoned entirely. The sales being observed may be outliers. Moreover, there may be external factors that cause the first sales to be abnormal. The product may only have been partially stocked, or product information may not have been entirely visible. In an online environment, the product might not be indexed in all the usual search engines, putting downward pressure on the sales. Inversely it may be that newly added products are overly visible on the homepage, or in a new-in category. All this may result in the first sales being abnormal. The course of action is clear: a retailer needs to weigh her prior beliefs and the newly observed sales. A simple reflex might be to just calculate an average, but this

132

5 Product (Re-)Distribution and Replenishment

runs into difficulties quite quickly. Averaging the observed value with the expected value does not yield a new probability distribution. Moreover, if something is very likely to be an outlier, it may be desirable to attach less importance to that value: P r(A|B) =

P r(B|A) · P r(A) P r(B)

(5.8)

This conundrum can be solved by applying a technique called Bayesian updating. This is based on Bayes’ theorem (Eq. 5.8), which is part of high school math curricula. The notation P r(A|B) is taken to mean the chance that A takes place, given that B is true (a conditional probability). Since this might have been quite a while ago, the basic principle can quickly be recapped with an example. Assume that you are running a direct advertising campaign to boost sales. However, you would like to better understand the actual impact this campaign is having on sales. In more exact terms, you want to know what the chances are that a customer who buys the product has done so as a consequence of the campaign. This can be expressed as the conditional probability: P r(A: Seen ad|B: Purchase). This quantity can be calculated using some other probabilities and Bayes’ theorem. Specifically, it needs to be investigated what the probability is that someone will make a purchase after seeing the ad: P r(B: Purchase|A: Seen ad). In the case of direct advertising, this can often be measured quite easily as the fraction of contacted customers who make a purchase—which can be tested on a small scale. Moreover, the overall probability of seeing the advert (P r(A: Seen ad)) should be determined. This is simply the fraction of the total customer base that is being contacted. Finally, the underlying probability of purchasing the product needs to be determined (P r(B: Purchase)). This is also easily observable by dividing the number of purchases made by the total customer base: P r(A: Seen ad|B: Purchase) =

P r(B: Purchase|A: Seen ad) · P r(A: Seen ad) P r(B: Purchase) (5.9)

Assume that 40% of customers who are contacted with the advert make a purchase, that 25% of the total customer population is contacted with this message, and that 15% of the customer base purchases this product. With this information and Bayes rule filled out for this example (Eq. 5.9), it can be calculated that 66.7% of purchases are due to the advertising campaign. Table 5.3 provides a summary containing all probabilities. The basic principle of Bayes’ theorem can be used in the context of demand forecasting to update our beliefs on the likely sales of a product. This technique is Table 5.3 A small numerical example of Bayes’ theorem

Did not see advert Did see advert

No purchase 70% 15%

Purchase 5% 10%

5.5 Demand Forecasting Models

133

called Bayesian inference. More than a simple technique, it is the cornerstone of an entire statistical methodology, which is different from traditional (frequentist) statistics that are widely thought in high school and university curricula. The next sections will illustrate how this technique can be used to update forecasts when limited data becomes available. The underlying theory will be discussed very sparingly. The reader who wants to know more is referred to the excellent introductory work by Ben Lambert [9].

Bayesian inference is a method for updating beliefs in light of new information. The initial beliefs are made explicit in the form of a prior distribution. Upon making new observations, the likelihood7 of observing the data that has been collected is determined (assuming that the prior distribution is correct). For continuous distributions, this is the chance that this or a more extreme sample would be obtained. The likelihood and the prior are then combined into a new and updated belief called the posterior.

The principle behind this process is identical to the Bayes rule illustrated earlier. This rule can be rewritten to fit into this new narrative, as shown in Eq. 5.10.8 In this equation the initial beliefs or prior is represented as P (belief). The likelihood of observing the data that was uncovered is represented as P (data|belief) · P (belief). The denominator P (data) usually only serves to normalize the outcome and ensure that the posterior distribution is a valid probability distribution.9 The result of this equation is the posterior (P (belief|data)); this represents the newly held beliefs accounting for both the initial beliefs and the prior: P (belief|data) =

P (data|belief) · P (belief) P (data)

(5.10)

The result of Bayesian inference is heavily influenced by the prior distribution. If there exists a very strong belief that a product will be a top selling item, a first few observations indicating underperformance will have a smaller effect. This often invokes criticism stating that the outcomes are influenced by subjective judgment. While this is undeniably true, it is also the case that any kind of model or statistical approach suffers from this to an extent, the advantage of this approach being that these assumptions have to be made explicit, making them visible and easy to

7

This concept is somewhat different from traditional probability, but the specific reasons why this is the case are outside of the scope of this text. 8 The attentive reader might have noticed that notation was switched from P r to P ; this represents a change from a discrete to continuous probability. The mechanics remain the same, so no need to worry. 9 The sum of the probability equals 1.

134

5 Product (Re-)Distribution and Replenishment

Probability Density Function

prior posterior

0

20

40 Demand

60

80

Fig. 5.3 An initial estimate with a mean of 30 units and a standard deviation of 10 units. The posterior has been constructed after observing a single time period during which 32 units were sold

challenge when required. Without further ado, what does this look like when applied to our initial forecasts that were introduced in Sect. 5.5.1? Rather than exploring the mathematics, this will now be illustrated using some examples.10 The starting point is the gamma distribution that was initially estimated. Moreover, it will be assumed that this gamma distribution represents the sales per time period, rather than the sales for the complete time horizon. A first simple example shows what happens if the actual sales fall nicely within the expected range, as shown in Fig. 5.3. The sales during the first period were slightly above the expected value, but still nicely within the expected uncertainty interval. The result of this is that the posterior distribution shifts somewhat to the right and that the distribution becomes narrower. This represents the increased certainty of the estimate. The new distribution expects 31.46 units to be sold during the next period, slightly up from the 30 that were expected during the first period— and a little below the 32 units that were actually observed.

10 Theoretical footnote: the observed sales are assumed to be Poisson distributed. This is a discrete distribution that assumes that events occur independently of each other (i.e., one sale does not depend on another). The Poisson distribution is frequently used to model the number of events taking place in a given frame of time. This is very similar to modeling the number of sales in a given time period. Within the context of Bayesian inference, the combination of a gamma-distributed prior and a Poisson-distributed likelihood has the interesting property that the posterior distribution is again a gamma distribution.

5.5 Demand Forecasting Models

135

Probability Density Function

prior posterior

0

10

20

30 Demand

40

50

Fig. 5.4 An initial estimate with a mean of 30 units and a standard deviation of 5 units. The posterior has been constructed after observing a single time period during which 60 units were sold

A second example (Fig. 5.4) shows what happens if the observed sales fall outside of the expected range in the prior distribution. Where the preceding showed that uncertainty was reduced, this is no longer significantly so under these conditions. The posterior distribution does however shift toward the observed level of sales, now shifting toward a mean of 43.53 rather than the initial expectation of 30 units. This is however significantly below the observed value of 60 units sold. It is clear that this process does not throw away initial assumptions in light of limited data. The greater the divergence between what is observed and what was expected, the larger the burden of proof becomes. This illustrates that the posterior distribution is strongly influenced by the (possibly subjective) estimate of the prior. It is very important to challenge the shape of this initial distribution prior to using methods such as these. A prior that does not allow for sufficient variability will result in adjustments that are too slow. This dynamic is further illustrated using a third example, shown in Fig. 5.5. The only difference between this and the second example is that the prior that was specified was much more uncertain. The effect of this is that the posterior distribution immediately puts much more weight on the newly observed demand, the mean of the posterior now being equal to 58.83, as opposed to 43.53 for the former example. A fourth and final example (Fig. 5.6) shows what happens when time progresses and more than a single observation is made. This is simply a continuation of the first example (Fig. 5.3). It is clear that the variance of the distribution continues to

136

5 Product (Re-)Distribution and Replenishment

Probability Density Function

prior posterior

0

20

40

60

80 Demand

100

120

140

160

Fig. 5.5 An initial estimate with a mean of 30 units and a standard deviation of 25 units. The posterior has been constructed after observing a single time period during which 60 units were sold

decrease and that the mean shifts toward the mean of the observed values. The input of the prior becomes further and further diluted as more observations are made. An important remark here is that the process of Bayesian inference does not intend to create a realistic picture of the variability of demand—as was the case for the prediction intervals used earlier. The goal of this process is to estimate the true average value of demand, regardless of the noise that surrounds this average in practice. As more and more data points are collected, this can become a limitation, and a motivation to move toward methods specialized in forecasting time series. This style of forecasting models will be discussed next.

5.5.3

Improved Time Series Forecasting

As more observations are collected, the previous discussion becomes less relevant. The historical performance of a vaguely similar product or the a priori expectations of the product manager no longer carry the same weight. The most valuable source of information simply becomes the time series representing the sales of the individual product itself. Hence, at this point it is again time to switch tactics and to move toward techniques specialized in time series forecasting. These techniques will only be discussed briefly, as the methods used here are widely known. A good starting point for readers who want to know more is the excellent introduction by Nielsen [2]. By and large, time series forecasting models can be grouped into the following categories:

5.5 Demand Forecasting Models

137

Probability Density Function

prior posterior

0

20

40 Demand

60

80

Fig. 5.6 An initial estimate with a mean of 30 units and a standard deviation of 10 units. The posterior has been constructed after observing multiple periods of demand, specifically the following values: [32, 33, 34, 34, 35, 32, and 33]

• Autoregressive models: Simple models that use historical observations to predict the next values. Typical examples are the ARIMA (autoregressive integrated moving average) model, which can handle time series that contain a certain trend. • Decomposition models: Decomposition models are similar, but include methods for dealing with seasonal patterns. The latter is often relevant in a retail context. • Machine learning models: Recently there is a more widespread adoption of more complex machine learning tools for time series forecasting. These can bring higher performance, but often offer less transparency than more traditional methods. A top candidate from this category are the so-called long-term shortterm (LSTM) memory models. These are artificial neural networks that have a specialized structure that makes them especially well suited for time series forecasting (and a number of other tasks). This sounds highly promising, but often these models only perform well in situations where there are very high volumes of data available (i.e., a significant number of sales for a product for every single day in every channel). A context with smaller data quantities will often be at risk for overfitting by using these models. • Off-the-shelf solutions: Some software packages come with forecasting powers included. While some of these perform well, the key advice here is to make sure that you understand what happens under the hood: how is this system making its decisions. There are also a number of open-source alternatives that are freely available and which generally provide excellent performance. The most popular among these is currently the Prophet algorithm introduced by Facebook [3].

138

5.6

5 Product (Re-)Distribution and Replenishment

Evaluating Forecast Accuracy

When is a forecast good? Or good enough? An answer to this question is equally important as the preceding discussion of how a forecast can be constructed. Many people will remember the R 2 measure as a measure of how good a regression model is. Far fewer people will be able to explain what this measure means, and even if they can, this explanation might make statistical sense without being able to quantify the business impact of the result. This section provides some points of interest when quantifying the performance of a prediction model.

5.6.1

Basic Forecast Performance Measures

Depending on how the forecast is going to be used, it may be important that the forecast is evaluated as a time series, or as a single aggregated estimate (point estimate). To put it differently, for the purposes of replenishment, it might be important to have a good estimate of sales on a week-by-week basis, especially if there are important seasonal patterns. For other applications such as purchasing, it may be more important to have an accurate estimate of the aggregate, even if the week-by-week approximation is not very good. Other things being equal, evaluating a point estimate is typically easier than evaluating the performance of a complete time series. Hence, these two situations will now be treated separately. The perspective taken is that of a typical retailer, who needs to evaluate the accuracy of a prediction for a complete collection of products.

5.6.1.1 Use of Unseen Data A basic principle in data science which bears repeating is that the performance of a model should never be judged based on data the model has already seen. This relates to the concept of overfitting: if a model performs well on the training data, but badly on unseen data, it is not able to generalize—and is therefore a bad model. A simple way of thinking about this is that it takes no real intelligence to create a model that can remember everything from the past data. As long as you can keep adding variables and complexity to the model, it will eventually be able to fit the data perfectly. But nothing of value would have been created. The best way to perform the task of remembering all past values would be to store them in a database—a system needing no form of advanced intelligence. Ideally the unseen data comes from a different time period than the data used for training. Not doing so may result in contamination: the training data may contain information from the testing data. This results in an overestimation of the performance of the model when presented with the supposedly unseen information. A setup where the training and testing data come from the same sales season, but products have been split into a training and testing group can provide an example of this. Under this situation it may for instance be that white T-shirts have proven to be exceptionally popular in this season. Inadvertently, this information can be picked

5.6 Evaluating Forecast Accuracy

139

up by the model and will be applied to the testing set. This may not seem like a problem, but the information that white T-shirts are popular is actually coming “from the future.” When using the model in a real-life setting, there is not yet information about of what the products are that are going to be in high demand for the next weeks.

5.6.1.2 Evaluating a Point Estimate Colloquially the performance of any model is often dubbed “accuracy”; in this context this is a misnomer since this only applies to categorization models. Such models are used to place observations in a specific category. These placements can be either correct or incorrect. The forecasting problem is different in nature because the purpose is to approximate a value rather than to predict a category. The result of this is that mistakes can be graded; there can be a very slight deviation from the actual value, or there can be a large deviation. This is more forgiving than the concept of accuracy where there exists only correct or incorrect. Behind the scenes most forecasting models will use some kind of explained variance score when searching for the optimal prediction, the most well-known measure being the R 2 value. The interpretation of this measure can be explained as “the fraction of variation in the data that can be explained by the model.” An R 2 of 1 would mean that every point is predicted exactly. Generally, this is a useful metric for tuning models, albeit one that is sensitive to outliers. In spite of their adequacy behind the scenes, measures of explained variance are a bad tool to communicate the business value of a model. This is better done by measures that are intuitively easy to understand. Ideally, these measures can also be used as a basis for calculations of financial value generated by the models. In the context of demand forecasting, this often comes down to being better at preventing stock-outs as well as reducing superfluous stock. A good starting point are the residuals of the predictions, and the values that can be derived from residuals to make them easier to interpret. While residuals may sound scary and statistical, the concept is extremely simple. The residual is nothing more than the difference between the true value (the observed demand) and the value that was forecasted (Eq. 5.11): Residual = True value − Forecasted value

(5.11)

This simple measure runs into problems when trying to aggregate it to get a perspective of the performance for the complete product portfolio, one problem being that positive and negative mistakes cancel each other out when averaging. Given that customers are not just in the market for any product, shortages in one product cannot be made up for by overages in another product. A simple visual approach that is intuitively easy to grasp is shown in Fig. 5.7. This is simply a scatter plot comparing the actuals to the predicted values. Ideally, the result produced in such a plot looks like the example on the left: there will always be variation, but in general the forecast is dispersed around the diagonal representing a perfect prediction. The panel on the right however does show that

140

5 Product (Re-)Distribution and Replenishment 220 200

Predicted sales

200

180 160

150

140 120 100

100

80 80

100 120 140 160 180 200 220 Observed sales

80

100 120 140 160 180 200 220 Observed sales

Fig. 5.7 A quick and simple way to investigate the quality of point estimates. The figures show the results from two different models on the left and right, respectively. The observed sales are compared to the predicted sales. Ideally the observations should be as close as possible to the diagonal representing a perfect forecast

there is a problem: this model tends to underestimate the top selling products. This is a systematic error that should not be present. This implies that the forecasting model should be tuned to reduce this error. If the perfect model could be created, this chart would show all the points exactly on the diagonal. This is unrealistic and some level of variability around the diagonal is always to be expected. There are other ways of dealing with this problem and creating a single summary metric that represents the performance of the system. The most widespread of these metrics is the mean average percentage error, commonly referred to as MAPE (Eq. 5.12). This metric solves the problem of compensating errors by taking the absolute value of the residual: 1  True value − Forecasted value | | n True value n

MAP E =

(5.12)

i=1

There is no single summary metric that is capable of capturing all the nuance present in the underlying data. To address this there are a large number of variations of the MAPE, which are easily found after a quick web search. Rather than listing countless metrics, the reasons for selecting a specific metric will be discussed. One important piece of information to obtain on the forecast is if there is a systematic under- or overestimate. This can be solved by not taking the absolute value (i.e., calculating the MPE). If the mean percentage error is zero, then this indicates that there is no definitive skew toward over- or underestimates of demand. This metric cannot on its own be considered sufficient to guarantee the quality of

5.6 Evaluating Forecast Accuracy

141

a model. Combining it with the MAPE allows the decision maker to validate that there is no significant skew in the errors and that the size of the errors is reasonable. It may also be worth investigating what the single biggest positive and negative errors are. Especially in cases where there is a large product portfolio, these outliers can easily disappear after averaging out. Large outliers are often easily explained as special cases, but can also have a crucial piece of information that leads to rethinking the models. These metrics are already a big step forward from the explained variance measures, but these measures are still imperfect. While relative performance is quite intuitive, there are situations where the value can be misleading. One cause of this may be the presence of many very low-volume products. Mistakes in this case might seem very large, while the absolute size of the mistake is limited. Another issue is caused by the fact that overestimates are at risk for being overweighted when compared to underestimates. An overestimate can result in an error of more than 100%, while an underestimate can never be larger than 100%. Hence, it may be valuable to investigate the absolute size of mistakes, especially for low-volume products. Another big step forward can be made by accounting for the financial implications of making a mistake. An overestimate will mean that capital is employed in a sub-optimal manner, and may lead to depreciation. Underestimates in turns will lead to a loss of gross margin. Both these quantities are dependent on the price and purchase cost of items. It is not difficult to create an evaluation metric that calculates an approximation of these costs. This gives a much more realistic picture of the economic value of a model. The lower the cost, the better the model is. This is a clear segue to the need for a benchmark. Calculating the cost of mistakes paints a quite negative picture of forecasting as it only discusses the costs associated with mistakes. The positive impact of a forecast can only be measured when comparing to a sensible benchmark. This benchmark can be the historical way of doing things, or can just be a very simple rule of thumb algorithm. Without a benchmark there is no clear way to pass judgment over a forecast.

An all-to-common pitfall when evaluating forecasting algorithms is retroactive cherry picking. This happens when analysts ask the question: “What algorithm would have been best for each individual product?” It is inevitable that very naive approaches will win out some of the time when just looking at the results. This raises the question: “Why should we invest in this complex system if it is actually worse for a part of products?” On the face of it, this may seem to be a reasonable statement. The analyst in this case is however making a grave logical error. Even a random roll of the dice will be better than the best forecasting algorithm some of the time. No forecasting algorithm can be expected to win all of the time. (continued)

142

5 Product (Re-)Distribution and Replenishment

Picking the best algorithm once the results are in also falsely gives the appearance that it would have been possible to select this better performing simple algorithm beforehand. This is not the case. While it may be true that a simple roll of the dice will sometimes win out, it will be impossible to know beforehand for which products this will be the case. The best model is the model that wins out on average.

5.6.1.3 Evaluating a Time Series Forecast Most of what has been stated about evaluating a point estimate remains useful for evaluating time series. It remains valuable to use metrics that translate performance into units of demand, rather than to use rather abstract statistical measures of quality. However, there are a number of additional things to be aware of when intending to use a forecast for multiple periods. When using MAPE-like metrics, there is an additional aggregation on the level of the product. For each time period t, the error is being recorded, and then aggregated to obtain a performance measure on the level of the product. This may pose an additional risk to miss certain patterns that are present on the level of an individual product, but are averaged out when looking at aggregations on the level of the complete product collection. Hence, it is even more important to take time to investigate the residuals in depth. While it may not be possible or desirable to do this for every product, it can be valuable to take a random sample. Figure 5.8 shows a simple and intuitive way to observe the residuals for a time series. Rather than using a scatter plot, the time dimension is placed on the horizontal axis. This makes it possible to see if the quality of the forecast changes over time. In the example it can be observed that the forecasts appear to become less accurate over time. How this should be interpreted depends on the offset: how long beforehand a forecast is created. This depends on how the forecast will be used, i.e., when does the forecast need to be available to be able to make a decision? If the complete 100 periods are forecasted at once, it is reasonable to expect that there will be larger deviations toward the end of the forecasting range. If this is not the case and a sliding forecast window of five periods is used, such deviations are reason for concern. It may be the case that there is a systemic change causing algorithms to deteriorate after a certain period of time.

5.7

Optimizing Allocation

As shown in Table 5.1, the forecast is only the first part of the solution to the replenishment problem of the retailer. The second piece of the puzzle is the allocation itself. In essence this is an optimization problem, searching for the optimal way to make use of the available inventory.

5.7 Optimizing Allocation

143

60

Observed demand Forecasted demand Residuals

150

40

20

Residuals

Demand (#)

100

50 0 0

0

20

40 60 Time period

80

100

−20

Fig. 5.8 A simple residual plot to evaluate forecasting performance of a time series

This optimization includes considerations on both the product/channel11 and the portfolio level. For every product, there is a theoretically optimal amount to be allocated to every store or channel. This optimal position can be found where the expected cost of having overstocked is equal to the expected cost of having understocked a product. The challenge does not stop there, since a combination of all product-level optimizations is unlikely to be feasible or desirable. It may be the case that a specific location can only stock a limited number of products, which is below what would be optimal on the product level. There is also likely to be a limitation to the available amount of products, either due to limited working capital or due to a limit to what suppliers can provide at a certain moment in time. This balancing act is the topic of this section, which is again divided into three parts. These three parts mirror what has been discussed on the topic of demand forecasting. Initial distribution (Sect. 5.7.1) will work with the a priori confidence interval estimations. Next, redistribution (Sect. 5.7.2) of inventory uses the updated forecasts resulting from Bayesian inference. Finally, the most traditional form of

11 In this section the term channel is used to signify a source of customer demand. Different physical stores can be considered separate channels, as well as different geographical regions serviced by a website.

144

5 Product (Re-)Distribution and Replenishment

inventory management is discussed in the form of continuous replenishment of products (Sect. 5.7.3).

5.7.1

Initial Distribution of Inventory

The allocation of new product inventory across stores has to be taken with very limited available information. Under such conditions the old adage “it is better to be approximately right, rather than being exactly wrong” rings true. As any decision maker who is performing this task based on experience and intuition will testify, the initial distribution is a balancing act. On the one hand, you do not want to allocate products to sales channels where they have a low likelihood of being sold. On the other hand, products should be present in sufficient number to be spotted by consumers. Not allocating products to places where you think they will sell less is effectively as self-fulfilling prophecy. Moreover, creating an attractive collection must also be considered when allocating products. Specifically for brick-and-mortar stores, the amount and type of products that are present have a big impact on footfall and purchasing intentions. To a lesser extent, this also has an impact on the attractiveness in an online environment, where longer lead times may put off potential buyers. The importance of the initial distribution of products is not the same for all retailers. Retailers who are selling products with long life cycles via channels with uniform demand have little to no challenge to decide the initial distribution of products. On the opposite end of the spectrum are retailers selling products with short life cycles in channels with heterogeneous demand. An excellent example of the latter are fashion retailers, who must anticipate on fashion trends and differences in taste for different customer groups. Because this type of retailer is most challenging, the initial distribution will be approached from the angle of such a fashion retailer. On the level of an individual product and a single channel, this problem shows strong similarities to the “newsvendor problem” [10]. The problem statement is simple: at the start of each day, a newsvendor needs to decide how many newspapers to purchase for the given day. She does not know exactly how much demand to expect on any given day, but it is possible to model the demand using a probability distribution. Any newspapers that are left over lose all their value—nobody wants to purchase yesterday’s news. What results is a balancing act where the possible loss of revenue from purchasing too few newspapers is weighted against the cost associated with purchasing too many. The profits of the newsvendor at the end of the day can be calculated as shown in Eq. 5.13, where q is the quantity purchased, p is the selling price for a newspaper and D is the observed demand which follows a known probability distribution D  F . Knowing that the price of a newspaper will always be greater than the cost, it can be presumed that a newsvendor will always want to purchase

5.7 Optimizing Allocation

145

more than the average demand. The real question is: exactly how much more than the average demand? profit = p · min(D, q) − c · q

(5.13)

The solution to this problem is quite straightforward, and can be solved with nothing more than the inverse probability distribution of the demand F −1 . The optimal solution is simply the amount that balances the cost of a lost sale (p − c) with the profit of making a sale. At this point the expected cost of being overstocked is exactly equal to the expected cost of being understocked. This ratio is commonly referred to as the critical fractile (Eq. 5.14): q = F −1



p−c c

 (5.14)

For a newsvendor who can purchase newspapers at a cost of e0.75, and can sell them at a price of e3.00, the critical fractile is equal to 0.75. Assuming that demand is normally distributed with a mean of 1.000 and a standard deviation of 100 units, this would mean that a newsvendor makes the optimal choice when ordering 1.067 newspapers. If demand would be more volatile with a standard deviation of 200 units, the optimal number of newspapers increases to 1.135 units. All this is similar to the problem faced by the retailer on the level of an individual product and channel. The demand in the given channel is unknown, but a probabilistic estimate of the demand can be made. Operational constraints often mean that a commitment of a certain amount of products cannot easily be adjusted for a certain period of time. Hence, there is a risk of losing sales when allocating too few products to a given channel. There are however some nuances that make the retailer’s problem more complex. The first is that the purchasing decision has already been made at the point when the distribution problem is being solved. This implies that the total cost of products can no longer be changed (a sunk cost), and should therefore not be considered. The cost that needs to be considered is the opportunity cost representing the likely return that would be earned when sending the product to another location. This ties into the second key difference: it is necessary to consider all candidate locations jointly when making an allocation decision. The reason for this is that the opportunity cost is dependent on the amount of products that has already been sent to a specific location. The latter implies that the allocation to alternative locations has to be determined at the same time. The calculation of the expected return when allocating a product to a location is however highly similar to the newsvendor problem. Equation 5.16 shows how the expected marginal return from adding the X-th product can be calculated. This is to be interpreted as the additional profit that can be expected from assigning an X-th product to a channel that already has X − 1 products allocated to it. The first term represents the probability of selling the X-th product, and the second term represents the value of making a sale. F is used to represent the cumulative density function.

146

5 Product (Re-)Distribution and Replenishment

When using the previously discussed approach (Sect. 5.5.1) to estimate demand per channel, this is equal to the gamma distribution: E[Marginal return(X)] = P (X = sold) · (p − transaction cost) = (1 − F (X)) · (p − transaction cost)

(5.15) (5.16)

The value of the transaction is determined by the price and the marginal transaction cost. These terms are included specifically for situations where different prices and transaction costs are used across channels. A product that has an identical chance of selling in two locations will then be allocated to the location that results in the greatest financial benefit when a sale is made. A simplified example can help in making this less abstract. Imagine a fashion retailer who is selling a single product. This product is sold in two different channels, a physical store and a webshop. Inventory management for the webshop is delegated to a third party because the store cannot handle shipping and returns of products. Hence, inventory has to be allocated to either the webshop or the physical store. The retailer has estimated two gamma distributions representing the expected demand for this product in the online and offline channel, respectively. Moreover, because there is more price competition online, the retailer has been required to charge lower prices for online sales than for offline sales. The latter are less price sensitive because customers who are already present in the physical store have the convenience of walking out with the product immediately. The forecast shows that the average online demand is 500 units with a standard deviation of 50, and offline demand is expected to be 250 units with a standard deviation of 15. Moreover, online the price needs to be set at e70 to be competitive with other offers. Offline the price is higher at e95. Additionally, the average transaction cost online is e10 versus e2.50 offline. This information can be used to parameterize a gamma distribution for both product and channel combinations. This gamma distribution can in turn be employed to calculate the marginal return as shown in Eq. 5.16. The result of this calculation for both channels is shown on Fig. 5.9. For both channels the initial products being allocated can be expected to get sold with certainty. The price that can be charged online is however lower than offline. Hence, it makes sense that if there are fewer than 200 units to allocate, all of these go to the offline channel. The chart also clearly shows that there is relatively more certainty about how much products can be sold offline when compared to online. This can be seen from the steeper decline of the marginal return for the offline products. A simple way of optimizing the product allocation in this situation would be to allocate the products one by one to the channel that has the highest yield. This can be imagined as “riding down” both curves simultaneously. In a situation where there are no other operational considerations to be accounted for, this would lead to the optimal solution for the initial allocation problem of the retailer. There are more elegant ways of calculating than just iterating over all products; a simple improvement would be the use of some type of memory in the heuristic. The result

5.7 Optimizing Allocation

147

Online Offline

Marginal return

80

60

40

20

0 0

100

200

300 400 Units allocated

500

600

Fig. 5.9 Comparison of the marginal return for selling a product via two different channels: online and offline

of such a heuristic is plotted in Fig. 5.10, where given a certain amount of product the amount to send to each channel can be read from the vertical axis. A realistic situation is more complex than this single product example. Several aspects contribute to this increased complexity, the most straightforward being that multiple products have to be allocated simultaneously. Other frequently encountered constraints are the following: • Channel capacity: There is often a limit to the total amount of products that can be presented at a channel. This can relate to the number of unique units due to limited amount of shelf space, but also to the general amount of products due to limited storage space. Depending on the type of products being sold, it can be required to estimate the amount of storage space occupied by a single product. • Product-level minima and maxima: It may be required to deliver at least a minimal quantity of a product to a channel before the product can be put on offer. For example, clothing must be available in a number of different sizes. • Channel-level minima: Especially in the context of physical stores, it can be required to have at least a certain amount of products available to prevent the store from appearing too empty. • Delivery capacity: There may also be a time dimension impacting the initial delivery where choices have to be made as to what channels are to be stocked first. A part of this decision may be to stock top selling channels fully before starting at the lower-performing channels.

148

5 Product (Re-)Distribution and Replenishment

700

Allocated online Allocated offline

Allocated amount

600 500 400 300 200 100 0 0

200

400 600 800 Number of available units

1000

Fig. 5.10 Solution to the allocation problem for one product that can possibly be allocated to two channels

• Centralized buffer: Some retailers also make use of their centralized warehousing capacity as a buffer. By keeping some products in a centralized location, there can be a quicker response to stock-outs than when products have to be sourced from other stores. The disadvantage to this is that keeping this buffer increases the overall cost of getting products to channel locations, as this will happen in multiple iterations. Also, products that are stored at a central warehouse cannot be sold—as opposed to products allocated to channels will have a nonzero chance of being sold. • Shared inventory: It can be the case that inventory is pooled between multiple channels. Especially in the case of online sales, there are retailers who have part of the online orders shipped from physical store locations. The choice of where the stock is picked is usually based on the inventory position at the moment of receiving the order. This complicates the nature of demand and the associated response in terms of allocating inventory. This is not an exhaustive list, and many more constraints can pop up in practice. The problem faced by the retailer then becomes a combinatorial allocation problem. At the basis of this problem however is still the same logic as shown with the single product example. The precise solution approach for the more complex combinatorial problem is outside the scope of this book. The interested reader is referred to Introduction to Operations Research by Hillier and Lieberman [11], where the basis for such optimization approaches is explained at length.

5.7 Optimizing Allocation

5.7.2

149

Redistribution of Inventory

No plan survives contact with the enemy,12 and in spite of all the effort spent in the initial allocation, there will likely be corrections required. At some locations demand will be higher than was initially expected, and vice versa. This new problem is framed here as the redistribution problem. The redistribution problem is distinct from replenishment in that it deals with products that have a short lifespan. Leaving the products at their current location would lead to inventory left unsold, or at least inventory being sold at significantly reduced prices. Simply waiting is therefore not an option. The aim of redistribution is to reduce this loss of value as much as possible. Moving inventory does however come at a cost. Products need to be picked from their current location and shipped. During shipping there is always the risk that products get broken or lost. Products then have to be restocked at new locations, again requiring certain efforts. There is also a cost associated with the time spent in transit, during which products cannot be sold to customers. This again results in a trade-off problem where the goal is to weigh the cost of moving versus the possible opportunity of a higher expected margin generated at another location. A two-stage solution approach to this problem will now be presented. This is certainly not the only way of solving this problem, and many more complex approaches are possible. This approach does however strike a balance between performance and difficulty to implement. The first stage of the procedure identifies candidate sources and destinations. Next, the second stage takes care of the matchmaking between sources and sinks.

5.7.2.1 Identification of Sources and Destinations It is only sensible to move a product between locations when a profit can be expected. This condition can be refined by stating that the expected profit of the product at the new location must exceed the expected profit of the product at the old location by more than the cost of moving the product. Preferably the expected profit at the new location exceeds this threshold by a not insignificant amount. Another way of stating this is that the gain has to be positive, gain in this case being defined as shown in Eq. 5.17, which expresses the gain of moving a product from the current location B to a new location A. In this equation the notation E[Loc X] is used to define the expected value for this product at location X. Ctransport represents the marginal cost of transporting the product: GainB→A = E[Loc A] − (Ctransport + E[Loc B])

(5.17)

A good estimate of the cost of moving products is therefore essential. This cost should be defined broader than the marginal shipping cost. The time spent in transit during which no sales can be made, as well as the risk for lost or damaged products, 12 This

quote is often attributed to Eisenhower, but is likely a much older adage.

150

5 Product (Re-)Distribution and Replenishment

must be accounted for. It is preferable to err on the side of caution and use a slight overestimation of this cost, rather than risk triggering too many moves by using an underestimation. In principle this concept should be sufficient to reallocate inventory, but in practice this falls short. The main cause are limits to the overall amount of inventory that can be moved in a given time frame. Redistributing large swathes of the collection in a single time period is not technically feasible. Even in situations where it is technically feasible, collections that are too volatile often encounter protest from employees, who feel that the picking and restocking efforts are excessive. A second reason to go beyond this simple calculation is to make the problem easier to solve. Calculating the gain of every single product from and to every single location requires an exponential number of calculations, especially when considering that the expected gain depends on the number of products that have already been moved. The first product that is moved from location B to location A will have a higher chance of being sold, and therefore also a higher expected value than the second product. Clearly, this is a very complex problem to solve.13 The identification of senders and receivers allows the problem to be greatly reduced in complexity. This takes place on the level of a unique product (stock keeping unit), where the superfluous amount of product or the deficit is identified. This immediately removes a lot of possible moves from consideration. Consider, for example, a situation where there are three locations A, B, and C. For these three locations, the expected value of selling a product is equal to e50, e40, and e30, respectively. Without clear identification of senders and receivers, it may seem reasonable to send products from location B to location A, where it would be more appropriate to send products from location C to location A, and possibly also from location C to location B. The identification of a channel as a sender or receiver for a specific product is based on the adjusted demand estimate. This is a gamma distribution representing both the prior convictions and what was learned from observing sales (see Sect. 5.5.2). Using this estimate of the demand for a product at a given channel, the expected value can be calculated as a function of the number of products, as shown in Fig. 5.11. As more products are present in the inventory of a channel, the marginal value of adding more products decreases. Inversely, taking products from a certain location becomes less expense as there is more inventory there. Placing a channel/product combination in a category as either a sender or receiver requires setting some cutoff points. These are shown as the upper and lower bounds on Fig. 5.11. If the marginal expected value is below the lower bound, there is too much inventory and the location is identified as a sender. Inversely, if the marginal

13 Complex

does not mean impossible, and problems such as these may be tackled using reinforcement learning techniques, the question here being if the added value of using such complex techniques weighs up against the required investment. As always the advice here is to start simple and see for how long additional complexity produces a significant return on investment.

Marginal expected value (€)

5.7 Optimizing Allocation

151

80 60

Expected value Upper bound Lower bound

40 20 0 0

50

100 150 200 250 Amount of inventory at channel

300

Fig. 5.11 Using upper and lower bounds to assign a channels status as sender or receiver for a specific product. For this product at this channel, 150 sales are expected on average, with a standard deviation of 35 units. The price of the product is e100, with a transaction cost of e5

expected value of the current inventory position is above the upper bound, the location is marked as a receiver. For example, if there would be 250 units of inventory present at the channel shown in Fig. 5.11, this channel would be marked as a sender. The marginal expected value of the 250th unit is approaching zero, and is well below the lower bound for being marked as a sending location. In other words, inventory is too high and at least part of the inventory is possibly better used elsewhere. The opposite would be true if there were only 50 units of inventory present at this channel. This would mean that the marginal value the next product added (the 51st) would be well above the upper bound of e80. Hence, the location would be marked as a receiver. This leads to a follow-up question: how many products should be sent from a sending location, and how many products should be requested from a receiving location? This is simply the number of products separating the current inventory position from the lower or upper bound. Continuing the previous numerical example, if there are 250 units, the lower bound has been passed, which intersects the marginal expected value at an inventory level of 177 units. Hence, the location is marked as a sending location for 250 − 177 = 73 units. If only 50 units are present, the upper bound is passed. This upper bound intersects the marginal expected value curve at an inventory level of 115 units. Hence, the location would be marked as a receiving location for 115 − 50 = 65 units. At this point it is worth highlighting that this way of working prevents undue deprivation of underperforming stores from products. An approach that does not

152

5 Product (Re-)Distribution and Replenishment

take into account the expected value may simply seek to equalize the rhythm of sales across all channels. While this seems reasonable, this is likely to move products that are likely to sell without being moved. In practice, such moves often result in worse performing channels being deprived of fast selling products, creating a negative spiral that further decreases the performance of these lagging channels. This leaves one question unanswered: how should the lower and upper bounds be determined? The answer to this question determines which channels are marked as senders or receivers, as well as the number of products that are eligible to be shipped between locations. Setting these values should account for the following aspects: • Logistical capacity: The bounds should take into account operational capacity limits for moving products. Ideally the bounds result in a number of product moves that is equal to or below the available capacity. If the number of moves is greater, this implies that some of the moves will not be feasible in the given time frame. This is not desirable, since this may result in an incorrect prioritization of product movements. The reason for this is that not all moves create the same uplift in expected value. The set of moves created by setting lower and upper bounds will therefore be heterogeneous in the value they deliver. This is not a problem as long as it can be assured that this set contains all the most valuable moves and that all these moves will be executed. However, if the set is too large to be executed, this may result in sub-optimal moves being given priority. This should be prevented by calibrating the level of the upper and lower bound. • Logistical costs: The bounds should also be set in such a way that ensures that all moves have a positive gain (Eq. 5.17). This means that the increase in the expected value at least covers the marginal cost of moving the product. • Desired return: Depending on the context, it may be desirable to require a minimal margin on top of the cost, representing the expected return on investment for the cost made to move the product. • Balanced sending and receiving: Bounds should be set in such a way that balances the number of products being sent with the number of products that can be received. An imbalance between these two quantities inevitably implies that there will either be products without a destination or unsatisfied demand for products. This again can be problematic because this can result in suboptimal moves receiving priority, as was stated earlier when discussing logistical capacity.14 This will now be illustrated using an example. Suppose the upper and lower bounds have to be determined for a product that is distributed in 20 different

14 One

exception to this can be the situation where the limited capacity for sending or receiving inventory from a single location is taken into account. This is a useful extension of this problem, but for the sake of simplicity, it is assumed that this is not an aspect that has to be taken into account.

5.7 Optimizing Allocation

153

channels. For the sake of simplicity, it is assumed that all channels have identical expected demand, equal to the previously used example (see Fig. 5.11). These channels have different stock positions, ranging between 25 and 300 units.15 A visual representation of the decision can be created by varying the value for the upper and lower bounds. This is done in the relevant range for this product example: between e1 and e95. For each of these upper and lower bounds, the number of locations that is marked as a sender or receiver is tracked. Likewise, the number of products that would be sent or received is also noted. This results in the charts shown on Fig. 5.12. A location is marked as a sender if the expected value of the last unit of the current stock level is lower than the lower bound. Inversely, a location is marked as a receiver if the last unit of the current stock level is higher than the upper bound. This logically results in the pattern shown in the top panel of Fig. 5.12. Here it can be seen that as the lower bound is increased, the number of sending locations is also increased. Inversely, as the upper bound is increased, the number of receiving locations decreases. The bottom panel translates this into a number of products that is marked as eligible to send or a number of products that is requested at a certain location. This is calculated by solving the equation for calculating the expected value (Eq. 5.16) for X. This represents the inventory level that corresponds with a certain expected marginal value. The result of this simple transformation is shown in Eq. 5.18. This shows how the inventory level that corresponds to a certain bound can be calculated. The notation F −1 is used to represent the inverse cumulative of the gamma distribution:  I = f (bound) = F −1 1 −

bound p − transaction cost

 (5.18)

The bottom panel of Fig. 5.12 provides a simple visual illustration of setting the bounds. It is clear that it never makes sense to increase the lower bound above the place where it intersects with the upper bound. Inversely the upper bound will never be lower than the location where it intersects with the lower bound. This is a minimal prerequisite to be in line with the desire to match the number of sending and receiving products. Hence, there is a natural cap on the number of product moves at the location where the lines for the upper and lower bounds cross. For this example this is equal to a total of 746 products being moved. This is equal to a value for the upper or lower bound of e48. The lower bound will therefore always be smaller than e48, and the upper bound must always be greater than e48. Naturally, a move must also cover the cost of moving inventory, as well as a possible margin. Hence, it is clear that the lower bound will have to be somewhat below this threshold, and the upper bound will have to be somewhat above this threshold. Assuming that a minimum of e15 is defined, this means that the

15 Drawn

from a uniform distribution.

154

5 Product (Re-)Distribution and Replenishment

16

# Sending locations (~ Lower bound) # Receiving locations (~ Upper bound)

Number channels (#)

14 12 10 8 6 4 0

40 60 Position of the bound (€)

80

# Products to send (~ Lower bound) # Products to receive (~ Upper bound)

2000 Number of products to move (#)

20

1500

1000

500

0

0

20

40 60 Position of the bound (€)

80

Fig. 5.12 Visual analysis that is used to determine the upper and lower bound for identifying a channel as a sender or receiver of a certain product

horizontal distance between the two curves in the bottom panel should be equal to 15. To visualize this imagine sliding a horizontal ruler down the chart, starting from the point where the curves intersect. When the distance between the two curves is exactly equal to 15, the quantity to transfer as well as the upper and lower bounds can be read from the chart. Specifically in this case, this takes place when setting

5.7 Optimizing Allocation

155

the lower bound to e41 and the upper bound to e56. This results in a total of 680 products being moved. Projecting these points upward to the top panel shows that this means that 9 out of the 20 locations are identified as sending locations, and 10 are identified as receiving locations. One location is neither sending nor receiving, meaning that the marginal expected value of the inventory at that location lies between the upper and lower bounds. It may be the case that it is logistically impossible to transfer 680 products in a single time period. The upper limit may, for example, be a transfer of 400 units in total per time period. Meeting this requirement uses the same methodology. The imaginary ruler slides further down the chart in the bottom panel until the horizontal axis reads 400 units. This results in an upper bound equal to e83 and a lower bound equal to e12. This approach becomes more complex when considering multiple products that are vying for the same limited transfer capacity. Under these conditions, the principles remain similar. By increasing the minimally required return of a single transfer, the number of eligible products steadily decreases. This is equivalent to increasing the absolute distance required between the upper and lower bound. This global condition is passed to each of the products as a subproblem, where as described earlier the horizontal ruler slides down until the right balance is struck. A consequence of this will be that products that have a greater difference in the absolute earning potential in different locations will be preferred. In most of the cases, this is a desired type of behavior. There may however be a situation where there are many inexpensive products that would never qualify for redistribution. Specifically when the cost of transferring these products is lower, the condition can be changed into a minimal relative return on the transfer cost, rather than a minimal absolute value of the return. The final result of this exercise is a list of locations, each of which being marked as a sender, receiver, or neither. The senders get a positive quantity associated with them, representing excess inventory. The receivers get negative quantities associated with them. The sum of these amounts per product should equal zero (bar any rounding errors).

5.7.2.2 Solving the Allocation Problem The heavy lifting to solve the redistribution problem is now completed. By identifying the sending and receiving locations, together with the quantities that can be redistributed, the problem has been reduced to a well-known existing problem: the transportation problem, first described in 1941 [12]. Detailed solution techniques for these problems are widely described in literature [13, 14], so the discussion will focus on what is specific to the situation of the retailer, rather than the solution techniques. Figure 5.13 shows a schematic representation of the problem that has to be solved. The nodes (circles) represent the different locations. Nodes on the left are the senders; nodes on the right are the receivers. Each of these nodes has a quantity associated with it. This quantity is positive if the location is a sender, and negative

156

5 Product (Re-)Distribution and Replenishment

C11 = 1

S1

D1

C12 = 1 -25

+20 C21 = 2

S2 +10

C22 = 1

D2 -5

Fig. 5.13 The reallocation problem of the retailer, formatted as a traditional transportation problem. The goal is to satisfy all the demand from the nodes on the right using the supply of the nodes on the left. This should be done in such a way that minimizes the overall costs, as noted on the arcs connecting the nodes

if a location is a receiver. The arcs (arrows) between the locations represent the direction in which products can be moved. Each arc also has a cost associated with it, which may be different. The goal is then to search for the product transfers that minimize the total cost to satisfy the demand. For the simple example shown in Fig. 5.13, it is clear that there is only one arc that is more expensive than the others. Hence, the optimal solution is simply to avoid the use of that arc. This means that only five products will be moved from S2 to D1, the remainder of products being delivered from S1. The costs on the arcs typically represent differences in shipping costs between nodes. It may be the case that certain channels are further removed geographically or that certain shipments have to pass through warehouses while others can take place directly. The consequence of this is that there are likely to be cheaper and more expensive ways of moving products. If differences can be very big, it may be desirable to account for this in the selection of senders and receivers (see previous section). Additional constraints and costs are also often accounted for. A typical example is a fixed start-up cost associated with a delivery between two locations (i.e., a truck has to drive between the locations, regardless of how many products are taken aboard). This is often added as a cost incurred as soon as one arc is used between two locations. Similar constraints may also be imposed limiting the maximum amount of products that can be shipped between two locations during a single time period. Depending on the nature of products, this may also require specifying the amount of space a single product occupies. Adding such constraints may cause the full solution no to longer being feasible, meaning that not all demand that was specified is ultimately satisfied. Under such conditions it may be desirable to re-iterate the selection of senders and receivers,

5.7 Optimizing Allocation

157

and imposing bounds that are less tight. Alternatively, a solution that does not use the full capacity can also simply be accepted. The solution method employed should also account for the fact that a good solution needs to be provided for all products simultaneously. The capacity constraints and additional costs as were previously specified can only be correctly calculated when accounting for all products. This can increase the size of the problem to a scale that is hard to manage as a single solution. Under these conditions an iterative solution procedure may be more appropriate. This will no longer guarantee to deliver optimal solutions, but will be able to provide a solution with an acceptable runtime.

5.7.2.3 Possible Extensions As with most of the models and problems described in this book, there are numerous possible extensions. While these extensions can be essential to guarantee a goodquality solution, they can also cause unneeded complexity. As always make sure to quantify the improvement potential of an extension before investing in it. Some products in a retailer’s collection are substitutes for each other. This fact may have implications on the value created by moving products. If close substitutes are still available at a location where a product’s inventory is depleted, this decreases the value of moving products. Customers in this situation do not always balk, but have a nonzero chance of purchasing the substitute product. The challenge here is to quantify how large this chance is. Often this cannot be objectively measured, and relies on subjective similarity scores.16 Another consideration is the prevention of products moving back and forth multiple times. Because of changing forecasts, it may be the case that a product that is supplied to a location is removed again in the next iteration. This is often a symptom of thresholds that have been set too low: the added value of a move should be greater before being executed. Even when this value is carefully calibrated, such undesirable moves can slip through. To prevent this, a cool-down period can be attached to a specific product. If a product has been shipped to a new location, it must not be removed again in the next X time periods, and the location from which it was shipped should not receive a replacement for Y periods.

5.7.3

Continuous Replenishment

Finally there is also the continuous replenishment problem. As was stated earlier, this is the oldest variation of the retailer’s distribution problem. Much has already been written on the topic, and there is little point in repeating what other authors have already voiced so clearly. Hence, the treatment here will be limited to the key dimensions to account for when searching for a solution, as well as a number of references to key works on replenishment optimization.

16 See

Sect. 4.6.4 for more on project substitution in the context of cross-price elasticities.

158

5 Product (Re-)Distribution and Replenishment

Algorithms that take care of replenishment will largely be based on EOQ-like17 methods [1]. The basic EOQ method is however a big simplification of the realities faced by a retailer. The following aspects are worth considering when selecting an extension or variation of the EOQ approach: • Nonconstant demand: The standard version of the EOQ model assumes that demand is constant over time. In reality demand is volatile, which implies that at times it is needed to carry more inventory to anticipate for peaks in demand. • (Variable) delivery lead times: There may be a period between the ordering of inventory and the delivery of the ordered products. In some situations this lead time may be sufficiently variable to model it as a stochastic distribution. • Quantity discounts: Permanent or temporary discounts offered by suppliers can be worth accounting for. • Combined orders: Often a single supplier delivers more than one product. Under these conditions the orders of multiple products may have to be synchronized in order to be optimal. A general introduction to the principles behind the EOQ model can be found in the tenth chapter of Supply Chain Management by Chopra and Meindl [15]. Readers who want a more extensive view of possible variations are referred to Inventory Optimization by Vandeput [16].

5.8

Inventory Management When Selling on Third-Party Platforms

When selling products on a third-party platform, it may be possible to have this third party handle shipping. This has the added advantage that rapid next day delivery is guaranteed by the platform. Moreover, these products often get a priority treatment on the platforms themselves. Both these aspects can cause sales to be higher than when just offering the product on the platform and handling logistics yourself. The downside to this is that inventory is often stuck in the warehouses of these third parties. This means that possible overstock is hard to retrieve and use elsewhere. As such there is a trade-off to be made: does the added revenue outweigh the possible cost of having dead inventory at the location of this vendor? Also, does the answer to this differ depending on the type of product? The answer to this question again resembles the newsvendor problem, who has to make a decision in the eye of uncertainty—and only gets a single shot.

17 Economic

order quantity.

References

5.9

159

Conclusion

This chapter has presented a framework and methods to deal with the inventory management problem as a modern retailer. The focus was placed on techniques that perform well in the most challenging conditions, specifically in situations where there is a lot of uncertainty and not a lot of historical data points are available. Such conditions are becoming increasingly common as product life cycles tend to shorten.

References 1. Harris, F. W. (1913). How many parts to make at once. The magazine of management. 2. Nielsen, A. (2019). Practical time series analysis: Prediction with statistics and machine learning. O’Reilly Media. 3. Taylor, S. J., & Letham, B. (2018). Forecasting at scale. The American Statistician, 72(1), 37–45. 4. Brodersen, K. H., Gallusser, F., Koehler, J., Remy, N., & Scott, S. L. (2015). Inferring causal impact using bayesian structural time-series models. The Annals of Applied Statistics, 9(1), 247–274. 5. Bruce, P., Bruce, A., & Gedeck, P. (2020). Practical statistics for data scientists: 50+ essential concepts using R and Python. O’Reilly Media. 6. Barber, R. F., Candes, E. J., Ramdas, A., & Tibshirani, R. J. (2021). Predictive inference with the jackknife+. The Annals of Statistics, 49(1), 486–507. 7. Linstone, H. A., & Turoff, M. (1975). The delphi method. Boston, MA, USA: Addison-Wesley Reading. 8. Hecht, M., & Zitzmann, S. (2021). Sample size recommendations for continuous-time models: Compensating shorter time series with larger numbers of persons and vice versa. Structural Equation Modeling: A Multidisciplinary Journal, 28(2), 229–236. 9. Lambert, B. (2018). A student’s guide to Bayesian statistics. Sage. 10. Arrow, K. J., Harris, T., & Marschak, J. (1951). Optimal inventory policy. Econometrica: Journal of the Econometric Society, 250–272. 11. Hillier, F. S., & Lieberman, G. J. (2019). Introduction to operations research (11th edn.) New York, NY, USA: McGraw-Hill. 12. Hitchcock, F. L. (1941). The distribution of a product from several sources to numerous localities. Journal of Mathematics and Physics, 20(1–4), 224–230. 13. Schrijver, A. (2003). Combinatorial optimization: polyhedra and efficiency (Vol. 24). Springer. 14. Galichon, A. (2016). Optimal transport methods in economics. In Optimal transport methods in economics. Princeton University Press. 15. Chopra, S., & Meindl, P. (2007). Supply chain management. strategy, planning & operation. In Das summa summarum des management (pp. 265–275). Springer. 16. Vandeput, N. (2020). Inventory optimization: Models and simulations. Walter de Gruyter GmbH & Co KG.

6

Managing Product Returns

6.1

The Challenges Created by Returns

Free returns have been one of the key incentives used to convince people to shop online. It has long been preached that this practice increases loyalty and increases the overall spend (free shipping starting at e29 anyone?). Conventional wisdom that has surprisingly little proof to back it up. While these returns are free to the customer, they come at a massive cost to the retailer—who not only has to pay for shipping but also has to have processes in place to restock the products that have been returned. Surprisingly often, products simply end up being thrown away because the cost of processing them is greater than their actual worth[1]. Some estimates go as high as 50% of returned products never getting back on the shelves[2]. Besides the economical cost, there is also the massive environmental impact to consider[3]. The purpose of this chapter is to present a way of thinking about this problem for retailers, showing how the impact of returns can be quantified, and how actions can be taken to mitigate this problem in a sensible way. The true cost of returns is often ignored, and many retailers are wholly unable to estimate the cost of returns. Retailers are streamlining the process up until the buying decision—forgetting that in the current landscape, a purchase can be overturned easily. As such, the value of a purchase to a retailer should include the possibility that the product is returned, and measure the costs associated with those returns. Moving beyond this issue, many retailers do not have a clear view on what the drivers are behind product returns. It is obvious that some products are returned more often and that some customers are more prone to returns than others. This information is crucial to solving the puzzle of finding the right strategies for dealing with product returns. When analyzing individual returns, it is clear that the legitimacy of a return can be placed on a spectrum. If your new phone arrives with a broken screen, the product © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 L.-P. Kerkhove, Data-driven Retailing, Management for Professionals, https://doi.org/10.1007/978-3-031-12962-9_6

161

162

6 Managing Product Returns

should be replaced. At the other extreme, buying a pair of formal shoes, wearing them to a wedding, and then returning them for a cash refund is bordering on theft. Between these extremes, there is a wide gray area. The issue of product returns is one of striking the right balance. Not all returns can be prevented, nor should that be the objective for a retailer. The challenge for a retailer is reducing illegitimate returns while not inconveniencing shoppers with valid reasons to return products. To this end this chapter aims to answer the following three questions: 1. How to quantify the (financial) impact of returns? 2. How to identify different return types? 3. How to intervene and improve return policies? As was done in other chapters, the theory will be illustrated with real data examples. For this chapter the web-sales of a small-scale fashion player were used to produce the data. The proposed methods should be of value adding for both small and large retailers.

6.2

How to Measure the Impact of Returns

An old adage states that it is better to be approximately right than to be exactly wrong. This is also the case for the cost of returns. The cost of a return should not be calculated to the cent for every order. Rather, it is important to have a good feel for the magnitude of both the associated cost and the probability of a product being returned. The first costs that spring to mind are those generated by inbound logistics— since this is an “out of pocket” expense. This usually comes down to paying a thirdparty logistics firm to transport the product back to a central warehouse. This is the most visible and easy to quantify cost of returns. However, it is important to also consider the cost of outbound logistics—getting the product to the customer in the first place. This shipping and packaging cost is often also paid for by the retailer, and is obviously lost when a product is returned. The costs for restocking the product are often harder to quantify. Once a retailer decides to ship products to customer, there needs to be an investment in a logistical process to manage returns. Since this infrastructure is in place, the marginal cost of processing a single return is often close to zero. However, there will be discrete points where infrastructure has to be expanded or additional manpower has to be added. Because of this, it is still advisable to allocate a processing cost to every return. The easiest way to calculate such a cost is to determine how many returns the current infrastructure can manage as well as the total cost for the infrastructure. For example, assume that you have invested in a shipping sorter that represents an annual write-off of e200k. 20% of the time on this sorter is used to process returns, an approximate cost of e40k annually. Moreover, e25k rent of warehouse space can

6.2 How to Measure the Impact of Returns

163

be allocated to returns annually. Also, a full-time equivalent works in the warehouse to process these returns at a cost of e60k. Hence, a total of e125k in costs can be allocated to the processing of returns. Under the assumption that this setup can manage a total of 250 returns every day—or 60k returns annually—the cost of a single return is e125k/60k = e2.08. The appropriateness of this calculation method can be debated. Since it is impossible for a retailer not to have an infrastructure for dealing with returns, it could be argued that this should just be considered a cost of doing business and is something that should be viewed as an overhead expense. Inversely, one could argue that since this infrastructure is not used to full capacity, a larger fraction of these costs should be allocated to returns. Along the same vein, calculating in this fashion will cause the cost per return to increase if overall return rates are reduced. These counter-arguments are fundamentally flawed. Just considering this to be overhead would be an underestimation of the actual cost of managing returns— especially in the case where the current infrastructure is nearing capacity and new investments have to be made to process increasing amounts of returns. Also, the argument that a larger cost should be allocated is flawed since this would lead to perverse effects encouraging returns where the marginal cost for a return would be negative because the cost of the infrastructure is spread out across a larger base of products. Next, the lost value for opened and damaged products should be considered. The main complicating factor is that the prevalence and magnitude of damage often differs depending on the type and the value of a product. As such, it is often easiest to quantify this cost as a fraction of the value of the product. Notably for retailers who sell different product categories, this percentage can differ depending on the category of the product.

In summary, a retailer should estimate the following costs to get a reasonably accurate picture of the costs created by product returns: • • • •

The cost of packaging and shipping in outbound logistics The cost of inbound logistics The restocking cost of products The cost of damaged products which can no longer be resold—or must be sold at a discount

One factor that has purposefully been omitted is the possible participation of suppliers in the costs of excessive returns. In some retail sectors, it is commonplace that suppliers carry some of the risk of product performance, often taken the form of a discount on the cost of the product if performance is inadequate. Due to the ad hoc nature of these kinds of deals, this is one component in the cost equation that we advise to omit.

164

6.3

6 Managing Product Returns

Investigating Patterns in Return Behavior

The expected value will be a key tool used to determine the impact of actions. Simply put, the expected value of something is the chance that something will happen multiplied by the (positive or negative) value of said event. A simple analogy is playing the lottery. Given that the odds of winning Euromillions are 1 in 139,838,160 and that the jackpot amounts to 190 million euro, this 1 would place the value of a single ticket at 139,838,160 · 190,000,000 = e1.36. Given that the minimal price for a ticket is e2.5, we can clearly see that playing the lottery is—obviously—a losing bet: the expected value of this action is negative (−e1.14 to be precise). An identical line of reasoning can be used to evaluate business decisions. In the context of product returns, the likelihood of a return is the random event that has to be taken into account. Very complex models can be built to estimate the chance of a specific product. However, much can be achieved with simple models. More complex layers can always be added on top if there is reason to expect that this investment will be worthwhile. The simplest estimate is the global return rate (Eq. 6.1). Without taking into account product groups, customers, sales channels,. . . , what is the fraction of products sold that is eventually returned? Benchmarks for this simple statistics vary widely, the global return rate being quoted to be anywhere between a couple percent up to half of all products being sold. Retailers selling products via their own channels typically observe return rates around 20%—while returns on sales via other sales channels (Amazon, Zalando,. . . ) tend to be substantially higher. Much also depends on the product being sold. Clothing and apparel have high return rates, where medication has virtually zero chance of being returned: return rate =

nb products returned nb products sold

(6.1)

Naturally, tracking the evolution of the return rate over time is also a good idea— giving you a feel for how customer behavior is evolving. Across the board return rates have been on the rise for the past years. This is one of the main reasons why addressing the return problem has been put higher on the list of priorities for many retailers. It is also good to know how this metric evolves in promo seasons, since discounts are often set in a manner that ensures products are not sold at a loss. If the returns in this period exceed expectations, it might still be the case that transactions lead to negative margins due to the costs associated with returns. The calculation of the return rate is quite simple and just takes the number of products into account. There are other important things to be mindful for. The first is that if there is significant seasonality, this type of calculation can be misleading if the total sales and returns are simply totaled per time period. Assume that you sell a lot of product in January, but February is typically a slow month. If you simply total the return rate per month, it might be that a lot of sales from January are returned in February, leading you to

6.3 Investigating Patterns in Return Behavior

165

believe that there is an issue with returns when you are actually just looking at seasonal patterns. A second thing to be weary of is that this measure does not account for the value of products. If the products that are returned have a greater than average value, it might be that the impact on the bottom line is greater than you would expect from this percentage, especially if some of the products cannot be resold due to damage or simply the nature of the product. Inversely, it might also be that cheap products with low margins are returned more often. For these products, the absolute cost of a return might be quite significant and lead to negative gross margin. Again, this would not show up when simply looking at this percentage. Despite these and other limitations, the return rate is a good place to start, especially for companies operating on a small scale with a limited product assortment. However, for the majority of cases, more detail is likely to be desirable. An intuitive way of adding this detail is to calculate the return rate for groups of products and customers to get a more granular feel for the return dynamic. The next subsections serve as illustrations of how such a more in-depth analysis could be conducted.

6.3.1

Estimating Return Likelihood Based on Product Properties

Descriptive properties of products such as the category, brand, price, color,. . . can influence the likelihood that a product is returned. Essentially, all structured information that is available can be tested for correlations with the return likelihood. A simple example of this is shown in Fig. 6.1, where the return rates for categories are shown. For this specific fashion retailer, the main return problems were found in the dress category, with returns that were significantly higher than the company average of 24.4%. This line of reasoning can easily be extended by considering other product properties, as well as combinations of product properties. There is also no reason to remain limited to the simple descriptive analysis as it is shown here, and predictive models can be used to investigate interrelationships between different product properties. All this adds to the understanding of what the root causes behind returns could be. Much of this of course remains in the realm of correlation rather than causation.

6.3.2

Estimating Return Likelihood Based on Product Performance

How well a product has sold, and especially how frequently it has been returned in the past, can provide a refined estimate of the return likelihood. Keeping track of the relative return figures and comparing these to the global return rate can be a quick and easy tool to detect problematic products earlier. Naturally, it is essential that a product has been sold a significant number of times before identifying it as a problem. The first sale being returned would mean a return rate of 100%—but should not be a reason for pulling the product.

166

6 Managing Product Returns

0.4

Return ratio

0.3

0.2

0.1

hoodie

dress shirt

t-shirt

shoes

sweater

cardigan

top

coat

blouse

pants

skirt

vest

dress

0

Product category Fig. 6.1 Return rate per category for a fashion retailer. The average return rate for this retailer is equal to 24.4%

A real-world example is the case where a retailer sold out of a popular item in a matter of weeks, their only regret being that they sold each and every one of those products twice. While the product was great, they appeared to be sized rather large, resulting in all customers to return the product for a smaller size. Sadly, this was only noticed after the majority of products had already been sold. Figure 6.2 shows an example of how the performance of a product could be evaluated in practice. Here the evolution of the return rate of a product is tracked as the number of sales of the product increases. An interval is also plotted to represent the reasonable range wherein the return rate of a product can be expected to fall. These ranges are best constructed by taking a representative sample of products. An example could be to link this to the average return rate within a specific product category. This example shows the evolution of two products—where both seem to be performing rather erratically at the start. However, we must account for the fact that the number of sales is still relatively low at that point in time before jumping to conclusions. Moving toward the right in the chart, it is clear that product A does not have an abnormal return rate, where product B does seem to be returned significantly more than average. The conclusion here is that it might be worth investigating the cause of returns for product B, where no action is required for product A. Naturally, this kind of chart cannot be carefully analyzed for every product in the collection at every time interval. It is however simple to create automated warnings when products appear to be performing outside of the normal range. This allows for taking quick actions on problematic products.

6.3 Investigating Patterns in Return Behavior

1

upper control limit lower control limit return rate product A return rate product B

0.8

Return rate

Fig. 6.2 Control chart for monitoring abnormal return behavior. Product A shows no real abnormalities when compared to the control limits. Product B shows a return probability that is above the expected range

167

0.6

0.4

0.2

0

6.3.3

0

20 40 60 80 Total number of products sold

100

Estimating Return Likelihood Based on Customer Behavior

Descriptive customer properties such as age and gender could in theory be used to predict return behavior. In practice this kind of variables is often shown to have very limited predictive power. Moreover, differential treatment of customers based on certain descriptive properties can raise moral objections. Historical customer behavior is often more usable, and has a much greater predictive power than any descriptive customer property. Most retailers have a group of customers that are responsible for a disproportionate fraction of returns. These can be identified based on their historical behavior, which is likely to persist in the future. This can be the basis for a fair type of discrimination of certain types of customers—who are then no longer offered the same level of free return services. Figure 6.3 shows what percentage of customers is responsible for a certain percentage of returns. For the small fashion retailer used for this example, 70% of customers have never returned a single product. Approximately 20% of the customer base is responsible for 88% of the product returns. This number may be skewed by a large group of customers who have made very few purchases. Looking at the subset of customers with more than two (so at least three) purchases shows that there is still a significant skew. In this group approximately 50% of customers never return a product, and about 20% of customers is responsible for 80% of returns. Still, it may be that this is randomness at work. To rule this out, the effect that placing a return has on the chance of future returns can be investigated. To this the distribution of the return rate of two groups of customers is compared in Fig. 6.4. The top panel shows the distribution of the return rate for customers who have made at least two purchases, showing that 59% of them never return a product. A similar

168

6 Managing Product Returns

Fig. 6.3 The Pareto rule exemplified in customer return behavior

Fraction of total returns

1

all customers customers > 2 purchases

0.8 0.6 0.4 0.2 0 0

0.2 0.4 0.6 0.8 Fraction of customers

1

histogram is shown on the bottom panel for customers who have returned their first product after purchasing. Comparing the two histograms leads to the conclusion that customers who have returned in the past are much more likely to do so in the future. For the example dataset used here, the average return rate of the first group is 17.4%, versus 38.7% for the second group. Simply branding customers who have placed a return as “bad” would however be too easy. A further look at the two groups shows that the group that placed the earlier return will purchase many more products on average: 17 products versus only 7 for the group that purchased at least 2 products. This means that even after accounting for their higher average return rate, the “return-customers” will keep 10.4 products, compared to 5.8 products for the other group of customers. Of course, the true discussion is not so much about averages. Most retailers will be able to identify a group of customers where return behavior can be considered excessive. This raises two questions: (i) where can the border be drawn that separates customers who return too much from the rest? (ii) At what point can these customers be identified with sufficient certainty? Answering the first question requires making some expected value calculations. The answer to the second question will rely on a predictive model which uses historical purchase and return information to predict a confidence interval for the return likelihood of a customer in the future. While the results discussed here are of course anecdotal, similar results have been found for other retailers. This section does not intend to state that a certain group of customers can automatically be considered undesirable. Rather, this style of analysis is valuable to repeat for the specific context wherein a retailer operates.

6.3 Investigating Patterns in Return Behavior

169

Fig. 6.4 Histograms investigating if making a return increases the odds of returning more products in the future

6.3.4

Estimating Return Likelihood Based on Order Properties

Another approach to identify drivers behind returns is the analysis of customer baskets. Certain combinations of products have a substantially higher likelihood of being returned than others. This is the case when several substitutes are purchased jointly. Another case is when different product variations or sizes are purchased jointly—with the intention of testing the options and keeping a subset. Where a lot could be achieved using descriptive statistics for individual product and customer performance, it is desirable to use a more intricate approach to analyze order’s composition. Useful models do not need to be overly complex, and

170

6 Managing Product Returns

a traditional prediction using a linear regression that estimates return probability can already perform admirably. Key to this performance is the creation of the right input variables. These can be simple variables such as a flag that indicates multiple sizes of the same product or multiple products from the same subcategory. It is also possible to use other models that calculate product similarity scores based on multiple properties (see Sect. 4.6.4). The output of this model, as well as the confidence interval1 surrounding this estimate can be used to trigger action. If the expected probability of a return is high, and this is sufficiently certain, customers can be nudged toward adjusting their purchase behavior. The next section goes into more detail on how this kind of information, combined with the cost estimates, can be used to make changes to your policies, the end result being increased margins, more fairness toward customers, and a positive ecological impact.

6.4

Taking Action to Prevent or Reduce Returns

Understanding return behavior is interesting, but does not solve anything in and of itself. Reducing returns requires making changes, either to the products or to the processes. It must be noted that some of these changes are not risk-free. Nor are predictive models perfect: an order may have a high probability to trigger a return, but this will never be certain. The result is a balancing act, and the right balance to strike will depend on how a retailer wishes to profile itself. An important tool to calibrate this balancing act are expected value calculations. This is nothing more than translating probabilities into their financial value. In doing so it can be prevented that actions that have a negative overall impact are taken. Even more than requiring that there is a positive expected value, a requirement can be formulated that requires an action to be significantly positive. The main quantity of concern is the expected contribution—i.e., how much does a retailer stand to earn or lose in a given transaction. The change to the expected contribution resulting from a change in assortment or policy is the main factor that should drive decision making. As shown in Eq. 6.2, contribution is calculated as the chance that the product is not returned multiplied by the gross margin, minus the likelihood that a return takes place multiplied by the cost of a return. These quantities have been elaborated at length in Sects. 6.1 and 6.3: E(contrib) = P (no return) · gross margin − P (return) · costreturn

(6.2)

The value of an action is concerned with the change to the expected contribution. As shown in Eq. 6.3, the difference between the expected contribution when certain preventive actions are taken should exceed the expected contribution without taking

1

See Sect. 5.5.1 for a brief discussion of bootstrapping methods that can be used for this purpose.

6.4 Taking Action to Prevent or Reduce Returns

171

action: value = E(contribwith action ) − E(contribwithout action )

(6.3)

As for many of the other topics covered in this book, it pays to experiment. Especially when taking actions that are aimed at changing customer behavior, this is of crucial importance. The estimates on return probabilities may be quite exact, but the question how a customer will respond when given a nudge to reduce returns is highly uncertain. Chapter A provides a great deal more detail on how to set up experiments the right way, yielding usable conclusions. Possible actions that can be taken by a retailer can be grouped into three categories, in line with the three ways in which return probabilities can be estimated. A first option is to take action on a product level by making changes to a product, or by adjusting the product portfolio (Sect. 6.4.1). A second avenue is to take action when a specific order shows a high likelihood of triggering a return (Sect. 6.4.2). Finally, a third course of action is to address specific customers who show pathological return behavior (Sect. 6.4.3).

6.4.1

Product-Based Actions

The simplest and lowest-risk actions that can be taken to prevent returns take place on the level of the individual product. Actions can either be an adjustment to the product (Sect. 6.4.1.1) or the nuclear option of removing a product from the assortment (Sect. 6.4.1.2).

6.4.1.1 Addressing Product-Specific Causes for Returns A first kind action that can be taken in the light of customer returns is to focus on returns that should theoretically not happen. This type of return can be prevented by providing the right information to the client at the time of ordering. An important factor in making this possible is analyzing what the reasons for returns are. If you are not yet tracking this, make sure to start doing so. This takes little more than a questionnaire that customers have to fill out when shipping orders for returns. If at all possible, create this in a digital format to maximize the velocity at which this data is collected. Even if this information is not readily available, things might be inferred from data points that are available. Naturally, some of this might be conjecture—but something is better than nothing in this case. The most frequently observed reasons for returns are outlined in the paragraphs below. Some suggestions are also made as to how this type of return could be reduced. Performance/Quality Problems A frequent return reason is unsatisfactory quality of a product. The action to be taken in this case is evidently to fix the quality problems, pull the product from the assortment, or clearly state what quality/performance can be expected from a product. An example of this is mentioning that a home

172

6 Managing Product Returns

cinema system has a blue LED that might be annoying depending on the location where the product is used, e.g., a small studio apartment.

Size Problems Another frequently encountered problem is that products are exchanged for identical or related products in a different size. This can be solved on a global level by providing a more elaborate size guide, informing customers on what to expect. On a product level, it may still be the case that a product has a tendency to be smaller or larger on average. Accompanying a product with advice in this regard may also reduce returns—as well as increasing customer satisfaction.

Inaccurate Descriptions The way in which a product is described or depicted on a website is also a major contributor for returns. This can be as simple as the dimensions of the product, or the colors of the product that look different in reality than on-screen. This specific return reason is something that can often be prevented by capturing the right feedback at the moment when a product is returned—and adjusting the information of the product as it is presented on the website. There will be return reasons that are specific to the kind of product a retailer sells. The list of possibilities is too large to be enumerated here, yet it is always important to consider what causes may be and how these causes can be addressed.

6.4.1.2 Adjusting the Product Assortment A second course of action is the “nuclear option”: no longer offering a problematic product. This may be limited to specific channels, for example, third-party platforms where it may not be as easy to provide sufficient product information to customers. Basic reporting tools and control charts (see Sect. 6.3.2) can be used to identify problematic products. Marking a product as a candidate to be removed from the assortment should account for the expected value (Eq. 6.2). A simple transformation of this equation can be used to determine the loss threshold for a specific product (Eq. 6.4). This is the return rate beyond which the gross margin on a product sale becomes negative. Hence, if the return rate of a product is above this threshold, it contributes negatively to the overall gross margin, and should be eliminated: Loss Threshold =

Gross Margin Costreturn

(6.4)

This implies that higher return rates can be tolerated if the margins on a given product are higher, the inverse being true for products with low margins. This ratio is the simplest way to combine margins, costs, and return probabilities. Figure 6.5 shows the distribution of the loss threshold for all products of our exemplar fashion retailer. A vertical line has been added to indicate the typical average return rate for this retailer (24.4%). Based on this simple analysis, it was

6.4 Taking Action to Prevent or Reduce Returns

173

Fig. 6.5 Loss threshold for the complete product assortment, vertical line at the level of the global average return rate

determined that 6.6% of products sold online contributed negatively to overall margin due to the costs of their returns. However, this number must be nuanced due to the fact that some of these products were sold at a very low or negative margin (clearance sales); adjusting for this the fraction of problematic products was still equal to 4.3%. When a product is seen to exceed this threshold, it can be desirable to charge return and shipping costs. A difficulty here can be that the current infrastructure does not allow for this kind of differentiation on the level of products, in which case it might be desirable not to sell a product in a specific channel anymore. A possible compromise is to offer a product with in-store pickup rather than at-home delivery. An important caveat in this regard is that the return rate has to be estimated with sufficient certainty. As was shown earlier, the return rate can vary widely if the product has only been purchased a limited number of times. Hence, it can be worthwhile to calibrate exactly how much certainty is needed before eliminating a product.

6.4.2

Transaction-Based Actions

A second course of action is to take action when a certain order is identified as having a high propensity for return behavior. At this point it is possible to use

174

6 Managing Product Returns

hard or soft nudges to encourage customers to see the error in their ways. A soft nudge would be a simple message that informs customers that their combination of products has been flagged as likely to contain one or more products that are going to be returned. A hard nudge on the other hand would simply be to charge shipping and return costs for the order based on the products it contains. The latter of course runs the risk of customers not finalizing the transaction. A middle ground is of course also possible where the expected value of a complete order has to be negative before a hard nudge is triggered. An example could be a customer trying out two different types of GoPro camera before deciding on the one that suits her/his needs best and returning the other one. Given the fact that packaging will be opened, this is a very real and significant cost to a retailer. Deciding to combat this type of return is often a somewhat strategic question: what degree of service do you consider to be normal for your target audience? Involving customer support is also a possible course of action for such cases, allowing customers to discuss any concerns they have about products. Such concerns are likely at the basis for ordering products with the intention of comparing them in the flesh. A complicating factor is that customers want transparency as to why their order does (not) qualify for free shipping and returns. This in turn implies that understandable (white box) models need to be used to make predictions of return probabilities. This information is then passed along to a customer, who now has the option to either adjust their order or to live with the consequences of not doing so.

6.4.3

Customer-Based Actions

The third and highest-risk approach to reducing returns is to address individual customers based on their return behavior. As illustrated in Sect. 6.3.3, there are customers who consistently return more products than others. In the majority of cases, such behavior will be premeditated: customers who purposefully order multiple products to compare them and intend to keep at most one of the products they order. Such acts of window shopping are perfectly harmless in brick-and-mortar stores, but in an online context, this behavior has a significant impact. To what extent this needs to be addressed depends on the company strategy. Some companies have even built a business model based on encouraging this kind of behavior. One example is Stitch Fix who specifically advertises that you should select items you like and send the rest back. Of course, such companies factor the high return rates, as well as making investments in the infrastructure needed to process high volumes of product returns. Even so, these business models are hard pressed to produce positive financial results.

References

175

A solution to this problem could be found in a variation on differential pricing,2 where not all customers pay the same amount for a product. Models that predict when a customer is likely to no longer be profitable because the expected returns are too high can be used as input to deny perks like free shipping and returns. Depending on the context, it may even be possible to inflate product prices themselves depending on the customer. Such customers can also be actively excluded from direct marketing actions, no longer encouraging them to shop as their transactions are likely to report a loss. Again, it will be desirable to create a certain level of transparency for this. There needs to be a reasonable explanation why some customers are required to pay more than others. Yet again this implies that simple and understandable white-box models are likely to be preferred to complex but higher-performing black box models.

6.5

Conclusion

It is no secret that return behavior can be problematic and costly. Addressing this problem is not overly complex but does require a willingness to take calculated risks. The effect that certain new ways of working will have on customer behavior cannot be guaranteed. In spite of this, environmental and economical considerations are sure to win out in the long run. It might be nice to be a first mover.

References 1. Calma, J. Free returns come with an environmental cost - the verge. Accessed 22 Jan 2021. 2. Khusainova, G. There is no such thing as a free return. https://www.forbes.com/sites/ gulnazkhusainova/2019/03/28/there-is-no-such-thing-as-a-free-return/. Accessed 22 Jan 2021. 3. Schiffer, J. The unsustainable cost of free returns | vogue business. https://www.voguebusiness. com/consumers/returns-rising-costs-retail-environmental. Accessed 22 Jan 2021.

2

See Sect. 3.5 for more on the topic of differential pricing.

Part III Marketing

7

The Case for Algorithmic Marketing

7.1

What Is Algorithmic Marketing?

Marketing automation is something most retailers are doing, but few retailers are excelling at. This chapter is about what it takes to excel. First, the main reasons why marketing automation projects fail are discussed. Next, key schools of thought are introduced that can be used as a backbone to create your own algorithmic marketing strategy. The term marketing automation is used to refer to software systems that aid marketeers in streamlining their activities. While originally focused on email, these systems can now be used to interact with leads and customers on nearly every platform. The reality is that the mere adoption of marketing automation software1 is not a silver bullet. This is why the term algorithmic marketing is preferred to marketing automation. This emphasizes the fact that the solution lies in the logic that is used, rather than in the mere adoption of software. The goal is not to automate but to improve. Improvement does not imply complexity and black box algorithms. Some of the most value adding algorithms are almost trivial in nature. The reality is that there is no silver bullet. Real life is never that simple. In spite of this, some things work better than others. The first part of this book is about providing information on what things might work better for you. Specifically, this

1

This software goes by many names, and acronyms that are hard to keep track of. Two popular ones are CRM (customer relationship management) and CPD (customer data platform)—only two of the many acronyms that are employed in an overly confusing landscape of similar tools. In essence these tools form a place to centralize customer information and act upon it by sending (automated) messages through various channels. Some of these systems also offer analytics capabilities to analyze results, but few truly do that well.

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 L.-P. Kerkhove, Data-driven Retailing, Management for Professionals, https://doi.org/10.1007/978-3-031-12962-9_7

179

180

7 The Case for Algorithmic Marketing

chapter discusses a required change in philosophy—before we delve into the details of specific applications in the next few chapters.

7.2

Why Algorithmic Marketing Systems Fail to Take Off

Most retailers have already invested into a marketing automation system of some sort. Some of these are quite rudimentary; others have all the bells and whistles— and come with an army of consultants in matching outfits. However, in spite of these sizeable investments, systems often sit underused. The two biggest causes of this are a lack of a clear strategy, when it comes to content marketing, and the lack of understanding of the quantitative tools that help to support such a strategy. It is no coincidence that these two aspects are the underlying topic for the remaining chapters in the first part of this book. When no time is taken to formulate a clear marketing strategy, it is hard to judge if an action will move the needle in the right direction. A good strategy makes it clear what the objectives are. The danger of combining the words marketing and strategy is that you end up with something that is innately vague. An excellent guideline for formulating usable strategies can be found in the book Good Strategy/Bad Strategy by Richard Rumelt [1]. In this book the author defines the key elements of a good strategy as a diagnosis, a guiding policy, and a coherent set of actions. Do not fall into the traditional traps of dreaming up vague missions, visions, personas, etc. A strategy is something grounded in the reality of today, and is something that results in a clear set of actions. Cooking up a good strategy implies diagnosing your current situation. Who are the customers for whom you can deliver superior value? Are these also the customers whom are the most lucrative for you? Focus your thinking on customer behavior rather than descriptive properties. Too often retailers describe their core clientele as middle-aged people who live in cities—something that can hardly be called actionable.2 Rather, knowing that your top clients are the ones who buy for their complete families, are gift buyers, are interested in vegetable gardening, etc. these are things you can influence, improving the manner in which customers interact with you and your products. Next, a guiding policy should be created. For the majority of retailers, this will mean thinking up ways in which you can maximize the leverage you have over your main competitors. These competitors will typically include the new online apex predators like Amazon and other big platforms. You will need to conceive what you have to offer to your customers other than the widest assortment and bottom prices. Cornerstones of this strategy will often be related to the physical proximity you have to specific customers, as well as the extensive product knowledge that you

2

A memorable wisecrack in this context is that the average customer has but a single testicle, the message being that the average customer as such does not exist and that one should be very weary of just working with averages across large groups of customers.

7.2 Why Algorithmic Marketing Systems Fail to Take Off

181

possess. Customers who visit you do not have to spend hours researching what the best baby mattress is; you can provide them that information and guarantee them a good night’s sleep. Finally, the coherent set of actions comes in. This is where the rubber meets the road and the software tools you have can be put to use. What kind of patterns is aligned with the strategy that you have just defined? How can you encourage people who do not yet exhibit this behavior to do so? How can you keep convincing people not to break with these virtuous cycles? What other behaviors could you try to trigger? Getting to answers will require experimentation, but fortunately that is what this kind of tooling was made for. Even if you have an actionable strategy, there are reasons why your algorithmic marketing efforts fail to get off the ground—causing systems to sit underused. These causes are often linked to the way things are done in an organization, and can at times be very hard to fix. The issues most frequently encountered in practice are the following: • Data integrity problems: Customers expect communication to be correct, and the more personalized communication becomes, the easier it is to make mistakes. Duplicate customer entries, unregistered transactions, incorrectly reported customer credits, etc. can greatly reduce the usability of a system. Whereas the goal should always be to eliminate all these errors, it is unrealistic to expect any system to be completely free of them. Taking random samples and testing the correct flow of data are two of the best ways of validating if information flows are working correctly—and you do not need to have any technical skills to do this. • Old habits die hard: Another key risk is that such a system is viewed as something on top of everything else that is already commonplace in terms of marketing communication. It is often expected that the same team will be able to complete all of its old tasks together with the use of new tools. The reality is that either some old (and probably less effective) forms of communication have to be discontinued or the team will have to increase in size. Moreover, failing to eliminate less effective communication can contribute to a perceived communication overload, leading to reduced effectiveness of your communications across the board. • Lack of analytical or creative skills: A good use of marketing automation tools requires both analytical and creative skills. If your team only has one of those things, make sure to acquire the other. Oftentimes people with great product knowledge, who truly understand the customer, are stumped the first time they have to set up a system in which they have to construct a database query, or when they need to conceive an experiment. On the other hand, there are mathematical minds who have no issue with the latter but lack even the most basic affinity for products or clients. Automations set up with one, but not the other is doomed to fail. • No experimentation: Marketing automation software makes it easy to construct experiments, ranging from simple AB testing to much more complex setups. In

182

7 The Case for Algorithmic Marketing

spite of this, the threshold for setting up experiments is often still too substantial, and this functionality is not used. • No performance tracking: In spite of having tools available to track performance, this is something that often does not happen. Key things to track over time are the open rates, spam reports, and the conversions to sales. The manner in which conversions are counted is also something that must be scrutinized. Oftentimes tools are all too happy to be very optimistic in this regard, counting everyone who has received a message and then made a purchase in the next 2 weeks as a conversion. Combine this with email messages that are being sent out to everyone every week and you run the risk of fooling yourself.

7.3

Precision Bombing, Not Carpet Bombing

The key to successful algorithmic marketing is a switch in strategy: moving from carpet bombing to precision bombing. Carpet bombing targets a large area and aims to hit everything indiscriminately through large volumes of unguided bombs. Carpet bombing communication relies on the sheer volume of messages to make an impact, hitting every conceivable target. Communication has become dirt cheap—the cost of reaching out to one extra person is negligible. Because of this, carpet bombing can be extremely tempting. However, more and more retailers are becoming aware of the collateral damage caused by this kind of tactic. Customers have a limited span of attention, and every ineffective message they receive is squandering this scarce resource. This causes customers to opt out or effectively become tone-deaf to all of your future messages, however relevant they might be. Just like carpet bombing is now rightfully considered to be a heinous war crime, employing nuclear tactics for marketing has also rightfully come under greater scrutiny. Customers themselves have come to expect personalized communication, to the point where they will actively complain if they receive messages that are irrelevant to them. A typical example is advertising for products they have just purchased. European governments in particular have also approved more stringent legislation that protects consumers from receiving unsolicited communication. Dive bombing, on the other hand, means getting up close and personal, keeping your eyes on the target to provide the greatest possible accuracy for your payload. This is the shift excellent retailers have made when it comes to their communications—to great effect. But just like for pilots of yesteryear, nerves of steel are a welcome asset. Continuing the analogy: The ability to hit high-value targets requires the right intelligence to be collected. If you have no idea where the target might be all, you can do is fire in the general area and hope something sticks, having some goodquality spy planes flying overhead in invaluable. This intelligence is often what is lacking in plain vanilla marketing automation systems. You have a system that allows you to target an individual, but now you need to know who to target with what message and at what time.

7.4 Should You Focus on High-Value Customers?

183

In practice, the main barrier to successful adoption of algorithmic marketing flows is the short-term greed to expand the targeted segments to increase the impact of individual campaigns. After all, if you have a decent message ready, sending it to twice as many people can only increase the return from that message. However, this fails to take the long-term perspective into account. By sending less than relevant communication, you are taking away from what more effective communications could have achieved in the future. The reality is that setting up truly good algorithmic marketing takes time. Individual campaigns will have a lower impact than your big carpet bombing campaigns. However, as your portfolio of automated and relevant communications grows, the returns will greatly exceed those you can hope to obtain from more traditional techniques. Do not underestimate that this will take considerable selfcontrol. Keep your eye on the prize. The cornerstone of better automated actions is gathering actionable intelligence about your customers. Algorithms are not magical devices that generate profits through the power of mathematics. It is important to think hard about what the outcomes of these algorithms mean and how they can be used in the way your business operates. One way of thinking about algorithms is to view them as a way to make the mom-and-pop store experience scalable. These small-scale stores typically have a loyal group of patrons, and the store owners have personal relationships with most of those clients. This personal relationship allows them to provide a better service, recommending the right products at the right time, and building a relationship between the customer and the retailer. While an algorithm will never be able to fully replicate this experience, it is useful to think of algorithms in this context. This does not go as far as to say that all mass media is useless. That is not the argument that is being made here. Campaigns that foster general brand awareness in a broad audience certainly have a time and place. However, in cases where messages can be tailored, the argument is that they should be.

7.4

Should You Focus on High-Value Customers?

Going the route of algorithmic marketing requires answering a question that splits the crowd. Do you use the information you have to focus on the most valuable customers, reducing churn and trying to increase their spend? Or do you try and go for a strategy that intends to expand the total population of customers? On one side you will find conventional marketing wisdom that states that keeping a customer is much less expensive than acquiring new customers. Also in this corner is the well-known Pareto principle, which states that most of your profit come from a small percentage of your customers (typically quoted as 20%, but your mileage may vary). In the opposite corner, you will find the argument that it is not because a customer is spending a lot of money that she has a lot more to spend still. Moreover, there are a lot of customers who are not part of your top customers. Following the old Soviet

184

7 The Case for Algorithmic Marketing

adage “quantity has a quality all of its own,” you may find that there is much to be gained from targeting larger masses of customers. If we would name two prize fighters for these perspectives, we might place Peter Fader [2] in one corner and Byron Sharp [3] in the opposite corner, the former with his theory on customer centricity, the latter with a philosophy that challenges much of the common wisdom of the marketing crowd. The question now is: who is right? To get close to an answer, let us first take a look at the key arguments put forth by our two opponents. At the heart of the customer-centricity philosophy lies the idea that retailers should move away from a product-centric point of view and aim to service a specific kind of customer. Fader’s own definition of customer centricity is the following: Customer centricity is a strategy that aligns a company’s development and delivery of its products and services with the current and future needs of a select set of customers in order to maximize their long-term financial value to the firm. While it is a good definition, it is wordy. Basically, it comes down to Focus on attracting and nurturing the right clients. That is all there is to it. It is simple and axiomatic. Why then would this be better? The assumption here is that the return of investing in the right customers is much greater than the return from trying to invest in all customers at once. This is applied to customer acquisition, development, and retention. In terms of customer acquisition, the idea is to go hunting for the most highpotential leads. Rather than using broad messages and discounts, focus on getting customers who have the right characteristics and interests to become top customers. The same is true for development and retention. The idea is to avoid spending disproportionate amounts on customers that have no real future, but investing more into customers who do show the potential for future value. Note that the idea here is not to no longer service customers who do not satisfy these criteria but rather not to overspend on these customers. Each transaction of a non-priority customer should generate a positive margin; it is no use investing in a customer who is unlikely to develop into a top customer down the line. The opposing view states that the possible gain from customer centricity is exaggerated. Growth by means of conquering more market share, often supported by more traditional forms of marketing, is suggested as an alternative. A well-known argument is the double jeopardy principle. This principle simply states that companies that have a greater market share tend to have a larger fraction of loyal customers and vice versa. This puts smaller companies at a double disadvantage, not only do they have a smaller market share (with obvious disadvantages), but their customers are less loyal than the customers of larger players in the market. This also implies that companies with a larger overall market share will overestimate the degree to which their customers are loyal. Due to this overestimation, the value of a customer might be overestimated. To illustrate this, assume that you are selling a product that customers purchase five times per year on average. You decide that a loyal customer is someone who has purchased at least three times in the past year. You ask your analyst to take a look at

Fig. 7.1 Customer loyalty as observed by a retail with 30% market share

Fraction of customers with at least x purchases

7.4 Should You Focus on High-Value Customers?

185

1

0.95

0.9

0.85

0.8

1

2 3 4 Number of purchases in 1 year

5

customer loyalty in the past year, and she draws up the chart you see in Fig. 7.1. You see that about 25% of your customers satisfy this criterion and are loyal customers. These results are likely to leave you feeling quite confident; those are decent KPIs! Unfortunately, this chart was created by nothing more than a fully random process that assumed that your company has a market share of approximately 30%. Simulated customers have no concept of loyalty whatsoever; they simply buy from your fictitious company 30% of the time.3 Customer loyalty percentages must be evaluated while taking into account what would be expected from a random process. As shown in Fig. 7.2, this expected percentage increases as the market share a company has increases.4 If you find yourself on a point above this line, you can state that your customers are loyal; under this curve the opposite true and customers are less loyal than you would expect from a random process. Naturally, this curve will differ depending on the nature of your business. The average number of purchases might be very different. Also how you define a loyal customer might also be different from the arbitrary number of three purchases per year that has been used here. Getting to these numbers might seem insurmountable, but fortunately it is often sufficient to get the orders of magnitude correct in order for the analysis to be usable. 3

This data was created using a very simple simulation where 10,000 random customers are generated. Each of these customers makes a number of purchases that are drawn from a Poisson distribution with a mean of 5. Each purchase then has an equal probability of 30% of being assigned to our fictitious company. 4 This graph was created by running a similar simulation as described previously while varying the market share with 5% increments.

186

7 The Case for Algorithmic Marketing

0.8

Fraction of loyal customers

Fig. 7.2 Customer loyalty to be expected for given market share

0.6

0.4

0.2

0 0

0.2 0.4 0.6 Company market share

0.8

This means that you should be able to estimate how big your market share is with an error margin of 5%. Likewise a rough estimate of the number of average products that a customer is likely to purchase in a given time period should be doable. Another argument against the concept of customer centricity is the so-called heavy buyer fallacy. Often it is suggested that encouraging good customers to buy more is easier than getting new or occasional buyers to buy more—and gets a greater ROI on the marketing spend. The fallacy part comes from the fact that heavy buyers might already be at their ceiling of their purchasing behavior. The opposite might be true in scenarios where the product is not something that has an infinite purchasing potential. Someone is not suddenly going to purchase twice as much shampoo because he likes your store; there is a limit to the amount of shampoo someone needs. Moreover, this also depends on the size and level of competition on the market. If the market is large and there are relatively few competitors, it might be quite easy to grow by means of acquiring new customers— and quite had to create loyal customers. These arguments have one unifying factor: they find their origins in common human fallacies. Humans have evolved to detect patterns, even when these patterns are not there. Stare long enough at a random series of dots and you will start to see clusters and patterns. Never think yourself to be above this. A second common fallacy is thinking in stories. Stories stick into our mind and feel right. This is what happens if we start to envision customer segments that are easily personified. Mary is a young mother of 3, who works as a civil servant. She is active in the community and looks for products that are environmentally conscious.

7.4 Should You Focus on High-Value Customers?

187

You can put a face to this person, but the number of customers in your database who actually match Mary’s description is likely to be frighteningly close to zero. But people like to think in stories, so personas such as this one tend to stick. And the story of the loyal champion of our company is one everyone wants to believe. Suffice it to say that both perspectives make some good arguments. But that might leave you more confused than ever when trying to make sense of this situation. The best advice is to combine industry knowledge with the scientific method. The former will bring the right hypotheses about products, customers and the market at large. The latter will provide the tools to confirm if these hypotheses hold water in reality. Reality is rarely fully black or white, so it is likely that was holds true for your company will fall somewhere in between these two perspectives. To gain a hold on the situation, it helps to envision the extreme ends of the spectrum, as illustrated in Fig. 7.3. On the right-hand size, typical customer-centric organizations are boutique and luxury retailers. These retailers pride themselves on having intimate knowledge of their customers, and tailoring their products to suit the exact needs of a client. On the other hand are FMCG brands, who often lack a direct channel to their customers and are typically sold in supermarket setting where they experience heavy competition from similar brands. Another example are flashsale websites that typically aim to optimize the ease of a single transaction, trying to lure a wide audience with an attractive deal. When innovating with data, both sides of this continuum will place different accents. Organizations that choose a customer-centric style of thinking will focus on maximizing customer lifetime value, optimizing loyalty programs, and focusing their efforts on retaining customers, or selectively acquiring customers that fit the profile of future top sellers.

You

Customer centricity

Transaction focus

Boutique stores Luxury retailers

FMCG products Flash sale website

Customer lifetime value

Assortment optimization

Loyalty program and gamification

Replenishment and availability

Customer retention and repurchasing

Broad spectrum acquisition

Fig. 7.3 Focus on customers or transactions

188

7 The Case for Algorithmic Marketing

In spite of the arguments raised against customer centricity, most retailers will find that the concept still holds relevance to them, mainly because the majority of criticisms are formulated on the level of a product or brand, rather than on the level of a specific retailer. The reason why customers visit a store or webshop is not solely the products themselves—since they can often find the same or similar products in other locations. Reasons why they choose this specific retailer can be very different. This might include simple convenience, choosing to visit the closest grocery store. Cheap price is another typical example—attracting a certain type of clients. Trust in the product expertise and good warranty conditions might be another reason. Operational excellence, good return policies, and next day delivery are other good reasons why customers prefer some stores over others. This multiplicity of reasons is why customer centricity can make good sense. As a retailer you should strive to know what brings your customers to your stores. Moreover, you should know what reasons your good customers have to visit your store. Those things are what you should act on, both on an aggregated level— ensuring that the big choices you make reinforce these reasons—and on the level of an individual customer. If I buy at your webshop because I know you deliver within 24h, you should not be targeting me with random promotions; I am likely to be quite price-insensitive as a customer. Rather, you might want to think about offering even faster shipping—at a small premium. On the opposite side, you will have companies that focus on the transaction, getting maximal conversions from a very broad audience. This also applies to the types of algorithms they employ in a marketing setting. Here we will typically find optimizations of the assortment, creating attractive bundles or complementary products. Also the optimization of replenishment and maximizing product availability are clear strategies that can make your products or stores more attractive to a general audience. Here we will also find more experiments on a behavioral level that investigate if certain types of framing, wording, checkout processes,. . . can improve the amount of purchases. Organizations where customer centricity is less applicable include: • Companies that primarily rely on the strength of their brand to sell products. Think of the usual giants like Coca-Cola and Pepsi—but also the big names in luxury such as Hermes, Louis Vuitton, Cartier, etc. • Companies that thrive due to the superiority of their products. Think of companies like Tesla, whose main attraction is the performance in both speed and range, as well as attractive styling of their product. Preserving and increasing the lead they have on their competitors should be prioritized. • Companies that have no direct line of communication to the final consumer. Examples of this include companies that produce products that are resold by broad groups of retailers, Siemens Consumer Products for instance, which are resold by many different retailers. Note that some of these companies are trying hard to circumvent this and create channels directly to consumers. Examples here include Dyson—who actively highlight the advantages of ordering directly

7.5 Do Not Try to Beat Big Marketplaces at Their Own Game

189

(better warranty, shipping, more choices, spread payments. . . ). Nike is also taking much more active control of the way in which their products are sold, ensuring that they capture as much information on their consumers as possible [4]. An added benefit for these companies is that they can capture a greater fraction of the overall profits of selling the product. Of course, this comes at a big risk of alienating the retailers who are currently distributing your products— since you are competing with them directly. As such, this kind of ambitions is only viable if your brand is strong enough to ensure that there is a natural demand for them in a wide customer base. Only when this is true will retailers be willing to continue carrying your products. The term used in Fig. 7.3 is purposefully chosen as transaction focused, and not the product focus—as is more typically used. This is because a focus on product is something that most retailers already have and should retain—regardless of where they find themselves on this spectrum. When the ambition is to leverage data and customer centricity is not a viable strategy, the focus will lie on maximizing the likelihood of a successful and profitable individual transaction. So should you focus on high-value customers? The only answer here is: “it depends.” However, the argumentation here will allow you to formulate some clear hypotheses that are likely to be true for your company. The most important takeaway here is that these hypotheses have to be validated using experiments and that the results of these experiments have to be compared to relevant benchmarks. Always make sure that the patterns you see are not just the patterns that you want to see.

7.5

Do Not Try to Beat Big Marketplaces at Their Own Game

No book on retail today can go without mentioning the impact of big marketplaces. At times their rise is described as an impending doom for all other retailers. The most well-known player here is Amazon, but several regional alternatives often have a strong presence as well. These include marketplaces such as bol.com in Belgium and the Netherlands and Otto in Germany. Other marketplaces have a focus on a specific type of product such as Zalando for fashion items. Trying to beat these players at their own game by trying to have a broader range of options and lower prices is something that can only end in tears. The reality is that there are only two options for the majority of players. Either you design a clear strategy to thrive on the platform or you decide to go another route and deliver greater value than a one-size-fits-all platform can provide. Should you decide to sell your products on these platforms, be very aware of what you are giving away. Some platforms have been accused of analyzing the data on their platform to see what the most profitable products, after which they start selling those products on the platform themselves [5, 6]. This is especially precarious if you are reselling branded products, that these platforms can easily buy themselves from the same manufacturers. Alternatively, if the product you are selling is something that has a long life cycle (as opposed to

190

7 The Case for Algorithmic Marketing

fashion products for instance) - and the idea is hard to patent you may also be at risk for counterfeit products being launched. The reality of the situation is that individual sellers are bankrolling the market exploration for these giants. Things that are successful get picked up and get their prices undercut, leaving the resellers on the platform with a much less valuable portfolio of products. Naturally, this might not be a risk for your specific organization. You may have a very clear brand value, or an exclusivity agreement with the brands that you are selling in your region. In such situations these platforms might indeed by a very low-cost way of increasing your market penetration and reaching a much greater audience. Creating value in areas where these giants find it hard to compete is often the more viable strategy. This can be done in a number of ways, but often it comes down to leveraging the proximity you have to clients (think about DIY stores for instance), or displaying superior product knowledge, being able to deliver a curated assortment or accompanying advice. Since much has already been written on this subject, the remainder of this book will assume that you agree with this premise and that you are primarily looking for tools to help you in this quest for a careful coexistence with these big marketplaces. Should you be looking for more ideas on how to position yourself, Doug Stephen’s book Resurrecting Retail might be a good starting place [7].

7.6

The Low-Hanging Fruit: Get Started Without the Need for Complex Algorithms

The term algorithm sounds complicated and mathematical. This does not need to be the case. Intrinsically an algorithm is nothing more than a sequence of rules and steps. These rules and steps can be simple and understandable, reflecting nothing more than common sense and rules of thumb. Within the context of algorithmic marketing, some of the simplest rules and automatons can already score big rewards. There are many value-added cases that can be implemented straight out of the box if you have access to a halfway decent marketing automation tools. Often cited examples are: • Basket and browse abandonment: This is one of the simplest and most rewarding in terms of ROI. It is as simple as knowing that a customer has left the funnel at some point in time. A key risk here is that you should have a good system of customer de-duplication in place, since you do not want to contact customers who have actually completed a transaction but were not registered in the system for some reason. Conducting experiments here to see what the right level of nudge/discount is (if you even want to give a discount—a reminder might be sufficient), as well as what the right timing is for this type of messaging. • Birthday campaigns: Another staple, which is often one of the more effective campaigns running for a lot of retailers. If not for the possibility of activating a

7.7 Measuring and Experimenting











191

customer around the time of their birthday, the mere fact that they will receive something on their birthday might be a sufficient incentive to give away more information than they were otherwise planning on doing. Reactivation campaigns: Later in this book, advanced techniques will be discussed to optimize the timing aspect of customer inactivity/churn. In spite of the usefulness of these models, there is absolutely no reason to wait for the deployment of these models to do something. Most tools will allow you to write common sense statements that might already capture a big part of the upward potential of these campaigns. Even just running a campaign targeting customers you have not seen for more than 3 months might already have a positive effect. New in campaigns: Based on people’s purchase history, you could notify them of products that are newly entering the assortment that may be of interest to them. Use your own common sense to evaluate what may or may not relevant to a customer, before actually feeling the need to go for more advanced analytics. Value-adding nudge: Set up campaigns that encourage customers to shop your most profitable product lines. Oftentimes communication is heavily indexed on discounts and promotional periods, but value-focused campaigns can help provide a counterbalance for this. Lookalike acquisition: Most tools allow you to target customers that resemble segments of your current customer base. Even before delving into the next chapter, simple customer turnover measures can be used to isolate a segment of valuable existing customers. Porting these to Google’s or Facebook’s algorithms is often dead easy, so this should be on the top of your shortlist. Welcoming journey: A customer of lead discovery implies that someone has taken the trouble to share information with you. This implies that this is a good time to communicate and explain who you are and what you stand for. A familiarization/welcoming drip track of emails is very useful in that regard.

Each of these can be implemented for almost any retailer, without complex assumptions and taking little work. However, the opportunities for simple algorithms do not stop there. The most rewarding of opportunities in this area is related to knowledge you already have about your customers and your products. Core to this concept is to take something that is specific to your organization, and which is relevant to a specific customer at a specific point in time. The most important thing for this type of campaigns is to just do it. Too many times organizations get caught up in the details and take years to get even the basics up and running. This is not acceptable.

7.7

Measuring and Experimenting

Once a retailer chooses to take the road of algorithmic marketing maintaining a clear view on what is happening can be difficult. The reality is that a greater number of things will be happening in parallel. Moreover, because of the automated nature of this process, it is easy to miss a structural mistake when it is being made.

192

7 The Case for Algorithmic Marketing

Fortunately, the KPIs that should be tracked are rather simple. In general it would be advisable to measure: • Reach of various campaigns and actions: This constitutes the overall reach of your messages. This includes the direct messages you have sent and the impressions your communication on social platforms. • Conversion rates: Are your efforts paying off? Ideally this is the true conversion, meaning an actual sale that can be linked to a campaign. In practice this can however be quite cumbersome due to the fact that multiple simultaneous or consecutive messages might have an impact. Hence, it might also be sensible to choose a reasonable proxy such as visiting the website or looking at a product detail page. • Personalized fraction: Assuming that creating more relevant and personalized communication is core to the strategy you defined (as it will be for the majority of retailers) if makes sense to track how personalized your communication is. Tagging various campaigns to indicate if they are either mass communication, segmented communication, or personalized communication can be a great way of seeing how things are progressing. Having a way to measure these quantities, and to keep track of these quantities over time, is already sufficient to maintain a clear grip on the situation. Naturally, there are many other things to keep an eye on in an operational setting (spam reports, open rates, page views, etc.). However, on a tactical level, these more aggregated measures should be more than sufficient. It might be nice to automate the reporting of these measures, avoiding that measuring is something that ends up being forgotten. This does not mean that this is a necessity, and often these numbers can be conjured up with just a few clicks and a simple spreadsheet. Automating things is nice, but in this case the process is more important than the automation. The real risk is not looking at any numbers at all, and assuming that everything runs smoothly until customers complain or revenue decreases. Do not forget to experiment. The logic that backs your campaigns will be something that is fueled by data analysis and the know-how of the organization. Both of these sources can be wrong, and even if they are correct, it may be that the nudge you are presenting is not triggering people sufficiently. The only way to make sure that you are making progress is to subscribe to the scientific method. This means formulating a measurable hypothesis, conducting an experiment, and looking at the results of this experiment. Experimentation should become a key element of the DNA of the organization. Nowhere is such DNA more visible than in the online giants such as Netflix and Google.5

5

If you want to know more the book Experimentation Works by Stefan Thomke [8] provides an easily digestible introduction.

References

193

For a typical organization, this implies that there is a monthly reporting cycle that discusses three big KPIs, as well as the learnings from experiments that have been conducted during the period. It is a good idea to keep track of the conclusions from past experiments, as well as keeping a prioritized backlog that contains a set of things to try next.

7.8

Conclusion

It is no secret that marketeers have a soft spot for new terms and technology. The basic principles underpinning good marketing are fortunately less volatile. This chapter has discussed how data-driven tools can be used to feed algorithms, which not only automate but also improve the performance of marketing and communication. The names of these tools are likely to change in the years and decades to come; their basic premise is unlikely to do so. All this centers around a greater or lesser degree of customer centricity. The search for who the “right” customer is for a retailer, and beyond this how this customer can best be catered to—or even created. The true search is for how the experience of a local mom-and-pop store can be recreated at scale, by a retailer who seems to know what you need before you yourself know. All this should yield marketing resting on stable foundations: data rather than conjecture.

References 1. Rumelt, R. P. (2012). Good strategy/bad strategy: The difference and why it matters. Strategic Direction, 28(8). 2. Fader, P. (2020). Customer centricity: Focus on the right customers for strategic advantage. Wharton Digital Press. 3. Sharp, B. (2016). How brands grow. Oxford University Press. 4. Joseph, S. Why dyson takes a hybrid approach to sell on amazon. https://digiday.com/?p= 358367. Accessed 23 Feb 2022. 5. Simon Van Dorpe, V. M. Amazon knew seller data was used to boost company sales. https:// www.politico.eu/article/amazon-seller-data-company-sales/. Accessed 23 Feb 2022. 6. Zink, D. Amazon competes with its resellers. https://eu.heraldtribune.com/story/business/ columns/2020/07/27/amazon-competes-with-its-resellers/41888497/. Accessed 23 Feb 2022. 7. Kolassa, S. (2021). Resurrecting retail: The future of business in a post-pandemic world by doug stephens. Foresight: The International Journal of Applied Forecasting(62), 4–7. 8. Thomke, S. H. (2020). Experimentation works: The surprising power of business experiments. Harvard Business Press.

8

Better Customer Segmentation

8.1

The Purpose of Segmentation

A segment of one. That would be the dream scenario. Each customer being tailored to as an individual. However, the reality is that this idealized scenario is often unattainable. The primary reason for this is that creating tailored communication takes time and creativity.1 At a certain level of detail, the size of a group of customers no longer warrants the effort that is being expended in tailoring to it. A second motivation for creating segments is the inability to fully control what every customer sees. In the real world, customers cannot be fully fenced off from each other. If you put up a billboard somewhere, you will target a segment rather than an individual. The same can hold true for things like sales promotions, where excluding customers might result in a severe backlash. Thirdly, segments are also a useful building block for algorithmic marketing (see the previous chapter). Customers are not unchanging, and many will switch segments in their lifetime. Good segmentation is a crucial tool to understand how customers are changing over time, as well as understanding what your most attractive customers look like. Unfortunately, segmentation is often done badly. At best a bad segmentation sits unused in a dusty drawer. At worst, a nonsensical segmentation is used to make decisions, steering a company in the wrong direction. This chapter starts by discussing the shortcomings of the most frequently employed customer segmentation methods. Next, we highlight what to look for in a good segmentation. Finally practical techniques for obtaining a good segmentation are discussed. To make this tangible, real data examples are used.

1

A possible exception being automated product recommendations.

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 L.-P. Kerkhove, Data-driven Retailing, Management for Professionals, https://doi.org/10.1007/978-3-031-12962-9_8

195

196

8.2

8 Better Customer Segmentation

The Problem with Traditional Segmentation

Segments based on descriptive properties and RFM-style segmentation are the two most commonly used techniques. Both these techniques have significant shortcomings, but the former is by far the most problematic. Both will now be discussed in depth.

8.2.1

Segments Based on Descriptive Properties

Traditionally, segments are often based on descriptive properties. Attributes such as gender, birth date, place of residence, etc. are collected at the moment of the first transaction and are then used to pigeonhole customers into a specific segment. These segments are then usually associated with an easy to imagine persona. Content is created while keeping this imaginary persona in the back of one’s mind. On the face of it, this might not seem problematic. However, the lack of behavioral aspects often limits the usefulness of this type of segmentation. While descriptive properties can at times provide useful information about a customer, the way in which a customer interacts with you and your product is often much more telling. This problem is aggravated by the most commonly used algorithms to create these segments. Specifically, clustering algorithms that aim to find structure in unlabelled data. This means that we expect the algorithm2 to come up with a number of groups that can be used to organize customers. The core of the problem is that descriptive properties are at best considered to all be equally important,3 something which is unlikely to be true. Irrelevant input properties yield irrelevant customer segments. As an example imagine the business of a DIY retailer who is selling all sorts of home improvement products. Using traditional descriptive properties age and postal code, a simple segmentation can be created, as shown in Fig. 8.1. A clustering algorithm would identify two clusters, which can then be interpreted by marketing as young urban people and old rural-living people. Looking familiar? It is deceptively easy to start to image whole personalities for both segments. However, the reality is that customers within these segments are likely to be highly heterogeneous. Moreover, the differences that exist between

2

Technical footnote: Typical example algorithms are k-means clustering and DBSCAN. While these algorithms are certainly useful, they cannot simply be applied to descriptive customer properties in order to create customer segments, the main reason for this being that this is not truly an unsupervised learning problem. While we may not yet be sure about what segments should exist, the decision maker often can voice an opinion on what should make segments different from each other. 3 Even their mistakes can be made if dimensions are not standardized, and different orders of magnitude might cause some dimensions to be considered to be much more important than others—especially for algorithms where Euclidean distance is used.

8.2 The Problem with Traditional Segmentation

197

100 90 80

Customer Age

70 60 50 40 Segment I

30 20 10

Segment II

0 0

100

200

300 City Size (000s)

400

500

600

Fig. 8.1 An example of the type of segmentation that is created based on descriptive properties. Whereas the clusters may seem logical, they are often quite hard to act upon

customers in the same cluster are often more relevant than the similarities because of which they have been grouped together. The risk here is that companies will market to nonextant stereotypes. It might be decided that the young-urban population should be targeted with new homeowner offers and that the old-rural population should receive gardening offers. Decisions such as these are often made based on nothing more than gut feeling inspired by these stereotypes. Would it not be much more relevant to know who are the new homeowners among your customers—regardless of age or location? Or to know who owns a garden? Who is interested in woodworking? These are of course rhetorical questions—naturally this information is likely to be of much greater use than things you can derive from demographics.

8.2.2

RFM Segmentation

RFM segmentation creates customer groups based on their observed value. This value is determined using simple calculations that use the recency (how long since the last purchase?), frequency (how many purchases per time period?), and monetary value (what has their total spend been?) of customer transactions. This allows various customer typologies to be created. There might be a group of customers who buy infrequently but always spend a lot. Other customers might buy very regularly, but not have a high spend per transaction. The main benefit of these models is the incorporation of behavioral information, something that was lacking from the clustering models discussed earlier. Segments

198

8 Better Customer Segmentation

based on customer value enable useful differentiating nudges. It is reasonable to invest more into a customer who is more valuable, and vice versa. However, while they do represent a move in the right direction, RFM models still have some significant shortcomings. A key limitation of this technique is that past results are not guaranteed to be repeated in the future. A simple analogy is that of picking funds to invest in. While it may seem tempting to invest in the winners of last year, there are no guarantees that these will also come out on top next year.4 In a similar fashion, there are no guarantees that customers are going to repeat their behavior indefinitely. In extreme cases this can become a form of circular reasoning that can be paraphrased as “These are the high value clients, because they are the high value clients.” A second limitation of RFM segmentation is the lack of product-specific information. While this technique measures how much money a customer spends, it does not look in detail at what a customer is buying. Again using the DIY retail example, you could have two customers with identical RFM values when one is buying solely in the garden department, and the other is buying electrical components. These differences would of course be highly interesting input for (automated) marketing campaigns aimed at these individuals. Thirdly, RFM models often equate value with turnover. While this can be true on average, this can imply large discrepancies on the level of an individual. A customer who buys a lot, but only at heavy discounts, could arguably be worth less than a customer that buys less—but always at full price. Along the same vein, it is often the case that product returns are not accounted for. A heavy buyer that returns 80% of purchased products might even be a net detractor and does not deserve a spot among your “gold” customers.5 In spite of these shortcomings, RFM segmentation can be useful. But the user must always be aware of the areas where RFM can fall short, the primary advantage being that the calculations for RFM are extremely simple. As such, RFM models are a sensible and low-effort benchmark for other models. Such models are likely to add complexity and be more demanding to put into practice. Therefore, it must always be kept in mind that this additional complexity and effort must translate into segments of higher quality. As H.G. Wells once put it “One cannot always be magnificent, but simplicity is always a possible alternative.”. If you lack the time, means, or capabilities to implement more complex models, much can be achieved with traditional RFM techniques. This will likely result in a failure to capture certain nuances in customer behavior, but when the alternative is doing nothing, this is decidedly better.

4

There might even be a reason to expect that the opposite will happen. The statistical concept of regression to the mean implies that if one observation of a variable was extreme, the next observation of the same variable is likely to be closer to the mean—i.e., less extreme. 5 See the chapter on product returns for a more in-depth look at the impact of product returns.

8.3 What Makes a Segment Actionable?

8.3

199

What Makes a Segment Actionable?

Picking fault is easy; doing better is hard. So what should a good segmentation look like? The main purpose of this book is not to create a list of the newest buzzwords and algorithms surrounding segmentation but to highlight the underlying principles that separate a good data-driven segmentation from a bad one: • Combine business knowledge and data insights: The most useful segments are those that sprout from cross-pollination between data, marketing, and product departments. Customers cannot just be summarized by their accounting value, and products cannot be captured completely by a list of properties. A good interplay between these perspectives allows retailers to confirm that certain patterns exist, and help to get closer to answering why certain customers behave the way they do. Only relying on data-driven solutions can cause very nonsensical actions to be taken, such as advertising grass-seeds in mid-winter or selling women’s shoes to men. • Strive to predict/explain future behavior: Rather than just grouping customers into bins with others who are similar, the goal should be to predict or even explain certain behavior, for example, identifying a group of customers who are likely to buy a certain type of product at a certain point in time. A group of customers who is likely to be at risk, etc. • Accurately measure future value: It should be possible to approximate the value of a customer in a given segment. This implies that value is measured as correctly as possible—i.e., not only approximated by turnover. This also means that during the creation of segments, one of the objectives should be to reduce the difference in customer value for customers who are placed in the same segment. Having a good handle on this makes, it easier to make certain investment decisions when it comes to making certain offers, promotions, or providing services to customers in specific segments. • Link to product (categories): One of the aspects that makes segments actionable is a link to specific products. The majority of interactions that retailers have with their customers center around a specific product. Knowing what products are relevant to a customer is one of the cornerstones for actionable segments. This answers what the end result should look like, but does not yet give information as to how this result can be achieved. The easiest approach uses three sequential steps. The first step is to make customer value measurable. This implies that a model is created to estimate the customer lifetime value for all customers. In practice, these models are usually only marginally more complex than the RFM models that have been discussed earlier. They do however have the key advantage that the decision maker is not forced to define randomly placed cutoff points and definitions of customer stages.

200

8 Better Customer Segmentation

The second step in the process is to look for ways in which customer value can be explained. What makes a certain customer worth more than others? Typically this focuses on the types of products a customer purchases, when these products are purchased, the channels a customer uses to purchase these products, the services that a customer uses, etc. Essentially the goal is to explain customer value by using the interactions a company has with them. The third and arguably most important step is to experiment with these segments. The relationships that are uncovered in the second step will often take the shape of correlations. And while traditional statistics often over-index the distinction between correlation and causation, it pays to investigate if encouraging a specific action or purchase truly inflates the value of a customer. The next sections go into more detail as to how these specific steps work, and show the kind of visuals and analysis tools that are useful in the process.

8.4

Customer Value Done Right

Lifetime value is one of those terms that gets thrown around a lot. Often this is accompanied with the mantra that it is much more cost-effective to increase the value of existing customers than it is to attract new customers. It is easy to find support for this truism when observing that customers who buy more often will typically buy in different categories, buy more expensive categories, etc. These will typically also be the customers who have a high lifetime value. This is obviously true, but adds no new information to act upon. As mentioned earlier, RFM models can provide a good initial benchmark for customer value determination. What then should a good lifetime value model add on top of this basic model? You should be looking for models that are: • Forward-looking: Lifetime value is often misinterpreted as the value of a customer over her complete lifetime—this includes all past purchases. In a similar fashion to the sunk cost fallacy, it is important not to be misled by what a customer has purchased in the past but to focus on what a customer is likely to be worth in the future, a notable exception to this of course being the attraction of new customers that have similar properties to customers who have proven to be valuable.6 • Individual: Ideally you have an estimate of the lifetime value on the level of the individual customer, not just averages over large groups of customers. This allows you to analyze any set of customers you want—and it allows you to define thresholds for individual customers. • Accounting for rhythm: Many models only look at aggregate amounts; a customer who has been coming in every month and purchased for 1.000 EUR

6

Depending on the type of product you sell, this might even be your most important exercise. For example, when selling expensive products that have a long lifespan.

8.4 Customer Value Done Right

201

Table 8.1 A small and simplified subset of order lines from a transaction table. This kind of database forms the basis for the RFM- and CLV-style calculations Saleid 31766 3e68d 77a35 ff8fa bc56f a2942 ac30e 4540f 4fc91 fda65

Artid 78fef 43258 1f7f1 2a306 7d267 f8c9f e5c1b 7982e ada35 af0c6

Custid 735ab3 c85a4e c27bf1 ddfe4e cc7c54 272d6a a14688 f7562f 99ad58 3f1d56

Turnover 178.37 10.02 20.07 2.66 44.63 133.92 39.07 4.80 2.90 22.30

Cost 83.72 3.91 10.05 1.23 27.91 70.86 20.69 2.46 1.79 10.49

Date 2021-07-15 2020-02-17 2021-08-01 2020-12-06 2020-05-30 2021-09-22 2021-02-27 2020-06-04 2019-05-29 2020-10-17

Category Lumber Hand tools Hand tools Hand tools Powertools Hand tools Paint Lumber Lumber Hand tools

Subcategory Hardwood Screwdrivers Pliers Screwdrivers Chain saws Pliers Base coat Fittings Softwood Saws

in total will often be considered identical to a customer that has purchased the same overall amount in 12 transactions in December (depending on the mode of calculation). It is immediately apparent that the second customer shows more potential than the former. • Can be split into components: Specifically allowing you to differentiate between the expected number of transactions and the value of those transactions. Models that only report aggregated measures still leave you guessing why a customer has not reached the forecasted potential. Understanding that the issue arose due to the purchasing of cheaper products or by visiting less is essential in taking the right action for a given customer. A class of model that answers this description are the so-called buy until you die models. The input for these models remains very simple, no more than what is used for RFM-style analysis. Rather than taking these parameters at face value, this information is used to predict the future value of a customer. In spite of this simplicity, they succeed in giving a much more accurate read on the value of an individual customer than traditional segmentation and lifetime models. The added value of these more advanced models becomes clear when investigating a real example. The basis for this example is real data that has been obfuscated, but still shows the underlying patterns. Specifically, we are looking at transaction data from a DIY retailer that sells products in a seasonally influenced market. Table 8.1 shows what would be contained in a minimal dataset for such an analysis. This data is built on the orderlines table, that is, at the heart of a transaction database. This table contains identifiers for customer, transaction, and product as well as the amount paid and the purchasing cost.7 Also contained are the category

7

In real situations there is often also the question of an accurate valuation of inventory; for now we are going to avoid such complexities—but be aware that they exist.

202

8 Better Customer Segmentation

Table 8.2 RFM values calculated per customer

Custid 586e14342e14 88ebda88c6eb 3ba6df3fb86f bc1cb4dcdef2 3ac86907d770

Recency 122.0 1056.0 321.0 100.0 22.0

Frequency 49 29 23 420 41

Monval 524.81 561.75 371.79 4914.05 566.98

and subcategory of the product.8 For the sake of simplicity, only four big categories are used: hand tools, lumber, powertools, and paint. The dataset used here contains around 100k unique customers and 300k transactions. This is by no means huge, but should be sufficiently large to be realistic. By using real masked data rather than fictitious data, the analysis shown in this chapter is much closer to reality than typical didactic examples. The objective is to illustrate that the methodology shown here can be applied to real-world examples.

8.4.1

The Traditional RFM Approach

Performing an RFM calculation on a dataset such as this is no more complex than creating a pivot table. As shown in Table 8.2, this should result in a list of customers, each of whom gets a value for recency (the number of days since the last purchase), the frequency (the total number of transactions), and the monetary value (the total turnover). It is important to note that the time span of the data on which this calculation is based can severely skew the results. A time period that is too short will undervalue your long-term buyers. The opposite situation may cause an overvaluation of some old customers—as well as risking to include behavioral patterns that are no longer relevant. Ideally the choice of time span is made based on information on the typical customer lifespan.9 Additionally management must be consulted on specific changes in strategy or external disruptions that might have caused historical data to be less relevant or corrupted. A second pitfall is taking turnover as the input for the monetary value, rather than gross margin. While these two quantities are often significantly correlated, there might be systematic patterns that cause high turnover customers to be less attractive than they seem to be. Depending on the specifics of your business and the types of automatic actions you want to link to customer value, be aware that there might be a discrepancy here.

8

A real dataset is likely to include more customer and product properties, as well as more depth on specific discounts that might have been applied. While important in practice, we chose to ignore it here for the sake of clarity. 9 More information on how to calculate this can be found in later chapters.

8.4 Customer Value Done Right

2500 Monetary value (EURO)

Fig. 8.2 Scatter plot comparing recency and monetary value as a result of a typical RFM analysis. Because relationships between three variables are hard to visualize, one is typically dropped. The relationship that is shown here confirms the intuitive notion that customers who have generated more total turnover are also more likely to have purchased recently

203

2000 1500 1000 500 0 0

200 400 600 800 1000 1200 Recency (days since last purchase)

Then comes the challenge of analyzing the RFM figures. RFM models tend to complicate this analysis because there are three dimensions that are considered jointly. The motivation for this is to enable the decision maker to distinguish between customers who have different behavioral patterns. However, reality shows that uncovering these patterns is often fuzzy in nature and heavily based on conjecture. One commonly used way of dealing with these three dimensions is to simply disregard one of them. Typically, either the frequency or the monetary value is dropped. There is logic behind this, as these two measures are more closely related to each other than to recency. When calculating the correlation for the sample dataset used in this chapter, we can see that frequency and monetary value are positively correlated with a value of 0.83, whereas the recency correlates negatively10 with both the frequency and monetary value (−.34 and −.36 respectively). Even after eliminating one dimension, looking at the interrelation between the remaining two (Fig. 8.2) provides no clear-cut answer to the segmentation question. The relationship shown confirms the intuitive expectation that customers who have visited more recently are likely to represent a larger monetary value. In other words, people who have spent more in total have on average been to the store more recently. In spite of this, the analysis does not result in clear segments or actions.

10 This makes sense; the lower the value, the more likely a customer has purchased recently. This is likely to mean that this would be a better customer, therefore having a higher overall frequency and monetary value.

8 Better Customer Segmentation

Recency value

Frequency value

1000

500

0

−500

350

300

250

0 200

0 150

0

5000

100

2000

10000

0

4000

2000

15000

50

4000

1400

6000

1200

8000

6000

800

8000

1000

10000

600

12000

10000

400

14000

12000

0

14000

200

Number of customers

204

Monval value

Fig. 8.3 Histograms showing the distribution of recency, frequency, and monetary value

Another way of going about this is to create a weighted formula to combine the RFM scores into a single value.11 The process is simple: standardize12 all values for the RFM measure to a range between 0 and 1—and then assign weights to each of the dimensions. This does however not solve the problem of the large degree of arbitrariness in the decision making. There is no real objective way in which to determine the right coefficients. This is made clear when looking at the histograms for these three measures for our example dataset, as shown in Fig. 8.3. The only one of these three that shows something resembling a natural cutoff point is the recency chart. However, even this is deceptive as we are looking at a business where seasonal patterns are present. The peaks shown on this chart are likely to resemble customers that have visited during specific peak moments. Because of this it might be dangerous to set a cutoff at a set distance from today. A chart that shows a clear peak just before 600 days is likely to see that peak shift backward as time progresses—rather than having a consisting peak a set number of days in the past. Both frequency and monetary value show a clear right-skewed distribution, as would be expected. This implies a large number of low-value customers and a small fraction of high-value individuals. In spite of this clear pattern, this provides no true handholds for setting cutoff values for these customer properties. The user of an RFM analysis typically resorts to grouping customers into arbitrarily chosen segments. A quick web search yields frameworks that suggest anywhere between 3[2] to more than 25 segments[3]. And while some of these can make intuitive sense, there is still no formal rule as to why the lines are drawn where they are. 11 RFM models are inherently pragmatic rather than academic and not a lot has been written on the subject in academic journal. One review that touches on the topic is that of Wei et al. [1]. Other places where you will find this advice include places such as the documentation of IBM’s SPSS software. 12 If you do this, beware of outliers—they are highly likely to be present in real data. A typical example might be a single customer id that is used to register purchases by employees.

8.4 Customer Value Done Right

205

This does not go to say that RFM is completely useless. Looking at these figures can allow a decision maker to get a feeling for how much (or little) their most valuable customers are worth. Also, these statistics can give a ballpark of the annual number of visits that a customer makes. These numbers can also be used for simple rule of thumb reactivation campaigns. Say you know that customers visit your store six times per year, which translates to once every 2 months. Creating a reactivation campaign with a promotional nudge after 2.5 months of inactivity is likely to have a positive overall impact. A more refined approach is likely to be better, but common sense goes a long way.

8.4.2

CLV-Based Customer Segmentation

This is the point when it pays to step back and gain some perspective on what the objective of this exercise is. The goal is to define customer segments. Customers in the same segment should be as similar to each other as possible. Moreover, the segments themselves should be as different as possible. Different in this case implying that customers are likely to behave differently—and respond differently to nudges. Especially when trying to fold RFM back into a single number, it is clear that the goal is to estimate the value of a customer. As mentioned at the start of this section, there are several variations of customer lifetime value models that do a better job at this than the RFM model presented earlier. A key advantage is that these models are forecasting a number that can easily be interpreted. Even when summarizing an RFM value into a single number, it is not perfectly clear what that number means. For a good CLV model, the meaning is simple: “The monetary amount that a customer is expected to be worth in the future.” There exist a great many different variations of these models; most differ in the statistical distributions they use to model the behavior of customer. What is being modeled remains the same: the likelihood of a transaction taking place. The mirror image of this being the likelihood that a customer becomes inactive (dies). The specific mathematics behind these models are outside the scope of this book13—the focus being on how the results of these models can be used in practice. The output of these algorithms is simply a single value associated with every customer. While this might seem like a downgrade from the nuanced view that RFM provides, this actually makes the subsequent analysis simpler. Moreover, this single value has the advantage of being forward-looking—it does not simply assume that past behavior will continue toward the future.

13 If

you want to know more—the founding work on this topic was authored by Schmittlein and Morrison [4]. The latest and greatest extensions on this topic by Platzer and Reutterer [5] are also well worth a read. Some of the most commonly used implementations of these algorithms can be found in the Lifetimes package for Python, or in the BTYDplus package for R.

206

8 Better Customer Segmentation

Fig. 8.4 Histogram showing the distribution of CLV for the example dataset

Figure 8.4 shows how this value is distributed across the customer base for the example dataset. A typical pattern emerges with a lot of customers with low expected values, and a continually smaller fraction of customers that have a high CLV. For most retailers this will paint a picture that is somewhat similar to the RFM analysis, in that no clear clusters emerge from this histogram. Instead we see a clear gradual pattern without natural cutoff points. The simplest possible type of segmentation using this data would be to use CLV percentiles. This is nothing more than moving from a continuous variable to a categorization (e.g., platinum, gold, silver, and bronze customers). However, the CLV value itself might even be more useful without creating bins when making decisions about how much to invest in a customer, the basic logic being not to invest more in a customer than she is expected to be worth in the future (unless of course you expect this investment to greatly inflate the CLV of a customer). Depending on the technique, it might not be possible to estimate an accurate CLV for customers who have yet to make a second or third transaction. Here there might be a need for additional analysis that determines the likelihood of first-time buyers to make a second transaction. Naturally, this is influenced by the amount and type of products that a customer has purchased. It is however important to be able to estimate a CLV value for first-time buyers. If these customers are completely disregarded, future analysis may be skewed too much toward good customers, and especially for retailers where this group is relatively small, this may lead to incorrect conclusions when analyzing purchasing behavior. An important limitation for this type of analysis is that you can only use the data generated by identifiable customers. Especially when handling a lot of offline transactions, or when doing business through a mediating platform, there are often a lot of transactions that cannot be linked to a specific customer—as this info is often

8.5 From Lifetime Value to Customer Segments

207

shielded by the platform. While you can include these transactions in processes that make decisions on an aggregated level (e.g., what products are increasing in popularity over time?), they are not useful from a marketing automation perspective. Be careful not to make the mistake of including these transactions as all being made by a onetime customer, since this will severely skew your result and cause you to underestimate the value of a customer. An unidentified customer is not the same as a onetime customer. Where then is the value in this? Why not stick with the simpler RFM analysis? The advantage of the customer lifetime value measure is that this is perfectly aligned with the objectives of a retailer. Focusing on improving CLV—if correctly calculated—is guaranteed to improve the profitability of an organization. The next section shows how the CLV can be used to create useful customer segments. The methodology is illustrated using the same realistic dataset that has been used for the previous calculations.

8.5

From Lifetime Value to Customer Segments

Knowing the potential of a client adds a crucial piece of information needed to create useful segments. However, this piece of information alone does not automatically yield ready-to-use segments. A decision maker who just selects random cutoff points is making the same oversimplification as is typically the case for RFM models. The end result of this exercise is only marginally better, and unlikely to be worth the additional effort. Key to making a successful segmentation is linking this new piece of information to behaviors that can be influenced. An example of this is the purchasing of specific products or categories. Knowing that making a certain purchase makes a customer more likely to have a high CLV is an interesting piece of information. While not automatically providing causal proof, it is a good candidate for an experiment to see what happens if customers receive a nudge in this direction. The nature of this nudge can be substantially more nuanced than frequently used broad-spectrum discounts. An example is to give a customer a discount if he makes purchases from a category that is different to the usual categories wherein she is already active. Alternatively, more general campaigns can focus on product bundles that are indicative of superior value in the current customer population. A fashion retailer, for example, could perceive that the highest CLV customers tend to purchase both blue jeans and sweaters. Using this information in a combined nudge is likely to have a greater impact in inflating the long-term customer value than by doing a less directed cross-selling campaign.

8.5.1

A Simple Approximation Using Customer Groups

The simplest analysis to start with is to use simple customer groups based on CLV and see how the relative spend differs across product categories. To illustrate how

208

8 Better Customer Segmentation

Relative customer spend per category For different CLV categorizations 45% 40% 35%

30% 25% 20%

15% 10% 5% 0% Handtools

Lumber platinum

gold

Paint silver

Powertools

bronze

Fig. 8.5 Relative spend in main product categories

this might work, simple percentiles for customer lifetime value have been defined for the sample dataset that is used in this chapter. Arbitrary cutoff points have been set at the 90th (platinum), 80th (gold), and 50th (silver) percentiles. The remaining customers receive a bronze status. Next the analysis can start on the highest level of product categorization, before going into more detail and looking at subcategories in so far as these are large enough to warrant analysis. Figure 8.5 shows what the results of such as simple analysis might look like. A look at this result shows that higher-value customers have a disproportionate spend in the power-tools category. Similarly, low-value customers tend to buy more lumber. Looking at the relative rather than the absolute spend avoids the analysis being polluted by the fact that high-value customers are likely to buy more from every category in absolute terms. A simple analysis such as this already yields some answers as to how a company should position itself strategically. For most retailers trying to be everything for everyone is not a viable strategy. A high-level analysis on top-level product categories might already give an indication about where the focus should be, especially if the CLV is well calculated and based on the contribution rather than the turnover of a customer category. If this is the case, this can identify where the true profit centers of a company are likely to be. This goes beyond the simple accounting analysis that can investigate the most profitable products at the moment because the customer dimension is also accounted for. In doing so the combination of product categories that needs to be the focus to attract and retain profitable

8.5 From Lifetime Value to Customer Segments

209

customers becomes apparent. It might be the case that a nonprofitable category is an important part of the purchasing pattern of your high-value customers.

8.5.2

Causal Model for Variable Selection

While useful, the preceding analysis does not make full use of the CLV calculation, the reason for this being that we are still looking at categories with arbitrary cutoff points. While this can be sufficient to spot big patterns, this can start to be a problem when looking in greater detail at smaller categories or individual products. The main advantage of the preceding analysis is its extreme simplicity, and depending on the size of the dataset, this can often be completed in simple spreadsheet software. By using simple regression models, the need for arbitrary cutoff points can be eliminated. The dependent (y) variable of the regression is the CLV. The independent variables (x) are linked to consumer spend in specific product categories. The goal is to see if specific spending patterns tend to cause more valuable customers. Before jumping in it is important to consider three pitfalls for this type of analysis: • Skew toward bigger categories: If a model is built that explains the CLV based on the turnover in different categories, the results will skew toward bigger categories. Suppose that one category is responsible for approximately 40% of turnover, as is true for the example used in this chapter. Also assume that customer spend is completely random—higher-value customers do not buy more or less from specific categories. Under these conditions a regression model that explains this will still indicate that the spend in the big category contributes more to CLV than the spend in smaller categories. This might lead to erroneous conclusions in favor of the categories that are responsible for the biggest turnover. A simple way to mitigate this is to correct for the overall turnover of the customer. This is no more complex than adding the total turnover as an independent variable in the regression equation. CLV = f (total turnover, category turnover)

(8.1)

• Significance and overfitting: Be aware of the limitations of the size of the dataset you are working with. Product groups that have a small number of sales outliers can skew the results. The extreme example would be when the analysis is performed on the level of an individual product, and this product has only been purchased a single time by a very valuable customer. While it could be true that the sale of this product has greatly increased the value of this individual customer, it is more likely that this is just randomness at work. Even more importantly, categories that are too small might not be workable for most of the automated marketing campaigns that you want to launch. Even if the value increase for an individual customer would be very large, there should still be a sufficiently large set of customers that can be convinced in order to have

210

8 Better Customer Segmentation

a positive return on investment for a given campaign. The product segment must not be too niche. Make sure to note the p-values and the confidence intervals to ensure not falling into this trap. • Multicollinearity: [Warning: this gets a bit more technical] If an independent variable can be perfectly predicted from one or more other independent variables, regression coefficients go haywire. This scenario naturally occurs if we work with the relative spend in all categories, or if the spend in each individual category is added together with the total turnover of a customer (see earlier why this is needed). If we make this mistake, the coefficients of the regression will not be able to be interpreted correctly. The solution for this problem is to run multiple smaller regressions that contain only one (or a small amount of categories) as independent variables. While there still might be correlations between these variables, there should be no structural problem that causes issues with perfect multicollinearity. Statisticians will immediately see another risk popping up in this case. As the number of regressions that you conduct increases, the risk of seeing a pattern that is not actually present also increases.14 Assuming that a typical p-value threshold of 5% is used, it can be expected that 5 out of 100 effects that are observed will be spurious. To mitigate this the total number of regressions should be limited to a level that the dataset can support. Looking at the confidence interval around the estimated coefficients often provides an intuitive feel of how reliable the estimates are. These risks might sound quite technical, but their implications are simply that (i) the total spend of customer should also be included as an independent (x) variable. (ii) Only categories and products that can be linked to a significantly large pool of customers and transactions should be used. (iii) Not all categories can be added in a single regression, but multiple regression should be used. The end result is that regression equations like 8.2 will be constructed. This equation includes an intercept, as well as coefficients for the total turnover of a customer T , and the turnover within a specific category of interest C: CLV = β0 + β1 T + β2 C

(8.2)

When applying this equation to the example dataset, the results as shown in Table 8.3 are obtained. The p-values and the confidence intervals show that the effect is statistically significant. The constant can be interpreted as the average value of a customer (approximately 955 EUR). Next, we can see that as the total spend of a customer increases, his lifetime value increases with a little over 5 EUR per additional EUR that the customer has spent. Finally, the net effect of

14 One

of the best explanations for this problem can be found in [6]. Another excellent entry-level book that can help to improve statistical intuition is [7].

8.5 From Lifetime Value to Customer Segments

211

Table 8.3 Partial results table from linear regression Const Powertools Totalspend

Coef 954.5589 1.5225 5.3727

Std err 10.762 0.056 0.019

t 88.695 27.417 278.726

P > |t| 0.000 0.000 0.000

[0.025 933.465 1.414 5.335

0.975] 975.653 1.631 5.410

buying something in the powertools category can also be quantified as an increase of approximately 1.5 EUR per euro spent in this category. This is a powerful result, as it allows to quantify the long-term effect of actions. If an AB test shows that the test group has increased turnover by 5 EUR per customer on average, it can now be calculated that this corresponds to a total value of 5·5.37 = 26.85EUR per customer. Moreover, if it is observed that this additional turnover is spent in the powertools category, the total value increases up to 5 · (5.37 + 1.41) = 33.90EUR. The same type of regression can be repeated for the other categories. This results in similar coefficients for the intercept and total spend of a customer, but the category coefficient is significantly different: +0.26 for hand tools, −1.85 for paint, and −1.32 for lumber (all significant and with reasonably narrow confidence intervals). Especially the negative values are interesting. This implies that if we succeed in increasing the spend of a consumer in the paint or lumber category, this will have less value than implied by the average increase in CLV. The value of this analysis can often be increased by increasing the level of detail. An easy way to do this is by doing the same style of analysis on the level of subcategories (or subsubcategories, etc.). When doing this it is always important to consider two caveats: (i) the subsegment should still be sizeable enough to have global relevance, and (ii) the effect must continue to be statistically significant. The results of such an analysis is shown in Fig. 8.6. Here the size of the subsegment is represented on the horizontal axis, and the vertical axis measures the beneficial or detrimental impact of encouraging a sale in the said subcategory. This allows for a quick visual inspection of where the most interesting opportunities currently lie. The more a subsegment is positioned to the top and to the right, the more interesting the segment is as a candidate for an upsell or cross-sell campaign, as they represent a significant uplift in CLV and are likely to be relevant for a large group of customers. Note that the impact can also be negative, and all categories with values below zero are of limited interest as focal points for segment-specific campaigns. Of greatest interests are the segments that are on the efficient frontier going from left to right: C14-C6-C5-C3-C2. At this point it is also important to consider the business implications of various subcategories. Some of these will have greater relevance in certain periods of the year; others will not be as easy to market as others. Examples are certain types of

212

8 Better Customer Segmentation

Fig. 8.6 Analysis of subcategory attractiveness based on CLV impact of purchases in different subcategories. Circles indicate that P ≤ 0.05, squares indicate that P ≥ 0.05

medication or live animals, for which advertising and promotions is often simply not allowed. When the size of the product groups decreases, the likelihood of observing spurious correlations also increases. Because of this it is important to keep track of the p-values (or confidence intervals). On Fig. 8.6 subcategories that have problematic p-values are indicated using squares. Specifically C2 and C19 are not statistically significant. Investigating the impact of combined categories can also be relevant in situations where there is no detailed subcategorization. Practically this implies running a polynomial regression on the high-level categories. Equation 8.3 shows how the regression model and the parameters of such a model look like. Here again it is advisable to run multiple smaller regressions, rather than a single big regression including all the polynomials at once. It is also relevant to consult business stakeholders in order to identify likely interaction effects that would be useful to investigate: CLV = β0 + β1 T + β2 C1 + β3 C2 + β4 C12 + β5 C22 + β6 C1 · C2

(8.3)

Conclusions might indicate that customers who buy in specific combinations of categories are substantially more valuable. Alternatively, it might also be the case that customer value increases exponentially rather than linearly as the expenditure in a certain category increases.

8.5 From Lifetime Value to Customer Segments

213

The caveat here is that this style of analysis is very prone to overfitting. As such, this kind of analysis should only be done if there is a sufficient amount of data to support it. For example, the dataset used in this chapter of 100k unique customers and 300k transactions is too small to support such an analysis. When applied to this dataset, the coefficients are either insignificant (P > 0.05) or only statistically significant—meaning that the perceived value is very small. It is important to note that customer behaviors other than the category spend may also be analyzed in a similar manner by using them as predictors for CLV. The category spend as presented is one of the most widely applicable types of analysis, but there is no reason why the analysis should be limited to this. Good examples of attributes that are also widely applicable are: • Discount propensity: Some customers can be true bargain hunters, whereas others can rarely be bothered to scout for discounts. This can often be important as this kind of behavior is very actionable when tailoring communications. The discount propensity can be quantified quite easily as the total (relative) discount a customer has received in the past. Of course if there is a specific saving/couponing dynamic at a retailer, it is worth looking in depth at how this may be translated into something representative of discount propensity. • Purchase timing: When a customer makes purchases can be very telling. This can take many shapes and should be discussed with the business teams before making variables to represent this. Typical examples include the time of the week, where you might have weekend buyers versus people who make purchases during the workweek. Specifically in food retail this factor can be quite informative when trying to identify different customer types. On a larger timescale, there might be people who buy from you in certain seasons but not in others. An example could be contact lenses which are only purchased during summer for outdoor activities and swimming. Especially in fashion there can also be an important distinction between customers who purchase early or later in a season. The true fashionista is likely to want the latest and greatest as soon as it is available, and may not be as price sensitive. Again, it is best to discuss the best way to construct these variables with business and product teams. It may be important to understand key milestones throughout the year that are relevant to base variables on. • Channels used: Specifically the distinction between online and offline consumption can be significant. Whereas the idea of the omnichannel retailer has been quite pervasive in recent years, the reality often shows that customers typically have a significant preference for one or the other. Even when there is a population that is truly omnichannel, they are often duplicated in the database system, causing this behavior to fly under the radar. Regardless, where a customer makes purchases is intuitively a good candidate for customer segmentation. • Business-specific factors: There are a great many other factors that can be of relevance in specific retail businesses. Stores selling children’s clothing might be

214

8 Better Customer Segmentation

very interested in the number of children as well as ages and gender. Similarly, when selling pet food, the species and ages of a customer’s companions are highly relevant. The make, model, and age of your car would be highly interesting to a store that sells classic car parts. While this makes intuitive sense, many retailers fail to exert directed attention to collecting this type of information from their customers. While it might need some new tools and data infrastructure, the added benefits of obtaining this information can be very substantial in this context as well as others. Naturally, these variables should undergo the same treatment as the category spend in the preceding examples. Many of these variables make intuitive sense—but intuition can be misleading. The human propensity to see patterns even if they are not there can easily introduce mistakes. The nature of the analysis is highly similar, in essence a search for causal explanations for differences in customer value. At this point the argument can be made that there is more to a customer than the customer’s CLV. Customers with an identical CLV can be extremely different in their likes and dislikes and might prefer a different treatment. While this is certainly true, the customer segmentation exercise is inherently one of broad strokes. The goal is to identify the most important patterns in customer types and behaviors. More detailed analyses will be discussed in a later chapter on propensity modeling. The preceding search for causal factors that explain CLV already yields a lot of useful information, but true customer segments have not been created. Knowing what product groups are underpinning your most valuable customers is crucial in setting a good direction for the company. A good segmentation ties into this strategic component, as it goes beyond identifying ad hoc customer groups that would respond well to a certain nudge. The basic segmentation used at a company is often used for long periods of time. Effectively, it is more likely that a customer segmentation will be challenged after too much time rather than too little. The true purpose of the causal analysis of CLV is to identify relevant variables for customer segmentation. This prevents the most common problem with segmentation—the use of irrelevant dimensions to decide on segments. Variables can be qualified as having statistical significance when explaining the CLV, but just as importantly can be preselected on real-world significance. Having real-world significance means that the effect is not so small as to be negligible in practice. It might consistently be the case that a customer who buys sweets at the registry has a 0.5% higher CLV than average, but this type of marginal increase might not be worth chasing if there are bigger fish to fry.

8.5.3

From Variables to Segments

The final step toward segmentation is to use these pre-qualified variables to create customer segments. There are two possible approaches to do this. The first is to use traditional clustering techniques on the variables that are deemed to be of significance during the preceding analysis. These are then typically assumed to have

8.5 From Lifetime Value to Customer Segments

215

a roughly equal weight in the clustering algorithm. Alternatively, there is also the option of using explainable prediction models. The independent variable in this case is still the CLV, but rather than performing an analysis one variable at a time, the most relevant variables are included in a single model. From the way the trained model looks clusters can then be identified.

8.5.3.1 Creating Segments Using Unsupervised Clustering Techniques The advantage of traditional clustering techniques is that it is possible to add other dimensions of behavioral relevance to the clustering, specifically if there are variables that have strong significance in the way customers are communicated toward, but yield no big change in CLV. Examples of this are the discount propensity (= how much does a customer make use of discounts?) and the channels used by a customer (online or offline). It is possible that such variables do not have a significant relation to the value of a customer, but they may nevertheless be interesting to include in a segmentation. A realistic example of the results of such a clustering algorithm will now be presented based on the same sample data. Here the customer’s expenditure in the paint, lumber, and powertools categories have been used as input variables, as well as the average discount of transactions completed by the customer. This last variable was not a strong predictor for the CLV variable, but can still be included in this analysis to identify discount buyers. Knowing who the discount buyers are can be interesting when designing certain pieces of communication. The optimal number of clusters for this dataset and the selected variables was found to be four.15 Segments that are created using clustering algorithms do not come accompanied with ready-made interpretations. Hence, scatter plots such as those shown in Fig. 8.7 are typically used to understand how segments can be interpreted. Two-dimensional scatter plots are often superior to three-dimensional scatter plots in this regard, since the perspective in three-dimensional scatter plots can lead to mistakes in the interpretation. Based on these scatter plots, the following interpretation could be attached to the four segments, based on what can be seen on the scatter plots: • • • •

C0—8.6k: Paint buyers. C1—3.6k: Promo buyers. C2—2.7k: Low-value customers. C3—10.4k: Powertool buyers.

When contrasted to the segments shown in Fig. 8.1, it is clear that these segments are easier to link to actions. Changing someone’s age is impossible; convincing them to move is only slightly more feasible. The segments presented here provide both

15 In

this example the silhouette score[8] which measures how similar a customer is to the other customers in her cluster. For customer segmentation purposes, this measure is often more insightful than the more traditional elbow method.

216

8 Better Customer Segmentation

Fig. 8.7 Segmentation result based on a clustering model that uses preselected relevant variables as input

an action that can be linked to them and a clear direction in which customers can be moved to improve their value. For example, trying to move customers from the C2 category into the C0 or C3 categories is attractive in terms of average CLV increase. The type of nudge is also clear. Moving to C0 requires convincing people to buy paint; likewise moving customers to C3 implies convincing them to buy powertools. A real segmentation would of course go into more detail than this. This is done by feeding more variables to the clustering algorithm, which will inflate the number of segments. A key risk is venturing too far down the rabbit hole. The segmentation needs to have enough detail to be actionable, but there is absolutely no need to answer all questions before moving over to actions. Even the rough cut segmentation presented here can already be useful in an organization where no active nudges in these directions exist. Again, a useful segmentation will have to be challenged by both product and business experts before being adopted. Once high-level segments are in active use, more detail can be added based on the preliminary results of experiments. It may be that powertool buyers are more valuable, but it may also be true that it is very hard to convince people to become powertool buyers. At this point in time, this variable could be dropped in favor of variables that are easier to influence.

8.5.3.2 Creating Segments Using Supervised Prediction Models As an alternative to unsupervised clustering models, supervised models can also be valuable tools to create actionable segments. Rather than just searching for patterns in the variables provided, the goal of these models is to explain as much variation in the CLV as possible. This is similar to the regression style analysis that was used to identify the variables of most importance—but now includes possible interrelations between variables.

8.5 From Lifetime Value to Customer Segments

217

Powertools