Frontiers of Dynamic Games: Game Theory and Management, St. Petersburg, 2018 [1st ed. 2019] 978-3-030-23698-4, 978-3-030-23699-1

This book is devoted to game theory and its applications to environmental problems, economics, and management. It collec

386 20 5MB

English Pages XII, 336 [345] Year 2019

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Frontiers of Dynamic Games: Game Theory and Management, St. Petersburg, 2018 [1st ed. 2019]
 978-3-030-23698-4, 978-3-030-23699-1

Table of contents :
Front Matter ....Pages i-xii
An Example of Reflexive Analysis of a Game in Normal Form (Denis Fedyanin)....Pages 1-11
Differential-Game-Based Driver Assistance System for Fuel-Optimal Driving (Michael Flad)....Pages 13-36
On the Selection of the Nash Equilibria in a Linear-Quadratic Differential Game of Pollution Control (Ekaterina Gromova, Yulia Lakhina)....Pages 37-48
Endogenous Formation of Cooperation Structure in TU Games (Anna Khmelnitskaya, Elena Parilina, Artem Sedakov)....Pages 49-64
Analysis of Competitive and Cooperative Solutions in Dynamic Auction Games (Nikolay A. Krasovskii, Alexander M. Tarasyev)....Pages 65-84
A-Subgame Concept and the Solutions Properties for Multistage Games with Vector Payoffs (Denis Kuzyutin, Yaroslavna Pankratova, Roman Svetlov)....Pages 85-102
A Dynamic Model of Bertrand Competition for an Oligopolistic Market (Zeng Lian, Jie Zheng)....Pages 103-130
Pure and Mixed Stationary Nash Equilibria for Average Stochastic Positional Games (Dmitrii Lozovanu)....Pages 131-155
Variational Inequalities, Nash Equilibrium Problems and Applications: Unification Dynamics in Networks (Vladimir Matveenko, Maria Garmash, Alexei Korolev)....Pages 157-174
About the Looking Forward Approach in Cooperative Differential Games with Transferable Utility (Ovanes Petrosian, Ildus Kuchkarov)....Pages 175-208
Dynamically Consistent Bi-level Cooperation of a Dynamic Game with Coalitional Blocs (Leon A. Petrosyan, David W. K. Yeung)....Pages 209-230
Optimal Incentive Strategy in a Markov Game with Multiple Followers (Dmitry B. Rokhlin, Gennady A. Ougolnitsky)....Pages 231-243
How Oligopolies May Improve Consumers’ Welfare? R&D Is No Longer Required! (Alexander Sidorov)....Pages 245-266
Guaranteed Deterministic Approach to Superhedging: Lipschitz Properties of Solutions of the Bellman–Isaacs Equations (Sergey N. Smirnov)....Pages 267-288
Evaluation of Portfolio Decision Improvements by Markov Modulated Diffusion Processes: A Shapley Value Approach (Benjamin Vallejo-Jimenez, Mario A. Garcia-Meza)....Pages 289-302
Conditionally Coordinating Contracts in Supply Chains (Nikolay A. Zenkevich, Irina Berezinets, Natalia Nikolchenko, Alina Rucheva)....Pages 303-336

Citation preview

Static & Dynamic Game Theory: Foundations & Applications

Leon A. Petrosyan Vladimir V. Mazalov Nikolay A. Zenkevich Editors

Frontiers of Dynamic Games Game Theory and Management, St. Petersburg, 2018

Static & Dynamic Game Theory: Foundations & Applications Series Editor Tamer Ba¸sar, University of Illinois, Urbana-Champaign, IL, USA Editorial Advisory Board Daron Acemoglu, MIT, Cambridge, MA, USA Pierre Bernhard, INRIA, Sophia-Antipolis, France Maurizio Falcone, Università degli Studi di Roma “La Sapienza,” Italy Alexander Kurzhanski, University of California, Berkeley, CA, USA Ariel Rubinstein, Tel Aviv University, Ramat Aviv, Israel; New York University, NY, USA William H. Sandholm, University of Wisconsin, Madison, WI, USA Yoav Shoham, Stanford University, CA, USA Georges Zaccour, GERAD, HEC Montréal, Canada

More information about this series at http://www.springer.com/series/10200

Leon A. Petrosyan • Vladimir V. Mazalov • Nikolay A. Zenkevich Editors

Frontiers of Dynamic Games Game Theory and Management, St. Petersburg, 2018

Editors Leon A. Petrosyan St. Petersburg State University St. Petersburg, Russia

Vladimir V. Mazalov Institute of Applied Mathematical Research Karelia Research Center of RAS Petrozavodsk, Russia

Nikolay A. Zenkevich Graduate School of Management St. Petersburg State University St. Petersburg, Russia

ISSN 2363-8516 ISSN 2363-8524 (electronic) Static & Dynamic Game Theory: Foundations & Applications ISBN 978-3-030-23698-4 ISBN 978-3-030-23699-1 (eBook) https://doi.org/10.1007/978-3-030-23699-1 Mathematics Subject Classification (2010): 90B, 91A, 91B © Springer Nature Switzerland AG 2019 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, express or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This book is published under the imprint Birkhäuser, www.birkhauser-science.com, by the registered company Springer Nature Switzerland AG. The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

Preface

The content of this volume is mainly based on selected talks that were given at the 12th international conference “Game Theory and Management 2018,” held at the Saint Petersburg State University in Saint Petersburg, Russia, from June 27th to 29th, 2018. Every year starting from 2007, an international conference “Game Theory and Management” (GTM) has taken place at the Saint Petersburg State University. Among the plenary speakers of this conference series were the Nobel Prize winners Robert Aumann, John Nash, Reinhard Selten, Roger Myerson, Finn Kidland, Eric Maskin, and many other famous game theorists. The underlying theme of the conferences is the promotion of advanced methods for modeling the behavior that each agent (also called player) has to adopt in order to maximize his or her reward once the reward does not only depend on the individual choices of a player (or a group of players), but also on the decisions of all agents that are involved in the conflict (game). In particular, the emphasis of the GTM 2018 conference was on the following topics: • • • • • •

dynamic and differential games; cooperative solutions; dynamic game modeling in management and environmental issues; energy and recourse allocation; games in finance and marketing; stochastic and sequential games.

In this volume, two sorts of contributions prevail: chapters that are mainly concerned with the application of game theoretic methods and chapters where the theoretical background is developed. In the chapter by Michael Flad, the interaction between a driver and a driving assistance system is described for the first time by means of a differential game. The system calculates the optimal control output by solving the game. It is interesting to note that this driving assistance system was implemented on a real-time system, integrated in a driving simulator and validated in a driving study.

v

vi

Preface

In the chapter by Ekaterina Gromova and Yulia Lakhina on a special linearquadratic differential game model for pollution control, an attempt is made to specify one solution from the set of solutions of the corresponding Hamilton– Jacoby–Bellman equations with the help of an economic criterion as well as a classical approach. The chapter by Alexander Sidorov studies how the concentration of industries affects social welfare, measured as the consumer’s indirect utility. Based on the presented model, the author tries to explain that a lower concentration is not always harmful for consumers. In the chapter by Benjamin Vallejo-Jimenez and Mario A. Gracia-Meza, it is demonstrated how the introduction of a time-inhomogeneous Markov-modulated diffusion process into asset portfolio decision problems yields higher returns for the rational decision maker. The chapter by Nikolay A. Zenkevich, Irina Berezinetz, Natalia Nikolchenko, and Alina Ruchyova investigates the target sales rebate and buyback contracts in supplier-retailer supply chain games. Thereby, the Stackelberg model is used for the supply chain, under the condition of a fixed retail price and stochastic demand. The chapter by Nikolay A. Krasovskii and Alexander M. Tarasyev is devoted to the analysis of competition and cooperation in dynamic auction models. A market equilibrium is defined in the Pareto set, and it is then asked how to shift the system from a competitive Nash equilibrium to the market equilibrium in the Pareto set. A shifting algorithm is proposed, and the results of the algorithm are demonstrated for a model of fast growing economies. The chapter by Zeng Lian and Jie Zheng is concerned with the infinitely repeated Bertrand competition game among firms with stochastic entry and stochastic demand. The symmetric subgame-perfect Nash equilibrium of the game is characterized for the case when a firm’s strategy consists of two components, namely a positioning strategy and a pricing strategy. The chapter by Leon A. Petrosyan and David W.K. Yeung deals with two-level cooperation: cooperation among members within a coalition bloc and cooperation between coalition blocs. The gain of each coalition is defined as components of the Shapley value. For the definition of the gains within a coalition, the proportional solution is used. The IDP is constructed for ensuring the time consistency of the two-level solution. In their chapter, Vladimir Matveenko, Maria Garmash, and Alexander Korolev investigate the equilibrium in a game theoretic model of production and externalities for a network with two types of agents having different productivities. Thereby, each player invests a part of her endowment in the first stage, and the consumption in the second stage depends on her investment and productivity as well as on investments of her neighbors in the network. The chapter by Anna Khmelnitsakaya, Elena Parilina, and Artem Sedakov provides a comparative analysis of several procedures for the endogenous dynamic formation of the cooperation structure in TU games. The authors propose two approaches to endogenous graph formation, based on sequential link announcement and revision.

Preface

vii

The chapter by Denis Kuzyutin, Yaroslavna Pankratova, and Roman Svetlov considers multistage multicriteria games in extensive form. The authors employ the so-called A-subgame concept for examining the dynamical properties of some noncooperative and cooperative solutions. The chapter by Ovanes Petrosian and Ildus Kuchkarov includes a complete description of the looking-forward approach for cooperative TU differential games. This approach can be used when the information about the game is updated—in each moment; the players receive the updated information about the motion equations and payoffs. Also, an example of the recourse extraction game is presented. The chapter by Dmitrii Lozovanu investigates the problem of the existence of a Nash equilibrium in the class of stationary strategies for so-called average stochastic games. In the chapter by Sergey N. Smirnov, a game theoretic approach is used for modeling the cheapest coverage of the contingent claim on an American option under all admissible scenarios. The chapter by Dmitry B. Rokhlin and Gennady A. Ougolnitsky covers the dynamic incentive problem in the case of several followers playing a Markov game. The leader’s strategy is determined by solving a stochastic control problem. Denis Fedyanin’s chapter is concerned with normal form games with an unknown parameter. The author investigates solutions based on the beliefs concerning this parameter. The GTM 2018 program committee thanks all the authors for their active cooperation and participation during the preparation of this volume. Also, the organizers of the conference gratefully acknowledge the financial support given by the Saint Petersburg State University. Last but not least, we thank the reviewers for their outstanding contribution and the science editor Tobias Schwaibold. St. Petersburg, Russia Petrozavodsk, Russia St. Petersburg, Russia

Leon A. Petrosyan Vladimir V. Mazalov Nikolay A. Zenkevich

Contents

1

An Example of Reflexive Analysis of a Game in Normal Form . . . . . . . Denis Fedyanin

2

Differential-Game-Based Driver Assistance System for Fuel-Optimal Driving . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . Michael Flad

13

On the Selection of the Nash Equilibria in a Linear-Quadratic Differential Game of Pollution Control . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . Ekaterina Gromova and Yulia Lakhina

37

3

1

4

Endogenous Formation of Cooperation Structure in TU Games . . . . . Anna Khmelnitskaya, Elena Parilina, and Artem Sedakov

5

Analysis of Competitive and Cooperative Solutions in Dynamic Auction Games .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . Nikolay A. Krasovskii and Alexander M. Tarasyev

65

A-Subgame Concept and the Solutions Properties for Multistage Games with Vector Payoffs .. . . . . . . . . .. . . . . . . . . . . . . . . . . . . . Denis Kuzyutin, Yaroslavna Pankratova, and Roman Svetlov

85

6

49

7

A Dynamic Model of Bertrand Competition for an Oligopolistic Market . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 103 Zeng Lian and Jie Zheng

8

Pure and Mixed Stationary Nash Equilibria for Average Stochastic Positional Games .. . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 131 Dmitrii Lozovanu

9

Variational Inequalities, Nash Equilibrium Problems and Applications: Unification Dynamics in Networks .. . . . . . . . . . . . . . . . . 157 Vladimir Matveenko, Maria Garmash, and Alexei Korolev

ix

x

Contents

10 About the Looking Forward Approach in Cooperative Differential Games with Transferable Utility . . . . . . .. . . . . . . . . . . . . . . . . . . . 175 Ovanes Petrosian and Ildus Kuchkarov 11 Dynamically Consistent Bi-level Cooperation of a Dynamic Game with Coalitional Blocs . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 209 Leon A. Petrosyan and David W. K. Yeung 12 Optimal Incentive Strategy in a Markov Game with Multiple Followers .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 231 Dmitry B. Rokhlin and Gennady A. Ougolnitsky 13 How Oligopolies May Improve Consumers’ Welfare? R&D Is No Longer Required! .. . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 245 Alexander Sidorov 14 Guaranteed Deterministic Approach to Superhedging: Lipschitz Properties of Solutions of the Bellman–Isaacs Equations .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 267 Sergey N. Smirnov 15 Evaluation of Portfolio Decision Improvements by Markov Modulated Diffusion Processes: A Shapley Value Approach . . . . . . . . . . 289 Benjamin Vallejo-Jimenez and Mario A. Garcia-Meza 16 Conditionally Coordinating Contracts in Supply Chains . . . . . . . . . . . . . . 303 Nikolay A. Zenkevich, Irina Berezinets, Natalia Nikolchenko, and Alina Rucheva

Contributors

Irina Berezinets St. Petersburg State University, St. Petersburg, Russia Denis Fedyanin V.A. Trapeznikov Institute of Control Sciences, Moscow, Russia National Research University Higher School of Economics, Moscow, Russia IEEE, Moscow, Russia Michael Flad Institute of Control Systems, Karlsruhe Institute of Technology, Karlsruhe, Germany Mario A. Garcia-Meza Facultad de Economía, Contaduría y Administración, Universidad Juarez del Estado de Durango, Durango, Mexico Maria Garmash National Research University Higher School of Economics at St. Petersburg, St. Petersburg, Russia Ekaterina Gromova St. Petersburg State University, St. Petersburg, Russia Institute of Mathematics and Mechanics, Ural Branch of the Russian Academy of Sciences, Ekaterinburg, Russia Anna Khmelnitskaya St. Petersburg State University, St. Petersburg, Russia V.A. Trapeznikov Institute of Control Sciences of the Russian Academy of Sciences, Moscow, Russia Alexei Korolev National Research University Higher School of Economics at St. Petersburg, St. Petersburg, Russia Nikolay A. Krasovskii Krasovskii Institute of Mathematics and Mechanics UrB RAS, Yekaterinburg, Russia Ildus Kuchkarov St. Petersburg State University, St. Petersburg, Russia Denis Kuzyutin St. Petersburg State University, St. Petersburg, Russia National Research University Higher School of Economics at St. Petersburg, St. Petersburg, Russia xi

xii

Contributors

Yulia Lakhina St. Petersburg State University, St. Petersburg, Russia Zeng Lian International Business School, Beijing Foreign Studies University, Beijing, China Dmitrii Lozovanu Institute of Mathematics and Computer Science of Moldova Academy of Sciences, Chisinau, Moldova Vladimir Matveenko National Research University Higher School of Economics at St. Petersburg, St. Petersburg, Russia Natalia Nikolchenko St. Petersburg State University, St. Petersburg, Russia Gennady A. Ougolnitsky I.I. Vorovich Institute of Mathematics, Mechanics and Computer Sciences of Southern Federal University, Rostov-on-Don, Russia Yaroslavna Pankratova St. Petersburg State University, St. Petersburg, Russia Elena Parilina St. Petersburg State University, St. Petersburg, Russia Ovanes Petrosian St. Petersburg State University, St. Petersburg, Russia National Research University Higher School of Economics at St. Petersburg, St. Petersburg, Russia Leon A. Petrosyan St. Petersburg State University, St. Petersburg, Russia Dmitry B. Rokhlin I.I. Vorovich Institute of Mathematics, Mechanics and Computer Sciences of Southern Federal University, Rostov-on-Don, Russia Alina Ruchyova St. Petersburg State University, St. Petersburg, Russia Artem Sedakov St. Petersburg State University, St. Petersburg, Russia Alexander Sidorov Novosibirsk State University, Novosibirsk, Russia Sobolev Institute of Mathematics, Novosibirsk, Russia Sergey N. Smirnov Lomonosov Moscow State University, Moscow, Russia Roman Svetlov The Herzen State Pedagogical University of Russia, St. Petersburg, Russia Alexander M. Tarasyev Krasovskii Institute of Mathematics and Mechanics UrB RAS, Yekaterinburg, Russia Ural Federal University, Yekaterinburg, Russia Benjamin Vallejo-Jimenez Universidad de Colima, Colima, Mexico David W. K. Yeung Shue Yan University, Braemar Hill, Hong Kong Nikolay A. Zenkevich St. Petersburg State University, St. Petersburg, Russia Jie Zheng Department of Economics, School of Economics and Management, Tsinghua University, Beijing, China

Chapter 1

An Example of Reflexive Analysis of a Game in Normal Form Denis Fedyanin

Abstract In this paper we considered a normal form game with a parameter A and suggested that this is an uncertain parameter for agents and they have to make some suggestion about it. We took a normal form game and weakened a suggestion on common knowledge. We introduced two fundamental alternatives based on dynamic epistemic logic and found equilibria for this modified game. We used the special property of these alternatives which let us calculate an equilibria by solving several normal form games with perfect information. We found direct expressions for equilibria for nonmodified game and for modified games with alternative suggestions on beliefs. Keywords Social networks · Epistemic models · Control · Uncertainty · Information control · Game theory · Collective actions · de Groot’s model · Logic models

1.1 Introduction Let’s there are a set of agents N = {1, . . . , n}, a set real not negative strategies X = {X1 , . . . , Xn }, and a set of utility functions F (A) = {f1 (A), . . . , fn (A)} with a parameter A. One can consider a game G =< N, X, F (A) > .

D. Fedyanin () V.A. Trapeznikov Institute of Control Sciences, Moscow, Russia National Research University Higher School of Economics, Moscow, Russia IEEE, Moscow, Russia © Springer Nature Switzerland AG 2019 L. A. Petrosyan et al. (eds.), Frontiers of Dynamic Games, Static & Dynamic Game Theory: Foundations & Applications, https://doi.org/10.1007/978-3-030-23699-1_1

1

2

D. Fedyanin

We have investigated a case when agents do not have consensus on a value of A. One can just denote A0 to be an actual value of the parameter A, denote beliefs of the agents by A = A1 , . . . , An , write F (A) = {f1 (A1 ), . . . , fn (An )}, and consider a game G =< N, X, F (A) >. But it is not clear sometimes what this structure means. We suggest a more accurate way [1]. First of all we introduce a formalism for epistemic models for common knowledge [2] and doxastic models for beliefs of agents [3]. We can proceed our analysis without this formalism but it makes our investigations more clear for logicians and formal philosophers [4–7]. We will use a formal grammar φ = ⊥ |  | p | ¬φ | φ ∧ ψ | φ ∨ ψ | Ki φ | Bi φ | CK φ | CB φ.  is Truth, ⊥ is False. Elementary propositions are elements of a set P ∈ {(A = x)|x ∈ R}. {Ki φ} is a set of knowledge operators of agents that describes their knowledge about a value φ. {Bi φ} is a set of belief operators of agents that describes their beliefs about a value φ. CK φ means that φ is a common knowledge among agents in N. CB φ means that φ is a common belief among agents in N. Precise definition of common knowledge and beliefs in these terms are k+1 k 0 k G0K φ = ∧j Kj φ; Gk+1 K φ = ∧j Kj GK φ; GB φ = ∧j Bj φ; GB φ = ∧j Bj GB φ;

CK φ = GK 0 φ∧GK 1 φ∧. . .∧GK m φ∧. . . ; CB φ = GB 0 φ∧GB 1 φ∧. . .∧GB m φ∧. . .

Please note that a set of propositions here is not countable or enumerable. We will write GI =< N, X, F (A), I > for a game G =< N, X, F (A) > with a given logic assumptions or axioms of informational structure I . The ordinary case is G =< N, X, F (A), CK (A = A0 ) >, where A0 is an actual value of a parameter A. It is just a game G =< N, X, F (A0 ) > in a normal form. Note that Nash equilibria for GCK =< N, X, F (A), CK (A = A0 ) > and GCB =< N, X, F (A), CB (A = A0 ) > coincides though resulting values of utility functions could differ since CK (A = A0 ) → (A = A0) but there is no such theorem for CB (A = A0 ). A belief could be false even if it is a common belief. There is a well-known way to investigate this game using Nash equilibria. Each Nash equilibrium is a vector y = (y1 , . . . , yn ) such that ∀xi ∈ Xi fi (y1 , . . . , yi−1 , yi , yi+1 , . . . , yn ) ≥ fi (y1 , . . . , yi−1 , xi , yi+1 , . . . , yn ). We will denote Gφ =< N, X, F (A), φ > e.g. G∀iBi (A=Ai ) =< N, X, F (A), ∀iBi (A = Ai ) > One can consider these options: GBi (A=Ai ) , GCB Bi (A=Ai ) , GCK Bi (A=Ai ) , GCB CB (A=Ai ) , GCK CK (A=Ai ) , GBi CB (A=Ai ) , GBi CK (A=Ai ) , GBi Bj (A=Ai ) , GBi Bj (A=Aj ) .

1 An Example of Reflexive Analysis of a Game in Normal Form

3

1.2 An Example of Game We will continue investigations of a game of collective actions [8]. There are a set of agents N = {1, . . . , n}, a set of real not negative strategies and a set of utility functions ⎛ ⎞  x2 xj − A⎠ − i , ∀i ∈ N, fi (x1 , . . . , xn , r1 , . . . , rn , A1 , . . . An ) = xi ⎝ ri j ∈N

where 0 < ri < 1. The corresponding practical interpretation is that the agents apply the strategies and it appears successful (provides positive contribution to unity functions of the agents) when the total effort exceeds specific threshold; the latter is set equal to 1. The strategy being successful, the agent’s gain(the first term in utility function) increases with growing effort of the agent in question. On the other hand, the agent’s effort itself results in negative contribution to the utility function (see the second term) which depends on the type ri . The larger is the type variable, the “easier” the agent applies the strategy (for instance, in psychological sense it could be explained by the agent’s greater loyalty or liking for the joint action) [8]. Cournot oligopoly model [9] looks similar but it is not the same because of different utility functions ⎛ fi = xi ⎝A −



⎞ xj ⎠ −

j ∈N

xi2 . ri

The corresponding practical interpretation of Cournot oligopoly is the following: strategies are amounts of sold products, utility functions are amount of products multiplies a price that decreases when total amount of sold products increases minus costs. There are some important differences that makes the game of collective actions looks like combination Cournot oligopoly and game theoretical modification of Granovetter [10] not just Cournot oligopoly. Breer Threshold model [11] is the one where utility functions are ⎛ fi = xi ⎝A −



⎞ xj ⎠

j ∈N

and set of strategies is restricted to binary values—strategy is equal either 0 or 1. Anyway we can apply all ideas below for Cournot oligopoly as well but we haven’t applied it yet.

4

D. Fedyanin

In this paper we propose to consider A as an uncertain parameter for agents and they have to make some suggestion about it.

1.3 Results 1.3.1 Players with Common Knowledge We can mode it by a game GCK =< N, X, F (A), CK (A = A0 ) > . There is a well-known way to find Nash equilibrium. It is to compose and solve a system of equations where strategy of each player is equals his or her best response ⎞ ⎛ 2ri ⎝ xi = BRi (x − i) = xj − A⎠ + i , ∀i ∈ N, 1 − ri j =i

where i =

⎧ ⎨0, ⎩− 2ri 1−ri



x − A , j j =i

2ri 1−ri 2ri 1−ri



j =i xj − A ≥ 0,

x − A < 0. j j =i

1.3.1.1 Zero Nash Equilibrium for GCK There is always a solution in the game GCK . Actions of agents xi = 0, ∀i ∈ N. Values of agents’ utilities fi = 0, ∀i ∈ N. 1.3.1.2 Nonzero Nash Equilibrium for GCK If  j ∈N

then there is one more solution.

rj > 1, 2 − rj

1 An Example of Reflexive Analysis of a Game in Normal Form

5

Strategies of agents ri A 2 − ri xi =  r j j ∈N

2 − rj

−1

, ∀i ∈ N.

Values of agents’ utilities A2 (1 − ri )ri , ∀i ∈ N. ⎞2  rj ⎝ − 1⎠ (2 − ri )2 2 − rj

fi = ⎛

j ∈N

1.3.2 Players with Communication and Consensus We can model this case by a game GCB =< N, X, F (A), CB (A = A0 ) > . There could be a communication between agents [4, 12] and they can communicate according de Groot model [13]. There is not difference if an existence of such communication is a common knowledge among all agents or it is not. Let their influences are wj then one should compose and solve system ⎛ ⎞  2ri ⎝ xi = BRi (x−i ) = xj − wj Ai ⎠ 1 − ri i =j

for each i. 1.3.2.1 Zero Nash Equilibrium for GCB There is always a solution in GCB . Strategies of agents xi = 0, ∀i ∈ N. Values of agents’ utilities fi = 0, ∀i ∈ N.

j

6

D. Fedyanin

1.3.2.2 Nonzero Nash Equilibrium for GCB If  j ∈N

rj > 1, 2 − rj

then there is one more solution 

wj Aj

j ∈N

xi =  j ∈N

ri 2 − ri

, ∀i ∈ N. rj −1 1 − rj

1.3.3 Players Without Communication We can model this case by games G∀iBi CB (A=Ai ) =< N, X, F (A), ∀iBi CB (A = Ai ) >, G∀iBi CK (A=Ai ) =< N, X, F (A), ∀iBi CK (A = Ai ) > . We formulated axioms to make an informational system complete. G∀iBi CB (A=Ai )∧Bi (A=Ai ) =< N, X, F (A), ∀i(Bi CB (A = Ai ) ∧ Bi (A = Ai )) >, G∀iBi CK (A=Ai )∧Bi (A=Ai ) =< N, X, F (A), ∀i(Bi CK (A = Ai ) ∧ Bi (A = Ai )) > . Player i could believe that all utility functions are fi = xi (



xj − Ai ) − xi2 /ri .

j ∈N

It coincides with the Nash equilibrium with certain value of parameter A, if there is A = Ai for any i common knowledge that A = Ai . Strategy of each player is equals to her best response that are xi = BRi (x−i ) =

2ri  ( xj − Ai ). 1 − ri j =i

Agent i makes a best response for all other agents according to her beliefs. Thus from the i-th player’s point of view it looks like she should compose and solve

1 An Example of Reflexive Analysis of a Game in Normal Form

system for following best responses xj = BRi (x−i ) =

2rj  ( xk − Ai ) 1 − rj k =j

for each j. 1.3.3.1 Zero Nash Equilibrium for G∀iBi CB (A=Ai )∧Bi (A=Ai ) There is always a solution. Strategies of agents xi = 0, ∀i ∈ N. Values of agents’ utilities fi = 0, ∀i ∈ N. 1.3.3.2 Nonzero Nash Equilibrium for G∀iBi CB (A=Ai )∧Bi (A=Ai ) If  j ∈N

rj > 1, 2 − rj

then there is one more solution ri Ai 2 − ri , ∀i ∈ N. xi =  r j −1 2 − rj j ∈N

1.3.4 Stubborn Players with Communication Without Consensus We can model this case by games GBi CB (A=Ai ) =< N, X, F (A), ∀iBi CB (A = Ai ) >, GBi CK (A=Ai ) =< N, X, F (A), ∀iBi CK (A = Ai ) > .

7

8

D. Fedyanin

We formulated axioms to make an informational system complete. GBi CB (A=Ai )∧Bi (A=Ai )) =< N, X, F (A), ∀i(Bi CB (A = Ai ) ∧ Bi (A = Ai )) >, GBi CK (A=Ai )∧Bi (A=Ai ) =< N, X, F (A), ∀i(Bi CK (A = Ai ) ∧ Bi (A = Ai )) > . If there is a communication with no trust at all then all agents become stubborn and other opinion doesn’t change their opinions. There is no difference if an existence of such communication is a common knowledge among all agents or it is not. The important information is that Ai is a common knowledge and that all agents are stubborn in our sense. Thus from the i player’s point of view he should compose and solve system for following best responses ⎛ ⎞ 2ri ⎝ xj = BRi (x−i ) = xj − Ai ⎠ 1 − ri j =i

for each i. 1.3.4.1 Zero Nash Equilibrium for G∀i(Bi CK (A=Ai )∧Bi (A=Ai )) There is always a solution for GBi CK (A=Ai )∧Bi (A=Ai ) and GBi CB (A=Ai )∧Bi (A=Ai )) . Strategies of agents xi = 0, ∀i ∈ N. Values of agents’ utilities fi = 0, ∀i ∈ N. 1.3.4.2 Nonzero Nash Equilibrium for G∀i(Bi CK (A=Ai )∧Bi (A=Ai )) If  j ∈N

then there is one more solution.

rj > 1, 2 − rj

1 An Example of Reflexive Analysis of a Game in Normal Form

Strategies of agents ⎞ ⎛ ri  rj  2 − ri ⎝Ai + xi =  r Aj − Ai ⎠ , ∀i ∈ N. j 2 − rj −1 j ∈N 2 − rj j ∈N

Values of agents’ utilities. fi (x1 , . . . , xn , r1 , . . . , rn , A1 , . . . An ) =

= ⎛ ⎝



ri 2 − ri 

j ∈N

⎞2 ⎝Ai +

rj − 1⎠ 2 − rj





 Aj rj ⎝ 2 − rj j ∈N

⎞  Aj rj rm −1 +n − A⎠ × 2 − rm 2 − rj

m∈N

j ∈N

ri2 ×⎛ ⎝

(2 − ri )  j ∈N

= ⎛ ⎝

j ∈N

⎞2 ⎝Ai +

rj − 1⎠ ri 2 − rj



j ∈N





ri 2 − ri

j ∈N

⎞  rj Aj − Ai ⎠ × 2 − rj





2



⎛ ⎞2 ⎝Ai +

rj − 1⎠ 2 − rj

 j ∈N

 j ∈N

⎞2  rj Aj − Ai ⎠ = 2 − rj

⎞  rj Aj − Ai ⎠ × 2 − rj

⎛ ⎞ ⎞2 ⎛  rj  rj Aj rj ⎝ + n − 1⎠ − A⎝ − 1⎠ − 2 − rj 2 − rj 2 − rj j ∈N

j ∈N

⎛ ⎞2   rj  1 ⎝ ⎠ − Aj − Ai , ∀i ∈ N. Ai + 2 − ri 2 − rj j ∈N

9

10

D. Fedyanin

1.4 Conclusion In this paper we considered a normal form game with a parameter A and suggested that this is an uncertain parameter for agents and they have to make some suggestion about it. We used a formal grammar φ = ⊥ |  | p | ¬φ | φ ∧ ψ | φ ∨ ψ | Ki φ | Bi φ | CK φ | CB φ. We applied this method to the game. In the game agents apply the strategies and it appears successful (provides positive contribution to unity functions of the agents) when the total effort exceeds specific threshold; the latter is set equal to 1. The strategy being successful, the agent’s gain(the first term in utility function) increases with growing effort of the agent in question. On the other hand, the agent’s effort itself results in negative contribution to the utility function (see the second term) which depends on the type ri . The larger is the type variable, the “easier” the agent applies the strategy (for instance, in psychological sense it could be explained by the agent’s greater loyalty or liking for the joint action) [8]. We used the special property of these alternatives which let us calculate an equilibria by solving several normal form games with perfect information. We have found equilibria for the games GBi (A=Ai ) , GCB Bi (A=Ai ) , GCK Bi (A=Ai ) , GCB CB (A=Ai ) , GCK CK (A=Ai ) , GBi CB (A=Ai ) , GBi CK (A=Ai ) , GBi Bj (A=Ai ) , GBi Bj (A=Aj ) . Another alternative is to consider other games GBi Bj Bk (A=Ai ) , GBi Bj Bk (A=Aj ) , GBi Bj Bk (A=Ak ) , GBi Bj C(A=Ai ) , GBi CBj (A=Ai ) , GCBi Bj (A=Ai ) , GBi Bj C(A=Aj ) , GBi CBj (A=Aj ) , GCBi Bj (A=Aj ) , GCCBi (A=Ai ) , GCBi C(A=Ai ) , GBi CC(A=Ai ) . Where C could be CK or CB . Acknowledgements The article was prepared within the framework of the HSE University Basic Research Program and funded by the Russian Academic Excellence Project ’5-100.

References 1. Aumann, R.J.: Interactive epistemology I: knowledge. Int. J. Game Theory 28(3), 263–300 (1999) 2. Novikov, D., Chkhartishvili, A.: Reflexion Control: Mathematical Models. Communications in Cybernetics, Systems Science and Engineering (Book 5). CRC Press, Boca Raton (2014) 3. Fedyanin, D.: Threshold and network generalizations of muddy faces puzzle. In: Proceedings of the 11th IEEE International Conference on Application of Information and Communication Technologies (AICT2017, Moscow) vol. 1, pp. 256–260 (2017) 4. Shoham, Y., Leyton-Brown, K.: Multiagent Systems: Algorithmic, Game-Theoretic, and Logical Foundations. Cambridge University Press, New York (2008) 5. Harsanyi, J.C.: Games with incomplete information played by Bayesian players, part I. Manag. Sci. 14(3), 159–183 (1967) 6. Harsanyi, J.C.: Games with incomplete information played by Bayesian players, part II. Manag. Sci. 14(5), 320–334 (1967)

1 An Example of Reflexive Analysis of a Game in Normal Form

11

7. Harsanyi, J.C.: Games with incomplete information played by Bayesian players, part III. Manag. Sci. 14(7), 486–502 (1968) 8. Fedyanin, D.N., Chkhartishvili, A.G.: On a model of informational control in social networks. Autom. Remote. Control 72, 2181–2187 (2011) 9. Cournot, A.: Reserches sur les Principles Mathematiques de la Theorie des Richesses. Hachette, Paris. Translated as Research into the Mathematical Principles of the Theory of Wealth. Kelley, New York (1960) 10. Granovetter, M.: Threshold models of collective behavior. Am. J. Sociol. 83, 489–515 (1978) 11. Breer, V.V., Novikov, D.A., Rogatkin, A.D.: Mob Control: Models of Threshold Collective Behavior. Studies in Systems, Decision and Control. Springer, Heidelberg (2017) 12. Sarwate, A.D., Javidi, T.: Distributed learning from social sampling. In: 46th Annual Conference on Information Sciences and Systems (CISS), Princeton, 21–23 March 2012, pp. 1–6. IEEE, Piscataway (2012) 13. DeGroot, M.H.: Reaching a consensus. J. Am. Stat. Assoc. 69, 118–121 (1974)

Chapter 2

Differential-Game-Based Driver Assistance System for Fuel-Optimal Driving Michael Flad

Abstract Increasing the fuel-efficiency is a current and essential question for all major car manufacturers. Supporting these efforts, the paper presents a shared control driver assistance system that may help the driver to apply a fuel-efficient driving strategy. For the proposed system, both driver and assistance system can apply forces to the acceleration pedal enabling a close cooperation between the two partners. The interaction between driver and such kind of assistance system can be described by means of a differential game. By solving this differential game, the assistance system calculates optimal control outputs. For realization, the assistance system is required to solve different game theoretic problems that are presented in this paper. The assistance system was implemented on a real time system, integrated in a driving simulator and validated in a driving study. The results indicate that the proposed system is able to save in average about 10% fuel in a highway scenario. Keywords Advanced driver assistance system · Differential game · Cooperative and haptic shared control · Increasing fuel efficiency · Optimal control in human-machine cooperation

2.1 Introduction Increasing the fuel-efficiency of transportation systems is currently a major research question in many countries. In the past, the conventional approach to realize this was to develop more efficient drive trains and especially engines. However, at the current technical level of combustion engines, even slight improvements of the efficiency becomes increasingly costly. Furthermore, alternative drive train concepts like hybrid cars or fully electric vehicles are still significantly more expensive than

M. Flad () Institute of Control Systems, Karlsruhe Institute of Technology, Karlsruhe, Germany e-mail: [email protected] © Springer Nature Switzerland AG 2019 L. A. Petrosyan et al. (eds.), Frontiers of Dynamic Games, Static & Dynamic Game Theory: Foundations & Applications, https://doi.org/10.1007/978-3-030-23699-1_2

13

14

M. Flad

conventional drive train concepts. In addition, the driving style also affects the fuel-efficiency, which thus largely depends on the individual skills of the drivers. Therefore, from an engineering point of view it seems likely that the imperfect driver should be replaced by an ideal or at least superior technical automation system. However, real fully automated vehicles, i.e. SAE level 5, will not be available in the next 10 years [2]. All currently available “autonomous” vehicles in serial production achieve, if at all, SAE level 3. This means that there has to be a driver in some kind of backup position in order to take over the control from the automation in critical situations in which the automation can no longer handle the situation on its own. However, a human is not capable to perform this supervisory task, since he loses awareness of the driving situation, as he is no longer involved in the driving task. Without a proper situation awareness, the driver cannot sufficiently take over the control especially in critical situations. This problem is known as ‘out-of-the-loop’ problem and has been demonstrated in several driving studies [3, 13]. Instead of forcing the driver into a supervisory task he is not suited for, the idea here is to support him with an advanced driver assistance system (ADAS). The ADAS is designed to apply a force to the accelerator pedal. In this way, driver and ADAS both can influence the vehicle dynamics simultaneously and haptically interact which each other. This shared control structure between driver and ADAS is depicted in Fig. 2.1. Several shared control ADAS focusing on force feedback via the gas pedal have been proposed in the literature, e.g. [1, 17, 18, 25]. It has also been shown that these systems can affect several aspects of the driving performance like speeding and distance control. However, a significant disadvantage of all these ADAS concepts is that their control parameters are determined heuristically, i.e. the “optimal” parameters are tuned in driving experiments. First, this approach allows no formal evaluation of the performance of the shared control loop. Second, an even more essential drawback is that these concepts are not suited to design individualized ADAS as performing extensive customization experiments with each individual user is not practicable.

Driver

MD

+

ADAS

Accelerator Pedal

q

x

Vehicle

MA

Fig. 2.1 Structure of the vehicle control. Driver and ADAS can both apply a torque to the gas pedal (θ: accelerator pedal angle, MD : torque applied by the driver, MA : torque applied by the ADAS, x: system states of the vehicle)

2 Differential-Game-Based Driver Assistance System for Fuel-Optimal Driving

15

To overcome these limitations, a systematic (control) engineering approach should be preferred in which the parameters of the ADAS controllers are determined using a formal (mathematical) model of the control loop. Yet, the shared control loop (see Fig. 2.1) is not a conventional control loop as there are two controllers, the ADAS and the driver, which influence the same system and interact with each other. To determine the optimal torque that should be applied to the gas pedal, the control algorithm of the ADAS has to consider the action of the driver. In addition, the ADAS also needs to respect the change of the driver’s pedal torque that would result if it adapts its own torque. For the proposed ADAS, it is essential to model this interaction. Mathematically, this is captured by a differential game. Applying game theory to describe the interaction between driver and automation is rarely used. In [35], a game theoretic model is introduced which describes a vehicle stabilization control loop. In this loop, a driver operates a conventional steering wheel and an ADAS influences the steering using a superimposed steering system. Further work related to this lateral steering task is presented in [27, 28]. In addition, the author of this paper presented in some previous work how in general game theory can be used to design cooperative shared control systems [10, 11]. A first concept for a longitudinal ADAS was presented by the author in [24]. However, the respective ADAS is limited to the simple task of giving the driver a feedback torque to prevent speeding and tailgating. In addition, the concept does not take individual driving behavior into consideration. The aim of this paper is to present a shared control ADAS system that supports the driver with the task of applying a fuel-efficient driving strategy. A particular challenge resulting from this idea is to solve the optimization problem, i.e. game theoretic problem, using justifiable hardware resources. A concept is proposed to resolve this problem along with an algorithm able to calculate the optimal ADAS parameters for each driver individually. The paper is structured as follows: In Sect. 2.2 it is described how the shared control problem of Fig. 2.1 can be modeled as a differential game. In addition, the necessary models of the players and the vehicle dynamics are presented in this section. In Sect. 2.3 it is shown how the differential game model of the previous section can be used to design the proposed ADAS. In addition, it is shown how the resulting game theoretic problem can be solved in a way that allows a realtime capable implementation. To evaluate the concept, a driving study with 10 participants is presented in Sect. 2.4.

2.2 Differential Game Model The general idea to model shared control interaction as a differential game was presented in [10]. Here, the main ideas are shortly recapitulated, but in contrast to [10] the concept is introduced directly for the specific problem of a longitudinal shared control ADAS depicted in Fig. 2.1.

16

M. Flad

Player 1/Driver:  arg min J1 x(t), u1 (t), u2 (t)

u1 (t) = MD (t)

u1 (t)

System Dynmaic/ Pedal and Vehicle model:

x(t)

 x˙ (t) = f t, x(t), u1 (t), u2 (t) Player 2/ADAS:  arg min J2 x(t), u1 (t), u2 (t) u2 (t)

u2 (t) = MA (t)

Fig. 2.2 Differential game representation of the longitudinal shared control structure from Fig. 2.1

The basis assumption is to model the driver as an agent that chooses its available control variables, here the pedal torque, with respect to some personal objectives. For this purpose, the driver is assumed to utilize an internal model of the vehicle, which he uses to predict the influence of his inputs. There is biological evidence for these assumptions [6, 16, 30, 33]. These assumptions have also been successfully used to describe human plant operators [19] and are an essential aspect of several driver models [22]. Mathematically, this behavior can be described as an optimizer, respectively in engineering as a model predictive controller (MPC). Analogous to the driver this model is also used as the basic framework for the ADAS. The vehicle dynamics can simply be modeled using a differential equation. With two optimizers influencing a common system represented by a differential equation, the result is a differential game (see Fig. 2.2). For the given application, the driver and the ADAS communicate via the pedal torques. This allows no binding agreements between the players. Instead, it is modeled that there is an implicit agreement between the two players which can be formalized by the well-known Nash equilibrium [29]. Furthermore, there have been several experiments showing that a human that is haptically coupled with a second agent results in a Nash equilibrium [5]. Based on this, the problem of Fig. 2.2 can be stated as ∀i ∈ {A, D} :

∗ Mi∗ = arg min Ji (Mi , M¬i , x, t)

(2.1a)

Mi

with respect to

 x˙ = f MA , MD , x, t

 c≤g x x (0) = x 0 .

(2.1b) (2.1c) (2.1d)

2 Differential-Game-Based Driver Assistance System for Fuel-Optimal Driving

17

n The objective functions Ji (Mi , M¬i , x, t) the goals

: R × R × R × R → R model of the driver D and the ADAS A and f MA , MD , x, t : R × R × Rn × R → Rn describes the system dynamics, i.e. the longitudinal vehicle dynamics. Equilibrium solutions are marked with a star. The inequality constraints (2.1c) model the physical limitations of the pedal angle, the motor torque and speed. They also handle the practical limitation that the ADAS is designed in a way that it is only able to apply a force against the human foot. Otherwise, the driver would no longer be capable to control the vehicle, since he could not counteract an accelerating ADAS.

2.2.1 Models In this subsection, the vehicle model and the driver’s objective function are described in more detail.

2.2.1.1 Vehicle Model The vehicle model describes the lateral dynamics of the vehicle, the pedal system, dynamics of the human neuromuscular system and the ADAS actor. The state space vector of the model is given by  

θ θ˙ M ω  . x = MA MD e Here, θ is the angle of the pedal and ω the engine speed. Me denotes the torque

are used to describe that of the engine. The effective pedal torques MA and MD neither the human neuromuscular system nor the ADAS actor can instantaneously

and M are the output of a first order lag change the torque. To model this, MD A element with the desired pedal torque of the driver MD respectively ADAS MA as input. Note that the pedal torques can easily be converted to pedal forces. With these system states, the system model is: ⎡ ⎤ ⎡ −1

(t) 0 0 0 M˙ D τD −1 ⎢ M˙ (t) ⎥ ⎢ 0 0 0 τA ⎢ A ⎥ ⎢ ⎢ ˙ ⎥ ⎢ 0 0 0 1 ⎢ θ (t) ⎥ ⎢ ⎢ ¨ ⎥ =⎢ 2 2 k −ω2 ω k ω −2d ⎢ θ (t) ⎥ ⎢ P 0,P P P ω0,P 0,P ⎢ ⎥ ⎢ 0,P kM ⎣ M˙ e (t) ⎦ ⎢ 0 0 ⎣ 0 τM ω(t) ˙ 0 0 0 0   A(iG (t ))

0 0 0 0

0 0 0 0 0

−1 τM −kr 1 ΘG (iG (t )) ΘG (iG (t ))

⎤ ⎥ ⎥ ⎥ ⎥ ⎥ x(t) · · · ⎥ ⎥ ⎥ ⎦ 

18

M. Flad



1 τF

0





⎢0 1⎥ ⎢ ⎢ τA ⎥  ⎢ ⎢ ⎢ ⎥ ⎢ 0 0 ⎥ MD (t) ⎢ +⎢ +⎢ ⎥ ⎢ 0 0 ⎥ MA (t) ⎢ ⎢ ⎢ ⎥ ⎣ 0 0 ⎦    ⎣ u(t ) MS − MR − 0 0    

0 0 0 0 0 Fs (t )+FL (v(t ))+FR (v(t )) iG (t )iD rR



B z (v(t ),iG(t ))

B

⎤ ⎥ ⎥ ⎥ ⎥ ⎥. ⎥ ⎥ ⎦  (2.2)

Slip is not modeled as its influence is negligible for usual longitudinal maneuvers. Thus, the relation to the vehicle speed v(t) is given by ! v(t) = 0 0 0 0 0

rR iG (t )iD

" x(t)

(2.3)

with iG (t) the shiftable transition ratio of the main gear and iD the fixed transition ratio of the differential gear. For the simulation, the gear shifting strategy of the vehicle simulation software is used which upshifts and downshifts at gear dependent engine rotation speed thresholds. However, the gear shifting strategy is not important for the ADAS algorithm as long as it is known. Fs is the climbing resistance, FL the air resistance and FR the rolling resistance. According to [34], these resistive forces depend on the vehicle position or velocity. ΘG is the total inertia of the vehicle calculated to be effective at the engine shaft. All other parameters of the model, including their values fitted for the reference vehicle used in the study in Sect. 2.4, are given in Table 2.3 in the appendix.

2.2.1.2 Driver Objective Function The driver’s objectives are modeled using the quadratic objective function # JD (·) =

x − x r,D



 2 QD x − x r,D + MD dt.

(2.4)

Here, the matrix QD and the reference trajectory x r,D are driver individual parameters. Previous studies both for lateral and longitudinal vehicle control have shown that this objective function (2.4) is suited to describe the driving behavior [9, 15, 24].

2 Differential-Game-Based Driver Assistance System for Fuel-Optimal Driving

19

2.3 Shared Control Driver Assistance System The goal of the ADAS proposed in this paper is to minimize the fuel consumption, thus the ADAS should apply the optimal pedal torque with respect to fuel saving. Hence, the ADAS objective function should be # JA,G =

 αMA2 + hξ x, t dt.

(2.5)

With the function hξ determining the specific fuel consumption of the vehicle (technically this function is modeled using characteristic maps that can be determined on an engine test bed) and a design parameter α that scales the weighting of the squared ADAS torque. For α = 0, the ADAS aims for perfect fuel optimal driving. However, in this case the ADAS would also overrule the driver’s intentions completely, which is in general disliked by drivers. For a higher α, the ADAS will act less dominant but this will also reduce the ADAS’ effect on fuel-efficiency. Since the driver and ADAS control the vehicle together (see Fig. 2.2), the ADAS should also consider the driver’s input for determining its control inputs. From a game theoretical point, this can be interpreted as the calculation of a Stackelberg equilibrium. However, the motivation for this problem statement is not a predetermined sequence of the ADAS and driver decisions. Instead, the aim is to design an ADAS that anticipates the driver behavior and adapts as good as possible. The problem formulation for the ADAS is Mˆ A∗ = arg min MA

#

 αMA2 + hξ x, t dt

(2.6a)

with respect to ∗ MD = arg min JD (MD , MA , x, t)

(2.6b)

MD

 x˙ = f MA , MD , x, t

 c≤g x x (0) = x 0 .

(2.6c) (2.6d) (2.6e)

The detailed equations for the driver and the vehicle model are given in Sect. 2.2. Note that the optimal solution with respect to (2.5) is marked with a hat. The solution of (2.6) is in general not a Nash equilibrium. Theoretically, the ADAS torque could be calculated by simply solving problem (2.6). However, an algorithm that can do this in real-time on an embedded platform is not known to exist. Therefore, a substantial goal is that the ADAS can be implemented on state of the art vehicle hardware without significant additional costs. Thus, problem (2.6) is not solved directly. The idea is to determine a substitute

20

M. Flad

objective function for the ADAS objective function (2.5) named JA (·) for which the Nash equilibrium between JA (·) and JD (·) (more precisely the element MA∗ of this tuple) is identical to the solution of (2.6). The essential benefit of this approach is that it simplifies the original problem (2.6) to the problem of finding a Nash equilibrium, which is a common game theoretic problem and for the given problem comparatively easy to compute. This approach has three steps and in every step a game, respectively optimization problem, needs to be solved: Step 1 Step 2 Step 3

Determine the objective function of the driver JD (·). Find a substitution for the objective function (2.5) that simplifies the problem to calculate a Nash equilibrium. Calculate this Nash equilibrium in real-time.

These three steps are described in the following subsections.

2.3.1 Step 1: Identification of the Driver’s Objective Function For all further steps, it is essential to know the driver’s objective function JD (·). A quadratic objective function structure is used here to model the driver (see Sect. 2.2.1.2 and (2.4)). This reduces the problem to determine the driver specific weighting matrix QD and the reference trajectory x r,D . Although it would theoretically be possible to determine a model that can describe driver’s individual reference trajectories, for example with the approach described in [15], this would require to collect excessive data from a driver. As this expenditure of time cannot be handled in a study on a driving simulator, it was decided to define a generic reference for all drivers instead. The reference for the speed is set to the of the legal speed limit and, if it is lower, the possible cornering speed (at a maximum lateral acceleration of 4 m/s2 ). The reference for all other states is set to 0. The matrix QD is individually identified for each driver. In order to do so, the driver performs several driving scenarios without ADAS support. The state trajectories x D and input trajectory M D of these runs are saved. Based on this data, the driver objective function is identified using the bi-level inverse optimal control approach proposed in [23]. # arg min QD

  2

x − x D + MD − M D dt. x − xD

(2.7a)

with respect to # MD = arg min MD





2 dt. x − x r,D QD x − x r,D + MD

 x˙ = f MD , x, t

(2.7b) (2.7c)

2 Differential-Game-Based Driver Assistance System for Fuel-Optimal Driving

 c≤g x

21

(2.7d)

x (0) = x 0 .

(2.7e)

where QD is a positive semi-definite matrix. Both, the high-level parameter optimization problem (2.7a) and the low-level dynamic optimization problem (2.7b) can easily be solved using state of the art algorithms. A significant disadvantage of the bi-level approach is the required calculation time. In general, there exist mathematically more elegant and numerical efficient algorithms to solve inverse dynamic optimization problems (e.g. see [20]), however no other algorithm is know that can practically solve problem (2.7) due to the constraints (2.7d) and the nonlinear vehicle dynamics (2.7c).

2.3.2 Step 2: Calculation of an Optimal ADAS Objective Function As stated in the introduction, the idea is to substitute (2.6) by the problem of calculating a Nash equilibrium. To do this, it is required to calculate an ADAS objective function JA (·) for which the game (2.1) yields a Nash equilibrium which is equal to the solution of the original ADAS problem (2.6). Formally, ! Mˆ A∗ = MA∗ .

Just as for the driver model, a quadratic structure is used for the ADAS objective function # 



(2.8) JA (·) = x − x r,A Q∗A x − x r,A + MA2 dt. The reference for the ADAS x r,A is set as the fuel-optimal velocity trajectory. This trajectory is calculated by solving problem (2.6), with α = 0 and MD = 0 but with the limit of a physically feasible cornering speed. The optimal weighting matrix Q∗A is individually fitted for each driver respectively JD (·). It is the solution of the following optimization problem: Q∗A = arg min

#

QA

 αMA2 + hξ x, t dt

(2.9a)

with respect to ∀i ∈ {A, D} :

Mi∗

# = arg min Mi





x − x r,i Q∗i x − x r,i + Ri∗ Mi2 dt (2.9b)

22

M. Flad



x˙ = f MA , MD , x, t

 c≤g x x (0) = x 0 .

(2.9c) (2.9d) (2.9e)

In analogy to the identification problem, this calculation is performed for a training scenario. Just as the bi-level problem of the previous section, this problem is a nested optimization problem. The high-level optimization problem (2.9a) is an ordinary parameter optimization problem, where QA is a positive semi-definite and symmetric matrix. These constraints are for the sake of clarity not formalized in (2.9). Neglecting (2.9a), the problem (2.9b)–(2.9e) is a differential game which is named here as low-level problem. To solve (2.9), it is required for every iteration of the parameter optimization problem (2.9a) to calculate a Nash equilibrium of the differential game ((2.9b)–(2.9e)). This is different from the classical bi-level approach, whose low-level problem is in general an ordinary (one player) dynamic optimization problem. The parameter optimization problem can be solved numerically with ordinary parameter optimization algorithms. In this paper, first a genetic search algorithm [14] and then in a second step a sequential quadratic programming [31, Chapter 18] is used. The calculation of the Nash equilibrium (the low-level problem) is done with the algorithm shown in the next subsection.

2.3.3 Step 3: Calculation of the Nash Equilibrium In the last step, the Nash equilibrium is calculated between the driver (JD (·)) and the ADAS, whose objective function JD (·) was determined in the previous step. The calculations of step 1 and 2 can be done offline and thus without any restriction of the computational. However, the following calculation needs to be performed on the quite limited vehicle on-board ADAS controller under real-time conditions.

2.3.3.1 Nash Equilibrium Problem Definition The problem is: ∀i ∈ {A, D} :

Mi∗

#

T

= arg min Mi

0



 x − x r,i Q∗i x − x r,i + Ri∗ Mi2 dt (2.10a)

2 Differential-Game-Based Driver Assistance System for Fuel-Optimal Driving

23

with respect to   x˙ = A (iG ) + B MD MA + B z (v, iG )

 c≤g x

(2.10b) (2.10c)

x (0) = x 0 .

(2.10d)

In contrast to step 1 and 2, where the calculations are computed for an entire identification or training scenario, this calculation is solved on a moving information horizon like in model predictive control (MPC). In contrast to a model predictive controller, the dynamic optimization problem is a differential game problem. For the longitudinal steering problem, the perdition horizon T is set to 10 s and the problem is recalculated every 50 ms. The approach to solve (2.10) is to linearize the system dynamics (2.10b) and consider the constraints (2.10c) using barrier functions.

2.3.3.2 Solving the Unconstrained Problem For the first step, it is assumed that the vehicle velocity v(t) is known and thereby also the transmission ratio iG (t) over the horizon T . With this information, it is possible to linearize the system dynamics (2.10b) and perform a time discretization. This yields the affine discrete state space model x (k) = Φ (k) x (k−1) + H z,(k) + H A,(k)MA,(k) + H D,(k) MD,(k) .

(2.11)

By using the batch-approach [4], the system equation for the entire horizon can be stated as x = Φx (0) + H z z + H A MA + H D MD

(2.12)

with ⎡ ⎢ ⎢ Hi = ⎢ ⎢ ⎣

H i,(1)

0

···

Φ (1) H i,(1) .. .

H i,(2) .. .

..

Φ (m−1) · · · Φ (1) H i,(1) Φ (m−2) · · · Φ (1) H i,(1) ⎤ Φ (1) ⎢ ⎥ Φ (2)Φ (1) ⎢ ⎥ Φ=⎢ ⎥ .. ⎣ ⎦ . Φ (m) · · · Φ (2) Φ (1)

0 .. .

. 0 · · · H i,(m)

⎤ ⎥ ⎥ ⎥ ⎥ ⎦

(2.13)



(2.14)

24

M. Flad

Table 2.1 Sampling time intervals used to discretize the system dynamics ΔT1 –ΔT5 10 ms

ΔT6 –ΔT10 50 ms

ΔT11 , ΔT12 100 ms

ΔT13 –ΔT16 750 ms

ΔT17 –ΔT20 1s

ΔT21 , ΔT22 1, 25 s

and the vectors " !  · · · x x x = x (1) (2) (m)

(2.15)

  z = 1 ··· 1   Mi = Mi,(1) Mi,(2) · · · Mi,(m)

(2.16) (2.17)

and m the discrete length of the horizon T . However, in contrast to the common procedure, the time resolution is not equally spaced. The used sampling times ΔTk are given in Table 2.1. This uncommon approach was used to optimize the trade-off between a sufficiently long prediction horizon T to include future influences on the speed, a sufficiently continuous pedal torque MA (discontinuity would be disliked by the drivers) and the calculation time. With (2.12) and in discrete time, both objective functions can be transformed to include the system dynamics. Hence, they only depend on the ADAS and driver pedal torque Ji (MA , MD ). This expression can be used to state the necessary condition for a Nash equilibrium for each objective function respectively player ∂Ji (MA , MD ) ! = C ii Mi + ci + C i¬i M¬i = 0. ∂ui

(2.18)

If (2.18) is calculated for both players and then combined in a matrix equation, the Nash equilibrium (and thus the desired Mˆ A∗ ) is given by $

%  ˆ ∗A C AA C AD M =− ∗ C DA C DD MD

−1 

cA . cD

(2.19)

The matrices C AA , C AD , C DD , C DA and the vectors cA , cD can be calculated straightforward from (2.12) and the objective functions. The resulting expressions are quite cumbersome and hence not included in the article. However, the expressions calculated for a general problem setup (going beyond the longitudinal steering problem of this paper) can be found in [8]. It is also possible to explicitly calculate the Hessian matrix of this problem which is always positive definite under the assumptions of step 1 and step 2 (the proof is given in [8, page 90f]). Therefore, (2.19) is a necessary and sufficient condition for a Nash equilibrium. Note: As an alternative to (2.19), it is possible to use the Hamiltonian method and solve a set of coupled Riccati equations [7, 21]. Especially for long prediction horizons T , the Hamilton method should be preferred as its computational effort

2 Differential-Game-Based Driver Assistance System for Fuel-Optimal Driving

25

scales only linearly with T , whereas for the concept used in this article, the computation effort scales with the third power of T , respectively more precisely with the third power of m. However, as the handling of the constraints is easier to implement and the overall implementation shows to be more accessible to other domains, the concept described above is chosen.

2.3.3.3 Including the Constraints Unfortunately, (2.19) does not consider the constraints (2.10c). Therefore, the constraints (2.10c) are included using the well-known barrier function approach [31]. If for a time step k one of the constraints is violated, a barrier function term c(k) is added to both objective functions. 



c(k) = x (k) − x δ(k) Qδ(k) x (k) − x δ(k)

(2.20)

With x δ(k) denoting the upper (respectively lower) boundary of the constraint that is violated. Qδ(k) is a zero-matrix excluding the entries on the main diagonal representing violated constraints. If a constraint is violated by more than a tolerance of 0.5% but already included in (2.20), the corresponding weighting factor in the matrix Qδ(k) is increased. If a constraint is no longer violated in a later iteration, it is excluded from (2.20). This extension of the objective function (2.20) can be directly included in the calculation (2.19). As this is an outer point method, the constraints handled by the method are strictly speaking never fulfilled exactly, however this is negligible for this application. Note: The mathematical more elegant interior point method cannot be combined efficiently with (2.19) and therefore cannot be used here.

2.3.3.4 Overall Algorithm The overall algorithm to calculate the Nash equilibrium is depicted in Fig. 2.3. It combines the aforementioned steps of linearization, calculates the equilibrium of the substitution problem, handles the constraints and calculates a new velocity trajectory. Simulations showed that 10 iterations are sufficient for the algorithm to converge, thus, to be on the safe side, the stopping criterion is set to 15 iterations. On the hardware platform described in the next section, the entire algorithm and thus solving problem (2.10) requires about 6 ms (without any code optimization). The proposed ADAS concept and algorithm of this section has been implemented using common hardware. For step 1 and 2, an ordinary PC was used and for step 3 a real-time hardware-in-the-loop system.

26

M. Flad

Start

Initialization

Adaption of barrier functions

Calculation of the (gradient) matrices

Calculate vehicle velocity and linearize system dynamics

Calculation of the Nash-equilibrium using (19)

No

Stopping criterion fulfilled?

Yes End Fig. 2.3 Algorithm for calculating the Nash equilibrium in step 3

2.4 Driving Study To evaluate the proposed cooperative ADAS concept, a small driving study with 10 participants was performed. The primary question of the study is whether the ADAS can support the driver towards a more fuel-efficient driving strategy. Formally, the stated hypothesis is: “the fuel consumption with ADAS support is lower than without support”.

2 Differential-Game-Based Driver Assistance System for Fuel-Optimal Driving

27

2.4.1 Hardware Setup For this study, the static driving simulator of the IRS is used. For the vehicle dynamics and the visualization of the driving scene, the commercial vehicle simulation framework CarMaker from IPG Automotive is used. A validated nonlinear model of a VW Gold GTD was used. This simulation is executed on a dSPACE DS1006 real-time system, which also runs the ADAS algorithms presented in the previous sections. The system is connected with a SENSO-Wheel SD-LC used here as an active steering wheel. For the interaction with the driver, the system is connected to an active accelerator pedal which realizes the ADAS pedal force MA and measures the driver pedal force MD . The pedal was developed in-house and introduced in [12]. The mock-up of the driving simulator is depicted in Fig. 2.4. A detailed description of the hardware and software can be found in [8].

2.4.2 Study-Setup The driving studies consist of two parts. First, a training and identification phase and second the actual evaluation phase. The training phase is a free driving scenario without any ADAS support in which the drivers can accustom themselves to the driving simulator and the simulated car. At the end of this scenario, data was

Fig. 2.4 Static driving simulator which was used for the study

28

M. Flad

0 Waldbronn −2.000

Schoellbronn

y-axis in m

−4.000 −6.000

Marxzell

−8.000 Schielberg

−10.000 −12.000 Bad Herrenalb −14.000

−8.000 −6.000 −4.000 −2.000 x-axis in m

0

2.000

Fig. 2.5 Real word highway scenario south of Karlsruhe in Germany that was used in the driving study

recorded to identify the parameters of the driver’s objective function JD (·) for each driver as described in Sect. 2.3.1. For the evaluation, a 25 km long highway road scenario which included four sections through towns is used. This scenario is a real German road a few kilometers south of Karlsruhe and depicted in Fig. 2.5. In Fig. 2.6, the legal speed limits for the scenario are depicted. To illustrate the physical speed limit caused by the curvature of the road, the figure also depicts the maximum speed that could be driven with a maximum lateral acceleration of aq = 5 m/s2 if this limit is below the legal limit. In addition, the figure shows the fuel-optimal speed trajectory. This trajectory is calculated under the constraint of a maximum lateral acceleration of aq = 5 m/s2 and the constraint that overspeeding is limited to 15% of the speed-limit. For this trajectory, the fuel consumption of the car for the total scenario is 2.3 l. To receive comparable results, the other traffic in the simulation has been scripted to not intervene with the ego-vehicle meaning there is never a leading vehicle in front of the ego-car that had to be considered by the driver. All participants perform this scenario three times with different configuration of the ADAS system: 1. No ADAS support (reference) 2. Soft ADAS 3. Dominant ADAS

2 Differential-Game-Based Driver Assistance System for Fuel-Optimal Driving

29

Velocity in km/h

100

50 Physical limit if max aq = 5 m/s2 Energy optimal 0

0

5

10 15 Distance in km

Speed-limit 20

Fig. 2.6 Legal speed-limit, max possible velocity with a lateral acceleration of up to 5 m/s2 (demonstration of the curvature of the scenario) and energy optimal driving trajectory (under consideration of physical-limitations of the vehicles and road and the legal speed-limit) for the scenario depicted in Fig. 2.5

The two different ADAS configurations only differ in the parameter α that was used in (2.9a) to determine the driver’s individual objective of the ADAS algorithm. For the soft ADAS configuration, the parameter is four times the parameter of dominant ADAS configuration (αsof t = 4αdom ). This means the soft ADAS configuration will trade smaller pedal forces at the cost of a lower fuel saving and vice versa. The sequence in which a participant performs these three configurations has been randomized in order to eliminate possible learning effects. The drivers were neither told about the order of configurations nor the details or purpose of the ADAS system. In addition to objective measures, each participant was asked to complete a questionnaire to get a first impression of the human acceptance of such kind of system.

2.4.3 Results The average fuel consumption for the reference configuration without support is 2.801, 2.62 l (−6.6%) for the soft ADAS and 2.51 l (−10.3%) for the dominant ADAS. The primary hypothesis that the ADAS increase the fuel economy can be approved. The p-values of the paired t-test for the tailed null hypothesis that the mean of fuel consumption without support is equal or smaller than the ADAS configurations are 0.007 and 0.004. Table 2.2 shows the detailed results. It also shows the RMS values of the pedal force for both the human driver and the ADAS. In Fig. 2.7, the corresponding trajectories of two participants are depicted for a section of the scenario with 2.5 km. The subjective evaluation of all participants is shown in the box-plot of Figs. 2.8, 2.9, and 2.10. In Fig. 2.8, the drivers have been

30

M. Flad

Table 2.2 Fuel consumption ξ , RMS of driver pedal MD forces and RMS of ADAS pedal forces MA for all 10 participants (P1–P10) and all three configurations (driver without any supporting system as reference and the two ADAS configurations)

P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 ø

No ADAS ξ MD (l) (Nm) 2.90 4.39 3.27 4.76 2.62 3.70 2.88 5.67 2.48 5.73 3.00 3.25 2.92 4.66 2.48 3.63 2.59 4.99 2.88 6.78 2.80 4.76

Soft ADAS ξ MD (l) (Nm) 2.69 5.02 3.07 5.93 2.64 6.25 2.35 6.12 2.38 6.35 2.57 5.26 2.65 6.28 2.51 4.97 2.36 7.11 2.94 7.65 2.62 6.09

MA (Nm) 2.82 2.39 2.41 2.01 3.55 2.59 2.78 2.69 1.07 2.19 2.45

Dominat ADAS ξ MD (l) (Nm) 2.42 9.82 2.54 10.96 2.49 6.17 2.52 8.95 2.39 9.15 2.62 6.35 2.54 8.22 2.74 10.88 2.27 5.29 2.63 7.53 2.51 8.33

MA (Nm) 7.75 10.67 4.79 7.55 10.48 4.95 5.97 10.59 1.61 3.39 6.78

asked whether they rate the system as helpful (10) or either distracting (0). The driver’s perception, which partner (the driver or the ADAS) is in control, was asked in the question related to Fig. 2.9. Finally, the drivers were asked to rate the ADAS in Fig. 2.10 using marks, from excellent (1) to inadequate (6).

2.4.4 Discussion Overall, the results justify the conclusion that the ADAS system fulfils its objectives as even for the comparably small amount of participants the hypothesis that the system is able to save fuel is approved. Saving up to 10.3% of fuel is a great result in the context of automotive development as other technical concepts that can achieve similar results (like hybrid electric cars) require a multiple of costs compared to those of the proposed ADAS. The total amount of fuel a driver may save with the assistance system depends on his own performance. If a driver has a bad performance without support, he benefits greatly (like P2). If the participant drives already very well, the system can only slightly increase the performance (see P5). For P8, the fuel consumption with the ADAS even slightly increases. The driver’s pedal force MD increases in the soft ADAS configuration and even more in the dominant ADAS configuration as the driver needs to counteract the ADAS force MA if he is not willing to follow the ADAS’ driving style entirely. For example, if a driver does not want to roll out before a significant lower speed limit, he needs to overrule the ADAS by applying a strong enough pedal torque (e.g. see Fig. 2.7 P7 at 3.25 km). Very interesting is the result of the subjective evaluation of the system. While the soft ADAS configuration is rated significantly better than the reference

2 Differential-Game-Based Driver Assistance System for Fuel-Optimal Driving

Participant P7 120

Speed in km/h

100

no ADAS soft ADAS

31

Participant P9

dom. ADAS s-limit

80 60 40 20

Driver torque in Nm

0 20 15 10 5

ADAS torque in Nm

0 -20 -15 -10 -5 0

int. cons. in l

optimal 0,6 0,4 0,2 0

3

4 Distance in km

5

3

4

5

Distance in km

Fig. 2.7 Trajectories for two exemplary participants for the scenario section between 2.5 and 5 km. The reference configuration without ADAS support is depicted in black, the blue trajectories represent the soft ADAS configuration and the red the dominant ADAS configuration. The green dotted line represents the speed limits. The two bottom pictures show the accumulated fuel consumption where the green line shows the theoretically optimal solution

32 Fig. 2.8 Box plot of subjective ratings of the usefulness of the ADAS input from 10 (very helpful) to 1 (disruptive)

M. Flad

10

5

0

Fig. 2.9 Box plot of subjective ratings of the perceived control authority from 10 (the driver is in full control) to 1 (the ADAS is in full control of the car’s longitudinal velocity)

no ADAS

soft ADAS

dom. ADAS

no ADAS

soft ADAS

dom. ADAS

10

5

0

Fig. 2.10 Box plot of subjective ratings of the different configurations using marks 1 (excellent) to 6 (inadequate)

6

4

2

no ADAS

soft ADAS

dom. ADAS

2 Differential-Game-Based Driver Assistance System for Fuel-Optimal Driving

33

configuration without the ADAS, the dominant ADAS configuration is rated significantly worse than the reference. This yields the conclusion that drivers would even prefer a vehicle without a support system over a car with the dominant ADAS configuration. The drivers rate the input of this configuration as too strong and even disruptive and perceive that they are no longer in control of the vehicle (this perception is wrong as the system can be overruled with forces that a driver can constantly apply). Thus, there is a conflict between objective performance (fuel saving) and subjective performance (ratings of acceptance). If the ADAS forces are too high (which would yield good objective performances), humans dislike the system. This conflict also exists in other shared control ADAS that are not based on a game theoretic framework like [26, 32]. However, a big advantage of this game theoretic approach is that the trade-off between objective control performance and subjective acceptance by the users can be set using a single parameter. This parameter is α which balances the weighting of fuel economization and ADAS torque in the optimization problem that determines the ADAS parameters (see Sect. 2.3.2 for details). Other ADAS concepts would require adapting multiple less intuitive controller parameters or characteristic maps. Furthermore, it would even be possible to let a driver change the α himself to adapt the ADAS to his personal requirements.

2.5 Conclusion and Future Perspectives The article shows how cooperation between a driver and the automation in a shared vehicle control scenario can be modeled using a differential game. This model is a suitable foundation to design an ADAS. To implement this concept, it is required to solve an inverse dynamic problem, a parameter optimization problem with a differential game as a constraint and the calculation of a Nash equilibrium in real-time. The driving study demonstrates that the ADAS concept also works in a practical application and can save up to 10% of fuel. As other shared control ADAS, the study shows a conflict between user acceptance and objective performance. However, in this design, the trade-off can be adjusted with a single parameter. From an engineering point of view, the next step is to implement the system in a real car. The required computational power is comparatively low (cf. the used RT hardware is more than 10 years old and its resources are primarily used to calculate the nonlinear vehicle model the driving simulator is based on). In addition, the safeguarding requirements are significantly lower than for a highly automated vehicle as the ADAS only applies a counter force on the acceleration pedal, which can be overruled by the driver if needed. Furthermore, the system only requires a global navigation satellite system receiver as sensor. From a scientific point of view, a next step should be to investigate the inverse optimal control problem. In this work, the driver’s objective function is calculated using trajectories recoded from driving without an assistance system. However, the driver’s objective may slightly change if he is supported by an assistance system.

34

M. Flad

In this scenario, it would be necessary to solve an inverse dynamic game instead of an inverse dynamic optimization problem. Ideally, algorithms should be designed such that they can solve the inverse problem in real-time as this would allow to design systems able to adapt online to the current driver and his actual condition. As a long time perspective, it would be beneficial to do research towards algorithms, which can calculate “solutions” like the Nash equilibrium for (ideally general and constrained) dynamic game problems in real time. This would have the potential to apply the concept of using game theory to a wider range of cooperative human machine scenarios and would greatly reduce the design efforts of game theory based systems, as there would no longer be the need to design specific algorithms like the one in this paper.

Appendix See Table 2.3. Table 2.3 Fitted parameter for the reference vehicle A cw ρ rR mg ΘM ΘK ΘG,ein ΘG,aus ΘD,ein ΘD,aus ΘR,v ΘR,h iD iG

Reference surface Drag coefficient Air density Dyn. wheel radius Vehicle mass Inertia of engine Inertia of clutch Inertia gear box (engine) Inertia gear box (output) Inertia differential gear (input) Inertia differential gear (output) Inertia wheel (input) Inertia wheel (output) Transmission ratio differential Transmission ratios of the six gears

kM kr MS MR τM τA τD kp dp ω0,P

Friction motor (factor) Friction drive train (factor) Drag torque Friction drive train (offset) Time constant motor Time constant ADAS (actor) Time constant driver (NMS) Spring constant of pedal Damping constant of pedal Characteristic frequency pedal

2.15 m2 0.3 1.205 kg/m3 0.297 m 1634 kg 0.168 kg m2 0.2 kg m2 0.001 kg m2 0.037 kg m2 0.001 kg m2 0.001 kg m2 1.661 kg m2 1.609 kg m2 0.599 3.769, 2.086, 1.323, 0.918, 0.714, 0.597 420.8 Nm 0 Nm s/rad −72.46 Nm 0 Nm 100 ms 2.5 ms 100 ms 6.65·10−3 1/Nm 0.49 Nm s/rad 5.34 rad/s

2 Differential-Game-Based Driver Assistance System for Fuel-Optimal Driving

35

References 1. Abbink, D.A.: Neuromuscular analysis of haptic gas pedal feedback during car following. Doctoral Thesis, Delft University of Technology (2006) 2. Automation from driver assistance systems to automated driving. Tech. rep., Verband der Automobilindustrie (2015) 3. Bainbridge, L.: Ironies of automation. Automatica 19, 775–779 (1983) 4. Borrelli, F., Bemporad, A., Morari, M.: Predictive control for linear and hybrid systems (2014). http://control.ee.ethz.ch/stdavid/BBMbookCambridgenewstyle.pdf 5. Braun, D., Ortega, P., Wolpert, D.: Nash equilibria in multi-agent motor interactions. PLoS Comput. Biol. 5(8) (2009). https://doi.org/10.1371/journal.pcbi.1000468 6. Chow, C., Jacobson, D.: Studies of human locomotion via optimal programming. Math. Biosci. 10(34), 239–306 (1971) 7. Engwerda, J.: LQ Dynamic Optimization and Differential Games. Wiley, Chichester (2005) 8. Flad, M.: Kooperative regelungskonzepte auf basis der spieltheorie und deren anwendung auf fahrerassistenzsysteme. Ph.D. Thesis, Karlsruher Institut fr Technologie (KIT) (2016). https:// doi.org/10.5445/IR/1000062759 9. Flad, M., Trautmann, C., Diehm, G., Hohmann, S.: Experimental validation of a driver steering model based on switching of driver specific primitives. In: 2013 IEEE International Conference on Systems, Man, and Cybernetics (SMC), pp. 214–220 (2013). https://doi.org/10.1109/SMC. 2013.43 10. Flad, M., Otten, J., Schwab, S., Hohmann, S.: Necessary and sufficient conditions for the design of cooperative shared control. In: IEEE International Conference on Systems, Man, and Cybernetics (2014) 11. Flad, M., Otten, J., Schwab, S., Hohmann, S.: Steering driver assistance system: a systematic cooperative shared control design approach. In: IEEE International Conference on Systems, Man, and Cybernetics, pp. 3585–3592 (2014) 12. Flad, M., Rothfuss, S., Diehm, G., Hohmann, S.: Active brake pedal feedback simulator based on electric drive. SAE Int. J. Passeng. Cars Electron. Electr. Syst. 7(1), 189–200 (2014) 13. Flemisch, F., Kelsch, J., Lper, C., Schieben, A., Schindler, J., Heesen, M.: Cooperative control and active interfaces for vehicle assistance and automation. In: FISITA World automotive Congress, Munich (2008) 14. Goldberg, D.E.: Genetic Algorithms in Search, Optimization & Machine Learning. AddisonWesley, Boston (1989) 15. Gote, C., Flad, M., Hohmann, S.: Driver characterization and driver specific trajectory planning: an inverse optimal control approach. In: 2014 IEEE International Conference on Systems, Man and Cybernetics (SMC), pp. 3014–3021 (2014). https://doi.org/10.1109/SMC. 2014.6974389 16. Hatze, H.: The complete optimization of a human motion. Math. Biosci. 28(12), 99–135 (1976) 17. Hayashi, Y.: Study on acceleration and deceleration maneuver guidance for driver by gas pedal reaction force control. In: 13th International IEEE Annual Conference on Intelligent Transportation Systems (2010) 18. Hjaelmdahl, M., Almqvist, S., Varhelyi, A.: Speed regulation by in car active accelerator pedal: effects on speed and speed distribution. IATSS Res. 26(2), 60–66 (2002) 19. Jagacinski, R., Flach, J.: Control Theory for Humans: Quantitative Approaches to Modeling Performance. Erlbaum, Mahwah (2009) 20. Johnson, M., Aghasadeghi, N., Bretl, T.: Inverse optimal control for deterministic continuoustime nonlinear systems. In: 52nd IEEE Conference on Decision and Control, pp. 2906–2913 (2013). https://doi.org/10.1109/CDC.2013.6760325 21. Ludwig, J., Gote, C., Flad, M., Hohmann, S.: Cooperative dynamic vehicle control allocation using time-variant differential games. In: 2017 IEEE International Conference on Systems, Man, and Cybernetics (SMC), pp. 117–122 (2017). https://doi.org/10.1109/SMC. 2017.8122588

36

M. Flad

22. MacAdam, C.: Understanding and modeling the human driver. Veh. Syst. Dyn. 40(1–3), 101– 134 (2003) 23. Mombaur, K., Truong, A., Laumond, J.P.: From human to humanoid locomotion an inverse optimal control approach. Auton. Robot. 28(3), 369–383 (2010). https://doi.org/10.1007/ s10514-009-9170-7 24. Mosbach, S., Flad, M., Hohmann, S.: Cooperative longitudinal driver assistance system based on shared control. In: 2017 IEEE International Conference on Systems, Man, and Cybernetics (SMC), pp. 1776–1781 (2017). https://doi.org/10.1109/SMC.2017.8122873 25. Mulder, M.: Haptic Gas Pedal Feedback for Active Car-Following Support. Delft University of Technology, Doktorarbeit (2007) 26. Mulder, M., Abbink, D., van Paassen, M., Mulder, M.: Design of a haptic gas pedal for active car-following support. IEEE Trans. Intell. Transp. Syst. 12(1), 268–279 (2011) 27. Na, X., Cole, D.J.: Linear quadratic game and non-cooperative predictive methods for potential application to modelling driverafs interactive steering control. Veh. Syst. Dyn. 51(2), 165–198 (2013) 28. Na, X., Cole, D.J.: Game theoretic modelling of a human drivers steering interaction with vehicle active steering collision avoidance system. IEEE Trans. Hum.-Mach. Syst. 45(1), 25– 38 (2015) 29. Nash, J.: Non-cooperative games. Ann. Math. 2, 286–295 (1951) 30. Nelson, W.: Physical principles for economies of skilled movements. Biol. Cybern. 46(2), 135– 147 (1983) 31. Nocedal, J., Wright, S.: Numerical Optimization, 2nd edn. Springer, New York (2006) 32. Petermeijer, S., Abbink, D., Mulder, M., deWinter, J.: The effect of haptic support systems on driver performance: a literature survey. IEEE Trans. Haptic 8(4), 467–479 (2015) 33. Rasmussen, J.: Skills, rules, and knowledge; signals, signs, and symbols, and other distinctions in human performance models. IEEE Trans. Syst. Man Cybern. 13(3), 257–266 (1983) 34. Society of Automotive Engineers (SAE): Stepwise coastdown methodology for measuring tire rolling resistance SAE J 2452 (2008) 35. Tamaddoni, S., Ahmadian, M., Taheri, S.: Optimal vehicle stability control design based on preview game theory concept. In: American Control Conference (ACC), pp. 5249–5254 (2011)

Chapter 3

On the Selection of the Nash Equilibria in a Linear-Quadratic Differential Game of Pollution Control Ekaterina Gromova and Yulia Lakhina

Abstract The work is devoted to the problem of the selection of Nash equilibrium in non-cooperative differential games with an n-dimensional state variable. We consider the problem of the control harmful emissions. When solving the problem in the class of closed-loop strategies it turns out that Hamilton–Jacobi–Bellman equation may have multiple solutions. The application of an economic criterion and a classical method used in the theory of linear-quadratic regulators (LQR) to the selection of the admissible solutions from the set of obtained solutions was shown in the considered model. Keywords Differential games · Linear-quadratic games · Selection of Nash equilibrium · Feedback strategies · Multiple solutions · LQR

3.1 Introduction One of the most important problems of contemporary ecology is environmental pollution. Until recently, the cost of the damage done to nature was not taken into account. Only in the recent past, the adverse effects of the production activities on the state of nature have being taken into account. The optimal control models of resource management have proven useful in addressing ecological problems. In this paper we formulate the problem within the well-studied class of linear-quadratic differential games that have numerous applications [4, 7, 8, 10, 12]. But even in this case there could be some issues related E. Gromova () St. Petersburg State University, St. Petersburg, Russia Institute of Mathematics and Mechanics, Ural Branch of the Russian Academy of Sciences, Ekaterinburg, Russia e-mail: [email protected] Y. Lakhina St. Petersburg State University, St. Petersburg, Russia © Springer Nature Switzerland AG 2019 L. A. Petrosyan et al. (eds.), Frontiers of Dynamic Games, Static & Dynamic Game Theory: Foundations & Applications, https://doi.org/10.1007/978-3-030-23699-1_3

37

38

E. Gromova and Y. Lakhina

to the existence and uniqueness of Nash equilibrium [2, 6, 9, 11, 17, 19–21]. In a more general case the problem of the existence and uniqueness of the solution was considered, for example, in [3, 15, 16]. In order to determine the optimal strategies of players, we use the game-theoretic approach [7, 14, 18]. The paper [13] suggested a differential game of advertising competition among three symmetric firms with investment into advertising as control variables. This firms competing for their sales of a certain homogeneous product. In the article [11], the authors consider a differential game of managing the investments in an advertising campaign for the case of n symmetric players. The solution of the linear-quadratic problem [22] was sought in the class of closedloop strategies [2]. The authors considered the non-cooperative statement [7, 14], in which the Nash equilibrium is used as a solution. In the survey [5], an economic method of selection of the admissible solutions from the set of obtained solutions of the optimal control problem was described. Also, [1] considered another solution selection method. It is a classical method used in the theory of linear-quadratic regulators (LQR). In this paper we suggest a new non-cooperative model of pollution control with n players. For simplicity of calculation we consider the case of symmetric players. The problem is solved by the dynamic programming method [2, 11] for n = 2 players. This paper is organized as follows: Sect. 3.2 introduces the optimal control model of harmful emissions in the production of interchangeable goods for n players in the absence of absorption. Section 3.3 presents the Nash equilibrium. The two-player differential game is considered in a non-cooperative statement. The problem is solved in the class of close-loop strategies. In Sect. 3.4 we apply an economic criterion and a classical method used in the theory of linear-quadratic optimization regulators (LQR) for the selection of the admissible solutions from the set of obtained solutions. This criterions show us non-existence of feedback Nash equilibria in the linear-quadratic model. Section 3.5 concludes the paper.

3.2 The Model of Pollution Control Let us formulate the problem of pollution control by several firms as a noncooperative differential game with n players. We suppose that the evolution of the stock of pollution xi of firm i is defined by the system of linear ordinary differential equations: x˙i = ai ,

i = 1, n,

xi (0) = xi0 ≥ 0,

(3.1)

where ai is the amount of pollutants emitted by the player i per unit of time (the control variable), ai ≥ 0, xi is the stock of pollution, i.e. the state variable such that the initial value of pollution stock of the player i equals to xi0 .

3 On the Selection of the Nash Equilibria

39

Let each player i, i = 1, n, produces a stream of goods yi = mi ai , mi ≥ 0. The payoff function of the player i has the following form: #∞ Ji (ai ) =

    n  e−ρt π k − yh + β yi − di xi dt,

(3.2)

h=1

0

where π, π ≥ 0 is proportionality  coefficient, k, k ≥ 0 is the required (maximum) n stream of goods, π k − yh + β yi (t) is the cost of goods, β, β ≥ 0 is h=1

the baseline cost of goods in conditions of full market saturation, and di xi is the abatement cost ( di ≥ 0). The problem is considered with the exponential discount function with discounting factor ρ, ρ ≥ 0. For simplicity let further assume that mi = 1, di = 1 and β = 0. Then the payoff function (3.2) of the player i is given by #∞ Ji (ai ) =

    n  e−ρt π k − ah ai − xi dt.

(3.3)

h=1

0

3.3 Nash Equilibria We will use the following definition of the Nash equilibrium. Definition 3.1 The n-tuple a NE = {a1NE , . . . , anNE } is a Nash equilibrium if Ji (a NE ) ≥ Ji (a NE ||ai ), NE NE where a NE ||ai = {a1NE , . . . , ai−1 , ai , ai+1 , . . . , anNE }, ai ∈ Ui , i ∈ N.

Ui is a set of admissible controls of the player i. Now let us consider the differential game with n = 2 players and find the Nash equilibrium by using dynamic programming method [2]. Let Vi (x) be a continuously differentiable (Bellman) function, which satisfies the Hamilton–Jacobi–Bellman (HJB) equation [7]: &   ' 2 2   ∂Vi ρVi (x) = max ah ai − xi + ai , π k− ∂xi ai ≥0,ajNE h=1

i=1

i, j = 1, 2,

j = i. (3.4)

40

E. Gromova and Y. Lakhina

Choose the Bellman functions for both players in the following form [8, 12]: $ Vi (x) = x

qi1

qi2 % 2

qi2 2

qi3

x + x [wi1 , wi2 ] + zi ,

i = 1, 2,

(3.5)

that as $ is, % a sum of a quadratic term, a linear term and a constant. The matrix qi1 q2i2 is chosen to be symmetric. Substituting Vi (x) and their partial derivatives qi2 2 qi3 into the respective HJB equations (3.4), we get   ρ q11 x12 + q12 x1 x2 + q13 x22 + x1 w11 + x2 w12 + z1 =

& π(k − a1 − a2 )a1 −

max

a1 ≥0,a2NE

' −x1 + a1 (2q11 x1 + q12 x2 + w11 ) + a2 (q22 x1 + 2q23 x2 + w22 ) ;   ρ q21 x12 + q22 x1 x2 + q23 x22 + x1 w21 + x2 w22 + z2 =

& max

a2 ≥0,a1NE

π(k − a1 − a2 )a2 −

' −x2 + a1 (2q11 x1 + q12 x2 + w11 ) + a2 (q22 x1 + 2q23 x2 + w22 ) .

(3.6)

The optimal control variables, which maximize the right-hand sides of the HJB equations (3.6) are a1∗ =

k (4q11 − q22 )x1 + 2(q12 − q23 )x2 + 2w11 − w22 + ; 3 3π

(3.7)

a2∗ =

k 2(q22 − q11 )x1 + (4q23 − q12)x2 + 2w22 − w11 + . 3 3π

(3.8)

Substituting (3.7) and (3.8) into the HJB equations (3.6) we get the following results. From the first equation (3.6) we have: x12 : 16q11 2 − q11 (14q22 + 9πρ) + 7q22 2 = 0; x1 x2 : 4q11 (4q12 − 7q23 ) + 7q22 (4q23 − q12 ) = 9πq12 ρ; x1 : πk(8q11 + q22 ) − 9π = w11 (9πρ − 16q11 + 7q22 ) + 14w22 (q11 − q22 );

(3.9) x22 : 4q12 2 − 14q12 q23 + 28q23 2 = 9πq13 ρ; x2 : 9πρw12 − 2πk(2q12 + q23 ) = 2w11 (4q12 − 7q23 ) + 7w22 (4q23 − q12 );

(3.10) 1 : πk(4w11 + w22 ) − 7w11 w22 + 4w11 2 + 7w22 2 = 9πρz1 − π 2 k 2 . (3.11)

3 On the Selection of the Nash Equilibria

41

From the second equation in (3.6) we get: x12 : 28q11 2 − 14q11 q22 + 4q22 2 = 9π q21 ρ; x1 x2 : 28q11 (q23 − q12 ) = q22 (16q23 − 7q12 − 9πρ); x1 : 9πρw21 − 2π k(q11 + 2q22 ) = 7w11 (4q11 − q22 ) + 2w22 (4q22 − 7q11 );

(3.12) x22 : 16q23 2 − q23 (14q12 + 9πρ) + 7q12 2 = 0; x2 : πk(q12 + 8q23 ) − 9π = 14w11 (q23 − q12 ) + w22 (7q12 − 16q23 + 9πρ);

(3.13) 1 : πk(w11 + 4w22) − 7w11 w22 + 7w11 2 + 4w22 2 = 9πρz2 − π 2 k 2 = 0. (3.14) Solving this systems for all parameters of the Bellman functions Vi (x), we get the following expressions: ⎡

q11 = q23

⎤ 0 ⎢ 9πρ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ 16 ⎥ ⎢ 7πρ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ 16 ⎥ = ⎢ πρ ⎥ , ⎢ ⎥ ⎢ 8 ⎥ ⎢ 7πρ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ 32 ⎥ ⎣ 63πρ ⎦ 64



q12 = q22

0 ⎥ ⎢ ⎢ 0 ⎥ ⎥ ⎢ ⎥ ⎢ ⎢ πρ ⎥ ⎥ ⎢ ⎥ ⎢ = ⎢− πρ ⎥ , ⎥ ⎢ 4 ⎥ ⎢ πρ ⎥ ⎢ ⎢− ⎥ ⎢ 4 ⎥ ⎣ 9πρ ⎦ 8



⎤ 1 ⎢ ⎥ ρ ⎢ ⎥ ⎢ 4πkρ − 8 ⎥ ⎢ ⎥ ⎢ ⎥ 7ρ ⎢ 4πkρ −8 ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ρ = ⎢ πkρ − 12 ⎥ , ⎢ ⎥ ⎢ ⎥ 14ρ ⎢ ⎥ ⎢ 8πkρ − 48 ⎥ ⎢ ⎥ ⎢ ⎥ 55ρ ⎢ ⎥ ⎣ 32 − 32πkρ ⎦ 3ρ

q13 = q21

⎤ 0 ⎢ 63πρ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ 64 ⎥ ⎢ 23πρ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ 64 ⎥ = ⎢ πρ ⎥ , ⎢ ⎥ ⎢ 8 ⎥ ⎢ 67πρ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ 256 ⎥ ⎣ 1899πρ ⎦ 1024





w11 = w22





w12 = w21

⎤ 0 ⎢ ⎥ 5πkρ − 8 ⎢ ⎥ ⎢ ⎥ 8ρ ⎢ ⎥ ⎢ 89πkρ − 152 ⎥ ⎢ ⎥ ⎢ ⎥ 24ρ ⎢ ⎥ =⎢ πkρ + 2 ⎥ , ⎢ − ⎥ ⎢ ⎥ 14ρ ⎢ ⎥ 3πkρ + 48 ⎥ ⎢ ⎢ − ⎥ ⎢ ⎥ 176ρ ⎣ 1696 − 1627πkρ ⎦ 96ρ

(3.15)

42

E. Gromova and Y. Lakhina



⎤ π 2 k 2 ρ 2 − 5πkρ + 4 ⎢ ⎥ 9ρ 3 π ⎢ ⎥ 2 k 2 ρ 2 − 536πkρ + 256 ⎢ ⎥ 253π ⎢ ⎥ ⎢ ⎥ 3 441ρ π ⎢ ⎥ 2 2 2 ⎢ ⎥ 85π k ρ − 296πkρ + 256 ⎢ ⎥ ⎢ ⎥ 3 9ρ π ⎥. z1 = z2 = ⎢ 2 2 2 ⎢ ⎥ 15π k ρ − 52πkρ + 32 ⎢ ⎥ ⎢ ⎥ 98ρ 3π ⎢ ⎥ ⎢ ⎥ 2 2 2 609π k ρ − 1808πkρ + 1024 ⎢ ⎥ ⎢ ⎥ 3025ρ 3π ⎢ ⎥ ⎣ 2454193π 2k 2 ρ 2 − 5315264πkρ + 2876416 ⎦ 20736ρ 3π Thus we get multiple results for the Nash equilibria.

3.4 Application of Different Criteria to the Selection of the Admissible Solutions In this section we consider two different criteria of the solution selection in the case of multiplicity results: an economic criterion and a classical method used in the theory of linear-quadratic optimization regulators (LQR).

3.4.1 Economical Criterion In order to select the admissible solutions from the set of obtained solutions, we use the following economic criterion [5]: if the marginal revenue (revenue per sale unit of goods) equals to zero, then total revenue should also be zero. Using the notation of our model, we formulate this criterion. Criterion 1 [5] Let the marginal revenue π = 0. Then, with necessity, we have the total revenue Vi (x) = 0, where Vi (x), i = 1, n is the Bellman function corresponding to the desired maximum in (3.3).

3 On the Selection of the Nash Equilibria

43

Let π = 0 in (3.15). We get ⎡

q11 = q23

⎡ ⎤ 0 ⎢ ⎥ ⎢0⎥ ⎢ ⎥ ⎢ ⎥ ⎢0⎥ ⎢ ⎥ = ⎢ ⎥, ⎢0⎥ ⎢ ⎥ ⎢ ⎥ ⎢0⎥ ⎣ ⎦ 0

q12 = q22

⎡ ⎤ 0 ⎢ ⎥ ⎢0⎥ ⎢ ⎥ ⎢ ⎥ ⎢0⎥ ⎢ ⎥ = ⎢ ⎥, ⎢0⎥ ⎢ ⎥ ⎢ ⎥ ⎢0⎥ ⎣ ⎦

q13 = q21

⎡ ⎤ 0 ⎢ ⎥ ⎢0⎥ ⎢ ⎥ ⎢ ⎥ ⎢0⎥ ⎢ ⎥ = ⎢ ⎥, ⎢0⎥ ⎢ ⎥ ⎢ ⎥ ⎢0⎥ ⎣ ⎦

0

0

w11 = w22

1 ⎤ ⎢ ρ ⎥ ⎢ 8 ⎥ ⎢− ⎥ ⎢ ⎥ ⎢ 7ρ ⎥ ⎢ 8 ⎥ ⎢ ⎥ ⎢ − ⎥ ⎢ ρ ⎥ = ⎢ 6 ⎥, ⎢ ⎥ ⎢− ⎥ ⎢ 7ρ ⎥ ⎢ 48 ⎥ ⎢− ⎥ ⎢ ⎥ ⎢ 55ρ ⎥ ⎣ 32 ⎦ −



⎤ 0 ⎢ 1 ⎥ ⎢ − ⎥ ⎢ ρ ⎥ ⎥ ⎢ ⎢ 19 ⎥ ⎥ ⎢− ⎢ 3ρ ⎥ ⎥ ⎢ ⎥ =⎢ ⎢ 1 ⎥, ⎥ ⎢− ⎢ 7ρ ⎥ ⎢ 3 ⎥ ⎥ ⎢− ⎥ ⎢ ⎢ 11ρ ⎥ ⎣ 53 ⎦ 3ρ ⎡

w12 = w21

(3.16)

As we can see, all obtained solutions of the problem do not satisfy this criterion.

3.4.2 Classical Method Used in the Theory of Linear-Quadratic Regulators Recall that the Bellman function Vi (x), i = 1, n (3.5) is the sum of a quadratic term, a linear term and a constant. Let us consider the matrix M of the quadratic form: M=

$ qi1 qi2 2

qi2 % 2

qi3

,

44

E. Gromova and Y. Lakhina

which has two eigenvalues λ1 =

  ( 1 2 − 4q 2 + q 2 + 6q q qi1 + qi3 + qi1 i1 i3 , i2 i3 2

λ2 =

  ( 1 2 − 4q 2 + q 2 + 6q q qi1 + qi3 − qi1 . i1 i3 i2 i3 2

Depending on the sign of λ1 , λ2 , the quadratic form in the formulation of the Bellman function Vi (x) will be either positive or negative definite. This agrees with the common practice that says that a linear quadratic optimization problem has two solutions: a feasible (stable) and a unfeasible (unstable) one [1]. In contradistinction to the classical case of the LQR problem, the maximization problem is solved in this formulation. Therefore, the Bellman function must be negatively semi-definite. Since λ1 is strictly positive (except for the first component of the solution with λ1 = λ2 = 0), all of obtained solutions of the problem are not satisfied this method.

3.5 Conclusion We considered the new non-cooperative differential game model of pollution control. This differential game was solved for two symmetric players. We applied two different methods for selection of the admissible solutions. It turned out that the problem of determining the Nash equilibrium has no solution. Acknowledgement This work was supported by the grant 17-11-01093 of Russian Science Foundation.

Appendix This appendix gives some intermediate calculations leading to solution. In section A we find the optimal controls. In section B we determine the parameters of the Bellman functions Vi (x), i = 1, 2 (3.15).

3 On the Selection of the Nash Equilibria

45

A Bellman functions: V1 (x) = q11 x12 + q12 x1 x2 + q13x22 + x1 w11 + x2 w12 + z1 , V2 (x) = q21x12 + q22 x1 x2 + q23 x22 + x1 w21 + x2 w22 + z2 . The partial derivatives: ∂V1 (x) = 2q11x1 + q12 x2 + w11 ; ∂x1

∂V2 (x) = q22 x1 + 2q23 x2 + w22 . ∂x2

Consider the following functions, which are maximized on the right-hand sides of the HJB equations (3.6) F (a1 , a2N E ) = π [k−a1 −a2 ]a1 −x1 +a1 (2q11 x1 +q12 x2 +w11 )+a2 (q22 x1 +2q23 x2 +w22 ); F (a2 , a1N E ) = π [k−a1 −a2 ]a2 −x2 +a1 (2q11 x1 +q12 x2 +w11 )+a2 (q22 x1 +2q23 x2 +w22 ).

The first-order optimality conditions are ∂F (a1, a2NE ) =0 ∂a1



π(2a1 + a2 ) = πk + 2q11 x1 + q12 x2 + w11 ,

∂F (a2 , a1NE ) =0 ∂a2



π(a1 + 2a2 ) = πk + q22x1 + 2q23 x2 + w22 .

We obtained the system, whose solution is given by (3.7) and (3.8).

B Consider a system of six equations that depend only on q11 , q12 , q13 , q21 , q22 , q23 : 16q112 − q11 (14q22 + 9πρ) + 7q222 = 0;

(3.17)

4q11(4q12 − 7q23) + 7q22(4q23 − q12 ) = 9πq12ρ;

(3.18)

4q122 − 14q12q23 + 28q232 = 9πq13ρ;

(3.19)

28q112 − 14q11q22 + 4q222 = 9πq21ρ;

(3.20)

46

E. Gromova and Y. Lakhina

28q11(q23 − q12) = q22 (16q23 − 7q12 − 9πρ);

(3.21)

16q232 − q23 (14q12 + 9πρ) + 7q12 2 = 0.

(3.22)

From (3.17) and (3.22) express q11 =

1 (14q22 + 9πρ ± 32

q23 =

1 (14q12 + 9πρ ± 32

( −252q222 + 252q22πρ + 81π 2 ρ 2 ), (

−252q122 + 252q12πρ + 81π 2 ρ 2 ).

Let us denote D22 = −252q222 +252q22πρ+81π 2ρ 2 ,

D12 = −252q122 +252q12πρ+81π 2ρ 2 .

Equation (3.18) added to (3.21) makes 4q11q12 − 4q22q23 + 3πρ(q12 − q22 ) = 0, and substitute q11 and q23 in this equation: ) ) 33πρ(q12 − q22) ± 8q12 D22 = ±8q22 D12 , and square both sides of this equation: ) (q12 − q22 )(1089π 2ρ 2 (q12 − q22 ) ± 528πρq12 D22 + 16128πρq12q22 + +5184π 2ρ 2 (q12 + q22)) = 0. We get that either q12 = q22 or ) 363πρ(q12 − q22) + 5376q12q22 + 1728πρ(q12 + q22 ) = ∓176q12 D22 . (3.23) For simplicity we consider the case when q12 = q22 . Therefore, q11 = q23 , hence q13 = q21. Given this, we obtain those equalities (3.17) and (3.22), (3.18) and (3.21), (3.19) and (3.20) are identical, then the system (3.17)–(3.22) of six equations will be reduced to a system of three equations: 16q112 − q11 (14q22 + 9πρ) + 7q222 = 0; 2 2 − 7q22 = 9πq22ρ; 44q11q22 − 28q11

4q222 − 14q22q11 + 28q112 = 9πq13ρ.

(3.24)

3 On the Selection of the Nash Equilibria

47

From the second equation of this system we express q11 =

( 1 2 − 7q πρ), (11q22 ± 3 8q22 22 14

(3.25)

and, substituting (3.25) into the first equation of the system (3.24), we find four values of q22 (3.15). Then, by substituting this values of q22 into (3.25), we get each value of q22 corresponds to two values of q11 (3.15). Considering the values of q22 and q11 , from the last equation of the system (3.24) we find the values of q13 (3.15). Given the equalities q12 = q22, q11 = q23 and Eqs. (3.9) and (3.13) we get that kπ(q22 + 8q11) − 9π , then substituting the values of q22 and q11 we w11 = w22 = 9πρ − 7q22 − 2q11 find w11 (3.15). Given the equalities q12 = q22 , q11 = q23 , w11 = w22 and Eqs. (3.10) and (3.12) 2kπ(2q22 + q11 ) + w11 (14q11 + q22 ) we get that w12 = w21 = , then substituting 9πρ the values of q22 , q11 and w11 we find w12 (3.15). Given the equality w11 = w22 and Eqs. (3.11) and (3.14) we get that z1 = z2 = 2 k 2 π 2 + 5kπw11 + 4w11 , then substituting the values of w11 we find z1 (3.15). 9πρ We obtained solutions (3.15) of the systems of Eqs. (3.9)–(3.14) and (3.17)– (3.22).

References 1. Alexandrov, A.G.: Optimal and Adaptive Systems, p. 263. High School, Moscow (1989) 2. Basar, T.: A counterexample in linear-quadratic games: existence of nonlinear Nash solutions. J. Optim. Theory Appl. 14(4), 425–430 (1974). https://doi.org/10.1007/BF00933308 3. Basar, T.: On the uniqueness of the Nash solution in linear-quadratic differential games. Int. J. Game Theory 5(2–3), 65–90 (1976). https://doi.org/10.1007/BF01753310 4. Basar, T., Olsder, G.J.: Dynamic Noncooperative Game Theory, 2nd edn., p. 519. Academic, New York (1999) 5. Bass, F.M., Krishnamoorthy, A., Prasad, A., Sethi, S.P.: Generic and brand advertising strategies in a dynamic duopoly. J. Mark. Sci. 24(4), 556–568 (2005). https://doi.org/10.1287/ mksc.1050.0119 6. Dockner, E.J., Sorger, G.: Existence and properties of equilibria for a dynamic game on productive assets. J. Econ. Theory 71(1), 209–227 (1996). https://doi.org/10.1006/jeth.1996. 0115 7. Dockner, E.J., Jorgensen, S., Long, N.V., Sorger, G.: Differential Games in Economics and Management Science, p. 396. Cambridge University Press, Cambridge (2000) 8. Engwerda, J.: LQ Dynamic Optimization and Differential Games, p. 510. Wiley, Chichester (2005) 9. Frutos, J., Martín-Herrán, G.: Selection of a Markov perfect nash equilibrium in a class of differential games. J. Dyn. Games Appl. 8(3), 620–636 (2018). https://doi.org/10.1007/s13235018-0257-7

48

E. Gromova and Y. Lakhina

10. Garcia-Meza, M.A., Gromova, E.V., Lopez-Barrientos, J.D.: Stable marketing cooperation in a differential game for an oligopoly. J. Int. Game Theory Rev. 20(3), 1750028 (2018). https:// doi.org/10.1142/S0219198917500281 11. Gromova, E.V., Gromov, D.V., Lakhina, Yu.E.: On the solution of a differential game of managing the investments in an advertising campaign. J. Trudy Inst. Mat. i Mekh. URO RAN 24(2), 75 (2018). https://doi.org/10.21538/0134-4889-2018-24-2-64-75 12. Haurie, A., Krawczyk, J.B., Zaccour, G.: Games and Dynamic Games, p. 465. World Scientific Publishing, Singapore (2012) 13. Jorgensen, S., Gromova, E.: Sustaining cooperation in a differential game of advertising goodwill accumulation. Eur. J. Oper. Res. 254, 294–303 (2016). https://doi.org/10.1016/j.ejor. 2016.03.029 14. Jorgensen, S., Zaccour, G.: Differential Games in Marketing, p. 159. Kluwer Academic Publishers, Boston (2004) 15. Kononenko, A.: The structure of the optimal strategy in controlled dynamic systems. USSR Comput. Math. Math. Phys. 20(5), 13–24 (1980). https://doi.org/10.1016/00415553(80)90085-3 16. Malafeev, O.: Stationary strategies in differential games. USSR Comput. Math. Math. Phys. 17(1), 37–46 (1977). https://doi.org/10.1016/0041-5553(77)90067-2 17. Papavassilopoulos, G., Olsder, G.J.: On the linear-quadratic, closedloop, no-memory Nash game. J. Optim. Theory Appl. 42(4), 551–560 (1984). https://doi.org/10.1007/BF00934566 18. Petrosyan, L.A., Zenkevich, N.A.: Game Theory, p. 564. World Scientific Publishing, Singapore (2016) 19. Reddy, P.V., Zaccour, G.: Feedback Nash equilibria in linear-quadratic difference games with constraints. J. IEEE Tran. Autom. Control 62(2), 590–604 (2017). https://doi.org/10.1109/ TAC.2016.2555879 20. Singh, R., Wisznieszka-Matyszkiel, A.: Linear-quadratic game of exploitation of common renewable resources with inherent constraints. J. Topol. Methods Nonlinear Anal. 51(1), 23–54 (2018). https://doi.org/10.12775/TMNA.2017.057 21. Singh, R., Wisznieszka-Matyszkiel, A.: Discontinuous nash equilibria in a two stage linearquadratic dynamic game with linear constraints. J. IEEE Trans. Autom. Control (2018). https:// doi.org/10.1109/TAC.2018.2882160 22. Zhukovsky, V.I., Chikrii, A.A.: The Linear-Quadratic Differential Games, p. 320. Naukova Dumka, Kiev (1994)

Chapter 4

Endogenous Formation of Cooperation Structure in TU Games Anna Khmelnitskaya, Elena Parilina, and Artem Sedakov

Abstract Our main goal is to provide comparative analysis of several procedures for endogenous dynamic formation of the cooperation structure in TU games. In the paper we consider two approaches to endogenous graph formation based on sequential link announcement and revision. For the evaluation of the pros and cons when adding of a new link is in question, along with the Myerson value we consider also the average tree solution and the centrality rewarding Shapley and Myerson values, recently introduced. Keywords TU game · Communication graph · Myerson value · Average tree solution · Centrality rewarding Shapley and Myerson values · Graph formation · Subgame perfect equilibrium

4.1 Introduction In classical cooperative game theory it is assumed that any coalition of players may form. However, in practical situations when different agents with distinct interests participate in some joint activity it happens quite often that individual players or groups of them start to negotiate seeking for other cooperation frameworks more advantageous for them. Such negotiations between players usually lead to creation of coalition or cooperation (communication) structures which in turn put restrictions on cooperation. Given a cooperative game, a question of which links

A. Khmelnitskaya St. Petersburg State University, St. Petersburg, Russia V.A. Trapeznikov Institute of Control Sciences of the Russian Academy of Sciences, Moscow, Russia e-mail: [email protected] E. Parilina · A. Sedakov () Saint Petersburg State University, Saint Petersburg, Russia e-mail: [email protected]; [email protected] © Springer Nature Switzerland AG 2019 L. A. Petrosyan et al. (eds.), Frontiers of Dynamic Games, Static & Dynamic Game Theory: Foundations & Applications, https://doi.org/10.1007/978-3-030-23699-1_4

49

50

A. Khmelnitskaya et al.

may be expected to form between the players was first raised by Aumann and Myerson in [1], where dynamic model of endogenous formation of cooperation structure in TU games was introduced. For a given cooperative game with n players the authors construct an auxiliary linking game which consists of pairs of players being offered to form links while the offers are made one by one according to some chosen fixed order of feasible links. After the last link is formed, each of the n(n − 1)/2 pairs is given a last opportunity to form additional link. To form a link, both potential partners must agree; once formed, a link cannot be destroyed, and the entire history of offers, acceptances, and rejections is known to all players. So, the linking game is of perfect information, and therefore, from Selten [11] it follows that it has subgame perfect equilibria in pure strategies, each of which is associated with a unique cooperation graph, namely the graph obtained at the end of the play. An alternative approach to endogenous dynamic formation of the communication graph was introduced later by Petrosyan and Sedakov in [10]. In their paper the construction of links is determined not by a chosen order of pairs of players who negotiate over possible links, but by a chosen order of players according to which each player tries to establish links with other players to which the player wants to be connected. Both approaches at each step of constructing a new link use the Myerson or Shapley values to evaluate the pros and cons of adding a link to the cooperation structure already constructed at previous steps. The main goal of this paper is to enrich both approaches for endogenous dynamic formation of the cooperation structure, of Aumann and Myerson, and of Petrosyan and Sedakov by evaluating the pros and cons when adding of a new link is in question. Along with the Myerson value this will also be done for other solution concepts for cooperative games with restricted cooperation represented by means of undirected communication graphs, such as the average tree solution introduced by Herings, van der Laan, Talman and Yang in [4], and the centrality rewarding Shapley and Myerson values recently introduced by Khmelnitskaya, van der Laan and Talman in [5], yet unpublished. The advantage of the average tree solution in comparison to the Myerson value is that the order of computational complexity of the average tree solution for games with cycle-free communication graph is equal to the number of players n, while the order of computational complexity of the Myerson value is n!, the same as for the Shapley value. Whereas the advantage of the centrality rewarding Shapley and Myerson values is that they not only care of cooperation abilities of the players in the sense that only connected players can cooperate, but in contrast to the Myerson value they also respect players’ positional importance in the communication graph and reward higher the more central players. We provide a comparative analysis of the modified procedures for endogenous communication graph formation on the example of a game with a major player studied in Parilina and Sedakov [8, 9]. The structure of the paper is as follows. Section 4.2 contains basic definitions and notation. Two dynamic models of graph formation are discussed in Sect. 4.3. Section 4.4 provides a comparative analysis of modified procedures within each of the two mentioned above approaches for endogenous dynamic formation of the communication graph on the example of a game with a major player.

4 Endogenous Formation of Cooperation Structure in TU Games

51

4.2 Preliminaries A cooperative game with transferable utility (TU game) is a pair (N, v), where N ⊂ N, N = {1, . . . , n}, is a finite set of n ≥ 2 players and v : 2N → R is a characteristic function that assigns to every coalition (subset) of players S ⊆ N its worth v(S), with v(∅) = 0. In this paper it is assumed that a player set N is fixed and the collection of all TU games on N is denoted by GN . For simplicity of notation and if no ambiguity appears we write v when we refer to a game (N, v). A singleton solution on G ⊆ GN , called a value on G , is a function ξ : G → RN that assigns to every v ∈ G a vector ξ(v) ∈ RN where ξi (v) is the payoff to player i ∈ N in v. In the sequel we denote the cardinality of a given set A by |A| along with lower case letters like n = |N|. Let Π(N) be the set of all linear orderings π : N → N on N. For π ∈ Π(N) and i ∈ N, π(i) is the position of player i in π, Pπ (i) = {j ∈ N | π(j ) < π(i)} and Sπ (i) = {j ∈ N | π(j ) > π(i)} are the sets of predecessors and successors of i in π, P¯π (i) = Pπ (i) ∪ {i} and S¯π (i) = Sπ (i) ∪ {i}. For v ∈ GN and π ∈ Π(N) the marginal contribution vector mπ (v) ∈ RN in v with respect to predecessors in π is given by mπi (v) = v(P¯π (i)) − v(Pπ (i)), i ∈ N, and the marginal contribution *πi (v) = vector m *π (v) ∈ RN in v with respect to successors in π is given by m v(S¯π (i)) − v(Sπ (i)), i ∈ N. The Shapley value on GN assigns to every v ∈ GN the payoff vector Sh(v) ∈ RN given by Shi (v) =

1 n!



mπi (v),

for all i ∈ N.

π∈Π(N)

A communication structure on N ⊂ N is specified by a graph, undirected or directed, on N. A graph is a pair (N, Γ ), where N ⊂ N is the set of nodes (players) and Γ ⊆ {{i, j } | i, j ∈ N, i = j }, a collection of unordered pairs, is the set of links (edges) between two nodes in N for an undirected graph, or Γ ⊆ {(i, j ) | i, j ∈ N, i = j }, a collection of ordered pairs, is the set of directed links (arcs) from one node to another in N for a directed graph (digraph). For ease of notation and if no ambiguity appears we write Γ when we refer to a graph (N, Γ ). For a graph Γ on N and S ⊆ N, the subgraph of Γ on S is the undirected graph Γ |S = {{i, j } ∈ Γ | i, j ∈ S} on S, when graph Γ is undirected, and the digraph Γ |S = {(i, j ) ∈ Γ | i, j ∈ S} on S, when graph Γ is directed. In a graph Γ a sequence of different nodes (i1 , . . . , ir ), r ≥ 2, is a path in Γ between i1 and ir if {ih , ih+1 } ∈ Γ for h = 1, . . . , r−1, when graph Γ is undirected, and if {(ih , ih+1 ), (ih+1 , ih )} ∩ Γ = ∅ for h = 1, . . . , r−1, when graph Γ is directed. In a digraph Γ a sequence of different nodes (i1 , . . . , ir ), r ≥ 2, is a directed path in Γ from i1 to ir if (ih , ih+1 ) ∈ Γ for h = 1, . . . , r − 1. In an undirected graph Γ a path (i1 , . . . , ir ) is a cycle if r ≥ 3 and {ir , i1 } ∈ Γ . For ease of notation given a graph Γ and a link {i, j } ∈ Γ if Γ is undirected, or (i, j ) ∈ Γ if Γ is directed, the subgraph Γ \{{i, j }}, correspondingly Γ \{(i, j )}, is denoted by Γ−ij .

52

A. Khmelnitskaya et al.

Given a graph Γ on N, nodes i, j ∈ N are connected in Γ if there exists a path in Γ between i and j . Γ is connected if any i, j ∈ N, i = j , are connected in Γ . S ⊆ N is connected in Γ if Γ |S is connected. For S ⊆ N, C Γ (S) denotes the collection of subsets of S connected in Γ , S/Γ is the collection of maximal connected subsets, called components, of S in Γ , and (S/Γ )i is the (unique) component of S in Γ containing i ∈ S. Given an undirected graph Γ on N, nodes i, j ∈ N are neighbors in Γ if {i, j } ∈ Γ . Given a digraph Γ , if for i, j ∈ N there exists a directed path in Γ from i to j , then j is a successor of i and i is a predecessor of j in Γ . If (i, j ) ∈ Γ , then j is an immediate successor of i and i is an immediate predecessor of j in Γ . For i ∈ N, let P Γ (i) and S Γ (i) denote the sets of predecessors and successors of i in Γ , P+Γ (i) and + S Γ (i) denote the sets of immediate predecessors and immediate successors of i in Γ , P¯ Γ (i) = P Γ (i) ∪ {i}, and S¯ Γ (i) = S Γ (i) ∪ {i}. An undirected graph Γ is cycle-free, or in other terms a tree, if it contains no cycles. Note that a connected cycle-free graph Γ on N has precisely n − 1 links. A connected cycle-free undirected graph is a star if it contains a node, called hub, for which any other node, called satellite, is a neighbor. A connected digraph T on N is a rooted tree if there is a unique node without predecessors, the root of the tree, denoted by r(T ), and for every other node in N there is a unique directed path in T from r(T ) to that node. A node in a tree without successors is a leaf. A rooted tree T on N is a spanning tree of graph Γ on N if T ⊆ Γ when Γ is directed and for every (i, j ) ∈ T , {i, j } ∈ Γ when Γ is undirected. From now on when we say ‘graph’ we mean undirected graph, otherwise we say ‘digraph’, and the set of undirected graphs on N we denote by ΓN . A pair (v, Γ ) of v ∈ GN and Γ ∈ ΓN constitutes a game with graph communication structure, for brevity called graph game, on N. The set of graph games on fixed N we denote by GNΓ . A singleton solution on a set G ⊆ GNΓ , called a graph game value, or simply value if no ambiguity appears, on GNΓ , is a function ξ: G → RN , which assigns to every (v, Γ ) ∈ G a payoff vector ξ(v, Γ ) ∈ RN . Following Myerson [7], we assume that for any (v, Γ ) ∈ GNΓ cooperation is possible only among connected players and along with a game (v, Γ ) we consider its (Myerson) restricted game v Γ ∈ GN defined as v Γ (S) =



v(C),

for all S ⊆ N.

(4.1)

C∈S/Γ

The Myerson value for graph games introduced in [7] is defined as the Shapley value (cf. Shapley [12]) of the corresponding restricted game, i.e., for every (v, Γ ) ∈ GNΓ , μi (v, Γ ) = Shi (v Γ ),

for all i ∈ N.

(4.2)

The average tree solution for graph games introduced in Herings et al. [4] in every connected graph game assigns to each player the average of the player’s

4 Endogenous Formation of Cooperation Structure in TU Games

53

marginal contributions to the successors in all admissible spanning trees of the given communication graph, i.e., for every (v, Γ ) ∈ GNΓ , ATi (v, Γ ) =

 1 m +Ti (v), Γ |TN | Γ

for all i ∈ N,

(4.3)

T ∈TN

where TNΓ denotes the set of all admissible spanning trees of Γ , when a spanning tree T of Γ is admissible if (i, j ) ∈ T implies S¯ T (j ) ∈ S T (i)/Γ , and marginal contribution m +Ti (v) of player i ∈ N in game v ∈ GN to the successors in a rooted tree T on N is given by m +Ti (v) = v(S¯ T (i)) − v(S¯ T (j )) = j ∈+ S T (i)

v Γ (S¯ T (i)) − v Γ (S T (i)).1

When graph Γ is connected and cycle-free, every j ∈ N determines the unique spanning tree of Γ with j being its root. Therefore, the order of computational complexity of the average tree solution for cycle-free graph games is n (cf. [3]), while the order of computational complexity of the Myerson value, similar to the Shapley value, is equal to n! The centrality rewarding Shapley and Myerson values for graph games introduced in [5] not only care of cooperation abilities of the players in the sense that only connected players can cooperate, but in contrast to the Myerson value they also respect players’ positional importance in the communication graph. The centrality rewarding Shapley value for graph games in every connected graph game assigns to each player the average of the player’s marginal contributions to the successors in all consistent linear orderings, i.e., for every (v, Γ ) ∈ GNΓ , Shci (v, Γ ) =

1 |Π Γ (N)|



m *πi (v),

for all i ∈ N,

(4.4)

π∈Π Γ (N)

where Π Γ (N) denote the set of all linear orderings consistent with Γ , when a linear ordering π ∈ Π(N) is consistent with Γ if P¯π (i) ∈ C Γ (N) for all i ∈ N. The centrality rewarding Myerson value of a graph game is defined as the centrality rewarding Shapley value of its restricted game, i.e., for every (v, Γ ) ∈ GNΓ , μci (v, Γ ) = Shci (v Γ , Γ ) =

1 |Π Γ (N)|



m *πi (v Γ ),

for all i ∈ N,

π∈Π Γ (N)

(4.5)

1 In literature the marginal contribution vector with respect to a tree was first introduced by Demange in [2] under the name of the vector of hierarchical outcomes.

54

A. Khmelnitskaya et al.

4.3 Dynamic Formation of a Communication Graph Graph games described in Sect. 4.2 are static, or in other terms one-short: all their three components—the player set, the characteristic function, and the communication graph—are a priori fixed. Now we discuss two models of endogenous dynamic formation of the communication graph for a given TU game, when for evaluation the pros and cons of adding a new link to the cooperation structure already constructed at previous steps along with the Myerson value also the average tree solution and the centrality rewarding Shapley and Myerson values are used.

4.3.1 Dynamic Game: Model 1 Inspired by the idea of Aumann and Myerson [1], for a given TU game v ∈ GN we consider a dynamic game of a communication graph formation in which each pair of players negotiates about establishing the link between them under a given order. We suppose that the negotiation about the link {i, j }, i = j , consists of two steps: player i chooses an action after player j chooses his action. Without loss of generality, we assume there is no initial communication between players, i.e., the initial communication graph Γ0 is an empty graph.2 For a negotiated link {i, j }, an action of player i at an odd stage t is her decision to propose or not the link {i, j } to player j in the present graph. So, the action set of player i at this stage t is Ait = {{i, j }, ∅} where the empty set means that player i makes no proposal. Thus, |Ait | = 2 for any odd t. Player j chooses an action at even stage t + 1 and her action is whether to accept the link {i, j } in the current graph or not. The action set of player j at this stage is Aj,t +1 = {{i, j }, ∅}, |Aj,t +1 | = 2. The link {i, j } is formed only if both players i and j choose the action to have this link. There are n(n − 1) stages in the game. The initial communication graph is empty and after each even stage of the game the graph may be changed. Describe this dynamic process. Let the link formation start from the link {1, 2}. At stage 1 player 1 chooses an action a12 ∈ A12 . After this stage the communication graph is not changed: Γ1 = Γ0 . At stage 2 player 2 chooses an action a22 ∈ A22 . After stage 2 communication graph is given by Γ2 =

, Γ0 ∪ {1, 2}, if a12 = a22 = {1, 2}, Γ0 ,

otherwise.

2 It is worth to note that the results below can be easily adapted to the case of an arbitrary linear ordering of links (not necessarily all potential links has to be negotiable) and the non-empty initial communication graph. In the case of non-empty initial communication graph we may add the action of deletion of the link between the pair of the players to their action sets.

4 Endogenous Formation of Cooperation Structure in TU Games

55

Suppose that at arbitrary odd stage t, a link {i, j } is negotiated and player i chooses an action ait ∈ Ait to propose or not a link {i, j } to player j . Then, at stage t + 1 player j chooses an action aj,t +1 whether to accept this link or not. The communication graph Γt +1 is of the form:

Γt +1 =

⎧ ⎪ ⎪ ⎨Γt −1 ∪ {i, j } = ⎪ ⎪ ⎩Γ , t −1

. i 0}. Then (α, β) possesses the properties a∈X βx,a > 0 for x ∈ X \ Xα and a stationary strategy sx,a that corresponds that a∈A(x)

to (α, β) is determined as

sx,a =

⎧ αx,a if x ∈ Xα ; ⎪  ⎪ ⎪ ⎪ α ⎪ x,a ⎪ ⎨ a∈A(x)

βx,a ⎪ if x ∈ X \ Xα , ⎪  ⎪ ⎪ ⎪ β ⎪ x,a ⎩

(8.16)

a∈A(x)

where sx,a expresses the probability of choosing the actions a ∈ A(x) in the states x ∈ X.

140

D. Lozovanu

8.3.3 An Average Markov Decision Problem in Terms of Stationary Strategies Using the relationship between feasible solutions of problem (8.13), (8.14) and the corresponding stationary strategies (8.16), in [9] it is shown that an average Markov decision problem in terms of stationary strategies can be formulated as follows: Maximize   ψθ (s, q, w) = f (x, a)sx,a qx (8.17) x∈X a∈A(x)

subject to ⎧ a s qy − px,y x,a qx = 0, ⎪ ⎪ ⎪ x∈X a∈A(x) ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ a s ⎪ px,y x,a wx = θy , ⎪ ⎨ qy + wy −

∀y ∈ X; ∀y ∈ X;

x∈X a∈A(x)

⎪ ⎪ ⎪ ⎪ sy,a = 1, ∀y ∈ X; ⎪ ⎪ ⎪ a∈A(y) ⎪ ⎪ ⎪ ⎪ ⎪ ⎩ sx,a ≥ 0, ∀x ∈ X, ∀a ∈ A(x); wx ≥ 0, ∀x ∈ X,

(8.18)

where θy are the same values as in problem (8.13), (8.14) and sx,a , qx , wx for x ∈ X, a ∈ A(x) represent the variables that must be found. It is easy to observe that for fixed sx,a , x ∈ X, a ∈ A(x) system (8.18) uniquely determines qx for x ∈ X and determines wx for x ∈ X up to an additive constant in s ) (see [15]). Therefore the notation ψ (s, q, w) each recurrent class of P s = (px,y θ in (8.17) can be changed by ψθ (s) =



f (x, a)sx,a qx .

(8.19)

x∈X a∈X

8.3.4 A Quasi-Monotonic Programming Model in Stationary Strategies for an Average Markov Decision Problem Based on results from previous section we show now that an average Markov decision problem in stationary strategies can be represented as a quasi-monotonic programming problem. We assume that an average Markov decision problem is determined by a tuple (X, {A(x)}x∈X , {f (x, a)}x∈X , p, {θx }x∈X ).

8 Pure and Mixed Stationary Nash Equilibria for Average Stochastic Positional Games

141

Theorem 8.2 Let an average Markov decision problem be given and consider the function ψθ (s) =

 

f (x, a)sx,a qx ,

(8.20)

x∈X a∈A(x)

where qx for x ∈ X satisfy the condition ⎧ ⎪ ⎪ qy − ⎨



x∈X a∈A(x)

⎪ ⎪ ⎩ qy + wy −

a s px,y x,a qx = 0,



x∈X a∈A(x)

a s px,y x,a wx = θy ,

∀y ∈ X; ∀y ∈ X.

(8.21)

Then on the set S of solutions of the system ⎧ ⎪ sx,a = 1, ⎨ ⎪ ⎩

∀x ∈ X;

a∈A(x)

(8.22) sx,a ≥ 0, ∀x ∈ X, a ∈ A(x)

the function ψθ (s) depends only on sx,a for x ∈ X, a ∈ A(x) and ψθ (s) is quasimonotonic on S (i.e. ψθ (s) is quasi-convex and quasi-concave on S). This theorem has been formulated and proved in [9]. Based on this theorem we can conclude that an average Markov decision problem in stationary strategies represents a quasi-monotonic programming problem in which it is necessary to maximize a quasi-monotonic function (8.20), (8.21) on the set of solutions of system (8.22). In the unichain case of the average Markov decision problem the function (8.20), (8.21) does not depend on θy , y ∈ X, i.e the problem is determined by (X, {A(x)}x∈X , {f (x, a)}x∈X , p). In this case from Theorem 8.2 as a corollary we obtain the following result. Corollary 8.1 Let an average Markov decision problem be given and consider the function ψ(s) =

 

f (x, a)sx,a qx ,

(8.23)

x∈X a∈A(x)

where qx for x ∈ X satisfy the condition ⎧ a ⎪ q − p s q = 0, ⎪ ⎨ y x∈X a∈A(x) x,y x,a x ⎪ ⎪ qy = 1. ⎩ y∈X

∀y ∈ X; (8.24)

142

D. Lozovanu

Then on the set S of solutions of the system ⎧ ⎪ sx,a = 1, ⎨ ⎪ ⎩

∀x ∈ X;

a∈A(x)

(8.25) sx,a ≥ 0, ∀x ∈ X, a ∈ A(x)

the function ψ(s) depends only on sx,a for x ∈ X, a ∈ A(x) and ψ(s) is quasimonotonic on S (i.e. ψ(s) is quasi-convex and quasi-concave on S). Remark 8.1 In Theorem 8.2 ψθ (s) expresses the average reward per transition in a Markov decision problem when the starting position is chosen randomly according to a given distribution {θx } on X and when a stationary strategy s ∈ S is applied. Therefore ψθ (s) = ωθ (s). In Corollary 8.1 ψ(s) expresses the average reward per transition for a Markov decision problem with unichain property. Therefore ψ(s) = ω(s), ∀s ∈ S and ω(s) does not depend on starting position.

8.4 Pure Stationary Equilibria for Two-Player Zero-Sum Average Stochastic Positional Games In this section we prove the existence of Nash equilibria in pure stationary strategies for a two-player zero-sum average stochastic positional game and present conditions for determining the optimal pure stationary strategies of the players.

8.4.1 Formulation of a Two-Player Zero-Sum Average Stochastic Positional Game A two-player zero-sum average stochastic game is determined by a tuple (X = X1 ∪ X2 , {A(x)}x∈X , {f (x, a)}x∈X, a∈A(x), p, x0 ), where X is the set of states of the game, X1 is the set of positions of first player, X2 is the set of positions of second player, A(x) is the set of actions in a state x ∈ X, f (x, a) is the step reward in x ∈ X for a fixed a ∈ A(x), p : X × ∪x∈X A(x) × X → [0, 1] is a transition a = 1, ∀x ∈ X, a ∈ A(x) probability function that satisfies the condition y∈X px,y and x0 is the starting state of the game. The game starts at given initial state x0 where the player who is owner of this position fixes an action a0 ∈ A(x0 ). So, if x0 belongs to the set of positions of the first player then the action a0 ∈ A(x0 ) in x0 is chosen by the first player, otherwise the action a0 ∈ A(x0) is chosen by the second one. After that the game passes a randomly to a new position according to the probability distribution {px00,y }y∈X . At time moment t = 1 the players observe the position x1 ∈ X. If x1 belongs to the set of positions of the first player then the action a1 ∈ A(x1 ) is chosen by the first

8 Pure and Mixed Stationary Nash Equilibria for Average Stochastic Positional Games

143

player, otherwise the action is chosen by the second one and so on, indefinitely. In this process the first player chooses actions in his position set in order to maximize t the average reward per transition lim inf 1t f (xτ , aτ ), while the second one t →∞

τ =0

chooses the action in his position set in order to minimize the average reward per t f (xτ , aτ ). Assuming that players choose actions in the transition lim sup 1t t →∞

τ =0

their state positions independently we show that for this game there exists a value ωx0 such that the first player has a strategy of choosing the actions in his position set t that insures lim inf 1t f (xτ , aτ ) ≥ ωx0 and the second player has a strategy of t →∞

τ =0

choosing the actions in his position set that insures lim sup 1t t →∞

t τ =0

f (xτ , aτ ) ≤ ωx0 .

Moreover, we show that players can achieve the value ωx0 applying pure stationary strategies of selection the actions in their position sets. The formulation of the game in pure stationary strategies is the following. Denote by S 1 = {s 1 | s 1 : x → a ∈ A(x) for x ∈ X1 }, S 2 = {s 2 | s 2 : x → a ∈ A(x) for x ∈ X1 }, the corresponding set of pure stationary strategies of the players. Then for arbitrary s 1 ∈ S 1 , s 2 ∈ S 2 the profile s = (s 1 , s 2 ) determines a Markov process s i (x) induced by probability distributions {px,y }y∈X in the states x ∈ Xi , i = 1, 2, and a given starting state x0 . For this Markov process with step rewards f (x, s i (x)), in the states x ∈ Xi , i = 1, 2, we can determine the average reward per transition ωx0 (s 1 , s 2 ). The function ωx0 (s 1 , s 2 ) on S = S 1 × S 2 defines an antagonistic game in normal form S 1 , S 2 , ωx0 (s 1 , s 2 ) that in the extended form is determined by the tuple (X = X1 ∪X2 , {A(x)}x∈X , {f (x, a)}x∈X, a∈A(x), p, x0 ). Taking into account that the strategy sets S 1 and S 2 are finite sets we can regard S 1 , S 2 , ωx0 (s 1 , s 2 ) as a matrix game and therefore for the this game there exist the min-max strategies 1 2 s 1 , s 2 of the players and the max-min strategies s , s of the players for which ωx0 (s 1 , s 2 ) = min max ωx0 (s 1 , s 2 ); s 2 ∈S 2 s 1 ∈S 1

1

2

ωx0 (s , s ) = max min ωx0 (s 1 , s 2 ). s 1 ∈S 1 s 2 ∈S 2

In this section we show that for the considered two-player zero-sum average ∗ stochastic positional game there exists a pure stationary strategy s 1 ∈ S 1 of first ∗ 2 2 player and a pure stationary strategy s ∈ S of the second player such that ∗



ωx (s 1 , s 2 ) = max min ωx (s 1 , s 2 ) = min max ωx (s 1 , s 2 ), s 1 ∈S 1 s 2 ∈S 2 ∗



s 2 ∈S 2 s 1 ∈S 1

∀x ∈ X,

i.e. we show that (s 1 , s 2 ) is a pure stationary equilibrium of the game for an arbitrary starting position x ∈ X, in spite of the fact that the values of the games with different starting positions may be different.

144

D. Lozovanu

In the following we will consider the game for which it is necessary to determine the optimal stationary strategies of the players for an arbitrary starting state x ∈ X and will denote such a game (X = X1 ∪ X2 , {A(x)}x∈X , {f (x, a)}x∈X, a∈A(x), p).

8.4.2 Existence of Pure Stationary Equilibria for a Two-Player Zero-Sum Average Stochastic Positional Game First we show that in a two-player zero-sum average stochastic positional game there exists a strategy s 1 ∈ S 1 of first player and a strategy s 2 ∈ S 2 of second player such that (s 1 , s 2 ) is a max-min strategy of the game for an arbitrary stating position x ∈ X, i. e. ωx (s 1 , s 2 ) = min max ωx (s 1 , s 2 ), s 2 ∈S 2 s 1 ∈S 1

∀x ∈ X.

To prove this we shall use the version of two-player zero-sum average stochastic positional games in which the starting state is chosen randomly according to a given distribution {θx } on X. So, we consider the game in the case when the play starts in a state x ∈ X with probability θx > 0 where x∈X θx = 1. We denote this game (X = X1 ∪ X2 , {A(x)}x∈X , {f (x, a)}x∈X, a∈A(x), p, {θx }x∈X ). This game looks more general, however it can be easily reduced to an auxiliary twoplayer zero-sum average stochastic positional game with a fixed starting position. Such an auxiliary game is determined by a new tuple obtained from (X = X1 ∪ X2 , {A(x)}x∈X , {f (x, a)}x∈X, a∈A(x), p) by adding to the set of positions of first player a new state position z that has a unique action a(z) for which the probability a(z) transitions pz,x = θx , ∀x ∈ X and the corresponding step reward f (z, a(z)) = 0. It is evident that for arbitrary strategies of the players in this game the first player will select in position z the unique action a(z). If for the obtained game with given starting position z we consider the normal form game in pure stationary strategies Sˆ 1 , Sˆ 2 , ωz (s 1 , s 2 ) then for this game we can determine the min-max strategies of the players sˆ 1 , sˆ2 for which ωˆ z (ˆs 1 , sˆ2 ) = x∈X θx ωx (s 1 , s 2 ). This means that the following lemmas hold. Lemma 8.1 For a two-player zero-sum average stochastic positional game determined by a tuple (X = X1 ∪ X2 , {A(x)}x∈X , {f (x, a)}x∈X, a∈A(x), p) there exists a strategy s 2 ∈ S 2 of second player and a strategy s 1 ∈ S 1 of first player such that (s 1 , s 2 ) is a min-max strategy of the game for an arbitrary starting position x ∈ X, i. e. ωx (s 1 , s 2 ) = min max ωx (s 1 , s 2 ), s 2 ∈S 2 s 1 ∈S 1

∀x ∈ X.

Lemma 8.2 For a two-player zero-sum average stochastic positional game determined by a tuple (X = X1 ∪ X2 , {A(x)}x∈X , {f (x, a)}x∈X, a∈A(x), p) there exists

8 Pure and Mixed Stationary Nash Equilibria for Average Stochastic Positional Games 1

145

2

a strategy s ∈ S 1 of first player and a strategy s ∈ S 2 of second player such that 1 2 (s , s ) is a max-min strategy of the game for an arbitrary starting position x ∈ X, i. e. 1

2

ωx (s , s ) = max min ωx (s 1 , s 2 ), s 1 ∈S 1 s 2 ∈S 2

∀x ∈ X.

Using these lemma we can prove the following theorem. Theorem 8.3 Let be given a two-player zero-sum average stochastic positional game determine by a tuple (X = X1 ∪ X2 , {A(x)}x∈X , {f (x, a)}x∈X, a∈A(x), p). Then the system of equations ⎧ & ' a ⎪ ⎪ ⎪ max f (x, a) + px,y εy , ∀x ∈ X1 ; ⎨ εx + ωx = a∈A(x) y∈X & ' a ⎪ ⎪ ε + ω = min p ε f (x, a) + ⎪ x x,y y , ∀x ∈ X2 ; ⎩ x a∈A(x)

(8.26)

y∈X

has a solution under the set of solutions of the system of equations ⎧ & ' a ⎪ ⎪ ∀x ∈ X1 ; max px,y ωy , ⎪ ⎨ ωx = a∈A(x) & y∈X ' a ⎪ ⎪ px,y ωy , ∀x ∈ X2 , ⎪ ωx = min ⎩ a∈A(x)

(8.27)

y∈X

i.e. the system of Eq. (8.27) has such a solution ωx∗ , x ∈ X for which there exists a solution εx∗ , x ∈ X of the system of equations ⎧ & ' a ⎪ ∗ = max f (x, a) + ⎪ ε + ω p ε ⎪ x x,y y , ∀x ∈ X1 ; ⎨ x a∈A(x) y∈X & ' a ⎪ ∗ ⎪ px,y εy , ∀x ∈ X2 . ⎪ ⎩ εx + ωx = min f (x, a) + a∈A(x)

y∈X





The optimal pure stationary strategies s 1 , s 2 of the players can be found by fixing ∗ ∗ arbitrary maps s 1 (x) ∈ A(x) for x ∈ X1 and s 2 (x) ∈ A(x) for x ∈ X2 such that ∗ s 1 (x)∈

&

& Arg max

a∈A(x)

y∈X

a ω∗ px,y y

''

6

&

& '' a ∗ px,y εy , x ∈ X1 , Arg max f (x, a)+ a∈A(x)

y∈X

& & '' & & '' a ∗ 6 a ∗ ∗ s 2 (x)∈ Arg min px,y ωy px,y εy , x ∈ X2 Arg min f (x, a)+ a∈A(x)

y∈X

a∈A(x)

y∈X

146

D. Lozovanu ∗







and ωx (s 1 , s 2 ) = ωx∗ , ∀x ∈ X, i.e. ωx (s 1 , s 2 ) = max min ωx (s 1 , s 2 ) = min max ωx (s 1 , s 2 ), s 1 ∈S 1 s 2 ∈S 2

s 2 ∈S 2 s 1 ∈S 1

∀x ∈ X.

Proof According to Lemma 8.1 for the players in the considered game there exist the pure stationary strategies s 1 ∈ S 1 , s 2 ∈ S 2 for which ωx (s 1 , s 2 ) = min max ωx (s 1 , s 2 ), ∀x ∈ X. s 2 ∈S 2 s 1 ∈S 1

We show that ωx (s 1 , s 2 ) = max min ωx (s 1 , s 2 ), , ∀x ∈ X, s 1 ∈S 1 s 2 ∈S 2





i.e. we show that s 1 = s 1 , s 2 = s 2 . Indeed, if we consider the Markov process induced by strategies s 1 , s 2 then according to Theorem 8.1 for this process the system of linear equations ⎧ a εx + ωx = f (x, a) + px,y εy , ⎪ ⎪ ⎪ y∈X ⎪ ⎪ a ⎪ ⎪ px,y εy , ⎪ εx + ωx = f (x, a) + ⎨ a y∈X ⎪ ωx = px,y ωy , ⎪ ⎪ ⎪ y∈X ⎪ a ⎪ ⎪ ⎪ px,y ωy , ωx = ⎩

∀x ∈ X1 , a = s 1 (x); ∀x ∈ X2 , a = s 2 (x); ∀x ∈ X1 , a = s 1 (x);

(8.28)

∀x ∈ X2 , a = s 2 (x)

y∈X

has a basic solution εx∗ , ωx∗ (x ∈ X). Now if we assume that in the game only the second payer fixes his strategy s 2 ∈ S 2 then we obtain a Markov decision problem with respect to the first prayer and therefore according to Theorem 8.1 for this decision problem the system of linear equations ⎧ a px,y εy , ⎪ ⎪ εx + ωx ≥ f (x, a) + ⎪ y∈X ⎪ ⎪ ⎪ a ε , ⎪ px,y ⎪ y ⎨ εx + ωx = f (x, a) + y∈X a ⎪ ωx ≥ px,y ωy , ⎪ ⎪ ⎪ y∈X ⎪ a ⎪ ⎪ ⎪ px,y ωy , ωx = ⎩

∀x ∈ X1 , a ∈ A(x); ∀x ∈ X2 , a = s 2 (x); ∀x ∈ X1 , a ∈ A(x); ∀x ∈ X2 , a = s 2 (x)

y∈X

has solutions. We can observe that x∗ , ωx∗ (x ∈ X) represents a solution of this system and ωx (s 1 , s 2 ) = ωx∗ , ∀x ∈ X.

8 Pure and Mixed Stationary Nash Equilibria for Average Stochastic Positional Games

147

Taking into account that ωx (s 1 , s 2 ) = mins 2 ∈S 2 Fx (s 1 , s 2 ) then for a fixed strategy s 1 ∈ S 1 the following system has solutions ⎧ a px,y εy , ∀x ∈ X1 , a = s 1 (x); ⎪ ⎪ εx + ωx = f (x, a) + ⎪ y∈X ⎪ ⎪ a ⎪ ⎪ px,y εy , ∀x ∈ X2 , a ∈ A(x); ⎪ ⎨ εx + ωx ≤ f (x, a) + y∈X a ⎪ ωx = px,y ωy , ∀x ∈ X1 , a = s 1 (x); ⎪ ⎪ ⎪ y∈X ⎪ a ⎪ ⎪ ⎪ px,y ωy , ∀x ∈ X2 , a ∈ A(x). ωx ≤ ⎩ y∈X

and x = x∗ , ωx = ωx∗ (x ∈ X) represents a solution of this system. This means that the following system ⎧ a ⎪ εx + ωx ≥ f (x, a) + px,y εy , ∀x ∈ X1 , a ∈ A(x); ⎪ ⎪ ⎪ y∈X ⎪ ⎪ ⎪ a ⎪ ⎪ ⎨ εx + ωx ≤ f (x, a) + y∈X px,y εy , ∀x ∈ X2 , a ∈ A(x); a px,y ωy , ∀x ∈ X1 , a ∈ A(x); ωx ≥ ⎪ ⎪ ⎪ y∈X ⎪ ⎪ ⎪ a ⎪ ⎪ ωx ≤ px,y ωy , ∀x ∈ X2 , a ∈ A(x) ⎪ ⎩ y∈X



has a solution, which satisfies condition (8.28). Thus, we obtain that s 1 = s 1 , ∗ ∗ ∗ s 2 = s 2 and ωx (s 1 , s 2 ) = ωx∗ , ∀x ∈ X, i.e. ∗



ωx (s 1 , s 2 ) = max min ωx (s 1 , s 2 ) = min max ωx (s 1 , s 2 ), ∀x ∈ X. s 1 ∈S 1 s 2 ∈S 2

s 2 ∈S 2 s 1 ∈S 1

So, the theorem holds.

 

The obtained saddle point conditions for zero-sum stochastic games generalize the saddle point conditions for deterministic average positional games from [4, 7]. Based on Theorem 8.3 we may conclude that the optimal strategies of the players in the considered game can be found if we determine a solution of Eqs. (8.26), (8.27). A solution of these equations can be determined using iterative algorithms like algorithms for determining the optimal solutions of an average Markov decision problem [15].

8.5 Mixed Stationary Nash Equilibria for an Average Stochastic Positional Game In this section we show that for an arbitrary m-player average stochastic positional game a Nash equilibrium in pure stationary strategies may not exist, however a Nash equilibria in mixed stationary strategies always exists. We show that such a

148

D. Lozovanu

game in normal form can be formulated as a game with quasi-monotonic and graphcontinuous payoff functions of the players.

8.5.1 A Normal Form Game for an Average Stochastic Positional Game in Mixed Stationary Strategies Based on Theorem 8.2 we show that an average stochastic positional game determined by a tuple ({Xi }i=1,n , {A(x)}x∈X , {f i (x, a}i=1,m , p, {θy }y∈X ) can be formulated in terms of mixed stationary strategies as follows. Let Si , i ∈ {1, 2, . . . m} be the set of solutions of the system ⎧ i ⎨ sx,a = 1, ∀x ∈ Xi ; a∈A(x) (8.29) ⎩ i ≥ 0, ∀x ∈ X , a ∈ A(x) sx,a i i that determines the set of stationary strategies of player i. Each Si is a convex compact set and an arbitrary extreme point corresponds to a basic solution s i of i system (8.29), where sx,a ∈ {0, 1}, ∀x ∈ Xi , a ∈ A(x), i.e. each basic solution of this system corresponds to a pure stationary strategy of player i. On the set S = S1 × S2 × · · · × Sm we define m payoff functions ωθi (s 1 , s 2 , . . . , s m ) =

m  



k sx,a f i (x, a)qx ,

i = 1, 2, . . . , m,

k=1 x∈Xk a∈A(x)

(8.30) where qx for x ∈ X are determined uniquely from the following system of linear equations ⎧ m ⎪ k pa q = 0, ⎪ qy − sx,a ∀y ∈ X; ⎪ x,y x ⎪ ⎨ k=1 x∈Xk a∈A(x) (8.31) ⎪ m k ⎪ a ⎪ ⎪ qy + wy − sx,a px,y wx = θy , ∀y ∈ X ⎩ k=1 x∈Xk a∈A(x)

for an arbitrary fixed profile s = (s 1 , s 2 , . . . , s m ) ∈ S. The functions ωθi (s 1 , s 2 , . . . , s m ), i = 1, 2, . . . , m, represent the payoff functions for the average stochastic game in normal form {Si }i=1,m , {ωθi (s)} i=1,m . Here θy , y ∈ X represent arbitrary fixed nonnegative values that y∈X θy = 1. If θy = 0, ∀y ∈ X \ {x0 } and θx0 = 1, then we obtain an average stochastic game in normal form {Si }i=1,m , {ωxi 0 (s)}i=1,m  when the starting state x0 is fixed and ωθi (s 1 , s 2 , . . . , s m ) = ωxi 0 (s 1 , s 2 , . . . , s m ), i = 1, 2, . . . , m. In this case the game is determined by (X, {Ai (x)}i=1,m , {f i (x, a}i=1,m , p, x0 ).

8 Pure and Mixed Stationary Nash Equilibria for Average Stochastic Positional Games

149

If θy > 0, ∀y ∈ X and y∈X θy = 1, then we obtain an average stochastic game when the play starts in the states y ∈ X with probabilities θy . In this case for the payoffs of the players in the game in normal form we have  θy ωyi (s 1 , s 2 , . . . , s m ), i = 1, 2, . . . , m. ωθi (s 1 , s 2 , . . . , s m ) = y∈X

8.5.2 The Existence of a Mixed Stationary Nash Equilibrium in an Average Stochastic Positional Game As we have noted in introduction the average stochastic positional games generalize deterministic positional games from [1, 4, 7, 19]. These deterministic positional games correspond to the case of average stochastic positional games when the trana } take only values 0 and 1. The main results concerned with sition probabilities {px,y the existence of pure stationary equilibria for two-player zero-sum deterministic positional games with mean payoffs have been obtained in [4, 7]. Additionally in [7] an example of non-zero-sum two-player cyclic game has been constructed for which a Nash equilibrium in pure stationary strategies does not exist. This means that in general case for an average stochastic positional game a Nash equilibrium in pure stationary strategies may not exist. However on the basis of results from [9] we can see that for an arbitrary average stochastic positional game there exists a Nash equilibrium in mixed stationary strategies. Let {Si }i=1,m , {ωθi (s)}i=1,m  be the non-cooperative game in normal form that corresponds to the average stochastic positional game in stationary strategies determined by ({Xi }i=1,m , {A(x)}x∈X , {f i (x, a}i=1,m , p, {θy }y∈X ), where Si and ωθi (s 1 , s 2 , . . . , s m ), i = 1, 2, . . . , m are defined according to (8.29)–(8.31). The payoffs in this game may be discontinuous, however each payoff function ωθi (s 1 , s 2 , . . . , s m ) of player i ∈ {1, 2, . . . , m} is quasi-monotonic (quasi-convex and quasi-concave) with respect to strategy s i and graph-continuous in the sense of Dasgupta and Maskin [3]. Based on these properties in [9] the following theorem has been proven. Theorem 8.4 The game {Si }i=1,m , {ωθi (s)}i=1,m  possesses a Nash equilib∗ ∗ rium s∗ = (s 1 , s 2 , . . . , s m∗ ) ∈ S which is a Nash equilibrium in mixed stationary strategies for the average stochastic positional game determined by ({Xi }i=1,n , {A(x)}x∈X , {f i (x, a}i=1,m , p, {θy }y∈X ). If θy > 0, ∀y ∈ X, then ∗ ∗ s∗ = (s 1 , s 2 , . . . , s m∗ ) is a Nash equilibrium in mixed stationary strategies for the average stochastic positional game {Si }i=1,m , {ωyi (s)}i=1,m  with an arbitrary starting state y ∈ X. This means that for an arbitrary average stochastic positional game there exists a Nash equilibrium in mixed stationary strategies and the optimal stationary strategies of the players can be found using the game {Si }i=1,m , {ωθi (s)}i=1,m .

150

D. Lozovanu

8.6 Pure Stationary Equilibria for Average Stochastic Positional Games with Unichain Property The existence of stationary Nash equilibria for an average stochastic positional game can be derived on the basis of results from [9]. Here we study the problem of the existence of Nash equilibria in pure stationary strategies.

8.6.1 A Normal Form of an Average Stochastic Positional Game with Unichain Property We consider an m-player average stochastic positional game with unichain property that is determined by a tuple ({Xi }i=1,m , {A(x)}x∈X , {f i (x, a}i=1,m , p). The normal form of the game in stationary strategies in this case can be derived from the game model from previous section if we take into account the unichain property of the game. The values of the payoffs of the players ωi (s 1 , s 2 , . . . , s m ), i = 1, 2, . . . , m do not depend on starting position but depend only on strategies of the players. The normal form of an average stochastic positional game in stationary strategies in the unichain case can be defined as follows: Let Si , i ∈ {1, 2, . . . m} be the set of solutions of system (8.2). On the set S = 1 S × S2 × · · · × Sm we consider m payoff functions ωi (s 1 , s 2 , . . . , s m ) =

m  



k sx,a f i (x, a)qx ,

i = 1, 2, . . . , m,

k=1 x∈Xk a∈A(x)

(8.32) where qx for x ∈ X are determined uniquely from the following system of linear equations ⎧ m ⎪ ⎪ ⎪ qy − ⎨



k=1 x∈Xk a∈A(x)

⎪ ⎪ qy = 1. ⎪ ⎩

k pa q = 0, sx,a x,y x

∀y ∈ X; (8.33)

y∈X

for an arbitrary fixed profile s = (s 1 , s 2 , . . . , s m ) ∈ S. The functions ωi (s 1 , s 2 , . . . , s m ), i = 1, 2, . . . , m on the set S = S1 × S2 × · · · × Sm determine a game in normal form {Si }i=1,m , {ωi (s)}i=1,m  that corresponds to an average stochastic positional game with unichain property when players use stationary strategies of choosing the action in their position sets. This game represents the positional game variant of the average Markov decision problem that reflects the property of the average reward function from Corollary 8.1 of Theorem 8.2.

8 Pure and Mixed Stationary Nash Equilibria for Average Stochastic Positional Games

151

An average stochastic positional game with unchain property represents a special case of an average stochastic game with unichain property for which Rogers [16] proved the existence of stationary Nash equilibria. At the same time the existence of stationary equilibria for the considered game we can obtain on the basis of Corollary 8.1 and the results from [3, 5] concerned with existence of Nash equilibria for the came with quasi-concave (quasi-convex) payoffs. The payoffs ωi (s 1 , s 2 , . . . , s m ), i = 1, 2, . . . , m are continuous on S1 ×S2 ×· · ·×Sm and quasimonotonic with respect to the strategy of each player and therefore in unichain case of the game there exists a Nash equilibrium in stationary strategies.

8.6.2 Existence of Pure Stationary Equilibria for Average Stochastic Positional Games with Unichain Property The existence of Nash equilibria in pure stationary strategies for an average stochastic positional game with unichain property we can obtain on the basis of the following result. Theorem 8.5 Let an average stochastic positional game be given that is determined by the tuple ({Xi }i=1,m , {A(x)}x∈X , {f i (x, a}i=1,m , p). Assume that for an arbitrary profile s = (s 1 , s 2 , . . . , s m ) of the game the transition probability matrix s ) induces a Markov unichain. Then there exist functions P s = (px,y εi : X → R,

i = 1, 2, . . . , m

and values ω1 , ω2 , . . . , ωm that satisfy the following conditions: a i px,y εy − εxi − ωi ≤ 0, ∀x ∈ Xi , ∀a ∈ A(x), i = 1, 2, . . . , m, 1. f i (x, a) + y∈X a i 2. max {f (x, a)i + px,y εy − εxi − ωi } = 0, ∀x ∈ Xi , i = 1, 2, . . . , m; a∈A(x)

y∈X

3. on each position set Xi , i ∈ {1, 2, . . . , m} there exists a map s i ∪x∈Xi A(x) such that ∗

s i (x) = a ∗ ∈ Arg max

a∈A(x)

7

f (x, a)i +





a px,y εyi − εxi − ωi

: Xi → 8

y∈X

and f j (x, a ∗ ) +





j

j

a px,y εy − εx − ωj = 0, ∀x ∈ Xi , j = 1, 2, . . . , m.

y∈X ∗







The maps s 1 , s 2 , . . . , s m∗ determine a Nash equilibrium s ∗ = (s 1 , s 2 , . . . , s m∗ ) for the stochastic positional game defined by ({Xi }i=1,m , {A(x)}x∈X ,

152

D. Lozovanu

{f i (x, a}i=1,m , p) and ∗



ωxi (s 1 , s 2 , . . . , s m∗ ) = ωi , ∀x ∈ X, i = 1, 2, . . . , m. ∗



Moreover, s ∗ = (s 1 , s 2 , . . . , s m ∗ ) is a pure stationary Nash equilibrium of the average stochastic positional game for an arbitrary starting position x ∈ X. Proof According to Theorem 8.4 for the average stochastic positional game determined by ({Xi }i=1,m , {A(x)}x∈X , {f i (x, a}i=1,m , p) there exists a mixed stationary ∗ ∗ Nash equilibrium s ∗ = (s 1 , s 2 , . . . , s ∗ ). Taking into account that for this game ∗ ∗ the unichain property holds we have ωi = ωxi (s 1 , s 2 , . . . , s m∗ ), ∀x ∈ X, i = 1, 2, . . . , m. ∗ If s i is a mixed stationary strategy of player i ∈ {1, 2, . . . , m} then for a fixed ∗ x ∈ Xi the strategy s i (x) represents a convex combination of actions determined ∗ ∗ by the probability distribution {s i x,a } on A∗ (x) = {a ∈ A(x)| s i x,a > 0}. Let us consider the Markov process induced by the profile of mixed stationary ∗ ∗ strategies s ∗ = (s 1 , s 2 , . . . , s ∗ ). Then according to (8.3) the elements of transition ∗ s∗ ) of this Markov process can be calculated as follows probability matrix P s = (px,y ∗

s = px,y





a s i x,a px,y

for x ∈ Xi , i = 1, 2, . . . , m.

(8.34)

a∈A(x)

This matrix is unichain and the corresponding step rewards in the states induced by s can be determined according to (8.5) as follows 



f j (x, s i ) =



s i x,a f j (x, a), for x ∈ Xi , ∀i, j ∈ {1, 2, . . . , m}.

(8.35)

a∈A(x)

Based on Theorem 8.1 for this Markov process we can write the following reward equations ∗

f j (x, s i ) +



i∗

j

j

s px,y εy − εx − ωj = 0, for x ∈ Xi , ∀i, j = {1, 2, . . . , m}.

y∈X

(8.36) From these equations we determine uniquely ωj , j = 1, 2, . . . , m and determine j εx for x ∈ X, j = 1, 2, . . . , m up to a constant (see [15]). Additionally, these values satisfy the following condition f j (x, a) +

 y∈X

j

j

a px,y εy − εx − ωj ≤ 0, for x ∈ Xi , a ∈ A(x), ∀i, j = 1, 2, . . . , m.

8 Pure and Mixed Stationary Nash Equilibria for Average Stochastic Positional Games

153

By introducing (8.34) and (8.35) in (8.36) we obtain 



s i x,a f j (x, a) +

 



j

y∈X a∈A(x)

a∈A(x)

j

In these equations we can set εx =

j

a s i x,a px,y εy − εx − ωj = 0, ∀x ∈ Xi , ∀i, j = 1, 2, . . . , m.





a∈A(x)





j

s i x,a εx and ωxi =



j

s i x,a ωx , because

a∈A(x)

s i x,a = 1. After these substitutions and some elementary transformations of

a∈A(x)

the equations we obtain 



s i x,a (f j (x, a))+



j

j

a px,y εy −εx −ωj ) = 0, ∀x ∈ Xi , ∀i, j = 1, 2, . . . , m.

y∈X

a∈A(x)

This means that for the Markov process induced by the profile of mixed stationary ∗ ∗ strategies s ∗ = (s 1 , s 2 , . . . , s m∗ ) there exist functions εi : X → R, i = 1, 2, . . . , m and values ω1 , ω2 , . . . , ωm that satisfy the following condition f j (x, a)) +



a px,y εy − εx − ωj = 0, for x ∈ Xi , a ∈ A∗ (x), ∀i, j = 1, 2, . . . , m. j

j

y∈X

(8.37) ∗







Now let us fix the strategies s 1 , s 2 , . . . , s i−1 , s i+1 , . . . , s ∗ of the players 1, 2, . . . , i −1, i +1, . . . , m and consider the problem of determining the maximal average reward per transition with respect to player i ∈ {1, 2, . . . , m}. Obviously, ∗ if we solve this decision problem then we obtain the strategy s i . Moreover, by ∗ i solving this problem we obtain and a pure optimal strategy s . If we write the average optimality reward conditions for the Markov decision problem with respect to player i then we obtain a i 1. f i (x, a) + px,y εy − εxi − ωi ≤ 0, ∀x ∈ Xi , ∀a ∈ A(x); y∈X 7 8 a i 2. max f i (x, a) + px,y εy − εxi − ωj = 0, ∀x ∈ Xi . a∈A(x)

y∈X

We can observe that ωi and εxi , x ∈ X, determined from (8.36), satisfy conditions (1), (2) above and (8.37) holds. Taking into account that ωi , i = 1, 2, . . . , m and εxi , x ∈ X, i = 1, 2, . . . , m represent a solution of system (8.36), for which conditions (1) and (2) take place for an arbitrary i ∈ {1, 2, . . . , m}, we obtain that ωi , i = 1, 2, . . . , m and εxi , x ∈ X, i = 1, 2, . . . , m represent a solution of the following system f i (x, a) +

 y∈X

a px,y εyi − εxi − ωi ≤ 0, ∀x ∈ Xk ,

∀a ∈ A(x), i, k = 1, 2, . . . , m.

154

D. Lozovanu

Moreover for such a solution holds 7 8  a max f i (x, a) + px,y εyi − εxi − ωi = 0, a∈A(x)

∀x ∈ Xi , i = 1, 2, . . . , m

y∈X

and f j (x, a)) +



a px,y εy − εx − ωj = 0, ∀x ∈ Xi , ∀a ∈ A∗ (x), j = 1, 2, . . . , m. j

j

y∈X ∗



This means that if we find arbitrary maps s 1 , s 2 , . . . , s m∗ such that ∗

s i (x) = a ∗ ∈ Arg min

a∈A(x)

7 8  a μix,a + px,y εyi − εxi − ωi , f or x ∈ Xi , . y∈X

and f j (x, a ∗ ) +





j

j

a px,y εy − εx − ωj = 0, ∀x ∈ Xi , j = 1, 2, . . . , m.

y∈X ∗



then s ∗ = (s 1 , s 2 , . . . , s m∗ ) is a Nash equilibrium in pure stationary strategies.

 

8.7 Conclusion Average stochastic positional games represent an important class of average stochastic games that generalize the deterministic positional games with mean payoffs from [4, 7, 10]. In general case for a stochastic game with average payoffs Nash equilibria in stationary strategies may not exist, however for an arbitrary average stochastic positional game a Nsh equilibrium always exists in stationary strategies. Moreover, for two-player zero-sum average stochastic positional games and for average stochastic positional games with unichain property there exist Nash equilibria in pure stationary strategies. The presented in the paper game models in pure and mixes stationary strategies and the obtained Nash equilibria results for the average stochastic positional games give the possibility to elaborate algorithms for determining the optimal stationary strategies of the players. Acknowledgements The author is grateful to the referee for useful suggestions and remarks contributing to improve the presentation of the paper.

8 Pure and Mixed Stationary Nash Equilibria for Average Stochastic Positional Games

155

References 1. Alpern, S.: Cycles in extensive form perfect information games. J. Math. Anal. Appl. 159, 1–17 (1991) 2. Codon, A.: The complexity of stochastic games. Inf. Comput. 96(2), 203–224 (1992) 3. Dasgupta, P., Maskin, E.: The existence of equilibrium in discontinuous economic games. Rev. Econ. Stud. 53, 1–26 (1986) 4. Ehrenfeucht, A., Mycielski, J.: Positional strategies for mean payoff games. Int. J. Game Theory 8, 109–113 (1979) 5. Fan, K.: Application of a theorem concerned sets with convex sections. Math. Ann. 1963, 189– 203 (1966) 6. Flesch, J., Thuijsman, F., Vrieze, K.: Cyclic Markov equilibria in stochastic games. Int. J. Game Theory 26, 303–314 (1997) 7. Gurvich, V., Karzaniv, A., Khachyan, L.: Cyclic games and an algorithm to find minimax mean cyclles in directed graphs. USSR Comput. Math. Math. Phys. 28, 85–91 (1988) 8. Lozovanu, D.: The game theoretical approach to Markov decision problems and determining Nash equilibria for stochastic positional games. Int. J. Math. Model. Numer. Optim. 2(2), 162– 174 (2011) 9. Lozovanu, D.: Stationary Nash equilibria for average stochastic positional games. In: Petrosyan, L.A., et al. (eds), Frontiers of Dynamic Games, Static and Dynamic Games Theory: Fondation and Applications, pp. 139–163. Springer, Birkhauser (2018) 10. Lozovanu, D., Pickl, S.: Nash equilibria conditions for cyclic games with p players. Electron Notes Discrete Math. 25, 117–124 (2006) 11. Lozovanu, D., Pickl, S.: Optimization and Multiobjective Control of Time-Discrete Systems. Springer, Berlin (2009) 12. Lozovanu, D., Pickl, S.: Determining the optimal strategies for zero-sum average stochastic positional games. Electron Notes Discrete Math. 55, 155–159 (2016) 13. Lozovanu, D., Pickl, S.: Nash equilibria in mixed stationary strategies for m-player mean payoff games on networks. Contrib. Game Theory Manag. 11, 103–112 (2018) 14. Nash, J.: Non-cooperative games. Ann. Math. 54, 286–293 (1953) 15. Puterman, M.: Markov Decision Processes: Discrete Dynamic Programming. Wiley, Hoboken (2005) 16. Rogers, P.: Nonzero-Sum Stochastic Games. Report ORC, PhD thesis, University of California, Berkeley, pp. 69–8 (1966) 17. Shapley, L.: Stochastic games. Proc. Natl. Acad. Sci. U. S. A. 39, 1095–1100 (1953) 18. White, D. : Markov Decision processes. Wiley, New York (1993) 19. Zwick, U., Paterson, M.: The complexity of mean payoff games on graphs. Theor. Comput. Sci. 158, 343–359 (1996)

Chapter 9

Variational Inequalities, Nash Equilibrium Problems and Applications: Unification Dynamics in Networks Vladimir Matveenko, Maria Garmash, and Alexei Korolev

Abstract We study game equilibria in a model of production and externalities in network with two types of agents who possess different productivities. Each agent may invest a part of her endowment (for instance, time or money) on the first stage; consumption on the second period depends on her own investment and productivity as well as on the investments of her neighbors in the network. Three ways of agent’s behavior are possible: passive (no investment), active (a part of endowment is invested) and hyperactive (the whole endowment is invested). We introduce adjustment dynamics and study consequences of junction of two regular networks with different productivities of agents. We use the projectionbased method for solving variational inequalities for the description of adjustment dynamics in networks. Keywords Network · Nash equilibrium · Externality · Productivity · Adjustment dynamics · Variational inequality

9.1 Introduction Social network analysis became an important research field, both as a subject area and as a methodological approach applicable to analysis of interrelations in various complex network structures, not only social, but political, economic, urban. There is also a permanent exchange of ideas among researchers doing network analysis in social and natural sciences—see e.g. [5]. A special place in this multidisciplinary research activity is played by the approach of network games (e.g. [3, 7, 9, 11, 12, 15]), which assumes that agents in network act as rational decision makers whose actions are results of solving optimization problems, and the profile of actions of

V. Matveenko · M. Garmash · A. Korolev () National Research University Higher School of Economics at St. Petersburg, St. Petersburg, Russia e-mail: [email protected] © Springer Nature Switzerland AG 2019 L. A. Petrosyan et al. (eds.), Frontiers of Dynamic Games, Static & Dynamic Game Theory: Foundations & Applications, https://doi.org/10.1007/978-3-030-23699-1_9

157

158

V. Matveenko et al.

all agents in the network is a game equilibrium. Decision of each agent is supposed to be influenced by behavior (or by knowledge) of her neighbors in the network. Such approach is found to be very productive analytically. In majority of researches on game equilibria in networks (e.g. [2, 4, 17, 18]) the agents are assumed to be homogeneous (except their positions in the network), and the problem is to study the relation between the agents’ positions in the network and their behavior in the game equilibrium. The models demonstrate that the agents’ behavior and well-being depend on their position in the network which is characterized by one or another measure of centrality. However, diversity and heterogeneity have become an important aspect of contemporary social and economic life (international working teams is a typical example; many other examples are described by researchers of inclusiveness and social cohesion—see e.g. [1]. Correspondingly, along with accounting for position of agents in the network, an important task is to account for heterogeneity of agents as a factor shaping differences in their behavior and well-being. This direction of research is but forming in the literature. E.g. in [4] agents possess different marginal costs. In the present paper we add heterogeneity of agents and adjustment dynamics into a two-period consumption-investment model with network externalities (see [19] for a special case of complete network and [16] for a general network case). The model considers situations in which on the first stage each agent in network, at the expense of diminishing current consumption, may make investment of some resource (such as money or time) with the goal to increase her second stage consumption. The latter depends not only on her investment and productivity but also on investments by her neighbors in the network. Total utility of each agent depends on her consumptions on both stages. Such situations are typical for families, communities, international organizations, innovative industries, etc. In the framework of the model, questions concerning interrelations between the network structure, incentives and behavior are studied. We use the concept of ‘Nash equilibrium with externalities’, similar to the one introduced by Romer [19] and Lucas [14]. As in the common Nash equilibrium, agents maximize their payoffs (utilities), and in equilibrium no one agent finds gainful to change her behavior if others do not change their behaviors. However, the agent’s maximization problem under the present concept is such that the agent does not change her behavior so ‘free’ as under the common Nash equilibrium concept. In some degree, the agent is attached to the equilibrium of the game. Namely, it is assumed that the agent makes her decision being in a definite environment which is formed by herself and by her neighbors in the network. Though she participates herself in formation of the environment, the agent in the moment of decision-making considers the environment as exogenously given. We identify conditions under which an agent behaves in equilibrium in a definite way, being ‘passive’ (not investing), ‘active’ (investing a part of the available endowment) or ‘hyperactive’ (investing the whole endowment). We study the influence of the heterogeneity on the game equilibria. We introduce adjustment dynamics into the model and study dynamics of transition to the new equilibrium.

9 Variational Inequalities, Nash Equilibrium Problems and Applications

159

The dynamics pattern and the nature of the resulting equilibrium depend on the parameters characterizing heterogeneous agents. A question studied in the paper is consequences of unification of networks with different types of agents. We study junction of complete networks and enumerate conditions under which the initial equilibrium holds after unification, as well as conditions under which a transition process starts and the equilibrium changes. We consider the transient processes leading to new equilibria in the unified network and show the application of the iterative solution of variational inequalities to describe these processes. The paper is organized in the following way. The game model is formulated in Sect. 9.2. Agent’s behavior in equilibrium is characterized in Sect. 9.3. Section 9.4 introduces the adjustment dynamics which may start after a small disturbance of initial inner equilibrium or after a junction of networks and the notion of dynamic stability. It is explained the relation between game equilibria and variational inequalities and between adjustment dynamics and variational inequalities solving. Section 9.5 considers consequences of junction of two complete networks with different types of agents. Section 9.6 describes the adjustment dynamics which start after junction of networks and demonstrates the use of variational inequalities for stability equilibria in united networks finding and transitional process describing. Advantages of this approach over traditional are discussed. Section 9.7 concludes.

9.2 The Model There is a network (undirected graph) with n nodes, i = 1, 2, . . . , n; each node represents an agent. In period 1 each agent i possesses initial endowment of good, e, and uses it partially for consumption in first period of life, c1i , and partially for investment into knowledge, ki : c1i + ki = e, i = 1, 2, . . . , n. Investment immediately transforms one-to-one into knowledge which is used in production of good for consumption in second period, c2i . Preferences of agent i are described by quadratic utility function:



 Ui c1i , c2i = c1i e − ac1i + d i c2i , where di > 0; a is a satiation coefficient, di is a parameter, characterized the value of comfort and health in the second period of life compared to consumption in the first period . It is assumed that c1i ∈ [0, e], the utility increases in c1i , and is concave (the marginal utility decreases) with respect to c1i . These assumptions are equivalent to condition 0 < a < 1/2.

160

V. Matveenko et al.

Production in node i is described by production function: F (ki , Ki ) = gi ki Ki , gi > 0 which depends on the state of knowledge in i-th node, ki , and on environment, Ki , gi is a technological coefficient. The environment is the sum of investments by the agent himself and her neighbors: 

Ki = ki + K˜ i , K˜ i =

kj ,

j ∈N(i)

where N(i)—is the set of neighboring nodes of node i. The sum of investments of neighbors, K˜ i , will be referred as pure externality. We will denote the product di gi by bi and assume that a < bi . Since increase of any of parameters di , gi promotes increase of the second period consumption, we will call bi “productivity”. We will assume that bi = 2a, i = 1, 2, . . . , n If bi > 2a, we will say that i-th agent is productive, and if bi < 2a, we will say that i-th agent is unproductive. Three ways of behavior are possible: agent i is called passive if she makes zero investment, ki = 0 (i.e. consumes the whole endowment in period 1); active if 0 < ki < e; hyperactive if she makes maximally possible investment e (i.e. consumes nothing in period 1). Let us consider the following game. Players are the agents i = 1, 2, . . . , n. Possible actions (strategies) of player i are values of investment ki from the segment [0, e]. Nash equilibrium with externalities (for shortness, equilibrium) is a profile 

of knowledge levels (investments) k1∗ , k2∗ , . . . , kn∗ , such that each ki∗ is a solution of the following problem P (Ki ) of maximization of i-th player’s utility given environment Ki : Ui (c1i , c2i ) −→ max c1i ,c2i ,ki

⎧ i ⎪ ⎪ ⎨ c1 ≤ e − k i ,

c2i ≤ F (ki , Ki ), ⎪ ⎪ ⎩ci ≥ 0, ci ≥ 0, k ≥ 0, i 1 2

 where the environment Ki is defined by the profile k1∗ , k2∗ , . . . , kn∗ : Ki = ki∗ +



kj∗ .

j ∈N(i)

Following Lucas and Romer we use the Jacobian equilibrium. The concept of Jacobian equilibrium, developed by Romer [19] and Lucas [14], supposes that at the

9 Variational Inequalities, Nash Equilibrium Problems and Applications

161

moment of decision-making the agent i takes her environment Ki as exogenously given. The first two constraints of problem P (Ki ) in the optimum point are evidently satisfied as equalities. Substituting into the objective function, we obtain a new function (payoff function):



 Vi (ki , Ki ) = U e − ki , Fi (ki , Ki ) = (e − ki ) e − a(e − ki ) + bi ki Ki = = e2 (1 − a) − ki e(1 − 2a) − aki2 + bi ki Ki . (9.1) If all players’ solutions are internal (0 < ki∗ < e, i = 1, 2, . . . , n), i.e. all players are active, the equilibrium will be referred as inner equilibrium. Clearly, the inner equilibrium (if it exists for given values of parameters) is defined by the system D1 Vi (ki , Ki ) = 0, i = 1, 2, . . . , n,

(9.2)

or D1 Vi (ki , Ki ) = e(2a − 1) − 2aki + bi Ki , i = 1, 2, . . . , n.

(9.3)

Note that using the concept of Jacobian equilibrium, we differentiate Vi (ki , Ki ) by the first argument, but not on all occurrences ki in the formula. We will use the following notation: A˜ is the diagonal matrix with b1 , b2 , . . . , bn on the main diagonal; I is the unit n × n matrix; M is the adjacency matrix of the network. In the adjacency matrix Mij = Mj i = 1 if in the network there is a link between nodes i and j , and Mij = Mj i = 0 otherwise; Mii = 0 for all i = 1, 2, . . . , n. System of Eq. (9.3) takes the form: ˜ (A˜ − 2aI )k + AMk = e¯,

T where k = (k1 , k2 , . . . , kn )T , e¯ = e(1 − 2a), e(1 − 2a), . . . , e(1 − 2a) .

(9.4)

Theorem 9.1 (Matveenko et al. [17], Theorem 1.1) For complete network, system of Eq. (9.4) has a unique solution. Thus, for complete network, the system of Eq. (9.4) has a unique solution k S , which components we shall name stationary values of investments. In the inner equilibrium: ki∗ = kiS , t = 1, 2, . . . , n.

9.3 Characterization of Agent Behavior Types We introduce the following notation. Regardless of the type of agent behavior, the root of equation D1 Vi (ki , Ki ) = (bi − 2a)ki + bi K˜ i − e(1 − 2a) = 0

162

V. Matveenko et al.

will be denoted by k˜iS . In this way e(2a − 1) + bi K˜ i k˜iS = , 2a − bi where K˜ i denotes—the pure externality of agent i. It is obvious, that in equilibrium, if agent i is active, then her investments are equal to k˜iS . The following statement plays a central role in the analysis of equilibria. Proposition 9.1 (Matveenko et al. [17], Lemma 2.1 and Corollary 2.1) A set of investment agent values (k1 , k2 , . . . , kn ) can be an equilibrium only if for each i = 1, 2, . . . , n it is true that 1. if ki = 0, then K˜ i ≤ 2. if 3. if

e(1−2a) ; bi ˜ S 0 < ki < e, then ki = ki ; i) ; ki = e, then K˜ i ≥ e(1−b bi

Lemma 9.1 (Matveenko et al. [17], Lemma 2.2) In equilibrium i-th agent is passive iff e(1 − 2a) K˜ i ≤ ; bi

(9.5)

e(1 − 2a) e < Ki < ; bi bi

(9.6)

e K˜ i ≥ . bi

(9.7)

i-th agent is active iff

i-th agent is hyperactive iff

In any complete network the environment is the same for all agents. This implies the following corollary. Corollary 9.1 (Matveenko et al. [17], Corollary 2.3) In complete network, in equilibrium agents with the same productivity make the same investments. If all agents have the same productivity, then a homophily takes place: everyone behaves in the same way. Corollary 9.2 (Matveenko et al. [17], Comment 2.3) In complete network there cannot be equilibrium in which an agent with a higher productivity is active while an agent with a lower productivity is hyperactive, or when an agent with a higher productivity is passive while an agent with a lower productivity is active or hyperactive.

9 Variational Inequalities, Nash Equilibrium Problems and Applications

163

Speaking about complete network we will omit index i in notation for the i-th agent’s environment, because the environment in complete network is the same for all agents. In other words, K will denote the sum of investments of all agents of complete network. Corollary 9.3 (Matveenko et al. [17], Corollary 2.4) In complete network, equilibrium with all hyperactive agents exists iff min bi ≥ i

1 . n

In this case K≥

e . min bi i

In complete network, equilibrium with all active agents exists iff e e(1 − 2a) is the standard scalar product inR n . Under the differentiability of payoff function hypothesis, it is well known (see e.g. [13] or [10]) that Nash equilibrium problems are equivalent to variational inequalities. Thus, consider the closed convex set Ψ n = {k : 0 ≤ ki ≤ e, i = 1, 2, . . . , n},

164

V. Matveenko et al.

where k = (k1 , k2 , . . . , kn ) and F : R n → R n be defined by: Fi (k) = −D1 V1 (ki , Ki ) = −

δVi (ki , Ki ) . δki

Thus, we can consider the following variotional inequality problem: Find k ∈ Ψ n such as n 

Fi (k)(xi − ki ) ≥ 0, ∀x ∈ Ψ n

i=1

Theorem 9.2 (Gemp and Mahadevan [8]) The vector k ∗ is the solution of V I (F, Ψ ) if and only if, for any α > 0, k ∗ is also a fixed point of the map k ∗ = PΨ k ∗ − αF (k ∗ ) , where PΨ is projector onto convex set Ψ . Thus, we shall use following algorithm: 1: 2: 3: 4: 5: 6:

Set k = 0 and xk ∈ K repeat

 Set xk+1 = PΨ xk − αF (xk ) . Set k = k +

1.  until xk = PΨ xk − αF (xk ) return xk

Now we introduce adjustment dynamics which may start after a small deviation from equilibrium or after junction of networks each of which was initially in equilibrium. We model the adjustment dynamics in the following way. Definition 9.2 Each agent i, i = 1, 2, . . . , maximizes her utility by choosing a level of investment; at the moment of decision-making she considers her environment as exogenously given. Correspondingly, for time periods t = 0, 1, 2, . . . , if kit = 0 and D1 Vi (ki , Ki )|ki =0 ≤ 0, then kit +1 = 0, and if kit = e and D1 Vi (ki , Ki )|ki =e ≥ 0, then kit +1 = e; in all other cases, kit +1 solves the difference equation: −2akit +1 + bi Kit − e(1 − 2a) = 0. Definition 9.3 The equilibrium is called dynamically stable if, after a small deviation of one of the agents from the equilibrium, dynamics starts which returns the equilibrium back to the initial state. In the opposite case the equilibrium is called dynamically unstable. It is clear, that the dynamics equations system depends of what agents are active, passive and hyperactive. Hence, we must look after the transitional process and all the time correct our equations system. It is very uncomfortably and insecure. In

9 Variational Inequalities, Nash Equilibrium Problems and Applications

165

the same time, the iteration processes of variational inequalities solving describe the transitional dynamics in the way to stable equilibrium (see, e.g. [6]). Hence the using of variational inequalities shall allow us to receive the results of traditional dynamics automatically.

9.5 A Case of Networks Unification Let us consider the following situation (the model of [17]). Let a complete network consist of p agents with productivity b1 (these agents will be referred as type 1) and q agents with productivity b2 (type 2); b1 > b2 . In initial time period each 1st type agent invests and each 2nd type agent invests k0 2. The junction of networks takes place in time period 0. Correspondingly, the environment (common for all agents) in the initial period is K = pk01 + qk02. The following statement lists all possible equilibria and conditions of their existence. Proposition 9.2 ([17], Proposition 3.1) In complete network with 2 types of agents the following equilibria exist. (1) Equilibrium with all hyperactive agents exists if b1 > b2 ≥

1 p+q

(2) Equilibrium in which 1st type agents are hyperactive and 2nd type agents are active exists if 0< p+

1 − 2a − pb2 < 1, qb2 − 2a

1 q(1 − 2a − pb2 ) ≥ . qb2 − 2a b1

(3) Equilibrium in which 1st type agents are hyperactive and 2nd type agents are passive exists if b1 ≥

1 1 − 2a , b2 ≤ . p p

(4) Equilibrium in which 1st type agents are active and 2nd type agents are passive exists if b1 >

1 pb1 − 2a , b2 ≤ . p p

166

V. Matveenko et al.

(5) Equilibrium with all passive agents always exists. (6) Equilibrium in which agents of both types are active exists if p(b1 − b2 ) < 2a, 2ab1(p + q) > 2a + q(b1 − b2 ). Proposition 9.3 (Matveenko et al. [17], Proposition 4.2) The conditions of dynamic stability/instability of the equilibria listed in Proposition 9.1 (in case of their existence) are the following. 1. The equilibrium in which both types of agents are hyperactive is stable iff b1 >

1 1 , b2 > . p+q p+q

2. The equilibrium in which agents of 1st type are hyperactive and agents of 2nd type are active is stable iff p+

1 1 − 2a − pb2 1 − 2a − pb2 > , 0< < e, q = 1, b2 < 2a. b2 − 2a b1 b2 − 2a

3. The equilibrium in which agents of 1st type are hyperactive and agents of 2nd type are passive is stable iff b1 >

1 1 − 2a , b2 > . p p

4. The equilibrium in which agents of 1st type are active and agents of 2nd type are passive is always unstable. 5. The equilibrium with all passive agents is always stable. 6. The equilibrium with all active agents is always unstable. We shall give some summary analysis of the results. Equilibrium with all passive agents is always possible, even at very low productivities, and such equilibrium is always dynamically stable. At higher productivity of the first type agents and low productivity of the second type agents equilibrium in which agents of the first type are active and agents of the second type are passive, is possible. This equilibrium is always unstable, as well as equilibrium, in which all agents of both types are active, possible at higher productivity agents of both types. With a sufficiently high value of the first type agents’ productivity and a low value of the second type agents’ productivity, the equilibrium in which the agents of the first type are hyperactive while the agents of the second type are passive, is possible. Such equilibrium is almost always dynamically stable (with the exception of the set of productivities values of the Lebesgue measure zero). At rather high values of the productivity of the first type agents and in a certain range of productivity values of the second type agents, the equilibrium in which agents of the first type are hyperactive and agents of the second type are active, is possible, and when

9 Variational Inequalities, Nash Equilibrium Problems and Applications

167

the productivity of the second type agents is not too high, such an equilibrium is dynamically stable. Finally, for very large values of the productivities of agents of both types, an equilibrium in which agents of both types are hyperactive, is possible, and such an equilibrium is almost always dynamically stable (except for the set of values of the Lebesgue measure zero). Thus, higher productivity values encourage agents to invest more actively in knowledge. The implication of Propositions 9.2 and 9.3 is that increasing the productivity of agents causes the establishment of stable equilibria with a higher amount of investment in knowledge. At the same time, Propositions 9.2 and 9.3 define specific thresholds of the corresponding values of the agents’ productivities.

9.6 Transitional Dynamics in Our Case Before the merger, let both complete networks be in an equilibrium. After the unification there is a transient dynamics, and in which stable equilibrium the united network will move, it depends on the relations ki0 and ki∗ , which in turn depends on the values of the parameters. To follow the changes, writing directly the equations of dynamics, is rather laborious, because one have all the time to change the system of equations, depending on the types of behavior of agents. For example as long as all agents are active the dynamics (see Definition 9.2) is described by the system of differential equations: , k1t +1 = k2t +1 =

pb1 t 2a k1 pb2 t 2a k1

+ +

qb1 t 2a k2 qb2 t 2a k2

+ +

e(2a−1) 2a , e(2a−1) 2a

where t = 0, 1, 2 . . . . However, as soon as one of the agents reaches the limit level of investments 0 or e, the system of equations must be modified. Matveenko et al. [17] gave detailed description all situations which are possible when unification of the considered complete networks. Proposition 9.4 (Matveenko et al. [17], Proposition 5.1) After the junction all agents hold their initial behavior (make the same investments as before the junction) in the following four cases: (1) if b1 ≥ (2) if

1 p , b2

≥ q1 , and initially agents in both networks are hyperactive;

b2 ≤

1 − 2a , p

168

V. Matveenko et al.

and initially agents in the 1st network are hyperactive, and agents in the 2nd network are passive; (3) if b1 >

1 2a , b1 + b2 ≥ , p p

and initially agents in the 1st network are active, and agents in the 2nd network are passive; (4) if initially agents in both networks are passive. In all other cases the equilibrium changes. Proposition 9.5 ([17], Proposition 5.2) Let the agents in the 1st network before junction be hyperactive (hence, b1 ≥ p1 by Corollary 9.3) and agents in the 2nd network be passive. Then the following cases are possible. 1. If b2 ≤ 1−2a p , then after junction all agents hold their initial behavior, and there is no transition process in the unified network. The unified network is in equilibrium {k1 = e, k2 = 0}. 2. If b2 > 1−2a and b2 ≥ 2a p q , then the 1st group agents stay hyperactive; investments of the 2nd group agents increase until they also become hyperactive. The unified network comes to equilibrium {k1 = e, k2 = e}. 1−2a 3. If 2a investments of q > b2 > p , then the 1st group agents stay hyperactive; & the 2nd group agents increase. The unified network comes to state k1 = e, k2 = ' e(pb2 +2a−1) 1 1 , if b2 < p+q if and to state {k1 = e, k2 = e} if b2 ≥ p+q . 2a−qb2 Proposition 9.6 (Matveenko et al. [17], Proposition 5.3) Let agents of the 1st network before junction be hyperactive (which implies b1 ≥ p1 by Corollary 9.3), and agents of the 2nd network be active (which implies b2 > q1 ). The unified network moves to the equilibrium with all hyperactive agents. The utilities of all agents increase. Proposition 9.7 (Matveenko et al. [17], Proposition 5.4) If before junction agents of both networks are hyperactive (this implies b1 ≥ p1 , b2 ≥ q1 by Corollary 9.3), they stay hyperactive after junction: there is no transition dynamics, and utilities of all agents do increase. Proposition 9.8 (Matveenko et al. [17], Proposition 5.5) If before junction agents of both networks are passive, they stay passive after junction: there is no transition dynamics, and agents’ utilities do not change. Proposition 9.9 (Matveenko et al. [17], Proposition 5.6) Let agents of 1st network before junction be active (which implies b1 > p1 by Corollary 9.3), k10 = e(1−2a) pb1 −2a ,

possible.

and agents of the 2nd network be passive. Then the following cases are

9 Variational Inequalities, Nash Equilibrium Problems and Applications

169

1. Under pb1 ≥ pb2 + 2a, all agents hold their initial behavior, and there is no transition process. 2. Let pb1 < pb2 + 2a. If b2 ≥

2a q

and

e−D1 −k10 b1

e−D2 b2 , then the network moves e−D1 −k10 2 b2 < 2a < e−D q and b1 b2 ,


p1 , b2 > q1 by Corollary 9.3, then after junction all agents become hyperactive; their utilities increase. We summarize these results in the Table 9.1. Seeing how much labor-intensive is description of dynamics in all situations possible when unification of the networks, let us describe adjusting dynamics

Table 9.1 The results of adjustment dynamics that occur after a junction of two regular networks with different productivities of agents The 1st type agents were Passive Active (k10 =

e(1−2a) pb1 −2a )

The second type agents before networks unification were Passive Active k1 = k2 = 0 k1 = k2 = e if pb1 < pb2 + 2a then k1 = k2 = e if

e−D1 −k10 b1

if b2


1−2a p

2a q

then

if b2
e p1 = e else if w(1) < 0 p1 = 0 else p1 = w(1) end; end; if w(2) > e p2 = e else if w(2) < 0 p2 = 0 else p2 = w(2) end; end; x2 = [p1; p2] eps = abs ((x1(1) − x2(1))^2 + (x2(1) − x2(2))^2) end;

References 1. Acemoglu, D., Robinson, J.A.: Why Nations Fall: The Origins of Power, Prosperity, and Poverty. Crown Publishers, New York (2012) 2. Ballester, C., Calvo-Armengol, A., Zenou, Y.: Who’s who in networks. Wanted: the key player. Econometrica 74(5), 1403–1417 (2006) 3. Bramoullè, Y., Kranton, R.: Public goods in networks. J. Econ. Theory 135, 478–494 (2007) 4. Bramoullè, Y., Kranton, R., D’Amours, M.: Strategic interaction and networks. Am. Econ. Rev. 104(3), 898–930 (2014) 5. Estrada, E.: The Structure of Complex Networks. Theory and Applications. Oxford University Press, Oxford (2011) 6. Gabay, D., Moulin, H.: On the uniqueness and stability of Nash equilibria in noncooperative games. In: Applied Stochastic Control in Econometrics and Management Science, pp. 271– 293. North-Holland Publishing Company, Amsterdam (1980). https://www.researchgate.net/ publication/248570672 7. Galeotti, A., Goyal, S., Jackson, M.O., Vega-Redondo, F., Yariv, L.: Network games. Rev. Econ. Stud. 77, 218–244 (2010)

174

V. Matveenko et al.

8. Gemp, I., Mahadevan S.: Finding Equilibria in Large Games Using Variational Inequalities. Association for the Advancement of Artificial Intelligence, Palo Alto (2015). http://www.aaai. org 9. Goyal, S.: Connections: An introductions to the Economics of Networks. Princeton University Press, Princeton (2010) 10. Harker, P.T., Pang, J.S.: Finite-dimensional inequality and nonlinear complementarity problems: a survey of theory, algorithms and applications. Math. Program. 48, 161–220 (1990) 11. Jackson, M.O.: Social and Economic Networks. Princeton University Press, Princeton (2008) 12. Jackson, M.O., Zenou, Y.: Games on networks. In: Young, P., Zamir, S. (eds.) Handbook of Game Theory, vol. 4, pp. 95–163. Elsevier, Amsterdam (2014) 13. Jadamba, B., Raciti, F.: A variational inequality approach to a class of environmental equilibrium problems. Appl. Math. 3, 1723–1728 (2012) 14. Lucas, R.: On the mechanics of economic development. J. Monet. Econ. 2(1), 3–42 (1988) 15. Martemyanov, Y.P., Matveenko, V.D.: On the dependence of the growth rate on the elasticity of substitution in a network. Int. J. Process Manag. Benchmarking 4(4), 475–492 (2014) 16. Matveenko, V.D., Korolev, A.V.: Knowledge externalities and production in network: game equilibria, types of nodes, network formation. Int. J. Comput. Econ. Econ. 7(4), 323–358 (2017) 17. Matveenko, V., Korolev, A., Zhdanova, M.: Game equilibria and unification dynamics in networks with heterogeneous agents. Int. J. Eng. Bus. Manag. 9, 1–17 (2017) 18. Naghizadeh, P., Liu, M.: Provision of public goods on networks: on existence, uniqueness, and centralities. IEEE Trans. Netw. Sci. Eng. 5(3), 225–236 (2018) 19. Romer, P.M.: Increasing returns and long-run growth. J. Polit. Econ. 94, 1002–1037 (1986)

Chapter 10

About the Looking Forward Approach in Cooperative Differential Games with Transferable Utility Ovanes Petrosian and Ildus Kuchkarov

Abstract This paper presents a complete description and the results of the Looking Forward Approach for cooperative differential games with transferable utility. The approach is used for constructing game theoretical models and defining solutions for conflict-controlled processes where information about the process updates dynamically or for differential games with dynamic updating. It is supposed that players lack certain information about the dynamical system and payoff function over the whole time interval on which the game is played. At each instant, information about the game structure updates, players receive new updated information about the dynamical system and payoff functions. A resource extraction game serves as an illustration in order to compare a cooperative trajectory, imputations, and the imputation distribution procedure in a game with the Looking Forward Approach and in the original game with a prescribed duration. Keywords Differential games · Differential cooperative games · Looking forward approach · Time consistency

10.1 Introduction This research examines n-player cooperative differential game with transferable utility [31] in which the game structure can change or update with time (timedependent formulation) and it is assumed that the players do not have information about the change of the game structure over the full time interval, but they have O. Petrosian () St. Petersburg State University, St. Petersburg, Russia National Research University Higher School of Economics at St. Petersburg, St. Petersburg, Russia e-mail: [email protected]; [email protected] I. Kuchkarov St. Petersburg State University, St. Petersburg, Russia © Springer Nature Switzerland AG 2019 L. A. Petrosyan et al. (eds.), Frontiers of Dynamic Games, Static & Dynamic Game Theory: Foundations & Applications, https://doi.org/10.1007/978-3-030-23699-1_10

175

176

O. Petrosian and I. Kuchkarov

certain information about the game structure over the truncated time interval. Under the information about the game structure we understand information about the dynamical system and payoff functions. The interpretation can be given as follows: players have certain information about the game structure, but the duration of this information is less than the length of the initial game. Evidently, this truncated information is valid only for a certain time and has to be updated. In order to define the best possible behavior for players in this type of cooperative differential game, a special approach is needed, which is called the Looking Forward Approach. This approach brings up the following points: how to define a cooperative trajectory, how to define a cooperative solution and allocate the cooperative payoff, and what properties the obtained solution will have. This paper answers these questions and gives an overview of the results corresponding to the Looking Forward Approach. It is demonstrated that the newly built arbitrary solution (arbitrary subset of the imputation set) for a class of differential games with dynamic updating is not only time consistent (which is very rare in cooperative differential games), but is also strongly time consistent. Strong time-consistency of solution means that the solution obtained by “optimal” reconsidering initial solution at any time instant during the game will belong to the initial solution. Haurie analyzed the problem of the dynamic instability of Nash bargaining solutions in differential games [7]. The notion of the time consistency of differential game solutions was formalized mathematically by Petrosyan [21] who presented results related to the connection between solutions chosen by the players in truncated subgames and in the overall game. The notion of the characteristic function for the general game is introduced, with the help of this and the solution for the general game, it is proved that solutions chosen by the players in the truncated subgames correspond to the resulting solution based on the new characteristic function. The concept of the Looking Forward Approach is new in game theory especially in cooperative differential games and gives a foundation for the further study of differential games with dynamic updating. There are currently almost no results in constructing approaches for modeling conflict-controlled processes where information about the process updates in real time. To get more information about the approach one may read the following papers: [6, 12–16, 19, 20, 29, 30]. In [12] the Looking Forward Approach was applied to a cooperative differential game with a finite horizon. The notion of a truncated subgame, the procedure for defining optimal strategies, a conditionally cooperative trajectory and solution concept, and the solution property of Δt-time consistency for a fixed information horizon were determined. [14] focuses on the study of the Looking Forward Approach with a stochastic forecast and dynamic adaptation when information about the conflicting process can change during the game. In [6] the Looking Forward Approach was applied to a cooperative differential game of pollution control. The paper studies the dependency of the resulting solution on the value of the information horizon, and the corresponding optimization problem was formulated and solved. In [13] the Looking Forward Approach was applied to a cooperative differential game with an infinite horizon. In [16] the Looking Forward Approach with a random horizon was presented, which is one of the variations of the Looking Forward Approach

10 About the Looking Forward Approach in Cooperative Differential Games

177

introduced in [12]. Papers [29] and [30] study cooperative differential games with an infinite horizon where information about the process updates dynamically, the focus of the papers is a profound formulation of Hamilton–Jacobi–Bellman equations for different types of forecasts and information structures. In [17] and [18] an imputation distribution procedure (IDP)-core was used as a cooperative solution and it was proved that the resulting solution is strongly time consistent. The last paper on the Looking Forward Approach [19] is devoted to studying the Looking Forward Approach for cooperative differential games with nontransferable utility and the real life application of the Looking Forward Approach to economic simulations. The set of all players is denoted by N (|N| = n). The characteristic function of a coalition is an essential concept in the theory of differential games. This function is defined in [3] as the total payoff for players from coalition S in Nash equilibrium in a game with the following set of players: coalition S (acting as one player) and players from the set N\S. A computation of the Nash equilibrium fully described in [1] is necessary for this approach. A set of imputations or a solution for the game is determined by the characteristic function at the beginning of each subinterval. For any set of imputations the IDP first introduced by Petrosyan in [23] is analyzed. See recent publications on this topic in [8, 9, 24]. In order to determine a solution for the whole game combined partial solutions and their IDP on subintervals is required. The property of time consistency and strong time consistency introduced by Petrosyan in [21, 22] are also examined for the proposed solution. The Looking Forward Approach has similarities with the Model Predictive Control (MPC) theory worked out within the framework of numerical optimal control. We analyze [5, 11, 25, 28] to get recent results in this area. MPC is a method of control when the current control action is achieved by solving at each sampling instant a finite horizon open-loop optimal control problem using the current state of an object as the initial state. This type of control is able to cope with strict limitations on controls and states, which is an advantage over other methods. There is, therefore, a wide application in petro-chemical and related industries where key operating points are located close to the set of admissible states and controls. The main problem that is solved in MPC is the provision of movement along the target trajectory under the conditions of random perturbations and an unknown dynamical system. At each time step the optimal control problem is solved for defining controls which will lead the system to the target trajectory. The Looking Forward Approach on the other hand solves the problem of modeling player behavior when information about the process updates dynamically. This means that the Looking Forward Approach does not use the target trajectory, but answers the question of composing a trajectory which will be used by players, and the question of allocating the cooperative payoff along the composed trajectory. To demonstrate the Looking Forward Approach we present an example of a cooperative resource extraction game with a finite horizon. The original example was introduced in [8], the problem of time consistency in this game was examined in [31]. We present both analytic and numerical solutions for specific parameters. The comparison between the original game and the game with the Looking Forward Approach is presented. In the final part of the example model we demonstrate

178

O. Petrosian and I. Kuchkarov

the strong time consistency of the solution. The structure of the article is as follows. In Sect. 10.1 the description of the original game is presented. In Sect. 10.2, the definition of a truncated subgame is presented. In Sect. 10.3 the solution of the truncated subgame is described, and the conditional-cooperative trajectory is constructed. In Sect. 10.4, based on the results in Sect. 10.3, a solution is constructed in a game with dynamic updating, the theorem of strong Δt-time consistency is presented. Section 10.5 is devoted to the construction of the characteristic function in a game with dynamic updating. In Sect. 10.6, the connection between the solutions in truncated subgames and the resulting solution in the original game with dynamic updating is described and formalized mathematically. In Sect. 10.7, the Looking Forward Approach is applied to a three player cooperative resource extraction game. The results of a numerical simulation in Matlab are presented.

10.2 The Original Game Consider an n-player differential game Γ (x0 , T − t0 ) with a finite horizon T − t0 , with an initial state x0 ∈ X ⊂ R m , where X is the state space and an initial time instant t0 (t0 and T are a fixed values). Denote the set of players by N = {1, . . . , n}. At each time instant player i ∈ N chooses a control or a strategy ui ∈ Ui ⊂ CompR k , the joint strategy (control) space is denoted by U = U1 × . . . × Un . The payoff function of the player i ∈ N is defined by #T Ki (x0 , T − t0 ; u) =

hi (τ, x(τ ), u(τ, x))e−r(τ −t0) dτ

(10.1)

t0

subject to the dynamical system x˙ = g(t, x(t), u(t, x)), x(t0 ) = x0 ,

(10.2)

where x(t) ∈ X ⊂ R m is the trajectory (the solution) of the system (10.2) with the control input u = (u1 , . . . , un ) ∈ U , r ≥ 0, functions hi (t, x, u) : [t0 , T ] × X × U → R and g(t, x, u) : [t0 , T ] × X × U → R are differentiable. hi (t, x, u) shows an instant payoff that is received by player i ∈ N in state x(t) ∈ R m with control input u = (u1 , . . . , un ), function e−r(t −t0) defines the discount factor for the players’ payoff. The solution of the system (10.2) determines the trajectory of the game. When open-loop strategies are used, we require piece-wise continuity with a finite number of breaks. For feedback strategies we follow [1]. We require that for any n-tuple of strategies u(t, x) = (u1 (t, x), . . . , un (t, x)) the solution of the Cauchy problem of (10.2) exists and is unique at the time interval [t0 , T ]. For a more sophisticated definition of feedback strategies in zero-sum differential games see [10].

10 About the Looking Forward Approach in Cooperative Differential Games

179

Truncated Subgame As mentioned above, information about the functions g(t, x, u) and hi (t, x, u) is updated at fixed time instants. Suppose that the interval [t0 , T ] ⊂ R is divided −t0 into l = TΔt − 1 subintervals [t0 + j Δt, t0 + (j + 1)Δt], j = 0, . . . , l of equal length Δt ≥ 0. Then during the time interval [t0 + j Δt, t0 + (j + 1)Δt] players have full information about the dynamics of the game g(t, x, u) and the payoff function hi (t, x, u) on time interval [t0 + j Δt, t0 + j Δt + T ], where T is a fixed value, namely the information horizon. More precisly information about the game is described in the following way, during the interval [t0 + j Δt, t0 + j Δt + T ] game dynamics g(t, x, u) = gk (t, x, u) and payoff hi (t, x, u) = hki (t, x, u), where functions gk (t, x, u), hki (t, x, u) : [t0 + kΔt, t0 + (k + 1)Δt] × X × R, T k = j, . . . , j + Δt . At time instant t = t0 + (j + 1)Δt information about game dynamics g(t, x, u) and payoff hi (t, x, u) is being updated. At the next time interval (t0 + (j + 1)Δt, t0 + (j + 2)Δt] players have full information about g(t, x, u) and payoff hi (t, x, u) at time interval (t0 + (j + 1)Δt, t0 + (j + 1)Δt + T ]. At the time instant t = t0 + (j + 1)Δt players receive additional information about motion equation g(t, x, u) = gk (t, x, u) and payoff hi (t, x, u) = hki (t, x, u) on the interval T . This new information about (t0 + kΔt, t0 + (k + 1)Δt + T ], where k = j + 1 + Δt the functions g(t, x, u) and hi (t, x, u) has the meaning of dynamic updating, the new information is the updating itself. To model this type of information structure we introduce the following definition (Fig. 10.1). Denote vector xj,0 = x(t0 + j Δt).

x

T

t0 Δt

T

Δt

x(t)

T

Δt

Δt

Δt

Δt

t T

Fig. 10.1 Each oval represents random truncated information, which is known to players during the time interval [t0 + j Δt, t0 + (j + 1)Δt], j = 0, . . . , l

180

O. Petrosian and I. Kuchkarov

Definition 10.1 Let j = 0, . . . , l. A truncated subgame Γ¯j (xj,0 , t0 + j Δt, t0 + j Δt + T ) is defined on the time interval [t0 + j Δt, t0 + j Δt + T ]. The dynamical system and the payoff function on the time interval [t0 +j Δt, t0 +j Δt +T ] coincide with that of the game Γ (x0 , T − t0 ) on the same time interval. The payoff function of player i ∈ N in the truncated subgame j is j

Ki (xj,0 , t0 + j Δt, t0 + j Δt + T ; u) = t0 +j #Δt +T

hi (τ, x(τ ), u(τ ))e−r(τ −t0) dτ (10.3)

t0 +j Δt

subject to the dynamical system x˙ = g(t, x, u), x(t0 + j Δt) = xj,0

(10.4)

with the initial condition xj,0 of the truncated subgame Γ¯j (xj,0 , t0 +j Δt, t0 +j Δt + T ) (Fig. 10.2).

x Gˆ2(x2,0, t0 + 2Δt, t0 + 2Δt + T )

t0 , t 0,

x ˆ G 0( 0

t0 Δt

+T

)

t 0, 0 ˆ G 1(x 1,

Δt

Δt , t0 + + Δt

Δt

x(t)

+ T)

Δt

Δt

Δt

t T

Fig. 10.2 Behavior of players in the game with truncated information can be modeled using the truncated subgames Γ¯j (xj,0 , t0 + j Δt, t0 + j Δt + T ), j = 0, . . . , l

10 About the Looking Forward Approach in Cooperative Differential Games

181

10.3 Solution of a Cooperative Truncated Subgame Consider a truncated cooperative subgame Γ¯jc (xj,0 , t0 + j Δt, t0 + j Δt + T ) defined on the time interval [t0 + j Δt, t0 +j Δt +T ] with the initial condition x(t0 +j Δt) = xj,0 . The total payoff of players to be maximized in this game is 

j

Ki (xj,0 , t0 + j Δt, t0 + j Δt + T ; u) =

i∈N



t0 +j #Δt +T

hi (τ, x(τ ), u(τ ))e−r(τ −t0 ) dτ

i∈N t +j Δt 0

(10.5) subject to x˙ = g(t, x, u), x(t0 + j Δt) = xj,0 .

(10.6)

This is an optimal control problem. Sufficient conditions for the solution and the optimal feedback are given by the following assertion [2]. Denote the maximum value of joint payoff (10.5) by the function W (j Δt ) (t, x): W

(j Δt )

(t, x) = max u∈U

, 

; j Ki (x, t, t0

+ j Δt + T ; u) ,

(10.7)

i∈N

where x, t are the initial state and time of the subgame of the truncated game respectively and U = U1 × . . . × Un . Theorem 10.1 Assume there exists a continuously differential function W (j Δt ) (t, x) : [t0 + j Δt, t0 + j Δt + T ] × R m → R satisfying the partial differential equation (j Δt ) (t, x) −Wt

= max u∈U

& n

hi (t, x, u)e

−r(t −t0 )

(j Δt ) + Wx (t, x)g(t, x, u)

' ,

i=1

(10.8) where

lim

W (j Δt ) (t, x) = 0 and the maximum in (10.8) is achieved under

t →−t0 +j Δt +T controls u∗j (t, x). Then

(10.6).

u∗j (t, x) is optimal in the control problem defined by (10.5),

182

O. Petrosian and I. Kuchkarov

Theorem 10.1 requires that the function W (j Δt ) be C 1 . However, it is possible to assume continuity only considering viscosity-solutions using the Subbotin approach [26, 27]. But due to the shortage of space, it is not possible to properly introduce and define this solution in this paper. In the example model we define and get solution W (j Δt ) from C 1 .

10.3.1 Conditionally Cooperative Trajectory During the game Γ (x0 , T − t0 ) players possess only truncated information about its structure. Obviously, this is not enough to construct optimal control and the corresponding trajectory for the game Γ (x0 , T − t0 ). As a cooperative trajectory in the game Γ (x0 , T − t0 ) we propose using a conditionally cooperative trajectory defined in the following way: Definition 10.2 A conditionally cooperative trajectory {xˆ ∗ (t)}Tt=t0 is defined as a composition of cooperative trajectories xj∗ (t) in the truncated cooperative subgames Γ¯jc (xj∗−1 (t0 +j Δt), t0 +j Δt, t0 +j Δt +T ) defined on the successive time intervals [t0 + j Δt, t0 + (j + 1)Δt] (Fig. 10.3):

{xˆ ∗ (t)}Tt0 =

⎧ ∗ ⎪ ⎪ x0 (t), t ∈ [t0 , t0 + Δt), ⎪ ⎪ ⎪ . ⎪ ⎪ ⎨ ..

xj∗ (t), t ∈ [t0 + j Δt, t0 + (j + 1)Δt), ⎪ ⎪ ⎪ .. ⎪ ⎪ ⎪ . ⎪ ⎩ ∗ xl (t), t ∈ [t0 + lΔt, t0 + (l + 1)Δt].

(10.9)

On the time interval [t0 + j Δt, t0 + (j + 1)Δt] a conditionally cooperative trajectory coincides with the cooperative trajectory xj∗ (t) in the truncated cooperative subgame Γ¯jc (xj∗−1 (t0 + j Δt), t0 + j Δt, t0 + j Δt + T ), where xj∗−1 (t0 + j Δt) is the position of the game at the instant t = t0 + j Δt on the cooperative trajectory of the previous subgame Γ¯jc (xj∗−2 (t0 +(j −1)Δt), t0 +(j −1)Δt, t0 +(j −1)Δt +T ). In the position xj∗ (t0 + (j + 1)Δt) at the time instant t = t0 + (j + 1)Δt information about the game structure updates. On the time interval (t0 + (j + 1)Δt, t0 + (j + 2)Δt] the trajectory xˆ ∗ (t) coincides with the cooperative trajectory xj∗+1 (t) in the truncated cooperative subgame Γ¯jc+1 (xj∗ (t0 + (j + 1)Δt), t0 + (j + 1)Δt, t0 + (j + 1)Δt + T ) which starts at the time instant t = t0 +(j +1)Δt in the position xj∗ (t0 +(j +1)Δt). For j = 0: xj∗−1 (t0 + j Δt) = x0 .

10 About the Looking Forward Approach in Cooperative Differential Games

183

x

x*

2 (t

)

ˆ x* (t)

)

(t ) x* 0

t0 Δt

(t) x* 1

Δt

Δt

Δt

Δt

Δt

t T

Fig. 10.3 The solid line represents the conditionally cooperative trajectory {xˆ ∗ (t)}Tt=t0 . Dashed lines represent parts of cooperative trajectories that are not used in the composition, i.e., each dashed trajectory is no longer optimal in the current random truncated subgame

10.3.2 Characteristic Function For each coalition S ⊆ N and j = 0, . . . , l define the values of the characteristic function as in [3] or [4]: ∗ , t + j Δt, t + j Δt + T ) = Vj (S; xj,0 0 0 ⎧ j ∗ ∗ ⎪ ⎪ ⎨ i∈N Ki (xj,0 , t0 + j Δt, t0 + j Δt + T ; uj ), S = N, ∗ , t + j Δt, t + j Δt + T ), S ⊂ N, (10.10) V˜j (S, xj,0 0 0 ⎪ ⎪ ⎩ 0, S = ∅, ∗ , t + j Δt, t + j Δt + T ) is defined as the total payoff for players where V˜j (S, xj,0 0 0 NE,j

NE,j

from coalition S in Nash equilibrium uNE = (u1 , . . . , un ) in the game with j the following set of players: coalition S (acting as one player) and players from the set N \ S, i.e. in the game with |N \ S| + 1 players. ∗ , t + j Δt) for each truncated cooperative subgame An imputation ξj (xj,0 0 ∗ , t + j Δt, t + j Δt + T ) is defined as an arbitrary vector which satisfies Γ¯jc (xj,0 0 0 the conditions ∗ ∗ ξi (xj,0 , t0 + j Δt, t0 + j Δt + T ) ≥ Vj ({i}, xj,0 , t0 + j Δt, t0 + j Δt + T ), i ∈ N,  j ∗ ∗ ξi (xj,0 , t0 + j Δt, t0 + j Δt + T ) = Vj (N, xj,0 , t0 + j Δt, t0 + j Δt + T ). j

i∈N

184

O. Petrosian and I. Kuchkarov

Denote the set of all possible imputations for each truncated subgame by ∗ , t + j Δt, t + j Δt + T ). Suppose that for each truncated subgame a Ej (xj,0 0 0 non-empty solution is defined: ∗ ∗ Wj (xj,0 , t, t0 + j Δt + T ) ⊂ Ej (xj,0 , t, t0 + j Δt + T ).

(10.11)

This can be a Core, an NM solution, a Nucleus, or a Shapley value.

10.4 Solution Concept in an Original Game with Dynamic Updating It is logical to assume that the distribution of the total payoff between players in the game Γ (x0 , T − t0 ) along the conditionally cooperative trajectory {xˆ ∗ (t)}Tt=t0 is defined as a combination of the imputations at time intervals [t0 + j Δt, t0 + (j + 1)Δt], j = 0, . . . , l. This construction is a new solution concept and we call it the resulting solution. ∗ , t + j Δt, t + j Δt + T ) does The combination of the family of sets Wj (xj,0 0 0 not allow us to obtain a solution in the game Γ (x0 , T − t0 ) directly. For each j = ∗ , t + j Δt, t + j Δt + T ) is 0, . . . , l the solution in a truncated subgame is Γˆjc (xj,0 0 0 defined for the time interval [t0 + j Δt, t0 + j Δt + T ]. But information about the game is updated with step Δt, and the use of such a solution in the time interval [t0 + j Δt, t0 + (j + 1)Δt] is not possible. A necessary part of the solution can be obtained by using the imputation distribution procedure for each truncated subgame. The IDP also provides the time consistency property of the new solution concept and the ability to determine solutions within the time interval [t0 + j Δt, t0 + j Δt + T ]. ∗ , t + j Δt, t + j Δt + T ) (ξ (x ∗ , t + j Δt)) Definition 10.3 Solution Wj (xj,0 0 0 j j,0 0 ∗ , t + j Δt, t + j Δt + T ) ∈ is called time-consistent if for any imputation ξj (xj,0 0 0 ∗ ∗ Wj (xj,0 , t0 + j Δt, t0 + j Δt + T ) exists IDP βj (t, xj ) which ∀t ∈ [t0 + j Δt, t0 + j Δt + T ] satisfies:

,#

t0 +j Δt +T t

,# t

; j βi (τ, xj∗ )dτ

t0 +j Δt +T

∈ W (xj∗ (t), t, t0 + j Δt + T ) ;

j βi (τ, xj∗ )dτ

(10.12)

 =

∗ ξj (xj,0 , t0

+ j Δt) .

In order to construct a solution in the game Γ (x0 , T − t0 ), you need to define ∗ , t + j Δt, t + j Δt + T ), j = the IDP for all truncated subgames Γˆjc (xj,0 0 0 0, . . . , l. We denote the family of subgames along the cooperative trajectory xj∗ (t) by Γˆ c (x ∗ , t0 + j Δt, t0 + j Δt + T ), where t ∈ (t0 + j Δt, t0 + j Δt + T ] j

j,0

10 About the Looking Forward Approach in Cooperative Differential Games

185

is the initial moment of the subgame. The characteristic function along xj∗ (t) in the family of subgames Γˆjc (xj∗ (t), t, t0 + j Δt + T ) is also defined as in (10.10). We denote by Ej (xj∗ (t), t, t0 + j Δt + T ) the set of imputations in the subgame Γˆ c (x ∗ (t), t, t0 + j Δt + T ). j

j

∗ , t + j Δt, t + j Δt + T ) the Suppose that in each truncated subgame Γˆjc (xj,0 0 0 ∗ solution Wj (xj,0 , t0 +j Δt, t0 +j Δt +T ) = ∅ along the cooperative trajectory xj∗ (t) is selected. Also, suppose that for any truncated subgame Γˆ c (x ∗ , t0 + j Δt, t0 + ∗ the imputation j Δt + T ) in the starting position xj,0

j

j,0

∗ ∗ ξj (xj,0 , t0 + j Δt, t0 + j Δt + T ) ∈ Wj (xj,0 , t0 + j Δt, t0 + j Δt + T )

and the corresponding IDP are selected βj (t, xj∗ ) = [β1 (t, xj∗ ), . . . , βn (t, xj∗ )], t ∈ (t0 + j Δt, t0 + j Δt + T ], j

j

which guarantees the time consistency of the selected imputation [23]:

∗ ξj (xj,0 , t0

t0 +j #Δt +T

+ j Δt, t0 + j Δt + T ) =

βj (t, xj∗ )e−r(τ −t0 ) dt.

(10.13)

t0 +j Δt ∗ ,t + The IDP βj (t, xj∗ ) can be obtained by differentiating the imputation ξj (xj,0 0 j Δt, t0 + j Δt + T ), the corresponding theorem is presented in [31]:

Theorem 10.2 If the function ξj (xj∗ , t, t0 + j Δt + T ) is continuously differentiable for t and xj∗ , then βj (t, xj∗ ) = −ξt (xj∗ , t, t0 + j Δt + T )−

j ∗j ∗j − ξx ∗ (xj∗ , t, t0 + j Δt + T )g τ, xj∗ (τ ), u1 (τ, xj∗ ), . . . , un (τ, xj∗ ) . j

(10.14)

j

The new solution concept in the game Γ (x0 , T − t0 ) consists of a combination of ∗ , t + j Δt, t + j Δt + T ) (corresponding to the IDP) in truncated solutions Wj (xj,0 0 0 subgames Γˆ c (x ∗ , t0 + j Δt, t0 + j Δt + T ), j = 0, . . . , l. Suppose that for each j

j,0

∗ , t + j Δt, t + j Δt + T ) ∈ W (x ∗ , t + j Δt, t + j Δt + T ) imputation ξj (xj,0 0 0 j j,0 0 0 there exists βj (t, xj∗ ). Define the resulting IDP for the whole game Γ (x0 , T − t0 ) as follows:

186

O. Petrosian and I. Kuchkarov

x

b

x* 1)

) x* 2 (2 t,

, b 1(t

) ,x* 0 b 0(t

t0 Δt

Δt

Δt

Δt

Δt

Δt

t T

∗ , t + j Δt, t + j Δt + Fig. 10.4 The combination of the βj (t, xj∗ ) IDP is defined for each ξj (xj,0 0 0 ∗ , t + j Δt, t + j Δt + T ), j = 0, . . . , l determines the distribution of the total payoff T ) ∈ Wj (xj,0 0 0 ˆ xˆ ∗ ) among players using β(t,

ˆ xˆ ∗ ) is defined for each set ξj (x ∗ , t0 + Definition 10.4 The resulting IDP β(t, j,0 ∗ , t + j Δt, t + j Δt + T ) using the corresponding j Δt, t0 + j Δt + T ) ∈ Wj (xj,0 0 0 βj (t, xj∗ ) as follows (Fig. 10.4): ⎧ ∗ ⎪ ⎪β0 (t, x0 ), t ∈ [t0 , t0 Δt], ⎪ ⎪ ⎪ .. ⎪ ⎪ ⎪ ⎨. ˆ xˆ ∗ ) = βj (t, x ∗ ), t ∈ [t0 + j Δt, t0 + (j + 1)Δt], β(t, j ⎪ ⎪ ⎪ ⎪.. ⎪ . ⎪ ⎪ ⎪ ⎩ βl (t, xl∗ ), t ∈ [t0 + lΔt, t0 + (l + 1)Δt].

(10.15)

ˆ xˆ ∗ ) we define the following vector: Using the resulting IDP β(t, Definition 10.5 The resulting imputation ξˆ (xˆ ∗ (t), T − t) is the vector defined by ˆ xˆ ∗ ) in the following way, let t ∈ [t0 + j Δt, t0 + (j + 1)Δt]: the resulting IDP β(t, ξˆ (xˆ (t), T − t) = ∗

#T

ˆ xˆ (τ ))e β(τ,

t





l  ⎢ + ⎣ m=j +1

−r(τ −t0 )

t0# +j Δt

dτ = t

t0 +(m+1)Δt #

βj (τ, xj∗ (τ ))e−r(τ −t0 ) dτ ⎤

⎥ ∗ βm (τ, xm (τ ))e−r(τ −t0) dτ ⎦ ,

t0 +mΔt

(10.16)

10 About the Looking Forward Approach in Cooperative Differential Games

187

in particular: ξˆ (x0 , T − t0 ) =

#T

ˆ xˆ ∗ (τ ))e−r(τ −t0) dτ. β(τ,

t0

We introduce the concept of the resulting solution in the game Γ (x0 , T −t0 ) with dynamic updating: Definition 10.6 The resulting solution Wˆ (xˆ ∗ (t), T − t) is the set of the resulting ˆ xˆ ∗ ) (10.15). imputation ξˆ (xˆ ∗ (t), T − t), (10.16) for all possible resulting IDP β(t, In [14], it was proved that using the resulting imputation ξˆ (xˆ ∗ (t), T − t) and the corresponding resulting solution Wˆ (xˆ ∗ (t), T − t) you can allocate actual total payoffs among players: Assertion 1 Any resulting imputation ξˆ (x0 , T − t0 ) ∈ Wˆ (x0 , T − t0 ) and the ˆ xˆ ∗ (t)) allocates total player payoffs (10.5) along corresponding resulting IDP β(t, the conditionally cooperative trajectory xˆ ∗ (t) in a game with the prescribed duration Γ (x0 , T − t0 ): n # 

t

βˆi (τ, xˆ ∗ (τ ))e−r(τ −t0) dτ =

i=1 t0

n # 

t

hi (τ, xˆ ∗ (τ ), uˆ ∗ (τ ))e−r(τ −t0 ) dτ.

i=1 t0

(10.17) The resulting solution Wˆ (x0 , T − t0 ) is time consistent by construction. In [12] it was proved that it also has the property of strong time consistency: Definition 10.7 The solution W (x0 , T − t0 ) is called strongly Δt-time consistent if for any j = 0, . . . , l and every ξ(x0 , T − t0 ) ∈ W (x0 , T − t0 ) the corresponding IDP β(t, x ∗ ) satisfies the condition t0# +j Δt

∗ β(τ, x ∗ (τ ))e−r(τ −t0) dτ ⊕W (xj,0 , T −t0 +j Δt) ⊂ W (x0 , T −t0 )

(10.18)

t0

in which a ⊕ A = {a + a : a ∈ A}. Theorem 10.3 An arbitrary resulting solution Wˆ (x0 , T − t0 ) is strongly Δt-time consistent in the game Γ (x0 , T − t0 ) with the prescribed duration. Under an arbitrary resulting solution we understand any resulting solution which ∗ , t + j Δt, t + j Δt + T ), j = 0, . . . , l as an is constructed using solutions Wj (xj,0 0 0 ∗ , t +j Δt, t +j Δt +T ). Solutions arbitrary subset of the set of imputations Ej (xj,0 0 0 in each truncated subgame can differ, i.e. in the first truncated subgame players can choose a Core, in the second truncated subgame players can choose a Shapley value etc.

188

O. Petrosian and I. Kuchkarov

10.5 The Construction of the Characteristic Function in a Game with Dynamic Updating As the characteristic function in a differential game Γ (x0 , T − t0 ) with dynamic updating, the resulting characteristic function is used. Definition 10.8 The resulting characteristic function V (S; xˆ ∗ (t), T −t) in the game Γ (xˆ ∗ (t), T −t) with dynamic updating is the function calculated using the values of the characteristic functions Vj (S; xj∗ (t), t, t0 +j Δt +T ) in every truncated subgame Γˆ c (x ∗ (t), t, t0 + j Δt + T ) along conditionally cooperative trajectory xˆ ∗ (t) for j = j

j

0, . . . , l, ∀t ∈ [t0 + j Δt, t0 + j Δt + T ]. Let t ∈ [t0 + j Δt, t0 + (j + 1)Δt], then: V (S; xˆ ∗ (t), T − t) =

l ! 

∗ Vm (S; xm,0 , t0 + mΔt, t0 + mΔt + T )−

m=j +1

" ∗ − Vm (S; xm,1 , t0 + (m + 1)Δt, t0 + mΔt + T ) + " ! ∗ , t0 + (j + 1)Δt, t0 + j Δt + T ) , + Vj (S; xj∗ (t), t, t0 + j Δt + T ) − Vj (S; xj,1 (10.19) ∗ = xˆ ∗ (t + j Δt), x ∗ = xˆ ∗ (t + (j + 1)Δt). where xj,0 0 0 j,1

In the following theorem it is shown that the resulting imputation ξˆ (x0 , T − t0 ) can be used as an imputation in the game Γ (x0 , T − t0 ) with V (S; x0 , T − t0 ) used as a characteristic function. Theorem 10.4 The resulting imputation ξˆ (x0 , T −t0 ) is the imputation in the game Γ (x0 , T − t0 ) with dynamic updating, if for ∀t ∈ [t0 + j Δt, t0 + (j + 1)Δt], j = 0, . . . , l the following condition is satisfied: ξi (xj∗ (t), t, t0 + j Δt + T ) − Vj ({i}; xj∗(t), t, t0 + j Δt + T ) ≥ j

∗ ∗ ≥ ξi (xj,1 , t0 +(j +1)Δt, t0+j Δt+T )−Vj ({i}; xj,1 , t0 +(j +1)Δt, t0+j Δt+T ). (10.20) j

Proof First let us show that for ∀t ∈ [t0 , T ] the following conditions are fulfilled: n 

ξˆi (xˆ ∗ (t), T − t) = V (N; xˆ ∗ (t), T − t),

(10.21)

ξˆi (xˆ ∗ (t), t) ≥ V ({i}; xˆ ∗(t), T − t).

(10.22)

i=1

10 About the Looking Forward Approach in Cooperative Differential Games

189

According to the definition of ξˆ (xˆ ∗ (t), T − t) and V (S; xˆ ∗ (t), T − t) left part of (10.21) can be rewritten as: n 

n # 

T

ξˆi (xˆ ∗ (t), T − t) =

i=1

ˆ xˆ ∗ (τ ))dτ = β(τ,

i=1 t



⎤⎤ ⎤ ⎡ j Δt ⎡ (m+1)Δt # # n l ⎢  ⎢ ⎥⎥ ∗ ∗ ⎣ = βm (τ, xm (τ ))dτ ⎦ + ⎣ βj (τ, xm (τ ))dτ ⎦⎦ = ⎣ i=1

m=j +1

=

n   l   i=1

t

mΔt

∗ ξim (xm,0 , t0 + mΔt, t0 + mΔt + T )−

m=j +1

 ∗ − ξim (xm,1 , t0 + (m + 1)Δt, t0 + mΔt + T ) +

" ! j j ∗ , t0 + (j + 1)Δt, t0 + j Δt + T ) . + ξi (xj∗ (t), t, t0 + j Δt + T ) − ξi (xj,1 (10.23) Since the condition (10.24) in (10.23) is satisfied then (10.21) is correct. n 

ξi (xj∗ (t), t, t0 + j Δt + T ) = Vj (N; xj∗ (t), t, t0 + j Δt + T ), j = 0, . . . , l, j

i=1

(10.24) Now lets prove (10.22) by substituting the expression ξˆi (xˆ ∗ (t), T − t) and V ({i}; xˆ ∗(t), T − t) for left part of (10.22). For right part of (10.22) we substitute V ({i}; x0, T − t0 ) (10.19):  l  ∗ , t0 + mΔt, t0 + mΔt + T )− ξim (xm,0 m=j +1 ∗ − ξim (xm,1 , t0 + (m + 1)Δt, t0 + mΔt + T ) +

! " j j ∗ + ξi (xj∗ (t), t, t0 + j Δt + T ) − ξi (xj,1 , t0 + (j + 1)Δt, t0 + j Δt + T ) ≥ ≥

 l  ∗ Vm ({i}; xm,0 , t0 + mΔt, t0 + mΔt + T )− m=j +1

190

O. Petrosian and I. Kuchkarov

∗ − Vm ({i}; xm,1 , t0 + (m + 1)Δt, t0 + mΔt + T ) +

! " ∗ + Vj ({i}; xj∗(t), t, t0 + j Δt + T ) − Vj ({i}; xj,1 , t0 + (j + 1)Δt, t0 + j Δt + T ) (10.25) Equation (10.25) is fulfilled for ∀t ∈ [t0 , T ], if for ∀m = 0, . . . , l is fulfilled ∗ ∗ ξim (xm,0 , t0 + mΔt, t0 + mΔt + T ) − ξim (xm,1 , t0 + (m + 1)Δt, t0 + mΔt + T ) ≥ ∗ ∗ Vm ({i}; xm,0 , t0 +mΔt, t0 +mΔt+T )−Vm ({i}; xm,1 , t0 +(m+1)Δt, t0+mΔt+T ) (10.26)

and for ∀t ∈ [t0 + j Δt, t0 + (j + 1)Δt], m = 0, . . . , l is fulfilled ∗ ξi (xj∗ (t), t, t0 + j Δt + T ) − ξi (xj,1 , t0 + (j + 1)Δt, t0 + j Δt + T ) ≥ j

j

∗ Vj ({i}; xj∗(t), t, t0 + j Δt + T ) − Vj ({i}; xj,1 , t0 + (j + 1)Δt, t0 + j Δt + T ). (10.27)

The fulfillment of condition (10.27) for ∀t ∈ [t0 + j Δt, t0 + (j + 1)Δt], m = 0, . . . , l implies the fulfillment of condition (10.26). Rewrite (10.27): ξi (xj∗ (t), t, t0 + j Δt + T ) − Vj ({i}; xj∗(t), t, t0 + j Δt + T ) ≥ j

∗ ∗ ≥ ξi (xj,1 , t0 +(j +1)Δt, t0+j Δt+T )−Vj ({i}; xj,1 , t0 +(j +1)Δt, t0+j Δt+T ). (10.28) j

The condition (10.28) means that in every truncated subgame changing the values of the characteristic function and imputation according to the time happens evenly in reference to each other. The theorem is proved. In this paragraph, the notion of the characteristic function V (S; x0 , T − t0 ) in the game Γ (x0 , T − t0 ) with dynamic updating is introduced. It is shown that the resulting imputation ξˆ (x0 , T − t0 ) is an imputation in the classical meaning fulfilled with individually rational conditions. Nevertheless condition (10.20) is not fulfilled ∀t ∈ [t0 , T ].

10.6 The Relationship of Solutions in Truncated Subgames and the Resulting Solutions In this paragraph it is shown that if players choose the imputation ξj (xj∗ (t), t, t0 + j Δt + T ) ∈ Ej (xj∗ (t), t, t0 + j Δt + T ) based on Vj (S; xj∗ (t), t, t0 + j Δt + T ), j = 0, . . . , l in every truncated subgame by the same rule, then the resulting

10 About the Looking Forward Approach in Cooperative Differential Games

191

imputation ξˆ (xˆ ∗ (t), T − t) corresponds to the imputation chosen by the same rule using the resulting characteristic function V (S; xˆ ∗ (t), T − t). Further prove of it for the number of optimality principals. First of all show, that if in every truncated subgame Γˆj (xj∗ , t, t0 + j Δt + T ) players choose a Shapley value Shj (xj∗ (t), t, t0 + j Δt + T ) as an imputation, then the resulting imputation ξˆ (xˆ ∗ (t), T − t) (10.16) coincides with the Shapley value ˆ ∗ (t), t, t0 + j Δt + T ), calculated using the resulting characteristic function Sh(x j V (S; xˆ ∗ (t), T − t) (10.19). Theorem 10.5 Suppose that in every truncated subgame Γˆj (xj∗ , t, t0 + j Δt + T ): ξj (xj∗ (t), t, t0 + j Δt + T ) = Shj (xj∗ (t), t, t0 + j Δt + T ), where t ∈ [t0 + j Δt, t0 + j Δt + T ], j = 0, . . . , l. Then the resulting imputation ˆ xˆ ∗ (t), T − t): ξˆ (xˆ ∗ (t), T − t) coincides with Sh( ˆ xˆ ∗ (t), T − t), ∀t ∈ [t, T ], ξˆ (xˆ ∗ (t), T − t) = Sh( ˆ xˆ ∗ (t), T − t) is the Shapley value calculated using the resulting characwhere Sh( teristic function V (S; xˆ ∗ (t), T − t) (10.19). Proof In this case the resulting imputation ξˆ (xˆ ∗ (t), T −t) is calculated by following formulas (10.15), (10.16) using the Shapley values Shj (xj∗ (t), t, t0 + j Δt + T ) in every truncated subgame Γˆj (x ∗ (t), t, t0 + j Δt + T ): j

ξˆ (xˆ ∗ (t), T − t) =

 l 

∗ , t0 + mΔt, t0 + mΔt + T )− Shm (xm,0

m=j +1 ∗ − Shm (xm,1 , t0 + (m + 1)Δt, t0 + mΔt + T ) +

! " ∗ + Shj (xj∗ (t), t, t0 + j Δt + T ) − Shj (xj,1 , t0 + (j + 1)Δt, t0 + j Δt + T ) , (10.29) where Shj (xj∗ (t), t, t0 + j Δt + T ), t ∈ [t0 + j Δt, t0 + j Δt + T ], j = 0, . . . , l is calculated by following formula Shi (xj∗ (t), t, t0 + j Δt + T ) = j



 (|N| − |S|)!(|S| − 1)! · |N|! S⊂N i∈S

· Vj (S; xj∗ (t), t, t0 + j Δt + T ) − Vj (S\{i}; xj∗ (t), t, t0 + j Δt + T ) . (10.30)

192

O. Petrosian and I. Kuchkarov

ˆ xˆ ∗ (t), T − t) is calculated using the resulting characteristic function Sh( V (S; xˆ ∗ (t), T − t). Substitute the expression for V (S; xˆ ∗ (t), T − t) in the formula ˆ xˆ ∗ (t), T − t). let t ∈ [t0 + j Δt, t0 + (j + 1)Δt], then: for Sh( ˆ xˆ ∗ (t), T − t) = Sh(

 (|N| − |S|)!(|S| − 1)! · |N|! S⊂N i∈S

·

  ! l

∗ Vm (S; xm,0 , t0 + mΔt, t0 + mΔt + T )−

m=j +1

" ∗ − Vm (S; xm,1 , t0 + (m + 1)Δt, t0 + mΔt + T ) − ! ∗ , t0 + mΔt, t0 + mΔt + T )− − Vm (S\{i}; xm,0 ∗ − Vm (S\{i}; xm,1 , t0 + (m + 1)Δt, t0 + mΔt + T )

+

!

"

+

Vj (S; xj∗ (t), t, t0 + j Δt + T )−

" ! ∗ − Vj (S; xj,1 , t0 + (j + 1)Δt, t0 + j Δt + T ) − Vj (S\{i}; xj∗ (t), t, t0 + j Δt + T )− "  ∗ − Vj (S\{i}; xj,1 , t0 + (j + 1)Δt, t0 + j Δt + T ) . (10.31) After substitution of (10.30) in (10.29) we obtain (10.31). Theorem is proved. The same result can be obtained for the proportional solution. Suppose that the characteristic function V ({i}; xj∗(t), t, t0 + j Δt + T ) is differentiable along the cooperative trajectory xj∗ (t). Define the proportiona solution using its IDP βj in the following way:

P rop

P rop,j

βi

(t)

d Vj ({i}; xj∗(t), t, t0 + j Δt + T ) − dt · (t) = d − dt Vj ({i}; xj∗(t), t, t0 + j Δt + T ) i∈N

 −

 d Vj (N; xj∗ (t), t, t0 + j Δt + T ) , i ∈ N. dt

(10.32) P rop

The corresponding imputation, obtained by the direct integration of βj the formula (10.13) and is denoted by P ropj (xj∗ (t), t, t0 + j Δt + T ).

(t) using

10 About the Looking Forward Approach in Cooperative Differential Games

193

Prove that in every truncated subgame Γˆj (xj∗ (t), t, t0 + j Δt + T ) players choose the proportional solution P ropj (xj∗ (t), t, t0 + j Δt + T ) (10.32), then the resulting imputation, defined by the formula ξˆ (xˆ ∗ (t), T − t) (10.16) coincides with the ˆ xˆ ∗ (t), T −t) (10.32), calculated using the characteristic proportional solution P rop( ∗ function V (S; xˆ (t), T − t) (10.19). Theorem 10.6 Suppose that in every truncated subgame Γˆj (xj∗ , t, t0 + j Δt + T ) ξj (xj∗ (t), t, t0 + j Δt + T ) = P ropj (xj∗ (t), t, t0 + j Δt + T ), where t ∈ [t0 + j Δt, t0 + j Δt + T ], j = 0, . . . , l. Then the resulting vector ˆ xˆ ∗ (t), T − t) (10.32): ξˆ (xˆ ∗ (t), T − t) will coincide with P rop( ˆ xˆ ∗ (t), T − t), ∀t ∈ [t, T ], ξˆ (xˆ ∗ (t), T − t) = P rop( ˆ xˆ ∗ (t), T −t) is the proportional solution calculated using the resulting where P rop( characteristic function V (S; xˆ ∗ (t), T − t) (10.19). Proof In this case the resulting imputation ξˆ (xˆ ∗ (t), T − t) is calculated using the formula (10.15) for the IDP using the combination of the values of the IDP for the proportional solution (10.32) in every truncated subgame on the interval [t0 + j Δt, t0 + (j + 1)Δt], i ∈ N. ˆ xˆ ∗ (t), T − t) or for the IDP βˆ P rop (t) (10.32), Show that the formula for P rop( ∗ where we use V (S; xˆ (t), T − t) (10.19) as a characteristic function leads to the right part of (10.32). Substitute the expression for V (S; xˆ ∗ (t), T − t) (10.19) in (10.32). Consider one of the addends. Let t ∈ [t0 + j Δt, t0 + (j + 1)Δt]: −

 d d V ({i}; xˆ ∗ (t), T − t) = − dt dt

  l !

∗ Vk ({i}; xk,0 , t0 +kΔt, t0 +kΔt +T )−

k=j +1

" ∗ , t0 + (k + 1)Δt, t0 + kΔt + T ) + − Vk ({i}; xk,0 +

!

∗ Vj ({i}; xj∗ (t), t, t0 + j Δt + T ) − Vj ({i}; xj,0 , t0 + (j

" + 1)Δt, t0 + j Δt + T ) .

(10.33) From (10.33) we can see, that for t ∈ [t0 + j Δt, t0 + (j + 1)Δt], j = 0, . . . , l under the derivative sign there is only one addend, depending on t, therefore −

d

d

Vj ({i}; xj∗(t), t, t0 + j Δt + T ) . V ({i}; xˆ ∗(t), T − t) = − dt dt

(10.34)

194

O. Petrosian and I. Kuchkarov

Substitute (10.34) and the formula for V (N; xˆ ∗ (t), T − t) (10.32). It is easy to see, that in this case the right hand side of IDP βˆ P rop (t) and (10.32) are equal. Theorem is proved. Show, that if in every truncated subgame Γˆj (xj∗ , t, t0 + j Δt + T ) players choose the Core Cj (xj∗ (t), t, t0 + j Δt + T ) as the optimality principle, then the resulting solution, every element of which is ξˆ (xˆ ∗ (t), T − t) (10.16) is a Core, calculated using the resulting characteristic function V (S; xˆ ∗ (t), T − t) (10.19). Theorem 10.7 Suppose that in every truncated subgame Γˆj (xj∗ , t, t0 + j Δt + T ): Wj (xj∗ (t), t, t0 + j Δt + T ) = Cj (xj∗ (t), t, t0 + j Δt + T ), where ∀t ∈ [t0 + j Δt, t0 + j Δt + T ], j = 0, . . . , l, then for every ξj (xj∗ (t), t, t0 + j Δt + T ) ∈ Cj (xj∗ (t), t, t0 + j Δt + T ), for which following condition is satisfied  j ξi (xj∗ (t), t, t0 + j Δt + T ) − Vj (S; xj∗ (t), t, t0 + j Δt + T ) ≥ i∈S

 j ∗ , t +(j +1)Δt, t +j Δt +T )−V (S; x ∗ , t +(j +1)Δt, t +j Δt +T ), ξi (xj,1 ≥ 0 0 j 0 j,1 0 i∈S

(10.35) the following is satisfied ˆ xˆ ∗ (t), T − t), ∀t ∈ [t, T ], ξˆ (xˆ ∗ (t), T − t) ∈ C( ˆ xˆ ∗ (t), T − t) is the Core, calculated using the resulting characteristic where C( function V (S; xˆ ∗ (t), T − t) (10.19). Proof The following statements should be proven: 1. If players in every truncated subgame choose the imputation ξj (xj∗ (t), t, t0 + j Δt + T ) ∈ Cj (xj∗ (t), t, t0 + j Δt + T ) calculated using Vj (S; xj∗ (t), t, t0 + j Δt + T ), j = 0, . . . , l, then the resulting imputation ξˆ (xˆ ∗ (t), T − t) belongs ˆ xˆ ∗ (t), T − t), calculated using the resulting characteristic function to the Core C( ∗ V (S; xˆ (t), T − t). ˆ xˆ ∗ (t), T − t) shouldn’t contain an imputation ξˆ (xˆ ∗ (t), T − t), 2. The Core C( for which it is impossible to find a set of imputations in truncated subgames ξj (xj∗ (t), t, t0 + j Δt + T ) ∈ Cj (xj∗ (t), t, t0 + j Δt + T ).

10 About the Looking Forward Approach in Cooperative Differential Games

195

Prove the first statement, that if the set of imputations ξi (xj∗ (t), t, t0 + j Δt + T ), satisfies the system of inequalities: j



ξi (xj∗ (t), t, t0 + j Δt + T ) ≥ Vj (S; xj∗ (t), t, t0 + j Δt + T ), S ⊂ N, j

i∈S

for each t ∈ [t0 + j Δt, t0 + j Δt + T ], j = 0, . . . , l, i = 1, . . . , n, then the resulting imputation ξˆ (xˆ ∗ (t), T − t) satisfies the system of inequalities: 

ξˆi (xˆ ∗ (t), T − t) ≥ V (S; xˆ ∗ (t), T − t), ∀t ∈ [t0 , T ], S ⊂ N.

(10.36)

i∈S

Substitute the expression ξˆ (xˆ ∗ (t), T − t) for the left hand side of (10.36). For the right hand side of (10.36) substitute (10.19). As in theorem 10.6. show that for every S ⊂ N, t ∈ [t0 + j Δt, t0 + j Δt + T ] the fulfilment of (10.36) leads to the fulfilment of (10.35). The first statement is proved. j Further prove the second statement, that in the set of imputations ξi (xj∗ (t), t, t0 + j Δt + T ), i = 1, . . . , n, j = 0, . . . , l, satisfying the system of inequalities ∀j = 0, . . . , l, S ⊂ N: l !  m=j +1

∗ ξim (xm (t), t, t0 + mΔt + T )−

" ∗ − ξim (xm,1 , t0 + (m + 1)Δt, t0 + mΔt + T ) +

" ! j j ∗ , t0 + (j + 1)Δt, t0 + j Δt + T ) ≥ + ξi (xj∗ (t), t, t0 + j Δt + T ) − ξi (xj,1 ≥

l ! 

∗ , t0 + mΔt, t0 + mΔt + T )− Vm (S; xm,0

m=j +1

" ∗ , t0 + (m + 1)Δt, t0 + mΔt + T ) + − Vm (S; xm,1 ! " ∗ + Vj (S; xj∗ (t), t, t0 + j Δt + T ) − Vj (S; xj,1 , t0 + (j + 1)Δt, t0 + j Δt + T ) , (10.37) there exists at least one set of ξi (xj∗ (t), t, t0 + j Δt + T ), i = 1, . . . , n, j = 0, . . . , l satisfying: j



ξi (xj∗ (t), t, t0 + j Δt + T ) ≥ j

i∈S

≥ Vj (S; xj∗ , t, t0 + j Δt + T ), ∀j = 0, . . . , l, S ⊂ N.

(10.38)

196

O. Petrosian and I. Kuchkarov

Proof from the opposite. Suppose, that for the imputations satisfying (10.35) and (10.38) the inequality is not satisfied (10.37). Show that for ∀j = 0, . . . , l the following condition is satisfied: 

ξi (xj∗ (t), t, t0 + j Δt + T ) − Vj (S; xj∗ (t), t, t0 + j Δt + T ) ≥ j

i∈S





∗ ∗ ξi (xj,1 , t0 +(j +1)Δt, t0 +j Δt+T )−Vj (S; xj,1 , t0 +(j +1)Δt, t0 +j Δt+T ), j

i∈S

(10.39) then from (10.38) the sign of the right and left hand sides are always positive, and using (10.35) it follows that (10.39) is satisfied. The theorem is proved. The same results can be obtained for an IDP-core, a new cooperative solution presented in [17]. Introduce the following notation: Uj (S; xj∗ (t), t, t0 + j Δt + T ) = −

d Vj (S; xj∗ (t), t, t0 + j Δt + T ), dt

(10.40)

where t ∈ [t0 + j Δt, t0 + j Δt + T ] and S ⊆ N. Define Bj (t) as a set of integrable vector functions βj (t) satisfying the following system of inequalities: 7 j j Bj (t, xj∗ ) = βj (t) = (β1 (t), . . . , βn (t)) : 

βi (t) ≥ Uj (S, xj∗ (t), t, t0 + j Δt + T ), j

i∈S



8 j βi = Uj (N, xj∗ (t), t, t0 + j Δt + T ), ∀S ⊂ N .

(10.41)

i∈N

Suppose that Bj (t, xj∗ ) = ∅, ∀t ∈ [t0 + j Δt, t0 + j Δt + T ], j = 0, . . . , l. Then using the set Bj (t, xj∗ ) it is possible to define the following set of vectors: Definition 10.9 The set of all ξj (xj∗ (t), t) for some integrable selectors βj (t, xj∗ ) ∈ Bj (t, xj∗ ) we shall call an IDP-core and denote it as C j (xj∗ (t), t0 + j Δt + T ), where 3 4 C j (xj∗ (t), t) = ξj (xj∗ (t), t), t ∈ [t0 + j Δt, t0 + j Δt + T ]

(10.42)

and for t ∈ [t0 + j Δt, t0 + j Δt + T ] ξj (xj∗ (t), t)

#T = t

βj (τ, xj∗ )dτ.

(10.43)

10 About the Looking Forward Approach in Cooperative Differential Games

197

In [17] it was also proved that an IDP-core is strongly time consistent. Show, that if in every truncated subgame Γˆj (xj∗ , t, t0 + j Δt + T ) players choose an IDP-core C j (xj∗ (t), t, t0 + j Δt + T ) (10.42) as an optimality principle, then the resulting solution, every element of which is defined by formula (10.16), is an IDP-core, calculated using the resulting characteristic function V (S; xˆ ∗ (t), T − t) (10.19). Theorem 10.8 Suppose in every truncated subgame Γˆj (xj∗ , t, t0 + j Δt + T ) Wj (xj∗ (t), t, t0 + j Δt + T ) = C j (xj∗ (t), t, t0 + j Δt + T ) = ∅, where ∀t ∈ [t0 + j Δt, t0 + j Δt + T ], j = 0, . . . , l, then ˆ xˆ ∗ (t), T − t), ∀t ∈ [t, T ], Wˆ (xˆ ∗ (t), T − t) = C( ˆ xˆ ∗ (t), T − t) is the IDP-core, calculated using the resulting characteristic where C( function V (S; xˆ ∗ (t), T − t) (10.19). Proof The resulting solution Wˆ (xˆ ∗ (t), T − t) consists of ξˆ (xˆ ∗ (t), T − t), each of which is defined by the IDP set of imputations ξj (xj∗ (t), t, t0 + j Δt + T ) ∈ C j (xj∗ (t), t, t0 + j Δt + T ), j = 0, . . . , l by formula (10.15). According to the definition of an IDP-core, each imputation from the IDP-core satisfies the following system of inequalities (10.41): 

βi (t, xˆ ∗ (t)) ≥ − j

i∈S

 i∈N

d

Vj (S; xj∗ (t), t, t0 + j Δt + T , dt

βi (t, xˆ ∗ (t)) = − j

d

Vj (N; xj∗ (t), t, t0 + j Δt + T ) , ∀S ⊂ N. dt

Thus, the resulting solution Wˆ (xˆ ∗ (t), T − t) is defined by (10.41) for t ∈ [t0 + j Δt, t0 + (j + 1)Δt], j = 0, . . . , l. ˆ xˆ ∗ (t), T − t) with the resulting characteristic Write out the expression for C( ∗ function V (S; xˆ (t), T − t) (10.19). Show, that it leads to (10.41). Consider separately one of the constraints in it and substitute the expression for V (S; xˆ ∗ (t), T −t) (10.19), let t ∈ [t0 + j Δt, t0 + (j + 1)Δt]: −

l d

d  ! ∗ V (S; xˆ ∗ (t), T − t) = − Vk (S; xk,0 , t0 + kΔt, t0 + kΔt + T )− dt dt k=j +1

" ∗ , t0 + (k + 1)Δt, t0 + kΔt + T ) + −Vk (S; xk,0 " ! ∗ , t0 + (j + 1)Δt, t0 + j Δt + T ) . + Vj (S; xj∗ (t), t, t0 + j Δt + T ) − Vj (S; xj,0 (10.44)

198

O. Petrosian and I. Kuchkarov

From (10.44) it follows that for t ∈ [t0 + j Δt, t0 + (j + 1)Δt], j = 0, . . . , l under the desired sign there is only one addend depending on t, so −

d

d

Vj (S; xj∗ (t), t, t0 + j Δt + T ) . V (S; xˆ ∗ (t), T − t) = − dt dt

(10.45)

ˆ xˆ ∗ (t), T − t): Substitute (10.45) in the formula for C( 

βˆi (t, xˆ ∗ (t)) ≥ −

i∈S

 i∈N

βˆi (t, xˆ ∗ (t)) = −

d

Vj (S; xj∗ (t), t, t0 + j Δt + T ) , dt

d

Vj (N; xˆ ∗ (t), t, t0 + j Δt + T ) , ∀S ⊂ N. dt

ˆ xˆ ∗ (t), T − t) with the resulting characteristic function Thus, the IDP-core C( ∗ V (S; xˆ (t), T −t), coincides with the resulting solution Wˆ (xˆ ∗ (t), T −t), calculated using the combination of solutions C j (xj∗ (t), t, t0 + j Δt + T ) in the truncated subgames. The theorem is proved. In this paragraph it is shown, that if players in every truncated subgame Γˆj (xj,0 , t0 + j Δt, t0 + j Δt + T ) choose a proportional solution, a Shapley value, an imputation from Core or an imputation from an IDP-core as an optimality principle, then the resulting imputation is also a proportional solution, a Shapley value, an imputation from Core or an imputation from an IDP-core in the game Γ (x0 , T − t0 ) with dynamic updating. The theorems proved in this section give the approach for directly calculating the resulting solution.

10.7 The Cooperative Limited Resource Extraction Game with Dynamic Updating Consider the resource extraction game defined on a closed time interval. The solution of the two person game in the classical form is presented in [8]. The problem of time consistency was studied by Yeung [31]. In this example, a game of limited resource extraction with dynamic updating for three persons is presented. A Core is used as an optimality principle. The characteristic function is calculated as in [3]. In the last part of the example the property of strong time consistency is discussed.

10 About the Looking Forward Approach in Cooperative Differential Games

199

10.7.1 Initial Game The following dynamical system describes the change in the stock of resources x(t) ∈ X ⊂ R: 3  ) ui , x(t0 ) = x0 , x˙ = a x(t) − bx(t) − i=1

where ui is the player’s production level i = 1, 3. The payoff of player i: #T Ki (x0 , t0 ; u) =

hi (τ, x(τ ), u(τ ))dτ, t0

here hi (τ, x(τ ), u(τ )) =

)

ci ui (τ ) − √ ui (τ ), i = 1, 3, x(τ )

where ci is a constant, ci = ck , ∀i = k = 1, 3.

10.7.2 Truncated Subgame The initial game Γ (x0 , T − t0 ) is defined on the time interval [t0 , T ]. Suppose that for any t ∈ [t0 + j Δt, t0 + (j + 1)Δt], j = 0, . . . , l, players have truncated information about the game. This includes information about the dynamical system and the payoff function on the time interval [t0 + j Δt, t0 + j Δt + T ]. This model is constructed using the truncated subgame Γˆj (xj,0 , t0 + j Δt, t0 + j Δt + T ). The dynamical system and the initial data have the following form: 3  ) x˙ = a x(t) − bx(t) − ui , x(t0 + j Δt) = xj,0 .

(10.46)

i=1

The payoff function of player i is:

j Ki (xj,0 , t0

t0 +j #Δt +T

+ j Δt, t0 + j Δt + T ; u) =

hi (τ, x(τ ), u(τ ))dτ. t0 +j Δt

Consider the case when players agree to cooperate in the truncated subgame Γˆj (xj,0 , t0 + j Δt, t0 + j Δt + T ). Then players act to maximize the total payoffs.

200

O. Petrosian and I. Kuchkarov

10.7.3 Cooperative Trajectory Suppose that the maximum joint payoff of players in each truncated subgame Γˆj (xj,0 , t0 + j Δt, t0 + j Δt + T ) has the following form [8]: √ W j (t, x) = Aj (t) x + C j (t),

(10.47)

where functions Aj (t), C j (t) satisfy the system of differential equations: b A˙ j (t) = Aj (t) − 2

3 





⎣ ! 1 "⎦ , Aj (t ) i=1 4 ci + 2

a C˙ j (t) = − Aj (t), 2 Aj (t0 + j Δt + T ) = 0, C j (t0 + j Δt + T ) = 0. The cooperative trajectory xj∗ (t) in each truncated subgame can be calculated as follows [8]: ⎡ #t ( 1 ⎢ ∗ ∗ 2 xj (t) = j (t0 + j Δt, t) ⎣ xj,0 + a · 2

⎤2 ⎥ j (t0 + j Δt, τ )−1 dτ ⎦ ,

t0 +j Δt

where t ∈ [t0 + j Δt, t0 + j Δt + T ] and #t j (t0 + j Δt, t) = exp t0 +j Δt

⎡ ⎤⎤ 3  1 ⎢ ⎥⎥ ⎢1 −⎣ b + ⎣ ! "2 ⎦⎦ dτ. j 2 A (τ ) i=1 4 ci + 2 ⎡

The initial position for the cooperative trajectory in each truncated subgame is ∗ = x and x ∗ = x ∗ (t + j Δt) defined from the previous truncated subgame: x0,0 0 j,0 j −1 0 for 1 ≤ j ≤ l. The conditionally cooperative trajectory xˆ ∗ (t) is defined as follows: xˆj∗ (t) = xj∗ (t), t ∈ [t0 + j Δt, t0 + (j + 1)Δt], j = 0, . . . , l.

10.7.4 Characteristic Function In order to allocate the cooperative payoff between players in each truncated subgame, it is necessary to determine the values of the characteristic function Vj (S; xj,0 , t0 +j Δt, t0 +j Δt +T ) for each coalition S ⊂ N. In accordance with the

10 About the Looking Forward Approach in Cooperative Differential Games

201

formula (10.10), the maximum total payoff for players Wj (t0 + j Δt, xj,0 ) (10.47) corresponds to the value of the characteristic function Vj (N; xj,0 , t0 + j Δt, t0 + j Δt + T ) of the coalition S = N in the truncated subgame Γˆjc (xj,0 , t0 + j Δt, t0 + j Δt + T ): Vj (N; xj∗ (t), t, t0 + j Δt + T ) = Wj (t, xj∗ (t)),

(10.48)

where t ∈ [t0 + j Δt, t0 + j Δt + T ], j = 0, . . . , l. Next, we need to determine the values of the characteristic function for the following coalitions: {1}, {2}, {3}, {1, 2}, {1, 3}, {2, 3}. For each coalition {i}, i = 1, 3, we need to determine the Nash equilibrium in the truncated subgame Γˆj (xj,0 , t0 + j Δt, t0 + j Δt + T ) and as a result Vj ({i}; xj∗(t), t, t0 + j Δt + T ).

10.7.5 One Player Coalitions The Nash equilibrium in the truncated subgame Γˆj (xj,0 , t0 + j Δt, t0 + j Δt + T ) is determined by the following strategies of players: x

j

ui (t, x) =

j

4[ci + Ai (t)/2]2

, i = 1, 3,

j

where functions Ai (t) for i = 1, 3 are defined by the system of differential equations: ⎤  1 1 b j j ⎦− A˙ i (t) = Ai (t) ⎣ + , j j 2 2 4(ci + Ai (t)/2) k =i 8(ck + Ak (t)/2) ⎡

a j j C˙ i (t) = − Ai (t), 2 j

j

Ai (t0 + j Δt + T ) = 0, Ci (t0 + j Δt + T ) = 0. The corresponding payoff for players i = 1, 3 in the Nash equilibrium is determined by the function: √ j j j Vi (t, x) = Ai (t) x + Ci (t), i = 1, 3.

202

O. Petrosian and I. Kuchkarov

The value of the characteristic function for coalitions consisting of one player S = {i}, i ∈ N is calculated as follows: Vj ({i}; xj∗(t), t, t0 + j Δt + T ) = Vi (t, xj∗ (t)), j

(10.49)

where t ∈ [t0 + j Δt, t0 + j Δt + T ], j = 0, . . . , l.

10.7.6 Two Player Coalitions In accordance with the formula (10.10), the characteristic function Vj (S; xj,0 , t0 + j Δt, t0 + j Δt + T ) for coalitions consisting of two players S = {1, 2}, {1, 3}, {2, 3} is defined as the best reply of coalition S against the strategies from the Nash NE,j NE,j NE,j equilibrium uNE = (u1 , u2 , u3 ) in the truncated subgame Γˆj (xj,0 , t0 + j j Δt, t0 + j Δt + T ), used by players from N/S. In our case, this means that players from coalition S act as one player and maximize their total payoff. Using this approach, we determine the equilibrium between the two players: the combined player (coalition S), and a player not included in the coalition S (coalition N/S). Consider the formula for Vj (S; xj,0 , t0 + j Δt, t0 + j Δt + T ) in the case of S = {1, 2}. The formula for the remaining coalitions can be obtained by the same principle: √ j j j V{1,2} (t, x) = A{1,2} (t) x + C{1,2} (t), √ j j j V3 (t, x) = A3 (t) x + C3 (t), j

j

j

j

where functions A{1,2} (t), A3 (t), C{1,2} (t), C3 (t) satisfy the system of differential equations: $ j A˙ {1,2} (t)

=

j A{1,2} (t)

1 b + 2 8(c3 + Aj (t)/2)2 3

% −



1

k∈S

4(ck + A{1,2} (t)/2)

j

,

% $ 1 1 b  j j ˙ − + , A3 (t) = A3 (t) j j 2 2 4(c3 + A3 (t)/2) k∈S 8(ck + A{1,2} (t)/2) a j a j j j C˙ {1,2} (t) = − A{1,2} (t), C˙ 3 (t) = − A3 (t) 2 2 j

j

with the initial condition A{1,2} (t0 + j Δt + T ) = 0, A3 (t0 + j Δt + T ) = 0, j

j

C{1,2} (t0 + j Δt + T ) = 0, C3 (t0 + j Δt + T ) = 0.

10 About the Looking Forward Approach in Cooperative Differential Games

203

The value of the characteristic function of coalition S = {1, 2} is calculated as follows: Vj ({1, 2}; xj∗(t), t, t0 + j Δt + T ) = V{1,2} (t, xj∗ (t)), j

(10.50)

where t ∈ [t0 + j Δt, t0 + j Δt + T ], j = 0, . . . , l.

10.7.7 The Concept of the Solution Suppose that in each cooperative truncated subgame Γˆjc (xj,0 , t0 +j Δt, t0 +j Δt +T ) the players use a Core as the principle of optimality. This means that players in each truncated subgame choose the imputation ξj (xj∗ , t, t0 +j Δt +T ) ∈ Cj (xj∗ (t), T −t) according to the following rule: 

ξi (xj∗ , t, t0 + j Δt + T ) ≥ Vj (S; xj∗ (t), t, t0 + j Δt + T ), S ⊂ N, j

i∈S

for any t ∈ [t0 + j Δt, t0 + j Δt + T ], j = 0, . . . , l. The resulting imputation ξˆ (xˆ ∗ (t), T − t) for any set of distributions in truncated subgames ξj (xj∗ , t, t0 + j Δt + T ) ∈ Cj (xj∗ (t), T − t), t ∈ [t0 + j Δt, t0 + j Δt + T ], j = 0, . . . , l ˆ xˆ ∗ (t), T − t) the set can be calculated by the formula (10.16). We denote by C( ∗ of imputations ξˆ (xˆ (t), T − t) constructed using (10.15) and (10.16). ˆ xˆ ∗ (t), T − t) Using the results obtained in Sects. 10.4 and 10.5, the solution C( can be constructed according to the following rule: 

ξˆ (xˆ ∗ (t), T − t) ≥ V (S; xˆ ∗ (t), T − t), S ⊂ N,

(10.51)

i∈S

where V (S; xˆ ∗ (t), T − t) is calculated using the formula (10.19). ˆ xˆ ∗ (t), T −t), we show Further, using an example of a particular deviation from C( that the solution constructed is strongly Δt-time consistent in the game Γ (x0 , T − t0 ).

10.7.8 Numerical Simulation Consider a numerical example of the resource extraction game defined on the time interval T − t0 = 4, in which information about the game is known on the time interval with the duration T = 2 and is updated every Δt = 1 time interval. The following parameters for the equation of motion a = 5, b = 0.3 for the payoff

204

O. Petrosian and I. Kuchkarov

function c1 = 0.15, c2 = 0.65, c3 = 0.45 and for the initial conditions t0 = 0, x0 = 250 are fixed. Figure 10.5 shows the optimal strategies for the first player in the game with dynamic updating (solid line) and optimal strategies in the original game [8] (dotted line) with a prescribed duration. The conditionally cooperative trajectory xˆ ∗ (t) is constructed using the cooper∗ , t + j Δt, t + j Δt + T ) with ative trajectories in truncated subgames Γˆj (xj,0 0 0 the dynamical system (10.46) (Fig. 10.6). In Fig. 10.7 the comparison of the conditionally cooperative trajectory xˆ ∗ (t) (solid line) in the game with dynamic updating and the cooperative trajectory x ∗ (t) (dashed line) in the original game Γ (x0 , T −t0 ) [8] is displayed. For limited information, resource development occurs faster, because players are guided by a reduced time interval. The abscissa axis in Fig. 10.7 determines the time t, the ordinate axis determines stock of resource x.

Fig. 10.5 Optimal strategy for player 1 in the game with dynamic updating (solid line) and optimal strategies in the original game [8] (dotted line) with prescribed duration

Fig. 10.6 Conditionally cooperative trajectory xˆ ∗ (t) (solid line) in a game with dynamic updating and the corresponding cooperative trajectories in truncated subgames (dashed lines)

10 About the Looking Forward Approach in Cooperative Differential Games

205

Fig. 10.7 Conditionally cooperative trajectory xˆ ∗ (t) (solid line) in a game with dynamic updating and cooperative trajectory x ∗ (t) (dashed line) in the initial game [8]

Based on the values of the characteristic functions Vj (S; xj∗ (t), t, t0 +j Δt +T ), t ∈ [t0 +j Δt, t0 +(j +1)Δt], S ⊂ N, i = 0, . . . , l, calculated using (10.48), (10.49), (10.50), the expression for the resulting characteristic function V (S; xˆ ∗ (t), T − t) (10.19), t ∈ [t0 , T ] is obtained. Using (10.51) ˆ xˆ ∗ (t), T − t) is constructed in the game with dynamic updating Γ (x0 , T − t0 ) C( (see Fig. 10.9). Demonstrate the property of the strong Δt-time consistency of the solution ˆ xˆ ∗ (t), T −t). Suppose that at the beginning of the game Γ (x0 , T −t0 ) the players C( ˆ xˆ ∗ (t), T −t) (10.32) (further it is shown agree to use the proportional solution P rop( ∗ ˆ xˆ (t), T − t) ∈ C( ˆ xˆ ∗ (t), T − t)). Now suppose for the given parameters P rop( that at some point in time tbr = t0 + mΔt ∈ [t0 , T ] players decided to chose ˆ xˆ ∗ (tbr ), T − tbr ) instead of the proportional solution, another imputation from C( ˆ xˆ ∗ (t), T − t), t ∈ [tbr , T ] (10.30). The IDP for for example, a Shapley value Sh( the proportional solution and the Shapley value are calculated using the formula (10.14). Suppose that m = 2, then the resulting IDP (10.15) for the combined solution has the following form: , ˆ xˆ ∗ ) = β(t,

βˆ P rop (t, xˆ ∗ ), t ∈ [t0 , tbr ], βˆ Sh (t, xˆ ∗ ),

t ∈ (tbr , T ].

(10.52)

In Fig. 10.8 the resulting IDP for the proportional solution βˆ P rop (t, xˆ ∗ ) (solid line) ˆ xˆ ∗ ) for the combined solution (10.52) (dashed line) are displayed. and β(t, In order to obtain the imputation (10.16) corresponding to the combined solution ˆ xˆ ∗ ) (10.52) it is integrated by t. Denote the result of the integration by β(t, ξˆ (xˆ ∗ (t), T − t). In accordance with the resulting imputation ξˆ (xˆ ∗ (t), T − t) players

206

O. Petrosian and I. Kuchkarov

ˆ xˆ ∗ ) for the Fig. 10.8 IDP βˆ P rop (t, xˆ ∗ ) for the proportional solution (solid line), IPD β(t, combined solution (10.52) (dashed line)

Fig. 10.9 Axis: ξ1 , ξ3 , t. ξ2 can be calculated using normalization condition

allocate a joint payoff in the game Γ (x0 , T − t0 ) with dynamic updating in the following way: ξˆ (xˆ ∗ (t), T − t) = (12.3, 30.2, 16.8). In Fig. 10.9 one can observe that the imputation corresponding to the combined ˆ xˆ ∗ (t), T − t) (the selected solution ξˆ (xˆ ∗ (t), T − t) (dashed line) belongs to C( region) for all t ∈ [t0 , T ]. This shows the property of strong Δt-time consistency ˆ xˆ ∗ (t), T − t), since the imputation ξˆ (xˆ ∗ (t), T − t) was constructed by the of C( ˆ xˆ ∗ (t), T − t) (solid line) deviation of players from the proportional solution P rop( ˆ xˆ ∗ (tbr ), T − tbr ). at the instant tbr = t0 + j Δt in favor of the Shapley value Sh( Acknowledgement Research was supported by a grant from the Russian Science Foundation (Project No 18-71-00081).

10 About the Looking Forward Approach in Cooperative Differential Games

207

References 1. Basar, T., Olsder, G.J.: Dynamic Noncooperative Game Theory. Academic Press, London (1995) 2. Bellman, R.: Dynamic Programming. Princeton University Press, Princeton (1957) 3. Chander, P., Tulkens, H.: A core-theoretic solution for the design of cooperative agreements on transfrontier pollution. Int. Tax Public Financ. 2(2), 279–293 (1995) 4. Chander, P.: The gamma-core and coalition formation. Int. J. Game Theory 35(4), 539–556 (2007) 5. Goodwin, G.C., Seron, M.M., Dona, J.A.: Constrained Control and Estimation: An Optimisation Approach. Springer, New York (2005) 6. Gromova, E.V., Petrosian, O.L.: Control of information horizon for cooperative differential game of pollution control. In: 2016 International Conference Stability and Oscillations of Nonlinear Control Systems (Pyatnitskiy’s Conference). IEEE, Piscataway (2016). https://doi. org/10.1109/STAB.2016.7541187 7. Haurie, A.: A note on nonzero-sum differential games with bargaining solutions. J. Optim. Theory Appl. 18, 31–39 (1976) 8. Jorgensen, S., Yeung, D.W.K.: Inter- and intragenerational renewable resource extraction. Ann. Oper. Res. 88, 275–289 (1999) 9. Jorgensen, S., Martin-Herran, G., Zaccour, G.: Agreeability and time consistency in linear-state differential games. J. Optim. Theory Appl. 119, 49–63 (2003) 10. Krasovskii, N.N., Kotel’nikova, A.N.: On a differential interception game. Proc. Steklov Inst. Math. 268, 161–206 (2010) 11. Kwon, W.H., Han, S.H.: Receding Horizon Control: Model Predictive Control for State Models. Springer, New York (2005) 12. Petrosian, O.L.: Looking forward approach in cooperative differential games. Int. Game Theory Rev. 18, 1–14 (2016). https://doi.org/10.1142/S0219198916400077 13. Petrosian, O.L.: Looking forward approach in cooperative differential games with infinitehorizon. Vestnik S. Petersburg Univ. Ser. 10. Prikl. Mat. Inform. Prots. Upr. 4, 18–30 (2016) 14. Petrosian, O.L., Barabanov, A.E.: Looking forward approach in cooperative differential games with uncertain-stochastic dynamics. J. Optim. Theory Appl. 172, 328–347 (2017) 15. Petrosian, O.L., Gromova, E.V.: Cooperative differential games with dynamic updating. IFACPapersOnLine 51(32), 413–417 (2018) 16. Petrosian, O.L., Pogozhev, S.V.: Looking forward approach with random horizon in cooperative differential games. Automatica (2017) (to be published) 17. Petrosian, O.L., Gromova, E.V., Pogozhev, S.V.: Strong Time-consistent subset of core in cooperative differential games with finite time horizon. Mat. Teor. Igr Pril. 8(4), 79–106 (2016) 18. Petrosian, O.L., Gromova, E.V., Pogozhev, S.V.: Strong time-consistent subset of the core in cooperative differential games with finite time horizon. Autom. Remote. Control. 79(10), 1912–1928 (2018) 19. Petrosian, O.L., Nastych, M.A., Volf, D.A.: Differential game of oil market with moving informational horizon and non-transferable utility. In: 2017 Constructive Nonsmooth Analysis and Related Topics (dedicated to the memory of V.F. Demyanov) (CNSA), pp. 1–4. IEEE, Piscataway (2017). https://doi.org/10.1109/CNSA.2017.7974002 20. Petrosian, O.L., Nastych, M.A., Volf, D.A.: Non-cooperative differential game model of oil market with looking forward approach. In: Petrosyan, L.A., Mazalov, V.V., Zenkevich, N. (eds.) Frontiers of Dynamic Games, Game Theory and Management, St. Petersburg, 2017. Birkhauser, Basel (2018) 21. Petrosyan, L.A.: Time-consistency of solutions in multi-player differential games. Vestnik Leningr. State Univ. 4, 46–52 (1977) 22. Petrosjan, L.A.: Strongly time-consistent differential optimality principles. Vestnik St. Petersburg Univ. Math. 26, 40–46 (1993)

208

O. Petrosian and I. Kuchkarov

23. Petrosyan, L.A., Danilov, N.N.: Stability of solutions in non-zero sum differential games with transferable payoffs. Vestnik Leningr. State Univ. 1, 52–59 (1979) 24. Petrosyan, L.A., Yeung, D.W.K.: Dynamically stable solutions in randomly-furcating differential games. Trans. Steklov Inst. Math. 253, 208–220 (2006) 25. Rawlings, J.B., Mayne, D.Q.: Model Predictive Control: Theory and Design. Nob Hill Publishing, Madison (2009) 26. Subbotin, A.I.: Generalization of the main equation of differential game theory. J. Optim. Theory Appl. 43, 103–133 (1984) 27. Subbotin, A.I.: Generalized Solutions of First Order PDEs. Birkhauser, Basel (1995) 28. Wang, L.: Model Predictive Control System Design and Implementation Using MATLAB. Springer, New York (2005) 29. Yeung, D., Petrosyan, O.: Cooperative stochastic differential games with information adaptation. In: International Conference on Communication and Electronic Information Engineering (CEIE 2016). Atlantis Press, (2017). https://doi.org/ceie-16.2017.47 30. Yeung, D., Petrosian, O.: Infinite horizon dynamic games: a new approach via information updating. Int. Game Theory Rev. 19, 1–23 (2017) 31. Yeung, D.W.K., Petrosyan, L.A.: Subgame-Consistent Economic Optimization. Springer, New York (2012)

Chapter 11

Dynamically Consistent Bi-level Cooperation of a Dynamic Game with Coalitional Blocs Leon A. Petrosyan and David W. K. Yeung

Abstract In many real-life scenarios, groups or nations with common interest form coalition blocs by agreement for mutual support and joint actions. This paper considers two levels of cooperation: cooperation among members within a coalition bloc and cooperation between the coalition blocs. Coalition blocs are formed by players with common interests to enhance their gains through cooperation. To increase their gains coalition blocs would negotiate to form a grand coalition. A grand coalition cooperation of the coalitional blocs is studied. The gains of each coalition are defined as components of the Shapley value. Dynamically consistent payoff distributions between coalitions and among players are derived for this double-level cooperation scheme. For definition of players’ gains inside each coalition the proportional solution is used. Keywords Coalition · Imputation distribution procedure · Shapley value · Proportional solution · Dynamically consistent solution

11.1 Introduction In many real-life scenarios, combinations of groups or nations with common interest form coalition blocs by treaty or agreement for mutual support and joint actions. Countries establish trading blocs because they believe free trade benefits them and frequently such blocs tend to be regional because it is more convenient to reach an agreement with a nearby neighbors than with remote partners. Trade blocs, currency blocs and political and economic unions had been formed for more than a

L. A. Petrosyan () St. Petersburg State University, St. Petersburg, Russia e-mail: [email protected] D. W. K. Yeung Shue Yan University, North Point, Hong Kong e-mail: [email protected] © Springer Nature Switzerland AG 2019 L. A. Petrosyan et al. (eds.), Frontiers of Dynamic Games, Static & Dynamic Game Theory: Foundations & Applications, https://doi.org/10.1007/978-3-030-23699-1_11

209

210

L. A. Petrosyan and D. W. K. Yeung

century. Studies of various kind of blocs are not rare. Schott [9] examined trading blocs and the world trading system. Kandogan [4] obtained consistent estimates of trade effects of regional blocs. Eichengreen and Irwin [2], Frankel and Rose [3], McDonald et al. [6] and Wolf and Ritschl [10] provided studies on currency areas and trade blocs. Mansfield and Pevehouse [5] presents some quantitative results pertaining to the link between trade blocs and military disputes. Aziz et al. [1] studied the effects of political institutions on trade blocs in the ASEAN perspective. Petrosyan and Gromova [7] considered a differential game with given coalitional partition of the set of players. The cooperative game proceeds in two stages. At the first stage the players (coalitions) maximize the total payoff and then distribute it according to the Shapley value. At the second stage the components of the Shapley value are distributed inside fixed coalitions. An illustration with a differential game of emission reduction is given, in [8] at the first stage players (coalitions) use Pareto optimal solution. This paper presents a dynamic game with coalitional blocs. The analysis involves two levels of cooperation—cooperation among members within a coalition bloc and cooperation between the coalition blocs. Coalition blocs are formed by players with common interests to enhance their gains through cooperation. Common interests of players within a bloc are reflected through (i) their common preference which is not shared by players outside the bloc, (ii) resources exclusively owned by players in the bloc, and (iii) some state dynamics can only be affected by controls of players inside the bloc. At the same there would other bloc with players sharing their common interests. There would a non-cooperative outcome among coalitions. To increase their gains coalition blocs would negotiate to form a grand coalition. The paper first presents the basic structure of a dynamic game with coalitional blocs. The coalitional Nash equilibrium which reflects the outcome of competition among coalitions is presented. Intra-coalition payoff distribution using proportional solution is derived. As result we get the new solution concept for a class of games with two levels of activities. On the first level, a Nash equilibrium of coalitions (blocs) is considered. On the second level, payoffs of blocs in Nash equilibrium are allocated between of blocs using the so called “proportional solution”. The solution for grand coalition cooperation of the coalitional blocs, Pareto optimum through maximization of the coalitions’ joint payoff and dynamically consistent payoff distribution procedure are derived. Dynamically consistent payoff distribution between the two coalitions and that among players within each coalition are derived in this double-level cooperation scheme.

11.2 Basic Structure of the Game We consider a T -stage nonzero-sum discrete-time dynamic game in which there are two coalitional-blocs. We use uik ∈ U i ⊂ R mi to denote the strategy vector of player i ∈ S1 at stage k where S1 is the set of players in coalition bloc 1. We use

11 Dynamically Consistent Bi-level Cooperation

211

j vk ∈ U¯ j ⊂ R mj to denote the strategy vector of player j ∈ S2 at stage k, where S2 is the set of players in coalition bloc 2. There are n1 players in S1 and n2 players in S2 . The payoff of player i in S1 is T 

$ Pki uik

k=1

 ×

% " cki (uik )2 ! i i (a)i 1/2 i i 2 i 2 i i − + pk vk (xk ) − γk (vk ) − ck (μk ) − hk x¯k − λk yk × xk

1 1+r

k−1

 1 T

(y)i (x)i (x)i ¯ + qT +1 xT +1 + qT +1 x¯T +1 + qT +1 yT +1 + qTi +1 1+r

(11.1) for i ∈ S1 and the payoff of player j in S2 is T  k=1

$

% j j " c¯ (u¯ )2 ! j j j j j j (a)j j j j P¯k u¯ k − k k + p¯ k v¯k (x¯k )1/2 − cγ¯k (v¯k )2 − c¯k (μ¯ k )2 − h¯ k xk − λ¯ k yk × xk  ×

k−1

 1 T 1 (x)j ¯ (x)j (y)j j + q¯T +1 x¯T +1 + q¯T +1 xT +1 + q¯T +1 yT +1 + q¯T +1 1+r 1+r

(11.2) for j ∈ S2 , where xk ∈ X ⊂ R + is the stock of productive resources of players in S1 , x¯k ∈ X¯ ⊂ R + is the stock of productive resources of players in S2 , yk ∈ Y ⊂ R + is the pollution stock. The state xk can be pools of resources, public capitals, infrastructures, technology and defense capacity. The term uik ∈ U i ⊂ R + is the direct resource use of player i ∈ S1 at stage k and vki ∈ Λi ⊂ R + is the input used in a productive activity which relates to the resource stock, μik ∈ Θ i ⊂ R + is the pollution abatement effort. The net economic gain from the direct use of the resource is ci (ui )2 Pki uik − k k . The gain from the productive activity related to the resource stock xk size is pki vki (xk )1/2 and γki (vki )2 is the corresponding cost. The cost of pollution (a)i abatement effort is ck (μik )2 and the damage of pollution is λik xk . There is a nonpositive effect/threat (economic or defense) to players in S1 from coalition S2 which i magnitude S1 will receive a terminal

is calibrated as hk x¯ k . Finally, player i ∈ (x)i ¯ i payment qT(x)i +1 xT +1 + qT +1 x¯ T +1 + qT +1 yT +1 + qT +1 at stage T + 1. (y)i

Similarly, the term u¯ k ∈ U¯ j ⊂ R + is the direct resource use of player j ∈ S2 at stage k and v¯ki ∈ Λ¯ i ⊂ R + is the input used in a productive activity which relates to the resource stock, μ¯ ik ∈ Θ¯ i ⊂ R + is the pollution abatement effort. The net economic gain from the direct use of the resource is j j c¯ (u¯ )2 j j P¯k u¯ k − k k . The gain from the productive activity related to the resource stock xk j

212

L. A. Petrosyan and D. W. K. Yeung j j

j

j

size is p¯ k v¯k (xk )1/2 and γ¯k (v¯k )2 is the corresponding cost. The cost of pollution (a)j j j abatement effort is c¯k (μ¯ k )2 and the damage of pollution is λ¯ k xk . There is a nonpositive effect/threat (economic or defense) to players in S2 from coalition S1 which magnitude is calibrated as. Finally, player j ∈ S2 will receive a terminal payment

(x)j ¯ (x)j (y)j j q¯T +1 x¯ T +1 + q¯T +1 xT +1 + q¯T +1 yT +1 + q¯T +1 at stage T + 1. The dynamics of the productive resource stock xk is governed by the difference equation  xk+1 = xk + a − bxk − uik , x1 = x10 , for k ∈ K. (11.3) i∈S1

The dynamics of the productive resource stock x¯k is governed by the difference equation  u¯ ik , x1 = x10 , for k ∈ K. (11.4) x¯k+1 = x¯k + a¯ − b¯ x¯k − i∈S1

The accumulation of the pollution stock is governed by the difference equation   j j   j j αki uik + α¯ k u¯ k − ωki μik (yk )1/2 − ω¯ k μ¯ k (yk )1/2 − σyk , yk+1 = y1 =

i∈S1 0 y1 , for

j ∈S2

i∈S1

j ∈S2

k ∈ K. (11.5)

Note that (i) xk is available only to players in S1 for productive uses and x¯k is available only to players in S1 for productive uses, (ii) only actions from players in S1 can affect the dynamics of xk and only actions from players in S2 can affect the dynamics of x¯k , and (iii) has non-positive impacts on the payoffs of players in S1 and xk has non-positive impacts on the payoffs of players in S2 . Given these common interests player i ∈ S1 would have incentive to form a coalition bloc S1 , and player j ∈ S2 would have incentive to form a coalition bloc S2 .

11.3 A Two-Coalition Game Now consider the case when two coalitions, S1 and S2 , are formed. Coalition S1 will attempt to maximize the value , T $ % "   cki (uik )2 ! i i (a)i i 2 i i 1/2 i i 2 i i Pk uk − + pk vk (xk ) − γk (vk ) − ck (μk ) − hk x¯k − λk xk × xk i∈S1 k=1 ;  k−1

 1 T 1 (y)i (x)i (x)i ¯ i × , + qT +1 xT +1 + qT +1 x¯T +1 + qT +1 yT +1 + qT +1 1+r 1+r

(11.6)

11 Dynamically Consistent Bi-level Cooperation

213

and coalition will attempt to maximize the value , T $  

" c¯k (u¯ k )2 ! j j j j (a)j j j j + p¯ k v¯ k (x¯ k )1/2 − cγ¯k (v¯ k )2 − c¯k (μ¯ k )2 − h¯ k xk − λ¯ k yk xk j ∈S2 k=1 ; k−1

  1 T 1 (x)j ¯ (x)j (y)j j + q¯T +1 x¯T +1 + q¯T +1 xT +1 + q¯T +1 yT +1 + q¯T +1 × , 1+r 1+r j j P¯k u¯ k −

j

j

%

(11.7) subject to (11.3)–(11.5). In particular, a coalition would dissolve and the players in the coalition will revert to individual actions if any of the players deviate from the coalition action plan.

11.3.1 Coalitional Equilibrium A feedback Nash equilibrium of the two-coalition game with payoffs (11.6)–(11.7) and dynamics (11.3)–(11.5) can be characterized by the following theorem. i

i

(μ)i

¯ y), ϕk(v) (x, x, ¯ y), ϕk Theorem 11.1 A set of strategies {ϕk(u) (x, x,

(x, x, ¯ y)}, for

(μ)j ¯ j ¯ j (x, x, ¯ y), ϕk(v) (x, x, ¯ y), ϕ¯k (x, x, ¯ y)}, for k S1 and {ϕk(u)

∈ K and k ∈ K and i ∈ j ∈ S2 provides a feedback Nash equilibrium solution to the game (11.3)–(11.5) and (11.6)–(11.7) if there exist functions W S1 (k, x, x, ¯ y) and W¯ S2 (k, x, x, ¯ y) for k ∈ K, such that the following recursive relations are satisfied: W S1 (T + 1, x, x, ¯ y) =

 1 T  (x)i (y)i (x)i ¯ qT +1 xT +1 + qT +1 x¯T +1 + qT +1 yT +1 + qTi +1 , 1+r

i∈S1

⎧ $ ⎨ " ci (ui )2 ! W S1 (k, x, x, Pki uik − k k + pki vki (x)1/2 − γki (vki )2 − ¯ y) = max xk uik ,vki ,μik ,i∈S1 ⎩i∈S 1   k−1 " 1 (a)i + ck (μik )2 − hik x¯ − λik y 1+r 8 (y) (μ) ¯ (x) (x) ¯ (u) ¯ (u) ¯ W S1 [k + 1, fk (x, uk ), fk (x, ¯ ϕ¯ k )fk (y, uk , ϕ¯ k , μk , ϕ¯ k )] , for k ∈ K;

(11.8) W S2 (T + 1, x, x, ¯ y) =

 1 T  (x)j ¯ (x)j (y)j j , q¯T +1 x¯ T +1 + q¯T +1 xT +1 + q¯T +1 yT +1 + q¯T +1 1+r

j ∈S2

214

L. A. Petrosyan and D. W. K. Yeung

W S2 (k, x, x, ¯ y) = (a)j j c¯k (μ¯ k )2

⎧ $ ⎨

max

u¯ k ,v¯k ,μ¯ k ,j ∈S2 ⎩j ∈S2 j

j

j

j j − h¯ k x − λ¯ k y

j j P¯k u¯ k −

" c¯k (u¯ k )2 ! j j 1/2 j j + p¯ k v¯ k (x) − cγ¯k (v¯ k )2 − x j

j

"  1 k−1 + 1+r

8 (y) (x) (u) (x) ¯ (u) (μ) ¯ u¯ k ), fk (y, ϕk , u¯ k , ϕk , μ¯ k )] , for k ∈ K, W S2 [k + 1, fk (x, ϕk ), fk (x,

(11.9) where (x)

fk (x, uk ) = x + a − bx −



uik

i∈S1 (x)

(u)

fk (x, ϕk ) = x + a − bx −



ϕki (x, x, ¯ y)

i∈S1

(x) ¯ ¯ u¯ k ) fk (x,

= x¯ + a¯ − b¯ x¯ −



j

u¯ k

j ∈S2 (x) ¯

(u) ¯

fk (x, ¯ ϕ¯ k ) = x¯ + a¯ − b¯ x¯ −



j

ϕk (x, x, ¯ j)

j ∈S2

(y) (μ) ¯ (u) fk (y, uk , ϕ¯ k , μk , ϕ¯ k )



=

αki uik +

(y)

(μ)

fk (y, ϕk , u¯ k , ϕk , μ¯ k ) =



(u)j

(x, x, ¯ y) −



ωki μik (y)1/2 −

i∈S1

− σy, 

(u)i

αki ϕk

(x, x, ¯ y) +



j j

α¯ k u¯ k −

j ∈S2

i∈S1 j j ω¯ k μk (y)1/2

j

α¯ k ϕ¯ k

j ∈S2

i∈S1 j (μ)j ¯ ω¯ k ϕk (x, x, ¯ y)(y)1/2

j ∈S2







(μ)i

ωki ϕk

(x, x, ¯ y)(y)1/2 −

i∈S1

− σy.

j ∈S2

Proof Invoking the discrete-time dynamic programming technique W S1 (k, x, x, ¯ y) is the maximized payoff of coalition S1 given the maximizing strategies ¯ ¯ (x, x, ¯ y), ϕ¯k(v) (x, x, ¯ y), ϕ¯k {ϕ¯k(u) j

j

(μ)j

(x, x, ¯ y)}

of coalition S2 . Similarly, W S2 (k, x, x, ¯ y) is the maximized payoff of coalition S2 given the maximizing strategies i

i

(μ)i

¯ y), ϕk(v) (x, x, ¯ y), ϕk {ϕk(u) (x, x, of coalition S1 . Hence a Nash equilibrium appears.

(x, x, ¯ y)}  

11 Dynamically Consistent Bi-level Cooperation

215

Performing the indicated maximization in (11.8)–(11.9) yields the game equilibrium strategies: x

1 1 + (1 + r)k−1 αki WySk+1 , uik = Pki − (1 + r)k−1 WxSk+1 2cki vki =

pki (x)1/2 2γki

μik = −

,

(11.10)

S1 (1 + r)k−1 ωki (y)1/2Wy+1

2c(a)i

, for i ∈ S1 and k ∈ K;



j j j 2 2 + (1 + r)k−1 α¯ k WySk+1 , u¯ k = P¯k − (1 + r)k−1 Wx¯Sk+1 2c¯ki j

v¯k = μ¯ ik

p¯ ki (x) ¯ 1/2

=−

2γ¯ki

,

(11.11)

S2 (1 + r)k−1 ω¯ ki (y)1/2Wy+1

2c¯(a)i

, for j ∈ S2 and k ∈ K.

The game equilibrium payoffs of the coalition S1 and coalition S2 can be obtained as: Proposition 11.1 The payoff of coalition S1 is:  W S1 (k, x, x, ¯ y) = [ASk 1 x + BkS1 x¯ + CkS1 y + DkS1 ]

1 1+r

k−1 (11.12)

and the payoff of coalition S2 is: W S2 (k, x, x, ¯ y) = [A¯ Sk 2 x + B¯ kS2 x¯ + C¯ kS2 y + D¯ kS2 ]



1 1+r

k−1 ,

(11.13)

where ASk 1 , BkS1 , CkS1 , DkS1 , A¯ Sk 2 , B¯ kS2 , C¯ kS2 and D¯ kS2 , for k ∈ K, are constants in terms of the parameters of the dynamic game problem (11.3)–(11.5) and (11.6)–(11.7). Proof Using Proposition 11.1 and (11.10)–(11.11), the game equilibrium strategies of the coalitions in stage k can be obtained as:

x S1 1 ¯ y) = Pki − (1 + r)−1 ASk+1 + (1 + r)−1 αki Ck+1 , ϕk(u)i (x, x, 2cki pi (x)1/2 ϕk(v)i (x, x, ¯ y) = k i (11.14) 2γ k

(μ) ϕk i (x, x, ¯ y)

=−

S1 (1 + r)−1 ωki (y)1/2Ck+1

2ck(a)i

,

216

L. A. Petrosyan and D. W. K. Yeung

for i ∈ S1 and k ∈ K;

x j j S2 2 (x, x, ¯ y) = P¯k − (1 + r)−1 A¯ Sk+1 + (1 + r)−1 α¯ k C¯ k+1 , j 2c¯k j p¯ (x) ¯ 1/2 (v) ¯ ϕ¯k j (x, x, ¯ y) = k j 2γ¯k S2 (1 + r)−1 ω¯ ki (y)1/2C¯ k+1 (μ) ¯ i ϕ¯k (x, x, ¯ y) = − , (a)j 2c¯k (u) ¯ j

ϕ¯k

(11.15)

for j ∈ S2 and k ∈ K; Substituting (11.14)–(11.15). into (11.8)–(11.9) yields a system of equations with the left-hand-side being  [ASk 1 x

+ BkS1 x¯

+ CkS1 y

+ DkS1 ]

1 1+r

k−1

or [A¯ Sk 1 x + B¯ kS2 x¯ + C¯ kS2 y + D¯ kS2 ]



1 1+r

k−1

and the right-hand-side being a function linear in with coefficients containing parameters of the game problem (11.6)–(11.7). Hence Proposition 11.1 follows.  

11.3.2 Intra-coalition Payoff Distribution Players in each coalition agree to distribute the coalition payoff among themselves according to an optimality principle. To accommodation the possibility of asymmetric payoff sizes, the players within each coalition agree to share the coalition payoff proportional to their non-cooperative payoffs (that is the payoffs from not forming a coalition). To obtain the imputations of the players under coalition S1 we first consider the case where players in coalition S1 act individually against coalition S2 . Then players i in S1 will attempt to maximize his payoff % i i 2 c (u ) (α)i Pki uik − k k + [pki vki (xk )1/2 − γki (vki )2 ] − ck (μik )2 − hik x¯ k − λik yk xk k=1 k−1 T   1 1 (y)i (x)i ¯ i + (qT(x)i x + q x ¯ + q y + q ) , T +1 T +1 T +1 T +1 +1 T +1 T +1 1+r 1+r (11.16)

T 

$

11 Dynamically Consistent Bi-level Cooperation

217

for i ∈ S1 , and coalition S2 will attempt to maximize the value , T $   c¯i (u¯ i )2 P¯ki u¯ ik − k k + [p¯ki v¯ki (x¯k )1/2 − γ¯ki (v¯ki )2 ] − c¯k(α)i (μ¯ ik )2 − h¯ ik x¯k − x¯k j ∈S2 k=1 k−1   1 λ¯ ik yk + 1+r  T ; 1 (y)i (x)i (x)i ¯ i (q¯T +1 x¯T +1 + q¯T +1 x¯T +1 + q¯T +1 yT +1 + q¯T +1 ) , 1+r (11.17) subject to (11.3)–(11.5). It is a dynamic game with n1 individual players and a coalition. A feedback Nash equilibrium of the game (11.3)–(11.5) and (11.16)– (11.17) can be characterized by the following theorem. Theorem 11.2. (u)i

(μ)i

(v)i

Proposition 11.2 A set of strategies {ϕk (x, x, ¯ y), ϕk (x, x, ¯ y), ϕk (x, x, ¯ y)}, (u)j ¯ (v)j ¯ (μ)j ¯ for k ∈ K and i ∈ S1 and {ϕ¯ k (x, x, ¯ y), ϕ¯ k (x, x, ¯ y), ϕ¯k (x, x, ¯ y)}, for k ∈ K and j ∈ S2 provides a feedback Nash equilibrium solution to the game (11.3)– (11.5) and (11.16)–(11.17) if there exist functions W˜ S2 (k, x, x, ¯ y) for k ∈ K, and V i (k, x, x, ¯ y), for i ∈ S1 and k ∈ K, such that the following recursive relations are satisfied: (x)i ¯ i V i (T + 1, x, x, ¯ y) = (qT(x)i +1 xT +1 + qT +1 x¯ T +1 + qT +1 yT +1 + qT +1 ) (y)i

,$ ¯ y) = max V (k, x, x,

Pki uik −

i

uik ,vki ,μik

hik x¯ − λik y



1 1+r

T ,

cki (uik )2 (α)i + [pki vki (x)1/2 − γki (vki )2 ] − ck (μik )2 − x

"  1 k−1 (u)S \i (x) (x) ¯ (u) ¯ + V i [k + 1, fk (x, uik , ϕk 1 ), fk (x, ¯ ϕ¯k ), 1+r 8 (y) (u)S \i (μ)S \i (μ) ¯ fk (y, uik , ϕk 1 , ϕ¯ku¯ , μik , ϕk 1 ϕ¯k )] ,

for i ∈ S1 and k ∈ K; W˜ S2 (T +1, x, x, ¯ y) =

 j ∈S2

(x)j ¯

 (x)j

,$ W˜ S2 (k, x, x, ¯ y) =

j

max

j

j

(y)j

j

(q¯T +1 xT +1 +q¯T +1 xT +1 +q¯T +1 yT +1 +q¯T +1 )

u¯ k ,v¯ k ,μ¯ k ,j ∈S2

j j P¯k u¯ k −

j

1 1+r

T ,

j

c¯k (u¯ k )2 j j j j + [p¯ k v¯k (x) ¯ 1/2 − cγ¯k (v¯k )2 ]− x¯

"  1 k−1 + 1+r 8 (y) (μ) (x) (u) ( x) ¯ (u) ¯ u¯ k ), fk (y, ϕk , u¯ k , ϕk μ¯ k )] , W˜ S2 [k + 1, fk (x, ϕk ), fk (x, (α)j j c¯k (μ¯ k )2

j j − h¯ k x − λ¯ k yk

(11.18)

218

L. A. Petrosyan and D. W. K. Yeung

for i ∈ S1 and k ∈ K; where 

fk(x) (x, uik , ϕ (u)S1 \i ) = x + a − bx − uik −

ϕkl (x, x, ¯ y);

l∈S1 , l =i (μ)S1 \i

¯ , μik , ϕk fk (y, uik , ϕ (u)S1 \i , ϕ¯ k(u) (y)



μ¯

, ϕ¯k ) =

αki ϕk(u)i (x, x, ¯ y)+

i∈S1

 j ∈S2

j j

α¯ k u¯ k +



(μ)i

ωki ϕk

(x, x, ¯ y)(y)1/2 −



j

j

ω¯ k μk (x, x, ¯ y)(y)1/2 − σy.

j ∈S2

i∈S1

Proof Invoking the discrete-time dynamic programming technique V i (k, x, x, ¯ y) is the maximized payoff of player i given the maximizing strategies of player l ∈ S1 \i (μ)l

¯ y), ϕ (v)l (x, x, ¯ y), ϕk {ϕk(u)l (x, x,

(x, x, ¯ y)},

for l ∈ S1 and l = i, and maximizing strategies of coalition S2 , that is (u¯

(μ) ¯

¯ {ϕ¯ k (x, x, ¯ y), ϕ¯ (v) (x, x, ¯ y), ϕ¯k (x, x, ¯ y)}.

In addition, W S2 (k, x, x, ¯ y) is the maximized payoff of coalition S2 given the (μ)l maximizing strategies {ϕk(u)l (x, x, ¯ y), ϕ (v)l (x, x, ¯ y), ϕk (x, x, ¯ y)} of players in S1 . Hence a Nash equilibrium appears. Performing the indicated maximization in (11.18) yields the game equilibrium strategies:

x uki = Pki − (1 + r)k−1 Vxik+1 + (1 + r)k−1 αki Vyik+1 2cki i 1/2 p (x) vki = k i , (11.19) 2γk (1 + r)k−1 ωki (y)1/2 Vyik+1 , μik = − (α)i 2ck for i ∈ S1 and k ∈ K; x¯

j j 2 2 u¯ kj = P¯k − (1 + r)k−1 W˜ x¯Sk+1 + (1 + r)k−1 α¯ k W˜ ySk+1 j 2c¯k j p¯ (x) ¯ 1/2 j v¯k = k j , 2γ¯k j 2 (1 + r)k−1 ω¯ k (y)1/2W˜ ySk+1 j , μ¯ k = − (α)j 2c¯k for j ∈ S2 and k ∈ K.

(11.20)

11 Dynamically Consistent Bi-level Cooperation

219

The game equilibrium payoffs of the players in S1 and coalition payoff of S2 can be obtained as: Proposition 11.3 The payoff of player i ∈ S1 is:  V i (k, x, x, ¯ y) = [Aik x + Bki x¯ + Cki y + Dki ]

1 1+r

k−1 (11.21)

,

for k ∈ K and i ∈ S1 , and the payoff of coalition S2 is: ¯ y) = [A¯ Sk 2 x + B¯ kS2 x¯ + C¯ kS2 y + D¯ kS2 ] W˜ S2 (k, x, x,



1 1+r

k−1 ,

(11.22)

for k ∈ K, where ASk 1 , BkS1 , CkS1 , DkS1 , A¯ Sk 2 , B¯ kS2 , C¯ kS2 and D¯ kS2 for i ∈ S1 and k ∈ K are constants in terms of the parameters of the dynamic game problem (11.3)–(11.5) and (11.16)–(11.17). Proof Using Proposition 11.2 and (11.19)–(11.22), the game equilibrium strategies of the player i ∈ S1 in stage k can be obtained as:  x

i (x, x, ¯ y) = Pki − (1 + r)−1 Aik+1 + (1 + r)−1 αki Ck+1 2cki i 1/2 p (x) ϕk(v)i (x, x, ¯ y) = k i , 2γk i (1 + r)−1 ωki (y)1/2Ck+1 (μ)i ¯ y) = − , ϕk (x, x, 2ck(α)i (u)i

ϕk

(11.23)

for k ∈ K and i ∈ S1 ;

x¯ j j 2 2 ¯ (x, x, ϕ¯ (u)j ¯ y) = P¯k − (1 + r)k−1 A¯ xS¯ k+1 + (1 + r)k−1 α¯ k A¯ Syk+1 j 2c¯k j p¯ (x) ¯ 1/2 ¯ (x, x, ϕ¯ (v)j ¯ y) = k j , 2γ¯k j S2 (1 + r)k−1 ω¯ k (y)1/2 C¯ k+1 ¯ (x, x, ϕ¯ (μ)j ¯ y) = − , (α)j 2c¯k

(11.24)

for j ∈ S2 and k ∈ K; Substituting (11.23)–(11.24) into (11.18) yields a system k−1

1 of equations with the left-hand-side being [Aik x + Bki x¯ + Cki y + Dki ] 1+r or k−1

1 [A¯ Sk 2 x + B¯ kS2 x¯ + C¯ kS2 y + D¯ kS2 ] 1+r and the right-hand-side being a function linear in (x, x, ¯ y) with coefficients containing parameters of the game problem (11.6)–(11.7). Hence Proposition 11.3 follows.

220

L. A. Petrosyan and D. W. K. Yeung

Given that players in coalition S1 agree to share the coalition payoff proportional to their non-cooperative payoffs, the imputation for player i ∈ S1 becomes Condition 1 V i (k, xk , x¯k , yk ) ξ i (k, xk , x¯k , yk ) = W S1 (k, xk , x¯k , yk )  , V l (k, xk , x¯k , yk )

(11.25)

l∈S1

for k ∈ K and i ∈ S1 . To obtain the imputations of the players under coalition S2 we first consider the case where players in coalition S2 act individually against coalition S1 . Then players j in S2 will attempt to maximize his payoff $ % T j j  c¯k (u¯ k )2 j j j j j j (α)j j j j 1/2 2 2 P¯k u¯ k − + [p¯ k v¯k (x¯k ) − cγ¯k (v¯k ) ] − c¯k (μ¯ k ) − h¯ k x¯k − λ¯ k yk xk k=1  k−1 T  1 1 (x)j (x)j ¯ (y)j j + (q¯T +1 x¯T +1 + q¯T +1 xT +1 + q¯T +1 yT +1 + q¯T +1 ) , 1+r 1+r (11.26) and coalition S1 will attempt to maximize the value , T $   ci (ui )2 Pki uik − k k + [pki vki (xk )1/2 − γki (vki )2 ] − ck(α)i (μik )2 − hik x¯k − xk j ∈S2 k=1 k−1   1 + λik yk 1+r  T ; 1 (y)i (x)i (x)i ¯ , (qT +1 xT +1 + qT +1 x¯T +1 + qT +1 yT +1 + qTi +1 ) 1+r (11.27) subject to (11.3)–(11.5). Following the analysis from (11.16) to (11.23) the game equilibrium payoffs of the players in S1 and coalition payoff of S2 can be obtained as: Proposition 11.4 The payoff of player j ∈ S2 is: j j j j ¯ y) = [A¯ k x + Bk x¯ + C¯ k y + D¯ k ] V¯ j (k, x, x,



1 1+r

k−1 (11.28)

,

for k ∈ K, j ∈ S2 , and the payoff of coalition S1 is: ¯ y) = [ASk 1 x + BkS1 x¯ + CkS1 y + DkS1 ] W˜ S1 (k, x, x,



1 1+r

k−1 ,

(11.29)

11 Dynamically Consistent Bi-level Cooperation

221

j j j j for k ∈ K, where ASk 1 , BkS1 , CkS1 , DkS1 , A¯ k , B¯ k , C¯ k and D¯ k for j ∈ S2 and k ∈ K are constants in terms of the parameters of the dynamic game problem (11.3)–(11.5) and (11.26)–(11.27).

Proof Follow the proof of Proposition 11.3. Given that players in coalition agree to share the coalition payoff proportional to their non-cooperative payoffs, the imputation for player j ∈ S2 becomes Condition 2 V¯ j (k, xk , x¯k , yk ) , ξ¯ j (k, xk , x¯k , yk ) = W S2 (k, xk , x¯k , yk )  V¯ l (k, xk , x¯k , yk )

(11.30)

l∈S2

for k ∈ K and j ∈ S2 .

11.4 Grand Coalition Cooperation Now consider the case when coalitions S1 and S2 agree to form a grand coalition and maximize the joint payoff , T $   j ∈S1

λik yk

Pki uik −

k=1





1 1+r

(x)i (qT +1 xT +1

k−1

cki (uik )2 (α)i + [pki vki (xk )1/2 − γki (vki )2 ] − ck (μik )2 − hik x¯k − xk +

(x)i ¯ + qT +1 x¯T +1

 (y)i + qT +1 yT +1

+ qTi +1 )

1 1+r

T ; +

, T $ j j   c¯ (u¯ )2 j j j j j j (α)j j + P¯k u¯ k − k k + [p¯ k v¯k (x¯k )1/2 − cγ¯k (v¯k )2 ] − c¯k (μ¯ k )2 xk j ∈S2 k=1 k−1  1 j j −h¯ k xk − λ¯ k yk ] 1+r  T ; 1 (x)j (x)j (y)j j (q¯T +1 x¯T +1 + q¯T +1 xT +1 + q¯T +1 yT +1 + q¯T +1 ) , 1+r (11.31) subject to (11.3)–(11.3)

222

L. A. Petrosyan and D. W. K. Yeung

11.4.1 Pareto Optimum and Imputation Sharing The optimal control strategies under a grand collation can be characterized by the following theorem. (μ)i

¯ y), ψk(v)i (x, x, ¯ y), ψk (x, x, ¯ y)}, Theorem 11.2 A set of strategies {ψk(u)i (x, x, ( u)j ¯ (v)j ¯ (μ)j ¯ ¯ ¯ for k ∈ K and i ∈ S1 and {ψk (x, x, ¯ y), ψk (x, x, ¯ y), ψk (x, x, ¯ y)}, for k ∈ K and j ∈ S2 provides an optimal solution to the dynamic optimization problem (11.3)–(11.5) and (11.31) if there exist functions W (k, x, x, ¯ y) for k ∈ K, such that the following recursive relations are satisfied: W (T + 1, x, x, ¯ y) =

 1 T  (x)i (y)i (x)i ¯ + qT +1 xT +1 + qT +1 x¯T +1 + qT +1 yT +1 + qTi +1 1+r

i∈S1

+

 1 T  (x)j ¯ (x)j (y)j q¯T +1 x¯T +1 + q¯T +1 xT +1 + q¯T +1 yT +1 + q¯Ti +1 1+r

j ∈S2

W (k, x, x, ¯ y) = −ck(a)i (μik )2

max

⎧ $ ⎨

uik ,vki ,μik ,i∈S1 ,u¯ k ,v¯ k ,μ¯ k ,j ∈S2 ⎩i∈S1 j

− hik x¯

$

− λik y

"

j

1 1+r

j

Pki uik −

" cki (uik )2 ! i i 1/2 + pk vk (x) − γki (vki )2 xk

k−1

+

" c¯k (u¯ k )2 ! j j 1/2 j j + p¯ k v¯k (x) ¯ − cγ¯k (v¯k )2 − xk j ∈S2 "  1 k−1 (a)j j j j + c¯k (μ¯ k )2 − h¯ k x − λ¯ k y 1+r

+



j

j j P¯k u¯ k −

j

8 (y) ¯ ¯ (x, ¯ u¯ k )fk (y, uk , u¯ (ku) , μk , μ¯ k )] , for k ∈ K, W [k + 1, fk(x) (x, uk ), fk(x)

(11.32) where (y)

(u)

fk (y, uk , u¯ k , μk , μ¯ k ) = 

 i∈S1

j

αki uik +

 j ∈S2

j j

α¯ k u¯ k −



ωki μik (y)1/2−

i∈S1

j

ω¯ k μ¯ k (y)1/2 − σy.

j ∈S2

Proof Condition (11.32) satisfies the optimal results in dynamic programming.

11 Dynamically Consistent Bi-level Cooperation

223

Performing the indicated maximization in (11.32) yields the optimal cooperative strategies:  x

(x, x, ¯ y) = Pki − (1 + r)k−1 Wxk+1 + (1 + r)k−1 αki Wyk+1 , 2cki 1 pki (x) 2 (v)i ψk (x, x, ¯ y) = 2γki (1 + r)k−1 ωki (y)1/2Wyk+1 (μ)i ψk (x, x, ¯ y) = − , 2ck(a)i (u)i

ψk

(11.33)

for i ∈ S1 , k ∈ K; 8 x¯

(u)j ¯ j j ¯ y) = P¯k − (1 + r)k−1 Wx¯k+1 + (1 + r)k−1 α¯ k Wyk+1 , ψ¯ k (x, x, j 2c¯k 1 j p¯ k (x) ¯ 2 (v)j ¯ ¯ ψk (x, x, ¯ y) = j 2γ¯k (1 + r)k−1 ω¯ ki (y)1/2Wyk+1 (μ)i ¯ ψ¯ k (x, x, ¯ y) = − , (a)i 2c¯k (11.34) for j ∈ S2 , k ∈ K; The payoff of the grand coalition can be obtained as: Proposition 11.5 The grand coalition payoff is:  W (k, x, x, ¯ y) = [Ak x + Bk x¯ + Ck y + Dk ]

1 1+r

k−1 (11.35)

for k ∈ K, where Ak , Bk , Dk are constants in terms of the parameters of the dynamic optimization problem (11.3)–(11.5) and (11.31). Proof Using Proposition (11.31) and (11.33)–(11.35), the optimal strategies of the grand coalition in stage k can be obtained as:

 x ψk(u)i (x, x, ¯ y) = Pki − (1 + r)−1 Ak+1 + (1 + r)−1 αki Ck+1 , 2cki 1 pi (x) 2 (v)i ¯ y) = k i ψk (x, x, 2γk (1 + r)−1 ωki (y)1/2 Ck+1 (μ)i ψk (x, x, ¯ y) = − , (a)i 2ck

(11.36)

224

L. A. Petrosyan and D. W. K. Yeung

for i ∈ S1 , k ∈ K; 8 x¯

(u)j ¯ j j ψ¯ k (x, x, ¯ y) = P¯k − (1 + r)−1 Ax¯k+1 + (1 + r)−1 α¯ k Ck+1 , j 2c¯k 1 j ¯ 2 p¯ k (x) (v)j ¯ ¯ ψk (x, x, ¯ y) = j 2γ¯k (1 + r)−1 ω¯ ki (y)1/2Ck+1 (μ)i ¯ ψ¯ k (x, x, ¯ y) = − , (a)i 2c¯k

(11.37)

for j ∈ S2 , k ∈ K; Substituting (11.36)–(11.37) into (11.32) yields a system of equations with the  k−1 1 left-hand-side being [Ak x + Bk x¯ + Ck y + Dk ] and the right-hand1+r side being a function linear in x, x, ¯ y with coefficients containing parameters of the dynamic optimization problem (11.3)–(11.5) and (11.31). Hence Proposition 11.5 follows. Substituting the optimal strategies in (11.36)–(11.37) into (11.3)–(11.5) we obtain the grand coalition state dynamics:  (u)i ψk (xk , x¯k , yk ), x1 = x10 , for k ∈ K, xk+1 = xk + a − bxk − x¯k+1 = x¯k + a¯ − b¯ x¯k − yk+1 = −





¯ ψ¯ k(u)i (xk , x¯k , yk ), x1 = x10 , for k ∈ K,

i∈S1 (u)i

αki ψk

i∈S1 (μ)i

ωki ψk

i∈S 1

(xk , x¯k , yk ) +



α¯ k ψ¯ k

j ∈S 2 

(xk , x¯k , yk )(y)1/2 +

i∈S1

j

(u)j

(xk , x¯k , yk )−

j (μ)j ω¯ k ψ¯ k (xk , x¯k , yk )(yk )1/2 − σyk ,

j ∈S2

y1 = y10 , for k ∈ K, (11.38) System (11.38) is a system of first order linear difference equations which +1 to denote solution can be obtained with standard techniques. We use {xk∗ , x¯k∗ , yk∗ }Tk=1 the solution to (11.38). The two coalitions agree to share the excess of the grand coalition gains over the sum of the non-cooperative coalition gains equally between the coalitions. Condition 3 Under grand cooperation, coalition S1 will receive a coalition imputation 1 W (xk∗ , x¯k∗ , yk∗ )− Ω S1 (k, xk∗ , x¯k∗ , yk∗ ) = W S1 (xk∗ , x¯k∗ , yk∗ ) + 2  [W S1 (xk∗ , x¯k∗ , yk∗ ) + W S2 (xk∗ , x¯k∗ , yk∗ )] = =

1 S1 ∗ ∗ ∗ (W (xk , x¯k , yk ) + W (xk∗ , x¯k∗ , yk∗ ) − W S2 (xk∗ , x¯k∗ , yk∗ )), for k ∈ K, 2 (11.39)

11 Dynamically Consistent Bi-level Cooperation

225

and coalition will receive a coalition imputation Ω S2 (k, xk∗ , x¯k∗ , yk∗ ) = =

1 S2 ∗ ∗ ∗ (W (xk , x¯k , yk ) + W (xk∗ , x¯k∗ , yk∗ ) − W S1 (xk∗ , x¯k∗ , yk∗ )), for k ∈ K, 2 (11.40)

Condition 3 shows that along the grand coalition cooperation state trajectory, each coalition will receive a coalition payoff under which the excess of grand coalition payoff over the sum of the non-cooperative coalitional payoffs is shared equally by the coalitions. In addition, according to Condition 1 and Condition 2, the cooperative payoffs of individual players in these coalitions will be Condition 4 Under cooperation, the payoff distribution procedure will assign player i ∈ S1 an imputation V i (k, xk∗ , x¯k∗ , yk∗ ) ∗ ξ i (k, xk∗ , x¯k∗ , yk∗ ) = Ω S1 (k, xk∗ , x¯k∗ , yk∗ )  , for k ∈ K, V i (k, xk∗ , x¯k∗ , yk∗ ) i∈S1

(11.41) and assign player j ∈ S2 an imputation V¯ i (k, xk∗ , x¯k∗ , yk∗ ) ∗ , for k ∈ K. ξ¯ j (k, xk∗ , x¯k∗ , yk∗ ) = Ω S2 (k, xk∗ , x¯k∗ , yk∗ )  V¯ i (k, xk∗ , x¯k∗ , yk∗ ) i∈S2

(11.42) The imputation guided a dynamically consistent solution to the grand coalition which satisfies (i) Group optimality: 



ξ i (k, xk∗ , x¯k∗ , yk∗ ) +

i∈S1



∗ ξ¯ j (k, xk∗ , x¯k∗ , yk∗ ) = W (xk∗ , x¯k∗ , yk∗ ) for k ∈ K

j ∈S2

(ii) Individual coalition rationality: Ω S1 (k, xk∗ , x¯k∗ , yk∗ ) ≥ W S1 (xk∗ , x¯k∗ , yk∗ ) and Ω S2 (k, xk∗ , x¯k∗ , yk∗ ) ≥ W S2 (xk∗ , x¯k∗ , yk∗ ) for k ∈ K

226

L. A. Petrosyan and D. W. K. Yeung

(iii) Individual player rationality: ∗

ξ i (k, xk∗ , x¯k∗ , yk∗ ) ≥ ξ i (k, xk∗ , x¯k∗ , yk∗ ), for i ∈ S1 and k ∈ K, ∗ ξ¯ j (k, xk∗ , x¯k∗ , yk∗ ) ≥ ξ¯ j (k, xk∗ , x¯k∗ , yk∗ ), for j ∈ S2 and k ∈ K,

(11.43)

In particular, group optimality in (i) ensures that the sum of imputations to all players equals the Pareto optimal payoff of the grand coalition. Individual coalition rationality in (ii) ensures that the payoff of each coalition under cooperation is no less than its payoff under non-cooperation. This implies that both coalitions have no incentive to deviate from cooperation. Individual player rationality in (iii) guarantees that the payoff of each player under grand coalition cooperation is no less than his payoff when the coalitions act non-cooperatively.

11.4.2 Payoff Distribution Procedure In a dynamically consistent solution, the specific agreed-upon optimality principle must remain effective at any stage of the game along the optimal state trajectory. Since at any stage of the game the players are guided by the same optimality principles and hence do not have any ground for deviation from the previously adopted optimal behavior throughout the game. Therefore for dynamic consistency to be satisfied, the imputation in Conditions 3 and 4 has to be maintained at all the +1 stages along the cooperative trajectory {xk∗ , x¯k∗ , yk∗ }Tk=1 . Crucial to the analysis is the formulation of a payment mechanism so that the imputation satisfying Conditions 3 and 4 can be realized. We follow Yeung and Petrosyan [11, 12] and formulate a Payoff Distribution Procedure (PDP) using (i) βki (xk∗ , x¯k∗ , yk∗ ) to denote the payment that player i stage k under the cooperative agreement along the +1 {xk∗ , x¯k∗ , yk∗ }Tk=1 , and j ∗ ∗ ∗ ¯ (ii) βk (xk , x¯k , yk ) to denote the payment that player j stage k under the cooperative agreement along the +1 {xk∗ , x¯k∗ , yk∗ }Tk=1 .

∈ S1 will receive at cooperative trajectory ∈ S2 will receive at cooperative trajectory

The payment scheme involving βki (xk∗ , x¯k∗ , yk∗ ) constitutes a PDP in the sense that the imputation to player j over the stages from k to T + 1 can be expressed as: ∗ ξ i (k, xk∗ , x¯k∗ , yk∗ )

⎧ T ⎨  ⎩

ζ =k+1

=

βki (xk∗ , x¯k∗ , yk∗ ) 

1 βζi (xζ∗ , x¯ζ∗ , yζ∗ ) 1+r



ζ −1

1 1+r

k−1 + 

1 + qTi +1 (xT∗ +1 , x¯T∗ +1 , yT∗ +1 ) 1+r

⎫ T ⎬ ⎭

for i ∈ S1 and k ∈ K, (11.44)

11 Dynamically Consistent Bi-level Cooperation

227

Similarly, the payment scheme involving β¯k (xk∗ , x¯k∗ , yk∗ ) constitutes a PDP in the sense that the imputation to player j over the stages from k to T + 1 can be expressed as: j

 k−1 1 ¯ξ j ∗ (k, x ∗ , x¯ ∗ , y ∗ ) = β¯ j (x ∗ , x¯ ∗ , y ∗ ) + k k k k k k k 1+r ⎫ ⎧   ζ −1 T ⎬ T ⎨  1 1 j j + q¯T +1 (xT∗ +1 , x¯T∗ +1 , yT∗ +1 ) β¯ζ (xζ∗ , x¯ζ∗ , yζ∗ ) ⎩ 1+r 1+r ⎭ ζ =k+1

for j ∈ S2 and k ∈ K. (11.45) A theorem characterizing a formula for βki (xk∗ , x¯k∗ , yk∗ ), for k ∈ K and i ∈ S1 , j and β¯k (xk∗ , x¯k∗ , yk∗ ), for j ∈ S2 and k ∈ K, which yields Condition 4 is provided below. Theorem 11.3 A payment equalling βki (xk∗ , x¯k∗ , yk∗ )

 =

1 1+r

k−1 !

" ∗ ∗ ∗ ∗ ∗ ξ i (k, xk∗ , x¯k∗ , yk∗ ) − ξ i (k + 1, xk+1 , x¯k+1 , yk+1 )

(11.46) given to player i ∈ S1 at stage k ∈ K, and a payment equalling j β¯k (xk∗ , x¯k∗ , yk∗ ) =



1 1+r

k−1 ! " ∗ ∗ ∗ , x¯ ∗ , y ∗ ) ξ¯ j (k, xk∗ , x¯k∗ , yk∗ ) − ξ¯ j (k + 1, xk+1 k+1 k+1

(11.47) given to player j ∈ S2 at stage k ∈ K, where ∗ xk+1 = xk∗ + a − bxk∗ − ∗ x¯k+1 = x¯k∗ + a¯ − b¯ x¯k∗ − ∗ yk+1 =



 i∈S1



i∈S 1 i∈S1

ψk(u)i (xk∗ , x¯k∗ , yk∗ ), (u)i ¯ ψ¯ k (xk∗ , x¯k∗ , yk∗ ),

αki ψk(u)i (xk∗ , x¯k∗ , yk∗ ) +

i∈S1 (μ)i

ωki ψk





α¯ k ψ¯ k

j ∈S 2 

(xk∗ , x¯k∗ , yk∗ )(y)1/2 +

j

(u)j

ω¯ k ψ¯ k j

(xk∗ , x¯k∗ , yk∗ )−

(μ)j

(xk∗ , x¯k∗ , yk∗ ) − σyk∗ ,

j ∈S2

would lead to the realization of the imputation in Condition 4.

(11.48)

228

L. A. Petrosyan and D. W. K. Yeung

Proof See Appendix A. When players in both coalitions are using the cooperative strategies along the +1 , player i ∈ S1 will receive in stage k the cooperative trajectory {xk∗ , x¯k∗ , yk∗ }Tk=1 amount πki (xk∗ , x¯k∗ , yk∗ ) = [Pki ψk

(u)i

(xk∗ , x¯k∗ , yk∗ )−

(u)i

cki [ψk

(xk∗ , x¯k∗ , yk∗ )]2 (v)i +[pki ψk (xk∗ , x¯k∗ , yk∗ )(xk∗ )1/2 xk∗

−γki [ψk(v)i (xk∗ , x¯k∗ , yk∗ )]2 ] − ck(a)i [ψk

(μ)i

(xk∗ , x¯k∗ , yk∗ )]2 − hik x¯k∗ − λik yk∗ ]

However, according to the agreed upon imputation, player i will receive βki (xk∗ , x¯k∗ , yk∗ ) at stage k. Therefore a transfer payment T Pki (xk∗ , x¯k∗ , yk∗ ) = βki (xk∗ , x¯k∗ , yk∗ ) − πki (xk∗ , x¯k∗ , yk∗ ), for i ∈ S1 and k ∈ K. (11.49) will be given to player i to yield a cooperative imputation to player i ∈ S1 according to Conditions 3 and 4. Similarly, a transfer payment T P¯k (xk∗ , x¯k∗ , yk∗ ) = β¯k (xk∗ , x¯k∗ , yk∗ ) − π¯ k (xk∗ , x¯k∗ , yk∗ ), for j ∈ S2 and k ∈ K. (11.50) j

j

j

will be given to player j to yield a cooperative imputation to player j ∈ S2 according to Conditions 3 and 4. In this section we considered the problem of dynamic stability (time consistency) only for the cooperative solution imputations (11.41) and (11.42) inside the coalitions S1 and S2 . The same problem arises on the first level of cooperation, when the grand coalition is formed between coalitions S1 and S2 as individual players in which coalitions (players) S1 and S2 allocate the joint payoff according (11.39) and (11.40). The analysis of the problem is similar to the one considered in this section. The same analysis can be made to investigate the dynamic stability (time consistency) of the solution imputation (11.25), where on the first level of cooperation the coalitions S1 and S2 . use Nash equilibrium strategies.

11.5 Concluding Remarks This paper provides a dynamic game with coalitional blocs. A dynamically consistent cooperative solution involving a grand coalition is provided. It involves dynamically consistent coalition payoff distribution to the coalitions and dynamically consistent payoff distribution to players within a coalition. The sharing mechanism adopted above can reflect the case where without coalition the individual

11 Dynamically Consistent Bi-level Cooperation

229

non-cooperative payoffs of the players from S1 are lower than the individual non-cooperative payoffs of the players from S2 . After coalitions are formed, the coalitional payoff of S1 becomes close to the coalitional payoff of S2 . Hence, coalition S1 has the bargaining power to negotiate for equal sharing of the excess of the grand coalition payoff over the sum of coalitional payoffs. The analysis can be applied to different scenarios and different agreed-upon gain sharing optimality principles. Finally, the game can be extended to the situation with more than two coalition blocs. Given that the formation of coalitional blocs is becoming more prevalent further analysis along this line is expected. Acknowledgements Supported by Russian Science Foundation the grant Optimal Behavior in Conflict-Controlled Systems (No. 17-11-01079)

Appendix From (11.44) one can obtain k  1 ∗ ∗ , x¯ ∗ , y ∗ ) = β i (x ∗ , x¯ ∗ , y ∗ ) ξ i (k + 1, xk+1 + k+1 k+1 k+1 k+1 k+1 k+1 1+r ⎫ ⎧ ζ −1 T ⎬   T ⎨  1 1 βζi (xζ∗ , x¯ζ∗ , yζ∗ ) + qTi +1 (xT∗ +1 , x¯T∗ +1 , yT∗ +1 ) ⎩ 1+r 1+r ⎭ ζ =k+2

for i ∈ S1 and k ∈ K, (11.51) Upon substituting (11.51) into (11.44) yields ∗ ξ i (k

+

1, xk∗ , x¯k∗ , yk∗ )

=

βki (xk∗ , x¯k∗ , yk∗ )



1 1+r

k−1



∗ , x¯ ∗ , y ∗ ) + ξ i (k + 1, xk+1 k+1 k+1

(11.52) with

∗ ξ i (T

+ 1, xT∗ +1 , x¯T∗ +1 , yT∗ +1 )

From (11.45) one can obtain

=

qTi +1 (xT∗ +1 , x¯T∗ +1 , yT∗ +1 )



1 1+r

T

 k 1 ¯ξ j ∗ (k + 1, x ∗ , x¯ ∗ , y ∗ ) = β¯ i (x ∗ , x¯ ∗ , y ∗ ) + k+1 k+1 k+1 k+1 k+1 k+1 k+1 1+r ⎫ ⎧   ζ −1 T ⎬ T ⎨  1 1 j j + q¯T +1 (xT∗ +1 , x¯T∗ +1 , yT∗ +1 ) β¯ζ (xζ∗ , x¯ζ∗ , yζ∗ ) ⎩ 1+r 1+r ⎭ ζ =k+2

for j ∈ S2 and k ∈ K, (11.53)

230

L. A. Petrosyan and D. W. K. Yeung

Upon substituting (11.53) into (11.45) yields ∗ ξ i (k

+ 1, xk∗ , x¯k∗ , yk∗ )

=

βki (xk∗ , x¯k∗ , yk∗ )



1 1+r

k−1



∗ , x¯ ∗ , y ∗ ) + ξ i (k + 1, xk+1 k+1 k+1

(11.54) ∗

with ξ¯ j (T + 1, xT∗ +1 , x¯T∗ +1 , yT∗ +1 ) = q¯T +1 (xT∗ +1 , x¯T∗ +1 , yT∗ +1 ) Hence Theorem 11.3 follows.

j



1 1+r

T

References 1. Aziz, N., Hossain, B., Mowlah, I.: Does the quality of political institutions affect intra-industry trade within trade blocs? The ASEAN perspective. Appl. Econ. 50(33) (2018). https://doi.org/ 10.1080/00036846.2018.1430336 2. Eichengreen, B., Irwin, D.: (1995) Trade blocs, currency blocs, and the reorientation of trade in the 1930s. J. Int. Econ. 38, 1–24 (2018) 3. Frankel, J.A., Rose, A.: The endogeneity of the optimum currency area criteria. Econ. J. 108, 1009–1025(1998) 4. Kandogan, Y.: Consistent estimates of regional blocs’ trade effects. Rev. Int. Econ. 16(2), 301– 314 (2008) 5. Mansfield, E.D., Pevehouse, J.C.: Trade blocs, trade flows, and international conflict. Int. Organ. 54(4), 775–808 (2000) 6. McDonald, F., Tuselmann, J.H., Voronkova, S., Golesorkhi, S.: The strategic development of subsidiaries in regional trade blocs. Multinatl. Bus. Rev. 19(3), 256–271 (2011) 7. Petrosyan, L.A., Gromova, E.V.: Two-level cooperation in coalitional differential games. Ann. Ekaterinburg Math. Inst. 20(3), 193–203 (2014) 8. Petrosyan, L.A., Yeung, D.W.K.: Two-level cooperation in a class n-person differential games. IFAC-PepersOnline 51.32, 585–587 (2018) 9. Schott, J.J.: Trading blocs and the world trading system. World Econ. 14(1), 1–17 (1991) https://doi.org/10.1111/j.1467-9701.1991.tb00748.x 10. Wolf, N., Ritschl, A.O.: Endogeneity of currency areas and trade blocs: evidence from a natural experiment. KYKLOS 64(2), 291–312 (2011) 11. Yeung, D.W.K., Petrosyan, L.A.: Subgame consistent solutions for cooperative stochastic dynamic games. J. Optim. Theory Appl. 145(3), 579–596 (2010) 12. Yeung, D.W.K., Petrosyan, L.A.: Subgame Consistent Cooperation: A Comprehensive Treatise. Springer, Singapore (2016)

Chapter 12

Optimal Incentive Strategy in a Markov Game with Multiple Followers Dmitry B. Rokhlin and Gennady A. Ougolnitsky

Abstract We consider a dynamic stochastic incentive problem for the case of several followers who play a Markov game. The leader ε-optimal strategy is determined via a stochastic control problem. This result generalizes a similar authors’ result for the case of one follower and the known results from the static theory of control in organizational systems. An illustrative example is given. Keywords Leader · Incentive strategy · Multiple followers · Production management

12.1 Introduction The theory of incentives is being developed very intensively for the past four decades [8, 12–15]. Indeed, adequate incentive mechanisms play the key role in any successful management. The most wide-spread mathematical model of an incentive mechanism is the so-called inverse Stackelberg game [11, 18, 19]. In this model the leader (principal) reports to one or several followers (agents) her strategy as a function of his/their control actions. The optimal leader’s strategy maximizes her payoff on the set of best responses of the follower. If there are several followers then their best response is usually defined as the set of Nash equilibria in the game of followers. In fact, the most comprehensive mathematical theory of inverse Stackelberg games was proposed by Germeier (1976) in the static case (English translation 1986: [4]) and developed by Kononenko in the dynamic case [6, 7]. According to this approach, the leader rewards the follower if the latter accept her optimal plan, and punish him, otherwise. It was proved that such control mechanism forms an

D. B. Rokhlin · G. A. Ougolnitsky () I.I. Vorovich Institute of Mathematics, Mechanics and Computer Sciences of Southern Federal University, Rostov-on-Don, Russia e-mail: [email protected]; [email protected] © Springer Nature Switzerland AG 2019 L. A. Petrosyan et al. (eds.), Frontiers of Dynamic Games, Static & Dynamic Game Theory: Foundations & Applications, https://doi.org/10.1007/978-3-030-23699-1_12

231

232

D. B. Rokhlin and G. A. Ougolnitsky

ε-optimal strategy of the leader. In management this idea was specified by Novikov in his theory of organizational systems [16, 17]. The respective incentive model uses specific payoff functions. The leader (center) maximizes the difference between her revenue and incentive payments, meanwhile the follower (agent) maximizes the difference between the incentive payments and his labor costs. For the static case it is shown that in the optimal control mechanism the leader compensates the follower his cost, if the latter accepts leader’s optimal plan, and pays him nothing, otherwise. In turn, the optimal plan maximizes the difference between leader’s revenue and follower’s cost. This result was generalized for multiple followers with different patterns of interaction. In [21] we considered an incentive model with Markov dynamics and discounted optimality criteria in the case of complete information, discrete time and infinite planning horizon. In this model, the leader economically influences the follower by selecting an incentive function that depends on the system state and the actions of the follower, who employs closed-loop control strategies. System dynamics, revenues of the leader and costs of the follower depend on the system state and the leader’s actions. We showed that finding an approximate solution of the inverse Stackelberg game reduces to solving the optimal control problem with the objective function equal to the difference between the revenue of the leader and the costs of the follower. Here an ε-optimal strategy of the leader is an economic incentive for the follower to follow this optimal control strategy. In this paper the approach of [21] is developed for the case of several followers. Section 12.2 contains the problem formulation. The main result is exposed in Sect. 12.3. Section 12.4 describes an illustrative example. Section 12.5 concludes.

12.2 Problem Formulation Consider a Markov game, determined by • a finite state space X; • a finite action set A = A1 × · · · × AN ; • set-valued mappings x → Γi (x) ⊂ Ai , describing the admissible actions of a player i at a state x ∈ X; • a transition kernel q(y|x, a): 

q(y|x, a) = 1,

q(y|x, a) ≥ 0,

y∈X

• player one-step rewards ri : X × A "→ R; • and a discounting factor β ∈ [0, 1).

x ∈ X, a ∈ A;

12 Optimal Incentive Strategy in a Markov Game with Multiple Followers

233

It is assumed that the players utilize stationary Markov strategies, determined by the distributions πi (a|x): 

πi (a|x) = 1,

πi (a|x) ≥ 0,

x ∈ X.

ai ∈Γi (x)

Any tuple π = (π1 . . . πN ) induces a unique probability measure Px,π on (X×A)∞ by the formula Px,π (x0 , a0 , . . . , xn , an ) = δx (x0 )

n−1 N ??

πi (ati |xt )q(xt +1|xt , at )

t =0 i=1

N ?

πi (ani |xn ),

i=1

where δx (y) = 1, y = x; δx (y) = 0, y = x. Put π = (π1 , . . . , πN ) and denote by Ex,π the expectation with respect to the measure Px,π . The expected discounted gain of the player i equals to Ji (x, π) = Ex,π

∞ 

β t ri (xt , at ).

t =0

This formalism describes a dynamic game, where at each step t, based on the information on the state xt , the players independently select random actions ati ∈ Γi (xt ) in accordance with the distributions πi (·|xt ). After that the system moves to the new state xt +1 with the probability q(xt +1|xt , at ), and the players receive the rewards ri (xt , at ). A tuple π ∗ = (π1∗ , . . . , πN∗ ) is called a Nash equilibrium, if ∗ Ji (x, πi , π−i ) ≤ Ji (x, π ∗ ),

x ∈ X, i = 1, . . . , N

∗ ) for the tuple, obtained for any distribution πi . We use a standard notation (πi , π−i ∗ ∗ from π by replacing πi with πi . It is known that the described game possesses a Nash equilibrium: see, e.g., [3, Theorem 3.1]. Each component πi∗ of an equilibrium π ∗ is a solution of the optimization problem ∗ Ji (x, πi , π−i ) → max .

(12.1)

πi

Let us rewrite this objective function as follows: ∗ Ji (x, πi , π−i )=

∞ 

βt

t=0

=

∞  t=0

βt

 y,a

ri (y, a)

 y,a

 x0 ,a0

∗ ) (xt = y, at = a) ri (y, a)P(x,πi ,π−i

···

 xt−1 ,at−1

δx (x0 )πi (a0i |x0 )

? j =i

πj∗ (a0 |x0 )q(x1 |x0 , a0 ) × . . . j

234

D. B. Rokhlin and G. A. Ougolnitsky

i × πi (at−1 |xt−1 )

?

πj∗ (at−1 |xt−1 )q(y|xt−1 , at−1 ) j

j =i

=

∞  t=0

βt



πj∗ (a j |y)πi (a i |y)

j =i



···

y,a i x0 ,a0i

i |xt−1 ) × πi (at−1

N ?

δx (x0 )πi (a0i |x0 )

πj∗ (a0 |x0 )q(x1 |x0 , a0 ) × . . . j

a0−i j =i

i xt−1 ,at−1

?

?

πj∗ (at−1 |xt−1 )q(y|xt−1 , at−1 )πi (a i |y) j

?

−i j =i at−1

πj∗ (a j |y)ri (y, a).

a −i j =i

The summation is performed over y ∈ X, aki ∈ Γi (xk ), a i ∈ Γi (y). Put ∗ r−i (y, a i ) =

? a −i

πj∗ (a j |y)ri (y, a),

j =i

∗ q−i (y|x, a i ) =

? a −i

πj∗ (a j |x)q(y|x, a).

j =i

(12.2) Then ∗ Ji (x, πi , π−i )=

∞  t =0

βt





···

y,a i x0 ,a0i

∗ δx (x0 )πi (a0i |x0 )q−i (x1 |x0 , a0 ) × . . .

i xt−1 ,at−1

∗ ∗ (y|xt −1, at −1 )πi (a i |y)r−i (y, a i ). × πi (ati−1 |xt −1)q−i

Let us define the probability measure ∗,−i Px,π (x0 , a0i , . . . , xn , ani ) = δx (x0 ) i

n−1 ? t =0

∗ πi (ati |xt )q−i (xt +1 |xt , ati )πi (ani |xn )

on the Cartesian product (X × Ai )∞ . It is easy to see that ∗ Ji (x, πi , π−i )=

∞  t=0

βt



∗ ∗,−i ∗,−i r−i (y, a i )Px,π (xt = y, ati = a i ) = Ex,π i i

∞ 

∗ β t r−i (xt , ati ),

t=0

y,a i

∗,−i ∗,−i where Ex,π i is the expectation with respect to Px,πi . Thus, the problem (12.1) corresponds to the Markov decision process with the state space X, action sets x → Γi (x) ⊂ Ai , the transition kernel and rewards (12.2), and the discounting factor β ∈ [0, 1). As is well known, the value function ∗ ∗ ) = sup Ji (x, πi , π−i ), Vi (x, π−i πi

12 Optimal Incentive Strategy in a Markov Game with Multiple Followers

235

where the supremum is taken over all randomized Markov strategies πi , is a unique solution of the equation ∗ Vi (x, π−i )

=

max

= max

a i ∈Γi (x)

⎧ ⎨

r ∗ (x, a i ) + ⎩ −i

⎧ ⎨ 

p∈Δ(Γi (x)) ⎩ a i ∈Γi (x)

β



q(y|x, a

y∈X

∗ r−i (x, a i )p(a i )



i

⎫ ⎬

∗ )Vi (y, π−i ) ⎭

 

i

q(y|x, a )p(a

y∈X a i ∈Γi (x)

i

⎫ ⎬

∗ )Vi (y, π−i ) , ⎭

where Δ(Γi (x)) is the set of probability measures on a finite set Γi (x). Moreover, for an optimal randomized Markov strategy π i , by the optimality principle, we have ∗ Vi (x, π−i )=



∗ r−i (x, a i )π i (a i |x) + β

a i ∈Γi (x)

 

∗ q(y|x, a i )π i (a i |x)Vi (y, π−i ).

y∈X a i ∈Γi (x)

Thus, the optimal randomized Markov strategies are precisely described by the relation ⎧ ⎨  ∗ π i (a i |x) ∈ arg max r−i (x, a i )p(a i ) p∈Δ(Γi (x)) ⎩ i a ∈Γi (x)



  y∈X a i ∈Γi (x)

⎫ ⎬

∗ q(y|x, a i )p(a i )Vi (y, π−i ) . ⎭

In this paper we consider the players as producers. Their rewards are of the form ri = ci (x, a) − gi (x, a), where gi is the cost function of i-th producer (a follower) and ci is a non-negative incentive function selected by the manager (the leader). An action a i is considered as the production level of i-th producer. It is assumed that for zero production level the cost of i-th producer equals to zero, regardless of the production levels of other players: gi (x, (0, a −i )) = 0.

(12.3)

For a fixed tuple c = (c1 , . . . , cN ) the followers with the objective functions Ji (x, π, c) = Ex,π

∞  t =0

β t (ci (xt , at ) − gi (xt , at ))

236

D. B. Rokhlin and G. A. Ougolnitsky

play the game, resulting in a Nash equilibrium. Denote by T (c) the set of such equilibriums. The expected discounted gain of the leader equals to JL (x, π, c) = Ex,π

∞ 

 β

t

f (xt , at ) −

t =0

N 

 ci (xt , at ) ,

i=1

where f (x, a) is her one-step revenue. The aim of the leader is to maximize the discounted gain, corresponding to the “worst” Nash equilibrium, over all tuples of incentive functions: G(x, c) = inf JL (x, π, c) → max . c

π∈T (c)

This problem can be classified as an inverse Stackelberg game [18, 19], since leader’s strategies depend on the strategies of the followers. Let us call VL (x) = sup G(x, c) c

the value of the leader. A tuple cε is called an ε-Stackelberg solution, if VL (x) − ε ≤ G(x, cε ),

x ∈ X.

A pair (cε , π), π ∈ T (cε ) is called an ε-Stackelberg equilibrium.

12.3 The Main Result Consider the auxiliary Markov decision process with the state space X, the action sets x → Γ (x) = Γ1 (x) × · · · × ΓN (x) ⊂ A, the transition kernel q, the rewards r(x, a) = f (x, a) −

N 

gi (x, a),

i=1

and the discounting factor β ∈ [0, 1). The objective function J (x, π) = Ex,π

∞  t =0

 β

t

f (xt , at ) −

N 

 gi (xt , at )

(12.4)

i=1

of this problem corresponds to the hypothetical leader’s gain, which she could receive being engaged in the production process without resorting to the services of producers, and if her costs coincided with their aggregate costs.

12 Optimal Incentive Strategy in a Markov Game with Multiple Followers

237

By the economic intuition, the optimal gain V (x) = sup J (x, π)

(12.5)

π

is an upper bound for VL (x), since an attempt to shift the production onto the shoulders of producers, using an economic incentive mechanism, cannot lead to greater profits. It appears, however, that these quantities coincide. As is known, among the optimal solutions of (12.5), there exists a deterministic Markov strategy π i (y|x) = δui (x)(y), where u(x) ∈ arg max

⎧ ⎨

a∈Γ (x) ⎩

r(x, a) + β

 y∈X

⎫ ⎬ q(y|x, a)V (y) , ⎭

and V is the unique solution of the dynamic programming equation V (x) = max

⎧ ⎨

a∈Γ (x) ⎩

r(x, a) + β

 y∈X

⎫ ⎬ q(y|x, a)V (y) . ⎭

Theorem 12.1 For the Stackelberg game, described in Sect. 12.2, the following holds true: (i) The value of the leader coincides with (12.5): VL (x) = V (x),

x ∈ X.

(ii) The function ciε (x, a) = gi (x, a) + (1 − β)

ε I i i , N {a =u (x)}

ε > 0,

i = 1, . . . , N

is an ε-solution, and the correspondent Nash equilibrium is unique: T (cε ) = {δu1 (x) , . . . , δuN (x)}. Proof 1) Let π ∗ ∈ T (c), c ≥ 0. Then by the definition of a Nash equilibrium and assumption (12.3), we get ∗ ), c) = Ex,(δ i ,π ∗ Ji (x, π ∗ , c) ≥ Ji (x, (δ0i , π−i 0

−i )

= Ex,(δ i ,π ∗ 0

−i )

∞ 

β t (ci (xt , at ) − gi (xt , at ))

t =0 ∞  t =0

β t ci (xt , at ) ≥ 0.

238

D. B. Rokhlin and G. A. Ougolnitsky

It follows that JL (x, π ∗ , c) ≤ JL (x, π ∗ , c) +

N 

Ji (x, π ∗ , c) = J (x, π ∗ ) ≤ V (x).

i=1

Hence, VL (x) = sup inf JL (x, π, c) ≤ V (x). c≥0 π∈T (c)

(12.6)

2) Clearly, Ji (x, π, cε ) = Ex,π

∞ 

β t (1 − β)

t =0

ε ε I i i ≤ . N {at =u (xt )} N

Furthermore, ∗ ∗ Vi (x, π−i , cε ) = sup Ji (x, (πi , π−i ), cε ) πi

is a unique solution of the dynamic programming equation ⎧ ⎨

⎫ ⎬  ε ∗ ∗ ∗ Vi (x, π−i , c ε ) = max q−i (y|x, a i )Vi (y, π−i , cε ) . (1 − β) I{a i =ui (x)} + β ⎭ N a i ∈Γi (x) ⎩ y∈X

But the constant ∗ Vi (x, π−i , cε ) =

ε N

is a solution. Hence, 7 8 ε (1 − β) I{a i =ui (x)} , N a i ∈Γi (x)

ui (x) ∈ arg max

and πi∗ = δui (x) is the optimal strategy of i-th player. In particular, we proved that T (cε ) = {δu1 (x), . . . , δuN (x) }. 3) For the obtained unique Nash equilibrium the leader gain equals to  ∞ N   ∗ t JL (x, π , cε ) =G(x, cε ) = Ex,π ∗ β f (xt , at ) − gi (xt , at ) t =0



N 

(1 − β)

i=1



∞ N   i=1 t =0

ε I i i N {at =u (xt )}

β t (1 − β)

i=1



= J (x, (δu1 (x), . . . , δuN (x)))

ε = V (x) − ε. N

12 Optimal Incentive Strategy in a Markov Game with Multiple Followers

239

Using also (12.6), we conclude that cε is an ε-Stackelberg solution, and VL (x) = V (x).   Theorem 12.1 shows that an optimal leader’s strategy is normative: the leader should • solve the optimal control problem (12.4), • communicate a fixed optimal plan ui (x) to each follower, • cover the costs of the followers with an incentive premium ε. Unfortunately, the implementation of this strategy requires the knowledge of the cost functions gi of the followers and the transition kernel q. Although such assumptions are far from reality, the value VL (x) = V (x) can serve as a useful benchmark. In the next section we consider an illustrative example, where the leader strategy is of a very simple structure.

12.4 An Example Assume that the leader is a manager of a large company, which produces a complex commodity. Orders (a1 , . . . , aN ) for the components are placed at N suppliers. Each supplier produces components only of one type and his production cost is determined by a non-decreasing function gi (ai ). To simplify the notation assume that the production of a unit of the complex commodity requires one copy of each component. Let the demand be determined by a sequence of independent identically distributed non-negative random variables ξt with values in Z+ = {0, 1, . . . }. The state process xt ∈ Z+ describes the amount of goods in stock xt +1 = max{xt + ut − ξt +1 , 0},

ut = min{a1,t , . . . , aN,t }

ut ∈ [0, M − xt ].

(12.7)

Here ut is the amount of the commodity produced at the time moment t. The upper bound in (12.7) is related to the warehouse capacity. Let P be the market price of the unit good. Leader’s reward is the difference between the profit from the sale of goods at the time moment t + 1 and the storage cost during the period [t, t + 1): f (Xt , at , ξt +1 ) = min{Xt + ut , ξt +1 }P − h(xt + ut ). The storage cost function h is non-decreasing. Clearly, it is not profitable for the manager to order different number of components from different suppliers, since the amounts a i − ut would be lost. Thus, according to the scheme described in Sect. 12.3, the manager should consider the

240

D. B. Rokhlin and G. A. Ougolnitsky

stochastic control problem J (x, π) = Ex,π

∞ 

 β

t

min{xt + ut , ξt +1 } · P − h(xt + ut ) −

t =0

N 

 gi (ut )

i=1

with the state space {0, . . . , M}. The related value function (12.5) satisfies the dynamic programming equation V (x) =

max

(P ·E min{x+u, ξ }−h(x+u)−g(u)+βEV (max{x+u−ξ, 0})),

u∈{0,...,M−x}

where g(u) = N i=1 gi (ui ), and ξ is distributed as ξt . Furthermore, let P(ξ = k) = pk , k ∈ Z+ . Then ⎛ V (x) =

max

u∈{0,...,M−x}

⎛ + ⎝1 −

x+u−1 



x+u−1 

pk (kP + βV (x + u − k))

k=0





pk ⎠ ((x + u)P + βV (0)) − h(x + u) − g(u)⎠ ,

x ∈ {0, . . . , M}.

k=0

We solved this equation numerically by the value iteration method (see, e.g., [20]) for the Poisson distribution pk = λk e−λ /k! of the demand and the following cost functions: & K + Cu, u > 0 g(u) = h(z) = Az. 0, u = 0, The values K > 0 correspond to fixed ordering costs. For M = 40, λ = 15, P = 14, C = 2, A = 1 the results are presented in Fig. 12.1. In fact, we obtained two well-known types of optimal strategies [1, 2]: a base stock policy & u(x) =

S − x, x ≤ S 0, x ≥ S

for K = 0 and an (s, S) policy: & u(x) =

S − x, x ≤ s 0, x ≥ s

for K > 0. Parameters s, S can extracted from Fig. 12.1. So, manager’s optimal ordering strategy is determined by one or two parameters: S or (s, S). But an ε-optimal incentive strategy, specified in Theorem 12.1, requires

12 Optimal Incentive Strategy in a Markov Game with Multiple Followers Optimal strategies for K=0 20.0

b=0

b = 0.99 b=0

17.5

15.0

15.0

12.5

12.5

10.0

10.0

u*

u*

Optimal strategies for K=4 20.0

b = 0.99

17.5

241

7.5

7.5

5.0

5.0

2.5

2.5

0.0

0.0 0

10

20 x

30

40

0

10

20 x

30

40

Fig. 12.1 Base stock strategies (the left figure) and (S, s) strategies (the right figure)

the knowledge of the coefficients Ki , Ci , i = 1, . . . , N in the supplier cost functions, which are assumed to look like & gi (u) = where

N

i=1 Ki

= K,

N

i=1 Ci

Ki + Ci u, u > 0 0, u = 0,

= C. Indeed, by Theorem 12.1 we have

ciε (x, a) = gi (a i ) + (1 − β)

ε I i , N {a =u(x)}

ε > 0.

12.5 Conclusion We considered a dynamic stochastic incentive problem for the case of multiple followers who play a Markov game. It is shown that to implement an ε-optimal incentive mechanism, the leader should solve an optimal control problem to maximize the difference between her revenue and the aggregate cost of the followers, report a fixed optimal plan to each follower, and cover the costs of the followers with an incentive premium ε. From one side, this result generalizes (for finite state, finite action case) a similar result of [21]; from the other side, it is a generalization of a similar result for multiple followers in the static case [16]. The main practical problem is that the implementation of this ε-optimal strategy requires the knowledge of the cost functions of all followers and the transition kernel of the game. We also mention technical problems in the case of general state-action

242

D. B. Rokhlin and G. A. Ougolnitsky

spaces, since in this setup the existence of a Nash equilibrium in stationary Markov strategies is not guaranteed [9, 10]. In the future it is supposed to address these issues as well as to consider the problem setup in continuous time. Also, there are several related variants of a stochastic Stackelberg game, which can be adapted to the problem under consideration: long-run average cost criterion [22], multiple leaders [23], cooperative games [5]. Acknowledgements The research is supported by the Russian Science Foundation, project 1719-01038.

References 1. Bensoussan, A.: Dynamic Programming and Inventory Control. IOS Press, Amsterdam (2011) 2. Beyer, D., Cheng, F., Sethi S.P., Taksar, M.: Markovian Demand Inventory Models. Springer, New York (2009) 3. Dutta, P.K., Sundaram, R.K.: The equilibrium existence problem in general Markovian games. In: Majumdar, M. (ed.) Organizations with Incomplete Information, pp. 159–207. Cambridge University Press, Cambridge (1998) 4. Germeier, Yu.B.: Non-Antagonistic Games. Reidel Publishing Co., Dordrecht (1986) 5. Hou, D., Lardon, A., Driessen, T.S.H.: Stackelberg oligopoly TU-games: characterization and nonemptiness of the core. Int. Game Theory Rev. 19(4), 1750020-1–1750020-16 (2017) 6. Kononenko, A.F.: On multi-step conflicts with information exchange. USSR Comput. Math. Math. Phys. 17, 104–113 (1977) 7. Kononenko, A.F.: The structure of the optimal strategy in controlled dynamic systems. USSR Comput. Math. Math. Phys. 13–24 (1980) 8. Laffont, J.-J., Martimort, D.: The Theory of Incentives: The Principal-Agent Model. Princeton University Press, Princeton (2002) 9. Levy, Y.: Discounted stochastic games with no stationary Nash equilibrium: two examples. Econometrica 81, 1973–2007 (2013) 10. Levy, Y.J., McLennan, A.: Corrigendum to “Discounted stochastic games with no stationary Nash equilibrium: two examples”. Econometrica 83, 1237–1252 (2015) 11. Li, T., Sethi, S.P.: A review of dynamic Stackelberg game models. Discrete Cont. Dyn. – B 22, 125–159 (2017) 12. Muthoo, A.: Bargaining Theory with Applications. Cambridge University Press, Cambridge (1999) 13. Myerson, R.: Incentive compatibility and the bargaining problem. Econometrica 47, 61–73 (1979) 14. Myerson, R.: Optimal coordination mechanisms in generalized principal-agent models. J. Math. Econ. 10, 67–81 (1982) 15. Myerson, R.: Mechanism design by an informed principal. Econometrica 51, 1767–1798 (1983) 16. Novikov, D.: Theory of Control in Organizations. Nova Science Publishers, New York (2013) 17. Novikov, D.A., Shokhina, T.E.: Incentive mechanisms in dynamic active systems. Autom. Remote Control 64, 1912–1921 (2003) 18. Olsder, G.J.: Phenomena in inverse Stackelberg games. Part 1: static problems. J. Optim. Theory Appl. 143, 589–600 (2009) 19. Olsder, G.J.: Phenomena in inverse Stackelberg games. Part 2: dynamic problems. J. Optim. Theory Appl. 143, 601–618 (2009)

12 Optimal Incentive Strategy in a Markov Game with Multiple Followers

243

20. Puterman, M.: Markov Decision Processes. Discrete Stochastic Dynamic Programming. Wiley, New York (1994) 21. Rokhlin, D.B., Ougolnitsky, G.A.: Stackelberg equilibrium in a dynamic stimulation model with complete information. Autom. Remote Control 79, 701–712 (2018) 22. Saksena, V.R., Cruz Jr., J.B.: Optimal and near-optimal incentive strategies in the hierarchical control of Markov chains. Automatica 21(2), 181–191 (1985) 23. Trejo, K.K., Clempner, J.B., Poznyak, A.S.: An optimal strong equilibrium solution for cooperative multi-leader-follower Stackelberg Markov chains games. Kybernetika 52(2), 258– 279 (2016)

Chapter 13

How Oligopolies May Improve Consumers’ Welfare? R&D Is No Longer Required! Alexander Sidorov

Abstract The paper studies how the industry concentration affects the Social welfare, which is measured as consumer’s indirect utility. Schumpeterian hypothesis tells that the harmful effect of oligopolization may be offset by positive externalities of concentration, such as innovations in technologies, R&D, etc. This contradicts to traditional neoliberal paradigm, which insists that concentration is always harmful for the end consumers. We study a general equilibrium model with two types of firms and imperfect price competition. Firms of the first type are monopolistic competitors with negligible impact to market statistics, subjected to typical assumptions, e.g., free entry until zero-profit cut-off. Unlike this, the firms of second type assumed to have non-zero impact to market statistics, in particular, to consumer’s income via distribution of non-zero profit across consumers-shareholders. Moreover, these large firms (oligopolies) allow for dependence of profits on their strategic choice, generating so called Ford effect. The first result we present is that in case of CES utility the concentration effect is generically harmful for consumers’ well-being. However, the result may be different for preferences, generating the demand with Variable Elasticity of Substitution (VES). We find the natural assumption on VES utilities, which hold for most of the commonly used classes of utility functions, such as Quadratic, CARA, HARA, etc., which allows to obtain the positive welfare effect, i.e., to justify Schumpeter hypothesis. Keywords Bertrand competition · Monopolistic competition · Additive preferences · Ford effect · Schumpeter hypothesis

A. Sidorov () Novosibirsk State University, Novosibirsk, Russia Sobolev Institute of Mathematics, Novosibirsk, Russia © Springer Nature Switzerland AG 2019 L. A. Petrosyan et al. (eds.), Frontiers of Dynamic Games, Static & Dynamic Game Theory: Foundations & Applications, https://doi.org/10.1007/978-3-030-23699-1_13

245

246

A. Sidorov

13.1 Introduction More than 40 years ago the monopolistic competitive Dixit-Stiglitz “engine” replaced “the elegant fiction of competitive equilibrium” (cf.,[12]). An idea that firms are price-makers even if their number is “very large”, e.g., continuum, now is a common wisdom. However, an assumption on negligibility of the monopolistic competing firms in terms of the aggregated market statistics, is nothing more than a brand new elegant fiction. When firms are sufficiently large, they face demands, which are influenced by the income level, depending, in turn, on the total consumers profits. As a result, firms must anticipate accurately what the total income will be. In addition, firms should be aware that they can manipulate the income level, whence their “true” demands, through their own strategies with the aim of maximizing profits [8]. This feedback effect is known as the Ford effect. In popular literature, this idea is usually attributed to Henry Ford, who raised wages at his auto plants to five dollars a day in January 1914. Ford wrote “our own sales depend on the wages we pay. If we can distribute high wages, then that money is going to be spent and it will serve to make... workers in other lines more prosperous and their prosperity is going to be reflected in our sales”, see [7, pp. 124–127]. To make things clear, we have to mention that the term “Ford effect” may be used in various specifications. As specified in [5], the Ford effect may have different scopes of consumers income, which is sum of wage and a share of the distributed profits. The first (extreme) specification is to take a whole income parametrically. This is one of solutions proposed by Marschak and Selten [12] and used, for instance, by Hart [9]. This case may be referred as “No Ford effect”. Another specification (also proposed by Marschak and Selten [12] and used by d’Aspremont et al. [5]) is to suppose that firms take into account the effects of their decision on the total wage bill, but not on the distributed profits, which is still treated parametrically. This case may be referred as “Wage Ford effect”’ and it is what exactly Henry Ford meant in above citation. One more intermediate specification of The Ford effect is an opposite case to the previous one: firms take wage as given, but take into account the effects of their decisions on distributed profits. This case may be referred as “Profit Ford effect”. Finally, the second extreme case, Full Ford effect, assumes that firms take into account total effect of their decisions, both on wages and on profits. These two cases are studied in newly published paper [4]. In what follows, we shall assume that wage is fixed. This includes the way proposed by Hart [9], in which the workers fix the nominal wage through their union. This assumption implies that only the Profit Ford effect is possible, moreover, firms maximize their profit anyway, thus being price-makers but not wage-makers, they have no additional powers at hand in comparison to No Ford case, with except the purely informational advantage— knowledge on consequences of their decisions. Nevertheless, as we show in [17], this advantage allows firms to get more market power, which vindicate the wisdom “Knowledge is Power”. However, an assumption that industry consists only of the large firms, oligopolies, is another extreme point of view, which hardly fits the reality. More naturally is to suggest that industry consists of both types of firms—“large” oligopolies and “small” monopolistic competitors. This is not a brand new idea; the

13 How Oligopolies May Improve Consumers’ Welfare?

247

mixed models of such type were studied, for example, in [16] and in [10]. However, an approach used in presented paper has some substantial differences. In paper [16] oligopolistic and monopolistic competitive sectors differ by their nature: large firms compete with quantities (Cournot-type competition), while small firms compete with prices in Dixit-Stiglitz style. As result, this heterogeneity generates counterintuitive outcome: Social Welfare always increases with respect to number of oligopolies. This result seems even more unusual as we take into account that under CES utility the pure monopolistic competition provides social optimum. In working paper [10] consumers’ utility is quasi-linear quadratic function, which completely kills income effect, all the more—the Ford effect. One of the main conclusions of this paper is that in equilibrium the large firm “mimic” the monopolistic competitors, i.e., set the same price and output as small firms, which seems to be completely unrealistic. The presented paper uses an additive separable utility function of general type. Unlike [16], large firms compete with prices (Bertrand-type competition) as well as small monopolistic competitors do, but the main distinctive feature of oligopolies is that its impact to market statistics is not negligible. As result, oligopolies charge the higher equilibrium prices than monopolistic competitors. The main focus of this research is on welfare aspects, e.g., how many oligopolies are needed to foster the Social Welfare? Put it differently, are there any circumstances supporting Schumpeter hypothesis on positive effect of oligopolization on Social Welfare, or concentration is always harmful for consumers? This problem may have many aspects, we only use indirect utility as a measure of Social Welfare. J. Schumpeter’s arguments to justify existence of oligopolies [14] are based on positive externalities of oligopolistic activity, e.g., R&D, innovations, which offset their negative effect on Social Welfare. The capitalistic economy it is a dynamic process, in which able leaders operate to develop innovation, that they allow to increase the market shares and to enjoy temporary monopolistic profits: such perspective is the greater incentive to the development of the innovation. The Schumpeterian hypothesis tells us that there is a close relationship between innovation and market structure: only companies that have market power, at the best the monopolist, can support the costs related to innovation, indeed, is the innovation itself determines that a monopoly position, the defense of which brings further innovation a virtuous circle. In fact, once a company, through innovation, achieves a monopoly position, tends to reinforce this position, controlling and extending the period of benefit due to agreements with innovation and patents. Therefore, only the large firms are induced to seek innovation to increase and strengthen its market power, which is why the monopoly is more rewarding for the purpose of economic growth compared to the competitive market, see [15]. At every stage of the innovation process, the innovative entrepreneur of success exploits the competitive advantage and monopolize the market, see [1]. In other words, Schumpeter contradicts the position of the classical economists according to which competition stimulates performance, arguing that the prospect of achieving a monopoly rent induces firms to invest in R& D and promotes, as well, dynamic efficiency, i.e. ability of the economic system to generate innovation. May be the best example of such positive impact are the AT&T Bell Laboratories, which

248

A. Sidorov

invented and developed so many useful things, among them, the transistor, the laser, the solar cells, the operating system UNIX, the programming languages C and C++, the wireless, wired, optical communications technologies and systems, not to mention their fundamental inventions in scientific areas, such as the radio astronomy and the information theory, that was awarded nine(!) times with Nobel Prize. Nevertheless, by decision of by U.S. regulators, the “mother” company, AT&T was sentenced to breakup up this into several “Baby Bells”. The main result of the present paper is in line with this decision: the most effective way to increase the Social Welfare is to keep size of oligopolies in certain frames. Moreover, in case of “oversized” corporation it is better to divide it into parts of smaller size, than to replace it with myriad of the monopolistic competitive firms. However, this is a hard task to compare directly the harmful effect of oligopolization with indirect spill-over effect of these positive externalities, that’s why Schumpeter’s arguments may be criticized, even if the positive effect of oligopoly’s activity is obvious. Strictly speaking, the presented paper does not follow the original Schumpeterian argumentation on apology of oligopolies. This is rather different aspect of the same problem, which may be combined with Schumpeter’s reasoning in the following way. The innovation activity of large firms is just a kind of redistribution of positive firm’s profit across population. In fact, this redistribution affects only the part of population, e.g., engineers, industrial scientists, skilled workers, etc, but we may consider it as across-the-board effect due to, for example, the income spill-over. In this connection, the Ford effect, which increases the firm’s market power, is just an amplifier of the profit’s value. The main idea of the presented paper is as follows: assume that the total firm’s profit was redistributed, somehow or other, across all consumers without any additional bonuses, then, under certain natural assumptions, the overall welfare effect will be positive. All the more, any by-product of the technological invention may only increase the social welfare, which seems obvious (unless you believe Ned Ludd), but it is hard to measure this positive effect directly. We found sufficiently weak condition, which holds for many popular classes of utility function, including CARA, HARA (with exception of pure CES), separable quadratic, etc., and demonstrate the positive welfare effect, supporting Schumpeter hypothesis, provided that oligopolies keep its size in certain interval. As a natural policy implication of this result is a usual anti-trust procedure when “too large” mono/oligopoly, e.g., AT&T Corporation, have to be divided into “not so large” parts to increase the total Social Welfare, however, there is no need to “smash it into dust”, i.e., into mass of monopolistic competitive firms.

13.2 Mixed Model of Oligopolistic and Monopolistic Competition Economy involves two sector with different competitions regimes supplying a horizontally differentiated good and one production factor—labor. There is a continuum [0, 1] of identical consumers endowed with one unit of labor. The labor market is perfectly competitive and labor is chosen as numeraire.

13 How Oligopolies May Improve Consumers’ Welfare?

249

13.2.1 Firms and Consumers There are two types of firms: a finite number N ≥ 2 of the “big” oligopolistic firms and a continuum mass M of the “small” monopolistic competitive firms. Each variety is produced by a single firm and each thus 4 3 firm5 produces a single variety, the horizontally differentiated good x = xk ≥ 0 5 k ∈ {1, . . . , N} ∪ M consists N of two parts—the finite dimensional oligopolistic 5 4 part (x1 , . . . , xN ) ∈ R+ , and 3 MC-produced bundle of varieties xj ≥ 0 5 j ∈ M . To operate, every monopolistic competitive firm needs a fixed requirement f > 0 and a marginal requirement c > 0 of labor. The same holds for oligopolies, but we shall denote the corresponding labor costs as F and C, assuming that they may—not necessary, will—differ from monopolistic competitive labor requirements. Wage is also normalized to 1, then the cost of producing qk units of variety i ∈ {1, . . . , N} is equal to F + 1 · C · qi , while the monopolistic competitive production costs are f + 1 · c · qj for j ∈ M. Oligopolies and the monopolistic competitive sector share the same labor market with total amount of labor equal to 1, due to normalization condition. Let denote as s the relative share of labor covered by one oligopolistic firm, then the total amount of oligopolistic employee is equal to N · s, while the monopolistic competitive share in labor market is 1 − Ns. Later we shall discuss how s may be determined, but anyways it is naturally to assume that firms of MC-sector treats both N and s parametrically. Remark 13.1 It was noticed in [13] that the oligopoly i may be equivalently treated as a “cartel” of the non-atomic firms [i − 1, i], where each non-atomic firm j ∈ [i − 1, i] has negligible impact, but they act in concord due to cartel agreement. This means that mass of all non-atomic firms is actually a concatenation of two intervals on R—the cartel interval [0, N] and interval M, consisting of the monopolistic competitors. From this point of view, our model became “seamless”— the new oligopolistic firm is simply a new cartel agreement over the bunch of monopolistic competitors, and vice versa, instead of “destroyed” oligopoly i we obtain immediately the continuum [i − 1, i] of monopolistic competitors. The possible changes in production cost f "→ F , c "→ C after this side effect we can interpret as a cost effect of “cartelizing”, which may be negative (i.e., F > f , C > c), positive (i.e., F < f , C < c) or ambiguous (e.g., F > f , C < c). Moreover, this approach allows us compare fixed costs of oligopolies and monopolistic competitors in correct way, because at first glance the monopolistic competitive labor requirements f are negligible in comparison to oligopolistic ones F . In fact, f is negligible to F in the same way, as length is negligible to area, because f is value of cost function at j ∈ M, thus the total fixed labor requirements of monopolistic competitive sector is equal to f · ν(M), where ν is a measure on the real line R, while the fixed labor requirements of one oligopolistic firm is equal to F · ν([i − 1, i]) = F .

250

A. Sidorov

Fig. 13.1 f > F , how this could be possible?

To make this idea more visual, look at the following diagram, Fig. 13.1, which represent the case of monopolistic competitive labor requirements f are “greater” that oligopolistic ones F . Consumers share the same additive preferences given by U (x) =

N 

# u(xi ) +

i=1

u(xj )dj

(13.1)

M

where u(·) is thrice continuously differentiable, strictly increasing, strictly concave, and such that u(0) = 0. Following [18], we define the relative love for variety (RLV) as ru (x) = −

xu

(x) u (x)

which is strictly positive for all x > 0. Technically RLV coincides with the Arrow-Pratt’s relative risk-aversion concept, which we avoid to use due to possible misleading association in terms, because in our model there is no any uncertainty or risk considerations. Nevertheless, one can find some similarity in meaning of these concepts as the RLV measures the intensity of consumers’ variety-seeking behavior. Under the CES, we have u(x) = x ρ where ρ is a constant such that 0 < ρ ≤ 1, thus implying a constant RLV given by 1 − ρ. Another example of additive preferences is provided by Behrens and Murata in [2] who consider the CARA utility u(x) = 1 − exp(−αx), where parameter α > 0 is so called absolute love for variety (which is defined pretty much like the absolute risk aversion measure −u

(x)/u (x); the RLV is now given by αx. Very much like the Arrow-Pratt’s relative risk-aversion, the RLV measures the intensity of consumers’ variety-seeking behavior. A consumer’s income is equal to her wage plus her share in total profits. Since we focus on symmetric equilibria, consumers must have the same income, which

13 How Oligopolies May Improve Consumers’ Welfare?

251

means that profits have to be uniformly distributed across consumers. In this case, a consumer’s income y is given by y =1+

N 

# Πi +

πj dj ≥ 1

(13.2)

M

i=1

where the profit made by the oligopoly selling amount qi of variety i ∈ {1, . . . , N} at price pi is given by Πi = (pi − C)qi − F,

(13.3)

while the profit of monopolistic competitive firm j ∈ M is equal to πj = (pj − c)qj − f.

(13.4)

Evidently, the income level varies with firms’ strategies. A consumer’s budget constraint is given by N 

# pi xi +

pj xj dj = y

(13.5)

M

i=1

The first-order condition for utility maximization yields u (xk ) = λpk ,

(13.6)

where λ is the Lagrange multiplier N λ=

i=1 u

(x

i )xi

+ y

@ M

u (xj )xj dj

> 0,

(13.7)

which implies that the inverse demand yu (xk ) @

i=1 u (xi )xi + M u (xj )xj dj

pk = N

(13.8)

for all varieties 3k ∈ {1, . 5. . , N} ∪ M. 4 Let p = pk ≥ 0 5 k ∈ {1, . . . , N} ∪ M be a price profile. In this case, consumers’ demand functions xi (p) are obtained by solving the system of Eq. (13.8) where consumers’ income y is now defined as follows: y(p) = 1 − NF − Mf +

# N  (pi − C)xi (p) + (pj − c)xj (p)dj. i=1

M

252

A. Sidorov

It follows from (13.7) that the marginal utility of income λ is a market aggregate that depends on the price profile p. Indeed, the budget constraint N 

pj xj (p) = y(p)

j =1

implies that $N % # 

 1

λ(p) = xi (p)u (xi (p)) + xj (p)u xj (p) dj . y(p) M

(13.9)

i=1

Since u (x) is strictly decreasing, the demand function for variety i is thus given by xk (p) = ξ(λ(p)pk ),

(13.10)

where ξ is the inverse function to u . Moreover, firm i’s profits can be rewritten as follows: Πi (p) = (pi − C)xi (p) − F = (pi − C)ξ(λ(p)pi ) − F,

(13.11)

πj (p) = (pj − c)xj (p) − f = (pj − c)ξ(λ(p)pj ) − f.

(13.12)

13.2.2 Market Equilibrium The market equilibrium is defined by the following conditions: (1) each consumer maximizes her utility (13.1) subject to her budget constraint (13.5), (2) each firm k maximizes its profit, (13.3) or (13.4), with respect to pk , (3) product market clearing: xk = qk

for all k ∈ {1, . . . , N} ∪ M,

(4) labor market clearing: NF + C

N  i=1

# qi = Ns, Mf + c

qj dj = 1 − Ns. M

13 How Oligopolies May Improve Consumers’ Welfare?

253

Conditions (3) and (4) imply that x¯ ≡

1 s−F , xˆ ≡ C c



1 − Ns −f M

 (13.13)

are the only candidate symmetric equilibrium demands for “oligopolistic” and “monopolistic competitive” varieties, respectively. Remark 13.2 In what follows we can refer s as a size of oligopoly, which is equivalent to the widely used determination of firm’s size in terms of output, due to (13.13).

13.2.3 Free Entry Condition for Monopolistic Competitors In equilibrium, profits must be non-negative for firms to operate. Moreover, if profit is strictly positive, this causes an enter of new firms, which will stop when profit become zero. One of the main difference between “big” and “small” firms in our model is that “small” firms are free to enter into industry, while formation of oligopolies is typically subjected to more sophisticated laws, e.g., to some kind of antitrust legislation. Consider symmetric price/quantity profile, i.e., pi = p, ¯ xi = x¯ =

s−F C

for all i = 1, . . . , n, pj = p, ˆ xj = xˆ =

1 c



1 − Ns −f M



for all j ∈ M, then the Zero-Profit Condition πj = 0 takes the following form (pˆ − c)xˆ = f ⇐⇒

f pˆ − c = , pˆ pˆ xˆ

while the symmetric budget constraint N p¯ x¯ + M pˆ xˆ = y = 1 − NF + N(p¯ − C)x¯ + 0 ⇐⇒ pˆ xˆ =

1 − NF − Ns + NF 1 − Ns 1 − NF − NC x¯ = = , M M M

(13.14)

254

A. Sidorov

taking into account (13.13). Substituting this equation into (13.14), we obtain the following form of Zero-Profit Condition μ≡

Mf (1 − Ns)μ pˆ − c = ⇐⇒ M = , pˆ 1 − Ns f

(13.15)

where μ is a monopolistic competitive markup, which implies 1 xˆ ≡ c



1 − Ns −f M

 =

f 1−μ > 0, c μ

(13.16)

provided that μ satisfies 0 < μ < 1.

13.2.4 When Bertrand Meets Ford As shown by (13.6) and (13.7), the income level influences firms’ demands, whence their profits. As a result, firms must anticipate accurately what the total income will be. In addition, firms should be aware that they can manipulate the income level, whence their “true” demands, through their own strategies with the aim of maximizing profits (see e.g. [8]). This feedback effect is known as the Ford effect (see [5]). Note that these considerations may concern only the “big” oligopolistic firms i ∈ {1, . . . , N}. The “non-atomic” monopolistic competitive firms j ∈ M make negligible effect on market statistics, e.g., on consumer’s income y(p) or marginal utility of money λ(p). In other words, we obtain that ∂λ ∂y = = 0, j ∈ M ∂pj ∂pj while for the oligopolistic competitive firms this is not true. The generalized Bertrand equilibrium is a vector p∗ such that pi∗ maximizes Πi (pi , p∗−i ) for all i ∈ {1, . . . , N}. Applying the first-order condition to (13.11) yields ξ(λpi ) pi − C

=− pi λpi ξ (λpi ) 1 +

pi ∂λ λ ∂pi

,

(13.17)

which involves ∂λ/∂pi because λ depends on p. The monopolistic competitive firms, however, get a less complicated form of the FOC xj u

(xj ) pj − c ξ(λpj ) = − = ru (xj ). =− pj λpj ξ (λpj ) u (xj )

(13.18)

13 How Oligopolies May Improve Consumers’ Welfare?

255

Indeed, by definition of function ξ = (u )−1 , we obtain ξ(λpj ) = ξ(u (xj )) = xj , λpj = u (xj ) is a first-order condition in consumer’s problem, while ξ (λpj ) =

1 u

(xj )

easily follows from the formula of derivative of inverse function. As result, we obtain that equilibrium markup in monopolistic competitive sub-sector satisfies the following equation μ=

pˆ − c = ru (x) ˆ = ru pˆ



 f 1−μ · , c μ

(13.19)

which depends only on function u(x) and fraction f/c and does not depend on number of oligopolies N and their share on labor market s. It was proved in [18] that (13.19) has unique solution 0 < μ < 1 provided that function u(x) satisfies the following additional assumption ru (x) = −

xu

(x) xu

(x) < 1, ru (x) = −

  f 1−μ μC u > u (∞) f c μ

(e.g., u (0) = +∞, u (∞) = 0), then there exists the “optimum size” of oligopolistic firm s ∗ , which maximizes indirect utility V (s, N) for arbitrary N. Proof Indeed, for any given n > 0 the indirect utility function V is strictly concave with respect to s, the First-Order Condition ∂V ∂B = 0 ⇐⇒ = 0 ⇐⇒ u ∂s ∂s



s−F C



  f 1−μ μC u = , f c μ

258

A. Sidorov

implies that the optimum size is given by s ∗ = F + Cξ



  f 1−μ μC u · , f c μ

−1 where ξ ≡ u is an inverse function.

(13.25)  

In the next statement the notion [x] means an integer part of number x. Proposition 13.1 Let the oligopolistic firm size s be given, whether optimal or not, then there are the following options for optimum number N ∗ of oligopolies 1. B(s) < 0 ⇒ N ∗ = 0, i.e., the optimum structure of industry is pure monopolistic competitive 2. B(s) = 0 ⇒ N ∗ may by arbitrary, i.e., consumers’ welfare is indifferent to the structure of industry 3. B(s) > 0 ⇒ N ∗ = [s −1 ], i.e., the optimum structure of industry may be achieved under the maximum concentration of production. Proof All statements of the Proposition immediately follow from linearity of V with respect to N. To fix ideas more visually, let’s consider the following diagram, Fig. 13.2, which shows cases of the two function incidence: the negative term of B(s)

the possible f 1−μ μ s, which is linear with respect to s, versus strictly concave function fu c μ

s−F  u C , which positively affects B(s).

Note that angular coefficient fμ u fc 1−μ depends only on the monopolistic μ competitive primitives f, c, because equilibrium markup μ is determined by Eq. (13.19), which does not depend on F and C. Therefore, increasing in fixed costs from F to F shifts the curve to the right side, tends to Case 1, as well as increasing if marginal costs from C to C , which make the curve more slanting. As result, for sufficiently large oligopolistic costs F, C we obtain Case 1, while for sufficiently small F and/or C tend to Case 3. Intermediate Case 2 seems to be negligible, but actually it corresponds to very important specific example.

13.3.1 CES Case Theorem 13.1 Let u(x) = x ρ , 0 < ρ < 1, then 1. f 1−ρ cρ < F 1−ρ C ρ ⇒ B(s ∗ ) < 0, which implies that for all admissible s the optimum industry structure is pure monopolistic competition. 2. f 1−ρ cρ = F 1−ρ C ρ ⇒ B(s ∗ ) = 0, which implies that for all s = s ∗ the optimum industry structure is pure monopolistic competition, while at s = s ∗ consumers’ welfare does not depend on structure of industry.

13 How Oligopolies May Improve Consumers’ Welfare?

259

Fig. 13.2 Cases of Proposition 13.1

3. f 1−ρ cρ > F 1−ρ C ρ ⇒ B(s ∗ ) > 0, which implies that there exists interval (s, s), such that for all s < s < s optimum industry structure is achieved at maximum concentration, otherwise the optimum industry structure is pure monopolistic competition. Proof The direct calculations show, that μ = 1 − ρ and s∗ = F +

c C

ρ 1−ρ

fρ , 1−ρ

which implies B(s ∗ ) = u

 ∗       ρ s −F fρ ρ f 1−μ ∗ c 1−ρ F μC 1−ρ (1 − ρ) − − u s = . C f c μ c C f

 

The rest is a simple algebra. f 1−ρ cρ

Remark Let condition firm of “optimum” size s ∗ is

ρ

=

F 1−ρ C ρ

holds, then the output of oligopolistic 1

c 1−ρ fρ (f 1−ρ cρ ) 1−ρ ρ F ρ s∗ − F = = = . x¯ = ρ ρ C 1−ρ C 1−ρ C 1−ρ 1 − ρ C 1−ρ

260

A. Sidorov

ρ Comparing this formula with corresponding output xˆ = fc 1−μ = fc 1−ρ of μ monopolistic competitive firm we obtain that “optimum” oligopoly produces at the same level as monopolistic competitor facing the production costs F, C. In particular, if F = f , C = c the “optimum” oligopoly does not differ, in terms of output, from monopolistic competitor.

13.3.2 General Case For non-CES consumers’ preferences a classification if thee Cases of Theorem 13.1 via model primitives become more complicated. Nevertheless, computer simulation shows that result is, in general, the same: all three cases of Theorem 13.1 are possible and result depends on relations between oligopolistic and monopolistic competitive labor requirements. As before, larger F, C tends to Case 1, while for F < f , C < c we typically obtain Case 3. The continuity considerations imply that Case 2 is possible for specific relation between F, C and f, c, which is not so simple as in case of CES utility. In what follows we assume that F = f , C = c, which allows us to disregard the obvious cost effects of oligopolization, both positive and negative. In CES case these conditions imply that concentration industry, i.e., transformation of myriads of monopolistic competitors into small number of oligopolistic ‘cartels’ generically leads to decreasing of social welfare, the only exception is the very specific case when oligopolies ‘mimic’ the monopolistic competitors. In non-CES case the outcome changes significantly. Consider the following additional assumption on elementary utility function u(x). Assumption There exist finite limits of the first three derivatives: 0 < u (0) < ∞, −∞ < u

(0) < 0 and |u

(0)| < ∞. It is easy to see that the main Assumption on utility function implies that the relative love for variety ru (x) is strictly increasing function, at least in some neighborhood of zero, because ru (0) = 0 and ru (x) > 0 for all x > 0. Obviously CES utility doesn’t meet these requirements, while many other classes of preferences, for example, HARA u(x) = (x + α)ρ − α ρ with α > 0, CARA u(x) = 1 − e−αx , quadratic u(x) = αx − x 2 /2, fit Assumption well. Our purpose is to show that under this assumption only Case 3 of Proposition 13.1 take place, at least when fixed labor requirement f is sufficiently small with respect to total labor supply L = 1. Theorem 13.2 Let Assumption holds and fraction f/c is sufficiently small, then an inequality B(s ∗ ) > 0 holds, and there exists interval (s, s), such that for all s < s < s optimum industry structure is achieved at maximum concentration, otherwise the optimum industry structure is pure monopolistic competition.

13 How Oligopolies May Improve Consumers’ Welfare?

261

Proof See Appendix The obtained result shows that there are two extreme solutions: the socially optimum structure of industry is either pure monopolistic competition, or maximum degree of oligopolization. This follows from linearity of the indirect utility function V (N, s) with respect to N, which, in turn, is a consequence of additivity of utility function U (x) =

N 

# u(xi ) +

u(xj )dj. M

i=1

Let’s consider the augmented additive utility U (x, N, M) = N −ν

N 

u(xi ) + M −ν

# (13.26)

u(xj )dj, M

i=1

where 0 < ν < 1 is a new parameter. This generalizes augmented CES-utility used in [3] with ν = 1/σ . This choice of ν was argued by authors as a way to guarantee that “an increase in the number of products does not increase utility directly”, see [3] for discussions. In our case, these multipliers, N −ν and M −ν , scale back consumers’ live for variety. The similar calculations that this modification results in the following indirect utility function V (s, N) = N 1−ν u(x) ¯ + Mˆ 1−ν u(x), ˆ where x¯ =

f 1−μ ˆ (1 − Ns)μ s−F , xˆ = , M= C c μ f

are equilibrium outputs and mass of mass of monopolistic competitive firms, while equilibrium markup μ is determined by Eq. (13.19). Let s be given, then the obvious calculations show that the optimum number of oligopolies may be obtained as a ∂V maximum of convex function V (s, N). Solving the first order condition ∂N = 0 with respect to N we obtain the optimum number of oligopolies 1 N (s) = · s

u



u

s−F 1/ν C

s−F 1/ν

+u

C



f 1−μ c μ

1/ν

μs f

1−ν , ν

which is obviously less than 1/s for any given s. This number is almost always noninteger, therefore, we may choose the best integer approximation of N ∗ . Anyway, we obtain that for an augmented additive utility with lowered taste for variety the mixed industry structure is an optimum solution. There is also an optimum size s ∗

262

A. Sidorov

of oligopolies, however, in this case it depends on N and cannot be calculated in closed form.

13.4 Concluding Remarks One may draw the obvious parallels between this result and Proposition 7 of the paper [16] mentioned in Introduction. It should be mentioned, however, that these two models are too different to compare them directly, in particular, oligopolies in model of [16] are Cournot-type, i.e., they have the different nature in comparison to Bertrand-type oligopolies in the presented paper. It seems like large firms extrude the small ones. Instead of extruding, oligopolies in presented model (especially in case F = f, C = c) rather substitute the monopolistic competitors, possessing the same nature with difference in the way of administrative setup. The authors’ explanation of positive welfare effect of the large firm entry is “...that the procompetitive effect associated with the presence of large firms dominates the decrease in diversity generated by the exit of several small firms.” This sounds well, but the main problem of cited result is the fact that it is obtained under CES utility, which provides social optimum at pure monopolistic competitive equilibrium, see [6]. In this case any oligopolistic “substitution” in place of monopolistic competitors should be harmful for social welfare, which agrees with Theorem 1.2 of presented paper. This means that in framework of paper [16] the positive effect of extruding of monopolistic competitors by “alien” oligopolies prevails over expected negative effect of simple substitution. The presented model is free of this “alien effect”, nevertheless, the main result guarantees the positive welfare effect under certain assumptions. It was mentioned above that the main Assumption on utility function implies that the relative love for variety ru (x) is strictly increasing. On the other hand, this condition implies the entry excess, which means that an equilibrium mass of monopolistic competitive firms, determined by zero-profit condition, exceeds the social optimum. Thus, our explanation of the positive effect of oligopolization will be based on another intuition: substitution of monopolistic competitors with oligopolies of proper size generates the same effect as reduction of the firms’ mass, which results in increasing of social welfare. This mechanism of indirect reduction is more realistic than attempts to stop the entry for myriads of monopolistic competitors before they reach a zero profit level. The similar idea in quite different model of vertical integration of upstream monopoly with downstream monopolistic competitors was used in the paper [11]. Corollary 1 of that paper states that “vertical integration is Pareto improving if there is a sufficient degree of increasing preference of variety at the origin”, which is similar to main result of the presented paper. Acknowledgements I owe special thanks to J. F. Thisse and M. Parenti for long hours of useful and discussions in CORE (Louvain-la-Neuve, Belgium). This work was supported by the Russian Foundation for Basic Researches under grant No.18-010-00728 and by the program of fundamental scientific researches of the SB RAS No. I.5.1, Project No. 0314-2016-0018.

13 How Oligopolies May Improve Consumers’ Welfare?

263

Appendix Proof Direct calculations show that assumption imply the following equalities ru (0) = ru (0) = 0, ru (0) = −

u

(0) , u (0)

the last equality is a simple consequence of identity ru (x) =

ru (x) (1 + ru (x) − ru (x)) , x

which is proved in [18]. Let’s denote fraction f/c as ϕ, then the optimum size of oligopoly is equal to s ∗ = f + cξ



      f 1−μ μ 1−μ μc u · =c ϕ+ξ u ϕ· . f c μ ϕ μ

Note that μ is actually an implicit function μ(ϕ) determined by equation   1−μ μ = ru ϕ μ

(13.27)

as well as equilibrium output of the monopolistic competitive firms x(ϕ) ≡ ϕ

1 − μ(ϕ) , μ(ϕ)

thus the Case 3 B(s ∗ ) > 0 of Proposition 13.1 is equivalent to inequality       μ(ϕ) μ(ϕ) μ(ϕ) u (x(ϕ)) − u (x(ϕ)) ϕ + ξ u (x(ϕ)) > 0. Δ(ϕ) ≡ u ξ ϕ ϕ ϕ (13.28) We shall prove that Δ(0) ≡ lim Δ(ϕ) = 0, Δ (0) ≡ lim Δ (ϕ) > 0, ϕ→0

ϕ→0

which implies that for all sufficiently small ϕ = f/c the maximum value of coefficient B(s ∗ ) = Δ(ϕ) > 0. First we calculate a series of more simple limits, then we obtain (13.28) as a combination of these parts.

264

A. Sidorov

The following statements lim μ(ϕ) = 0, lim

ϕ→0

ϕ→0

ϕ = 0, lim x(ϕ) = 0. ϕ→0 μ(ϕ)

were proved in [13] in general case without additional Assumption. In what follows, it will be crucial that there are exist finite limit values u (0) and u

(0), provided by Assumption. As result, it also provides existence and finiteness of the all following limits. Using L’Hospital rule we obtain μ(ϕ) u (x(ϕ)) u (x(ϕ)) = lim (1 − μ(ϕ)) = u (0). ϕ→0 ϕ ϕ→0 x(ϕ) lim

Taking into account that by definition ξ is inverse function to u , we have the following identity  lim ξ

ϕ→0

 

μ(ϕ) u (x(ϕ)) = ξ u (0) = 0. ϕ

Using (13.27) and L’Hospital rule we obtain μ2 (ϕ) μ(ϕ) ru (x(ϕ)) u

(0) = lim ru (x(ϕ)) = lim (1 − μ(ϕ)) = ru (0) = − . ϕ→0 ϕ→0 ϕ ϕ→0 x(ϕ) ϕ u (0) lim

This implies that

2  μ (ϕ) u

(0) μ2 (ϕ) 1 1 lim μ (ϕ)μ(ϕ) = lim =− , = lim

ϕ→0 2 ϕ→0 ϕ 2 ϕ→0 ϕ 2u (0)

and  ϕ

μ (ϕ)μ(ϕ) = ϕ→0 μ2 (ϕ)    u (0) u

(0) 1 = 1 − −

− = . u (0) 2u (0) 2

lim x (ϕ)μ(ϕ) = lim

ϕ→0

 1 − μ(ϕ) −

This allows us to calculate more complicated limits:  lim

ϕ→0

u (x(ϕ)) x(ϕ)



x(ϕ)u (x(ϕ)) − u (x(ϕ)) x (ϕ)μ(ϕ) = ϕ→0 x 2 (ϕ)

μ(ϕ) = lim

xu (x) − u (x) u

(0)

, · lim x (ϕ)μ(ϕ) = x→0 ϕ→0 x2 4

= lim

13 How Oligopolies May Improve Consumers’ Welfare?

265

and    μ(ϕ) u (x(ϕ)) u (x(ϕ)) μ(ϕ) = lim (1 − μ(ϕ)) μ(ϕ) = lim ϕ→0 ϕ→0 ϕ x(ϕ)    u (x(ϕ)) u (x(ϕ)) 3u

(0)

+ (1 − μ(ϕ)) . = lim −μ(ϕ)μ (ϕ) μ(ϕ) = ϕ→0 x(ϕ) x(ϕ) 4 

Taking into account that ξ = (u )−1 and therefore ξ = 1/u

, we obtain

ξ lim



μ(ϕ) ϕ u (x(ϕ))

μ(ϕ)

ϕ→0

= lim ξ ϕ→0





 μ(ϕ) μ(ϕ) u (x(ϕ)) μ(ϕ) ϕ u (x(ϕ)) = ϕ μ(ϕ)μ (ϕ) =−

3u (0) . 2u

(0)

These calculations imply    μ(ϕ) = u (0) = 0 u (x(ϕ)) lim u ξ ϕ→0 ϕ and    μ(ϕ) μ(ϕ) lim u (x(ϕ)) ϕ + ξ u (x(ϕ)) = u (0) · 0 = 0, ϕ→0 ϕ ϕ which means that Δ(0) = 0. Moreover, u (ξ(z)) = z by definition, therefore,     

μ(ϕ) μ(ϕ)

μ(ϕ) ξ u (x(ϕ)) u (x(ϕ)) u (x(ϕ)) − Δ (ϕ) = u ξ ϕ ϕ ϕ      μ(ϕ) μ(ϕ) μ(ϕ) u (x(ϕ)) u (x(ϕ)) − u (x(ϕ)) − − ϕ+ξ ϕ ϕ ϕ  

μ(ϕ) μ(ϕ)

μ(ϕ) − u (x(ϕ)) ξ u (x(ϕ)) u (x(ϕ)) = ϕ ϕ ϕ   μ(ϕ) ϕ = −μ(ϕ) u (x(ϕ)) − ϕ μ(ϕ)

 ξ μ(ϕ) u (x(ϕ))  ϕ μ(ϕ) μ(ϕ) u (x(ϕ)) − u (x(ϕ)) −μ(ϕ) ϕ μ(ϕ) ϕ



266

A. Sidorov

which implies lim Δ (ϕ) = −

ϕ→0

  3u

(0) 1 3u

(0) 3u (0) ·0− −

− u (0) = u (0) > 0. 4 4 2u (0) 8

This inequality implies that function Δ(ϕ) is strictly positive in some neighborhood of 0, which completes the proof of Theorem.  

References 1. Aghion, P., Howitt, P.: Research and development in the growth process. J. Econ. Growth 1, 49–73 (1996) 2. Behrens, K., Murata, Y.: General equilibrium models of monopolistic competition: a new approach. J. Econ. Theory 136, 776–787 (2007) 3. Blanchard, O., Giavazzi, F.Y.: Macroeconomic effects of regulation and deregulation in goods and labor markets. Q. J. Econ. 118, 879–907 (2003) 4. d’Aspremont, C., Dos Santos Ferreira, R.: The Dixit-Stiglitz economy with a ‘small group‘ of firms: a simple and robust equilibrium markup formula. Res. Econ. 71(4), 729–739 (2017) 5. d’Aspremont, C., Dos Santos Ferreira, R., Gerard-Varet, L.: On monopolistic competition and involuntary unemployment. Q. J. Econ. 105(4), 895–919 (1990) 6. Dixit, A.K., Stiglitz, J.E.: Monopolistic competition and optimum product diversity. Am. Econ. Rev. 67, 297–308 (1977) 7. Ford, H.: My Life and Work. Doubleday, Page, Garden City (1922) 8. Gabszewicz, J., Vial, J.: Oligopoly a la Cournot in general equilibrium analysis. J. Econ. Theory 4, 381–400 (1972) 9. Hart, O.: Imperfect competition in general equilibrium: an overview of recent work. In: Arrow, K.J., Honkapohja, S. (eds.) Frontiers in Economics. Oxford, Basil Blackwell (1985) 10. Kokovin, S., Parenti, M., Thisse, J.-F., Ushchev, P.: On the Dilution of Market Power. Centre for Economic Policy Research Discussion Paper Series. No. DP12367 (2017) 11. Kuhn, K.-U., Vives, X.: Excess entry, vertical integration, and welfare. RAND J. Econ. 30(4), 575–603 (1999) 12. Marschak, T., Selten, R.: General Equilibrium with Price-Making Firms. Lecture Notes in Economics and Mathematical Systems. Springer, Berlin (1972) 13. Parenti, M., Sidorov, A.V., Thisse, J.-F., Zhelobodko, E.V.: Cournot, Bertrand or Chamberlin: toward a reconciliation. Int. J. Econ. Theory 13(1), 29–45 (2017) 14. Schumpeter, J.A.: The Theory of Economic Development. Oxford University Press, New York (1934) 15. Schumpeter, J.A.: Capitalism, Socialism and Democracy. Harpers and Bro, New York (1942) 16. Shimomura, K.-I., Thisse, J.-F.: Competition among the big and the small. RAND J. Econ. 43, 329–347 (2012) 17. Sidorov, A.V., Parenti, M., Thisse, J.-F.: Bertrand meets ford: benefits and losses. In: Petrosyan, L., Mazalov, V., Zenkevich, N. (eds.) Static and Dynamic Game Theory: Foundations and Applications. Birkhäuser, Basel, pp. 251–268 (2018) 18. Zhelobodko, E., Kokovin, S., Parenti, M., Thisse, J.-F.: Monopolistic competition in general equilibrium: beyond the constant elasticity of substitution. Econometrica 80, 2765–2784 (2012)

Chapter 14

Guaranteed Deterministic Approach to Superhedging: Lipschitz Properties of Solutions of the Bellman–Isaacs Equations Sergey N. Smirnov

Abstract For the discrete-time superreplication problem, a guaranteed deterministic formulation is proposed: the problem is to ensure the cheapest coverage of the contingent claim on an American option under all admissible scenarios. These scenarios are set by a priori defined compacts depending on the price history; the price increment at each moment of time must lie in the corresponding compact. The market is considered without trading constraints and transaction costs. The problem statement is game-theoretic in nature and leads directly to the Bellman– Isaacs equations of a special form under the assumption of no trading constraints. In the present study, we estimate the modulus of continuity of uniformly continuous solutions, including the Lipschitz case. Keywords Guaranteed estimates · Deterministic price dynamics · Super-replication · Option · Arbitrage · No arbitrage opportunities · Bellman–Isaacs equations · Multivalued mapping · Semicontinuity · Continuity · Modulus of continuity · Lipschitz functions

14.1 Introduction In the article [9], we introduce the guaranteed deterministic approach, describe the financial market model and trading constraints, set the problem of superhedging of contingent claim on the American option, and present the corresponding bibliography. Here, we limit ourselves to a brief description of the necessary information concerning the formulation of the problem given in [9]. The main premise of the proposed approach is an “uncertain” price dynamics framework based on the assumption of a priori information regarding price

S. N. Smirnov () Lomonosov Moscow State University, Moscow, Russia © Springer Nature Switzerland AG 2019 L. A. Petrosyan et al. (eds.), Frontiers of Dynamic Games, Static & Dynamic Game Theory: Foundations & Applications, https://doi.org/10.1007/978-3-030-23699-1_14

267

268

S. N. Smirnov

movements1 at the time moment t, namely, the increments ΔXt of discounted prices2 lie in a priori given compacts3 Kt (·) ⊆ Rn , t = 1, . . . , N. Let us designate by vt∗ (·) the infimum of the portfolio value at time t that guarantees, given the price history, a certain choice of an appropriate hedging strategy, covering current and future liabilities that indicate possible payments on the American option. The corresponding Bellman–Isaacs equations (expressed in discounted prices) are derived directly from the economic sense by choosing at the step t the “best” of an admissible hedging strategy h ∈ Dt (·) ⊆ Rn for the “worst” scenario of (discounted) price increments y ∈ Kt (·) for the given functions gt (·), describing potential payouts on the option. Thus, we obtain the following recurrence relation4 : ∗ vN (x¯N ) = gN (x¯N ),

vt∗−1 (x¯t −1) = gt −1 (x¯t −1 ) ∨

inf

sup

h∈Dt (x¯ t−1 ) y∈Kt (x¯ t−1 )

  ∗ vt (x¯t −1 , xt −1 + y) − hy ,

t = N, . . . , 1, (BA) where x¯ t −1 = (x0 , . . . , xt −1 ) represents the price history up to the present moment t. The conditions for the validity of (BA) are formulated in Theorem 3.1 in [9]. It is convenient to assume (formally) that g0 = −∞ (there are no liabilities to pay at the initial time); gt ≥ 0 for t = 1, . . . , N in the case of an American option. The set Dt (·) is assumed to be convex and 0 ∈ Dt (·). Multi-valued mappings x "→ Kt (x) and x "→ Dt (x), in addition to functions x "→ gt (x), are assumed to be defined for all x ∈ (Rn )t , t = 1, . . . , N. Therefore, functions x "→ vt∗ (x) are defined by Eq. (BA) for all x ∈ (Rn )t . In Eq. (BA), functions vt∗ , in addition to the corresponding supremum and infimum, take values in an extended set of real numbers R ∪ {−∞, +∞} = [−∞, +∞], which is the two-point compactification5 of R. Here, functions vt∗ are bounded above owing to the following assumption: there are constants Ct ≥ 0 such that for any t = 1, . . . , N and for all possible of trajectories x¯t = (x0 , . . . , xt ) ∈ Bt the

(B)

inequality gt (x0 , . . . , xt ) ≤ Ct holds. increments are taken “backward”, i.e., ΔXt = Xt − Xt−1 , where Xt is a discounted price vector at time t. 2 We suppose that a risk-free asset has a fixed price equal to one. 3 The point indicates the variables representing the price evolution. More precisely, prehistory x¯t−1 = (x0 , . . . , xt−1 ) ∈ (Rn )t for Kt , whereas it is the history x¯t = (x0 , . . . , xt ) ∈ (Rn )t+1 for the functions vt∗ and gt introduced below. 4 The sign ∨ denotes the maximum; hy = h, y is the dot product of vector h by vector y. 5 Neighborhoods of points −∞ and +∞ are given by [∞, a), a ∈ R and (b, +∞], b ∈ R, respectively. 1 The

14 Guaranteed Deterministic Approach to Superhedging: . . .

269

Suppose that the constants Ct in (B) are minimal, i.e., Ct = sup gt (x) x ∈Bt

and we denote C=

N

∨ Ct .

(14.1)

t =1

For the convenience of notation, an “additive” change in the last variable of functions vt∗ will be used wt (x¯t −1, y) = wt (x1 , . . . , xt −1 , y) = vt∗ (x1 , . . . , xt −1 , xt −1 + y),

(T)

and the right-hand side of the Bellman–Isaacs equations will be written in terms of the functions wt , t = N, . . . , 0 as follows: vt∗−1 (·) = gt −1 (·) ∨ inf

sup

h∈Dt (·) y∈Kt (·)

  wt (·, y) − hy ,

t = N, . . . , 1.

Notably, the dot indicates the “current” variables; in this case, the argument is x¯t −1. A trajectory on the time interval [0, t] = {0, . . . , t} of asset prices (x0 , . . . , xt ) = x¯t will be considered possible if x0 ∈ K0 , Δx1 ∈ K1 (x0 ), . . . , Δxt ∈ Kt (x0 , . . . , xt −1 ); t = 0, 1, . . . , N. We denote by Bt the set of possible trajectories of asset prices in the time interval [0, t]; thus, Bt = {(x0 , . . . , xt ) : x0 ∈ K0 , Δx1 ∈ K1 (x0 ), . . . , Δxt ∈ Kt (x0 , . . . , xt −1 )}. (14.2) Note that (14.2) is tantamount to the recurrence relation6 Bt = {(x¯t −1, xt ) : x¯t −1 ∈ Bt −1 , Δxt ∈ Kt (x¯t −1 )} = = {(x¯t −1 , xt ) : x¯t −1 ∈ Bt −1 , xt ∈ xt −1 + Kt (x¯t −1)}, t = 1, . . . , N.

(14.3)

Hereafter, we assume that the assumptions listed in Theorem 3.1 from [9] and those listed in paragraph 1) of Remark 3.1 in [9] are fulfilled. In [10], for the deterministic market model, the different notions of “no arbitrage” have been studied: no guaranteed arbitrage, no arbitrage opportunities, and no guaranteed arbitrage with unlimited profit. The concept of robustness of “no arbitrage” property (structural stability of the model) is introduced and criteria of robustness are obtained. 6 Here,

x + A = {z : z − x ∈ A}.

270

S. N. Smirnov

For the convenience of the reader we give several definitions of the notions introduced in [10]. By the (deterministic) arbitrage opportunity (AO) at the time step t we mean the following: (1) there exists a strategy h∗ ∈ Dt (·), such that h∗ y ≥ 0 for all y ∈ Kt (·), (2) there exists a price movement y ∗ ∈ Kt (·), such that h∗ y ∗ > 0. By the (deterministic) sure arbitrage (SA) at the time step t we mean the following: there exists a strategy h∗ ∈ Dt (·) such that h∗ y > 0 for all y ∈ Kt (·). We say that there is a (deterministic) sure arbitrage with unlimited profit (SAUP) at the time step t if the function7 h "→ min{hy, y ∈ Kt (·)} takes arbitrarily large values for h ∈ Dt (·). Using these three notions of “arbitrage” we can define the corresponding “no arbitrage” properties on a time interval; any “no arbitrage” propriety on a time interval tantamount to “no arbitrage” propriety at every time step of this interval. We get so the following “no arbitrage” properties: – no deterministic arbitrage opportunity (NDAO), – no deterministic sure arbitrage (NDSA), – deterministic sure arbitrage with unlimited profit (NDSAUP). The structural stability in our context means that the qualitative behavior of a price dynamics Kt (·), namely some “no arbitrage” property, is unaffected by sufficiently small (with respect to the Pompeiu-Hausdorff metric) perturbations of Kt (·); such a “no arbitrage” property is called robust. In what follows we need two robust “no arbitrage” properties: – robust no deterministic arbitrage opportunity (RNDAO), – robust deterministic sure arbitrage with unlimited profit (RNDSAUP). In the article [11], the properties of the semicontinuity and continuity of the solutions of Bellman–Isaacs Eq. (BA) are investigated. Under a rather weak assumption of “no arbitrage” in the market, namely, robust condition of no guaranteed arbitrage with unlimited profit, the main result concerning the “smoothness” of solutions of (BA) is obtained. In Proposition 2.1 of [11], a sufficient condition for the compactness of Bt is given: it is sufficient to assume upper semicontinuity of compact-valued mappings x "→ Kt (·). Moreover, a sufficient condition for the property (B) is given: it is sufficient, in addition to the upper semicontinuity of Kt (·), to assume upper semicontinuity of gt (·), t = 1, . . . , N. The main results in [11] are related to the conditions for upper and lower semicontinuity, in addition to the continuity of the solutions of Bellman–Isaacs equations arising in the framework of a guaranteed deterministic approach; the most can be interpreted as the guaranteed profit for a strategy h ∈ Dt (·). Note that min{hy, y ∈ Kt (·)} is attained due to the compactness of Kt (·) for some (worst scenario) yh∗ ∈ Kt (·); it is positive for the sure arbitrage strategy h∗ ∈ Dt (·).

7 It

14 Guaranteed Deterministic Approach to Superhedging: . . .

271

important of them is presented in Theorem 3.2 (with reference to Theorem 2.7). For convenience, we formulate this result here in full. Theorem 14.1 For t = 1, . . . , N, let the numeric functions x¯t "→ gt (x¯t ) be continuous, compact-valued mappings x¯t −1 "→ Kt (x¯t −1 ) be continuous, and multivalued mappings x¯t −1 "→ Dt (x¯t −1) be lower semicontinuous and closed. If the robust condition of no guaranteed arbitrage with unlimited profit (RNDSAU P ) is valid, then 1) functions x¯t −1 "→ vt∗ (x¯t −1), defined by the relations (BA), are continuous, 2) multivalued mappings (x¯t −1, h) "→ Mt (x¯t −1, h), where Mt (x¯t −1, h) is the set of maximizers y ∈ Kt (x¯t −1 ), for which the maximum of function (x¯t −1 , h) "→

sup

y∈Kt (x¯ t−1 )

[wt (x¯t −1, y) − hy],

is attained, and multivalued mappings x¯t −1 "→ Nt (x¯t −1 ), where Nt (x¯t −1) is the set of minimizers h ∈ Dt (x¯t −1), for which the minimum of function x¯t −1 "→ ρt (x¯t −1) =

inf

sup

h∈Dt (x¯ t−1 ) y∈Kt (x¯ t−1 )

[wt (x¯t −1, y) − hy],

is attained, are upper semicontinuous for t = 1, . . . , N. The purpose of this study is to refine this result for the case of no trading constraints, i.e., when Dt (·) ≡ Rn . Thus, we shall estimate the modulus of continuity of uniformly continuous functions vt∗ (·), including the Lipschitz case. When there are no trading constraints, the condition RNDSAU P is tantamount to the robust condition of no arbitrage opportunities RNDAO, according to Theorem 5.1 of [10]; a geometric criterion for RNDAO has the form 0 ∈ int(conv(Kt (·))), t = 1, . . . , N,

(14.4)

where int(A) represents the interior of a set A and conv(B) denotes the convex hull of a set B; see Remark 5.1 in [10].

14.2 Auxiliary Results The following elementary assertion will be reused several times; therefore, for convenience, we present it as a separate lemma. Lemma 14.1 For any functions f and g with numeric values 5 5 5 5 5 sup f − sup g 5 ≤ sup 5f − g 5, A

A

A

A

A

5 5 5 5 5 inf f − inf g 5 ≤ sup 5f − g 5. A

(14.5) (14.6)

272

S. N. Smirnov

5 5 Proof Denote γ = sup 5f − g 5. Then, for all x ∈ X A

g(x) − γ ≤ f (x) ≤ g(x) + γ , whence sup g − γ ≤ sup f ≤ sup g + γ , A

A

A

i.e., there is inequality (14.5). Similarly, we can obtain (14.6); however, (14.6) is obtained from (14.5), considering both functions with a minus sign. In particular, from Lemma 14.1, it follows that 5 5 5 5

n

n

i=1

i=1

n

n

i=1

i=1

5

n

5

5

∨ yi − ∨ yi 5 ≤ ∨ 5yi − yi 5 = &y − y &∞ ∧ yi − ∧

i=1

5

yi 5



n



i=1

5 5 5yi − y 5 = &y − y &∞ i

(14.7)

where y, y ∈ Rn . Let (X, ρ) and (Y, d) be metric spaces; E ⊆ X, E = ∅. For a function f : X "→ Y and for δ ∈ [0, ∞), let us denote by ωfE (δ) =

sup

x1 ,x2 ∈E, ρ(x1 ,x2 )≤δ

 d f (x1 ), f (x2 )

the modulus of continuity of the function f on set E. If E = X, we skip the superscript at the modulus of continuity, i.e., ωfX (δ) = ωf (δ). The uniform continuity of f on E indicates that ωfE (δ) → 0 when δ → 0 . Any modulus of continuity ωfE = ω, where ω : [0, ∞) "→ [0, ∞], satisfies the following apparent properties. 10 ω(0) = 0; 20 ω is a monotone non-decreasing function. Note that the non-negativity of ω follows from 10 and 20 . Lemma 14.2 Let T be a topological space, the function u : T "→ [0, ∞) be lower semicontinuous, F (z) = {x ∈ T : u(x) ≤ z} be multivalued mapping,8 F : [0, ∞) "→ 2T , such that F (0) = ∅, and the function ϕ : T "→ R be upper semicontinuous. 8 Here, 2T

is the class of all subsets of T .

14 Guaranteed Deterministic Approach to Superhedging: . . .

273

Then, 1) F takes non-empty closed values and the graph of F is closed; if T is compact, the multivalued mapping F is upper semicontinuous; 2) if T is compact, then the function z "→ sup{ϕ(x), x ∈ F (z)}, z ∈ A, where A is a closed subset of the set of nonnegative numbers, is monotone non-decreasing and right-continuous. Proof 1) The closedness of the set F (z), z ∈ A follows from the lower semicontinuity of the function u. Consider convergent nets zα → z, xα ∈ F (zα ) and xα → x; as xα ∈ F (zα ) is equivalent to u(xα ) ≤ z, the lower semicontinuity of function u implies u(x) ≤ lim inf u(xα ) ≤ z, i.e., x ∈ F (z); therefore, the graph of F is closed. If E is compact, then F is upper semicontinuous according to Proposition 2.23 of [5]. 2) The numeric function ψ : A "→ R, where ψ(z) = sup{ϕ(x) : x ∈ F (z)}, is monotone non-decreasing owing to the monotony of F , i.e., F (z) ⊆ F (z ) when z ≤ z . According to the Berge theorem (see [11, Theorem 2.2]) ψ is upper semicontinuous. For a monotone non-decreasing numerical function, upper semicontinuity is equivalent to right-continuity. Proposition 14.1 Let E be a non-empty compact subset of a metric space (X, ρ) and the function f : X "→ Y be continuous, where (Y, d) is a metric space. Then, the modulus of continuity ωfE = ω satisfies additional conditions 30 ω takes finite values; 40 ω is right-continuous. Proof It is sufficient to use Lemma 14.2, by choosing T = E×E, for x = (x1 , x2 ) ∈ E × E; u(x) =u(x1 , x2 ) = ρ(x1 , x2 ); A = [0, +∞); z = δ; φ(x) = φ(x1 , x2 ) = d f (x1 ), f (x2 ) . Thus, the multivalued mapping F (δ) = {(x1 , x2 ) ∈ E × E, ρ(x1 , x2 ) ≤ δ} is upper semicontinuous, the function ϕ is continuous and attains a maximum on compact F (δ), and therefore, the function ψ(δ) = max ϕ(x) = x∈F (δ)

max

x1 ,x2 ∈E, ρ(x1 ,x2 )≤δ

 d f (x1 ), f (x2 ) = ωfE (δ)

is right-continuous. Note 14.1 1) A simple example shows that the modulus of continuity under the assumptions of Proposition 14.1 can be discontinuous: it suffices to consider the discrete space

274

S. N. Smirnov

X with the metric ρ(x, y) =

, 0,

at x = y,

1

at x = y.

In this case, any function f : X "→ Y is uniformly continuous and ωf (x, y) =

, 0,

at δ ∈ [0, 1),

1

at δ ∈ [1, ∞).

2) Introducing an additional structure on the space X allows the establishment of not only continuity, but also such a property of the continuity modulus as subadditivity. Let E be a nonvoid convex subset of the normed space X with the metric ρ(x, y) = &x − y&. The case where E is singleton is not interesting,9 and hence, we assume that E contains at least two points (hence, an infinite number of them). We fix arbitrary non-negative δ1 , δ2 not equal to zero simultaneously and 1 choose two points ρ(x, y) = &x − y& ≤ δ1 + δ2 > 0. Considering p = δ1δ+δ 2 and zp = (1 − p)x + py ∈ E, we have &x − zp & = p&x − y& ≤ δ1 , &zp − y& = (1 − p)&x − y& ≤ δ2 . Therefore, for any x and y, such that &x − y& ≤ δ1 + δ2 , we obtain the inequality





 ρ f (x), f (y) ≤ ρ f (x), f (zp ) + ρ f (zp ), f (y) ≤ ωfE (δ1 ) + ωfE (δ2 ), Thus, considering the propriety 10 , when δ1 = δ2 = 0, it follows that the modulus of continuity ωfE = ω satisfies the subadditivity property10: 50 . ω(δ1 + δ2 ) ≤ ω(δ1 ) + ω(δ2 ). 3) The function ϕ : [0, ∞) "→ [0, ∞), satisfying properties 10 , 20 , 30 , and 50 , has a continuity modulus,11 coinciding with this function,12 i.e., ωϕ = ϕ. Let |t2 − t1 | ≤ δ; thus, using 20 , 30 and 50 , we obtain |ϕ(t2 ) − ϕ(t1 )| = ϕ(t1 ∨ t2 ) − ϕ(t1 ∧ t2 ) ≤ ϕ(|t2 − t1 |) ≤ ϕ(δ), 9 In

(14.8)

this case, considering the property 10 , the subadditivity holds trivially. it is possible that ω(δ)=∞ for all δ > 0. If ω satisfies the conditions 10 , 20 and 50 , then, for 30 , it is sufficient that ω(δ) is finite at a point δ > 0. 11 For such a function ϕ, the conditions of Remark 14.1 clause 2) are satisfied. 12 A similar statement can be found in the literature; see, for example, Sect. 6 in the book [2]; however, there are redundant requirements on ϕ. 10 Wherein,

14 Guaranteed Deterministic Approach to Superhedging: . . .

275

and hence, ωϕ ≤ ϕ. From 10 and 20 , the following opposite inequality follows: ωϕ (δ) ≥ |ϕ(δ) − ϕ(0)| = ϕ(δ) − ϕ(0) = ϕ(δ). 4) If, in addition to the properties 10 , 20 , and 50 , the continuity modulus ωf = ω of function f satisfies13 the property 40 , then the property 30 is valid, ω coincides with its modulus of continuity, and f is uniformly continuous (it follows from the inequality (14.8)). 5) An example of a uniformly continuous, but not absolutely continuous function ϕ : [0, ∞) "→ [0, ∞), satisfying the properties 10 , 20 , 40 , and 50 can be a Cantor staircase function,14 see [12, section 3.2.4]. Moreover, the function ϕ satisfies the 2 Hölder condition: ϕ(δ) = ωϕ (δ) ≤ δ α , where α = ln ln 3 , see [4]. 6) Under the conditions of clause 2), if the property 30 holds, i.e., the modulus of continuity satisfies 10 , 20 , 30 , and 50 , the function ω, satisfying these four properties is asymptotically linear.15 Let us fix an arbitrary t > 0. Recall that ω ≥ 0 owing to 10 and 20 . Denoting by [a] the integer part of a ∈ R, for any x ∈ [0, ∞], we have  ω(x) = ω

    x x x t +r , r = − t ∈ [0, t); t t t

from subadditivity (50 ), it follows that  ω(x) ≤

x ω(t) + ω(r). t

As ω is monotone decreasing,16 according to (20), we have ω(r) ≤ ω(t); from the previous two inequalities, we obtain  ω(x) ≤

13 If,

   x x + 1 ω(t) ≤ + 1 ω(t). t t

(14.9)

in addition to 10 and 20 , the function is concave, then 40 and 50 are fulfilled. precisely, it is a continuation of the Cantor function ψ : [0, 1] "→ [0, 1], setting ϕ(δ) = 1 when δ > 1. 15 This property can be considered a continuous version of the Fekete lemma. 16 As the additive function, i.e., the solution of the Cauchy equation, is subadditive, then, without additional conditions, it is impossible to obtain asymptotic linearity for the subadditive function, because the axiom of choice implies the existence of a solution (not measurable) of the Cauchy equation such that its graph is dense everywhere on the plane [6]. It suffices to require the boundness of ω in a neighborhood of the point 0; in our case, this follows from the nonnegativity of ω (a consequence of the properties of 10 and 20 ) and the properties of monotonicity of 20 and the finiteness 30 . 14 More

276

S. N. Smirnov

Further, standard arguments based on (14.9) lead to the inequality17: lim sup x→+∞

ω(t) ω(x) ≤ , t > 0, x t

and, by virtue of arbitrariness of t > 0, we obtain lim sup x→∞

ω(x) ω(t) ω(x) ≤ inf ≤ lim inf , x→+∞ x t >0 t x

so that we have obtained the classical result ω(x) ω(x) = inf < +∞. x→+∞ x x>0 x

0 ≤ lim

The Pompeiu–Hausdorff distance18 is defined as hρ (A, B) = inf{ε > 0 : A ⊆ [B]ε , B ⊆ [A]ε }, where [A]δ = {x ∈ X : ρ(x, A) < δ}, ρ(x, A) = inf{ρ(x, y) : y ∈ A}. Lemma 14.3 Let (X, ρ) be a metric space, E ⊆ X, A ⊆ E, and B ⊆ E be nonempty sets, and f : E "→ R be a numeric function; 1) If hρ (A, B) < δ, then | sup f − sup f | ≤ ωfE (δ). A

B

2) Moreover, if f is continuous on E, a compact subset of X, then | sup f − sup f | ≤ ωfE (hρ (A, B)). A

B

Proof 1) Let us consider the metric space (E, ρ ), where ρ is the restriction of ρ to E×E. Note that, for x ∈ E, A ⊆ E, the equality ρ (x, A) = ρ(x, A) holds. Let us denote [A]δE = {x ∈ E : ρ(x, A) < δ} = [A]δ ∩ E. For A ⊆ E, the inclusion of A ⊆ [B]ε implies A ⊆ [B]ε ∩ E = [B]εE , and the inclusion A ⊆ [B]εE 17 If 18 It

a function ω, satisfying 10 and 20 , is concave, then the function t → is possible that hρ (A, B) = ∞.

ω(t)) t

is non-increasing.

14 Guaranteed Deterministic Approach to Superhedging: . . .

277

implies A ⊆ [B]ε ∩ E ⊆ [B ε ]; therefore, for A ⊆ E and B ⊆ E, the Pompeiu– Hausdorff distance on E, i.e., hρ (A, B) = inf{ε > 0 A ⊆ [B]εE , B ⊆ [A]εE }, coincides with hρ (A, B). As hρ (A, B) < δ, A ⊆ [B]δE and B ⊆ [A]δE . For x ∈ [A]δE , there exists y ∈ A, such that ρ(x, y) < δ, which indicates that |f (x) − f (y)| ≤ ωfE (δ). Consequently, sup f ≤ sup f + ωfE (δ);

[A]δE

A

similarly, we obtain sup f ≤ sup f + ωfE (δ).

[B]δE

B

Moreover, sup f ≤ sup f ≤ sup f + ωfE (δ) [A]δE

B

A

and sup f ≤ sup f ≤ sup f + ωfE (δ); [B]δE

A

B

Hence, | sup f − sup f | ≤ ωfE (δ). A

B

2) In accordance with the Proposition 14.1, the function ωfE is right-continuous, and hence, 2) follows from 1). Let us denote by K (Rn ) the class of all nonvoid compact subsets of Rn . Consider N ∈ K (Rn ), such that its convex hull contains the coordinate origin, i.e., 0 ∈ conv(N). The support function σN (h) = max hy is thus convex, non-negative, and y∈N

finite for all h ∈ Rn , and therefore, continuous (see, for example, Corollary 10.1.1 of [13]). Consequently, it attains a minimum on the compact set S1 (0), unit sphere n in Rn , i.e., A S1 (0) = {h ∈ R : &h&2 = 1}, where & · &2 is Euclidean norm, n 2 &h&2 = hi . i=1

278

S. N. Smirnov

Let us define an important (for our further consideration) function r as19 r(N) = min σN (h) = min max hy h∈S1 (0)

(14.10)

h∈S1 (0) y∈N

for20 N ∈ K (Rn ), such that 0 ∈ conv(N). Note that r(conv(N)) = r(N), as the support function of a set N coincides with that of its convex hull, conv(N). For a convex set N ∈ K (Rn ) containing a point 0, the geometric meaning of r(N) is the radius of an inscribed ball with center 0, i.e., the maximum radius of the ball with center at point 0, contained in a convex compact N, or, equivalently, the distance from point 0 to the boundary bd(N) of N. Lemma 14.4 The function N "→ r(N) is a Lipschitz one, with a Lipschitz constant equal to 1, on the space K (Rn ) equipped with the Pompeiu–Hausdorff metric. Proof Let us denote, for N ⊆ Rn , N ∗ = conv(N).

(14.11)

Let Ni ∈ K (Rn ), i = 1, 2. According to Theorem 14.1 of [7], the Pompeiu– Hausdorff distance between convex compacts21 N1∗ and N2∗ can be represented in the form hρ (N1∗ , N2∗ ) = max |σN1∗ (h) − σN2∗ (h)|. h∈S1 (0)

(14.12)

Using the Lemma 14.1, we have 5 5 5 5 5 5 5r(N1 ) − r(N2 )5 = 5r(N ∗ ) − r(N ∗ )5 = 5 min σN ∗ (h) − min σN ∗ (h)5 ≤ 1 2 1 2 h∈S1 (0)

h∈S1 (0)

5 5 ≤ max 5σN1∗ (h) − σN2∗ (h)5 = hρ (N1∗ , N2∗ ) ≤ hρ (N1 , N2 ); h∈S1 (0)

(14.13) the previous inequality can be found in [8], the formula (5.12), Proposition 5.2. Lemma 14.5 Let K be a compact, N : K "→ K (Rn ) be a lower semicontinuous multivalued mapping. In addition, we assume that the sets N(x) for all x ∈ K satisfy the condition 0 ∈ int(N ∗ (x)), where N ∗ (x) = conv(N(x)). Thus, the function x "→ r(N(x)) is lower semicontinuous and min r(N(x)) > 0. x∈K

19 Recall

that hy denotes the inner product. such arguments, the function r takes finite non-negative values. 21 The convex hull of a compact is compact; see, for example, [7], Theorem 2.6. Note that we use the notation (14.11) here. 20 For

14 Guaranteed Deterministic Approach to Superhedging: . . .

Proof The function (x, h) "→ σN(x) (h) =

279

max hy is lower semicontinuous

y∈N(x)

according to the Berge theorem; see [11], Theorem 2.2 . Further, the mapping x "→ S1 (0), which takes a constant compact value, is continuous; hence, according to the Berge theorem (see [11], Theorem 2.3 ), function x "→ r(N(x)) = min σN(x) (h) h∈S1 (0)

is lower semicontinuous and attains a minimum value at some point x ∗ ∈ K, for which 0 ∈ int(N ∗ (x ∗ )), and therefore, r(N(x ∗ )) = r(N ∗ (r ∗ )) > 0. Note 14.2 1) If the mapping of x "→ N(x), taking compact convex values is continuous, or, equivalently, h-continuous, then the function x "→ r(N(x)) is continuous, which directly follows from Lemma 14.4. 2) If N ∈ K (Rn ), 0 ∈ N ∗ and22 dim(aff(N)) < n, then r(N) = 0. 3) Let us define a new function r˜ (N) =

min

h∈S1 (0)∩aff(N)

σN (h),

for N ∈ K (Rn ) such that23 0 ∈ ri(N ∗ ), i.e., r˜ (N) is the distance from the origin 0 to the relative boundary rbd(N ∗ ) of N ∗ , then r˜ (N) is no longer a continuous function with respect to the Pompeiu–Hausdorff distance, corresponding to the Euclidian metric in the space Rn . If N is convex, 0 ∈ ri(N) and dim(aff(N)) < n, then r˜ (N) > 0. However, if we choose an arbitrary small ε ∈ (0, r˜ (N)) and let N ε = N + B¯ε (0), then hρ (N, N ε ) = ε and r˜ (N ε ) = r(N ε ) = ε. If we reduce the space Rn to E = aff(N) with the inherited Euclidean metric and the corresponding Pompeiu–Hausdorff distance, then r˜ becomes continuous in this space.

14.3 Estimation of the Continuity Modulus of Solutions of the Bellman–Isaacs Equations The following result is a refinement of Theorem 14.1 for the case of the absence of trading constraints which, together with the estimation of the modulus of continuity, provides an alternative (direct) proof of the continuity of the functions vt∗ , t = 0, . . . , N (under additional assumption of no trading constraints).

22 Here,

aff(N) is the affine hull of N (that is the smallest affine set containing N, see [13, Section 1]); dim(A) is the dimension of an affine space A. 23 Here, ri(A) denotes the relative interior of a convex set A, that is the interior of A with respect to the relative topology on aff(A) ⊆ Rn , see [13, Section 6].

280

S. N. Smirnov

It is convenient for us to consider,24 in the initial formulation of the problem, that the functions gt (·) and multivalued mappings Kt (·) are defined on the whole space (Rn )t ; accordingly, the functions vt∗ (·) in (BA), with Dt (·) ≡ Rn , are defined everywhere. In fact, one needs to define these functions and mappings on the set of possible trajectories Bt , or, possibly, on some convex compact subset (Rn )t , containing Bt . Let us define the norm for x¯t ∈ (Rn )t +1 by &x¯t & =

t 

&|xs &1 ,

s=0

where &z&1 =

n 

|zi | for z = (z1 , . . . , zn )T ∈ Rn .

i=1

Let us denote Ct∗ =

N

∨ Ct , Ct = sup gt (x¯t −1 );

s=t

x¯ t−1 ∈Bt

Kt∗ (·) = conv(Kt (·));

(14.14)

rt∗ = inf r(Kt∗ (x)), t = 1, . . . , N. x∈Bt−1

where r is the function defined by the formula (14.10). Theorem 14.2 Suppose that the robust condition of no arbitrage opportunities RNDAO holds, for s = 1, . . . , N, the functions of potential payments gs are continuous and multivalued mappings x¯s−1 "→ Ks (x¯s−1) are h-continuous (continuity with respect to the Pompeiu–Hausdorff metric25 ). Then, the functions vs∗ , defined by (BA) with Ds (·) ≡ Rn , are uniformly continuous and bounded on Bs 0 ≤ vs∗ ≤ Cs∗ < ∞, s = 0, . . . , N,

(14.15)

least, with respect to the functions of potential payments gt , t = 1, . . . , N, this is not only convenient, but also natural, as, usually, it is how these functions are defined in practice. 25 For compact-valued mappings, h-continuity is equivalent to continuity; see Theorem 2.68 from [5]. 24 At

14 Guaranteed Deterministic Approach to Superhedging: . . .

281

and the following recurrent inequalities represent an estimate of the continuity modulus ωvs∗ of functions vs∗ :   Cs∗ Bs−1 B Bs−1 B Bs ωv ∗s−1 (δ) ≤ ωgs−1 (δ) ∨ ωvB∗s (ωKs−1 (δ)) + ω (δ) + ω ∗ (δ) , s = 1, . . . , N; v K s s s s s−1 rs∗ ωvB∗N = ωgBNN , N

(14.16) B

where we formally assume that g0 ≡ 0 (i.e., ωg0 ≡ 0), ωKs−1 is the modulus of s

continuity of Ks (·) in the Pompeiu–Hausdorff metric on Bs−1 , ωgBss and ωvB∗s are the s moduli of continuity on Bs of functions gs−1 and vs∗ , respectively, and the quantities rs∗ , given by (14.14), are positive, s = N, . . . , 1.

Proof The statement regarding the continuity of the functions vs∗ , s = 0 . . . , N follows from Theorem 14.1, considering the coincidence of upper or low semicontinuity and h-semicontinuity for compact-valued mappings Ks (·), s = 1, . . . , N; see Proposition 2.68 of [5]. According to Proposition 2.1 from [11], the set of possible trajectories of Bs is compact; therefore, the functions of vs∗ at Bs are uniformly continuous and bounded. However, for the considered case of no trading constraints, the uniform continuity of the functions vs∗ can also be obtained directly by induction (from the arguments presented below), simultaneously with the estimation of the continuity moduli (14.16) and the inequalities (14.15). For s = N, this is evident. Let vs∗ be uniformly continuous for s = N, . . . , t; let us show that vt∗−1 is uniformly continuous and (14.15) and (14.16) hold for s = t. In accordance with (BA), for the case of the absence of trading constraints, i.e., when Dt (·) ≡ Rn , the equations for t = N, . . . , 1 take the form vt∗−1 (x¯t −1) = gt −1 (x¯t −1) ∨ ρt (x¯t −1 ),

(14.17)

where ρt (x¯t −1 ) = infn

sup

h∈R y∈Kt (x¯ t−1 )



 vt∗ (x¯t −1 , xt −1 + y) − hy .

(14.18)

As the functions gt −1 are continuous according to the assumptions of the theorem, they are uniformly continuous and bounded on the compact set Bt of possible trajectories; in particular, Ct < ∞. As vt∗−1 (·) ≥ gt −1(·), we have vs∗ (·) ≥ 0 for s = t − 1, . . . , N. By the induction hypothesis, vt∗ (·) ≤ Ct∗ , and the infimum for h in (14.18) does not exceed the value of supremum for y for the particular h = 0. Hence, ρt (·)) ≤ Ct∗ and the inequalities vt∗−1 (·) ≤ Ct −1 ∨ Ct∗ = Ct∗−1 hold. Thus, the inequalities (14.15) hold. Now, let us show the uniform continuity of the function x¯t −1 "→ ρt (x¯t −1 ) and simultaneously evaluate its modulus of continuity.

282

S. N. Smirnov

As a continuous mapping of a compact metric space into a metric space is uniformly continuous26 and as the class of all compact subsets Rn is a metric space equipped with the Pompeiu–Hausdorff distance,27 one can conclude that the mapping x¯ t −1 "→ Kt (x¯t −1), considered as an ordinary (single-valued) mapping, is B uniformly continuous on Bt −1 . Denoting by ωKt−1 (·) the continuity modulus of the t B

(δ) → 0 when δ → 0. mapping x¯t −1 "→ Kt (x¯t −1), we have ωKt−1 t According to the induction hypothesis, the function vt∗ (·) is uniformly continuous, and hence, its modulus of continuity ωvt∗ (δ) → 0 when δ → 0. Let x¯t −1 ∈ Bt −1 , x¯t −1 ∈ Bt −1 , &x¯t −1 − x¯t −1 & ≤ δ. The following inequalities hold: 5  5  ∗  ∗

5 sup + y) − hy 5 ≤ vt (x¯t−1 , xt−1 + y) − hy − sup vt (x¯t−1 , xt−1 y∈Kt (x¯t−1 )

5 ≤5 5 +5

sup

  ∗ vt (x¯t−1 , xt−1 + y) − hy −

sup

  ∗ vt (x¯t−1 , xt−1 + y) − hy −

y∈Kt (x¯t−1 )

) y∈Kt (x¯t−1

) y∈Kt (x¯t−1

sup

5  ∗ vt (x¯t−1 , xt−1 + y) − hy 5+

sup

5  ∗

+ y) − hy 5. vt (x¯t−1 , xt−1

) y∈Kt (x¯t−1

) y∈Kt (x¯t−1

(14.19) To estimate the first term in the right-hand side of the inequality (14.19), note that the modulus of continuity at the point ε ≥ 0 of the function y "→ vt∗ (x¯t −1, xt −1 + y) − hy can be estimated from above using the value ωvB∗t (ε) + &h&ε. Further, using t Lemma 14.3, item 2), we observe that the first term can be estimated as

 ωvt∗ hρ (Kt (x¯t −1 ), Kt (x¯t −1 )) + &h&hρ (Kt (x¯t −1), Kt (x¯t −1 )) ≤ 

B B ≤ ωvB∗t ωKt−1 (δ) + &h&ωKt−1 (δ). t t t

The second term on the right-hand side of the inequality (14.19), using Lemma 14.1, can be estimated as 3 4 sup |vt∗ (x¯t −1 , xt −1 + y) − vt∗ (x¯t −1 , xt −1 + y)|, y ∈ Kt (x¯t −1 ) ≤ ωvB∗t (δ). t

5 5

Consequently,  ∗  sup vt (x¯t −1 , xt −1 + y) − hy −

) y∈Kt (x¯ t−1

sup

) y∈Kt (x¯ t−1



5 vt∗ (x¯t −1, xt −1 + y) − hy 5 ≤

  B Bt−1 ≤ ωvB∗t (ωKt−1 (δ)) + &h&ω (δ) + ωvt∗ (δ) K t t t

(14.20) for all x¯t −1 , x¯t −1 ∈ Bt −1 , such that &x¯t −1 − x¯t −1 & < δ. 26 See, 27 See,

for example, Theorem 3.16.5 from [3]. for example,Theorem 5.1 and Lemma 5.5 from [8].

14 Guaranteed Deterministic Approach to Superhedging: . . .

283

 The function Bt ' x¯t −1 "→ r Kt (x¯t −1 ) , where the function r is defined by (14.10), is continuous according to Lemma 14.4; according to Lemma 14.5, the continuous function x¯t −1 "→ r(Kt (x¯t −1 )) attains a minimum for some x¯t∗−1 ∈ Bt −1 and r(Kt (x¯t∗−1 )) = rt∗ > 0. The following inequalities hold: sup

y∈Kt (x¯ t−1 )

 ∗  vt (x¯t −1 , xt −1 + y) − hy ≥

max

(−hy) =

y∈Kt (x¯ t−1 )

max

y∈−Kt∗ (x¯ t−1 )

hy =

= σ−Kt∗ (x¯t−1 ) (h) ≥ σBr ∗ (0)(h) = rt∗ &h&2 , t

where & · &2 is the Euclidean norm in Rn . Therefore, for &h&2 > sup

y∈Kt (x¯ t−1 )

Ct∗ rt∗ ,

the inequality

 ∗  vt (x¯t −1 , xt −1 + y) − hy > Ct∗ ,

holds, and hence, the infimum in (14.18) cannot be achieved for such h. Consequently, ρt (x¯t −1 ) = infn



sup

h∈R y∈Kt (x¯ t−1 )

=

inf

sup

h∈BC/r ∗ (0) y∈Kt (x¯ t−1 )

 vt∗ (x¯t −1 , xt −1 + y) − hy =

  ∗ vt (x¯t −1 , xt −1 + y) − hy =

t

inf

h∈BC/r ∗ (0)

ϕt (x¯t −1 , h),

t

(14.21) where ϕt (x¯t −1 , h) =

sup



y∈Kt (x¯ t−1 )

 vt∗ (x¯t −1, xt −1 + y) − hy .

Considering (14.20), inequalities 5 5ϕt (x¯

t −1 , h)

5 − ϕt (x¯t −1, h)5 ≤ β(δ),

(14.22)

where B

(δ)) + β(δ) = ωvB∗t (ωKt−1 t t

Ct∗ Bt−1 B ω (δ) + ωv ∗t−1 (δ), t rt∗ Kt

(14.23)

hold for any x¯t −1 , x¯t −1 ∈ Bt −1 , such that &x¯t −1 − x¯t −1 & < δ and for all h ∈ BC/rt∗ (0). Using (14.21)–(14.23), and Lemma 14.1, we obtain the inequalities 5 5ρt (x¯

t −1 ) − ρt (x¯ t −1 )

5 5 ≤ β(δ)

(14.24)

284

S. N. Smirnov

for all x¯t −1 , x¯t −1 ∈ Bt −1 , such that &x¯t −1 − x¯ t −1& < δ. As vt∗ and Kt are uniformly continuous, β(δ) → 0 when δ → 0; thus, the function ρt −1 is uniformly continuous, with the modulus of continuity B

ωρtt−1 (δ) = β(δ).

(14.25)

As vt∗−1 = gt −1 ∨ ρt , we can use the consequence (14.7) from Lemma 14.1 and obtain, for δ ≥ 0, B

B

B

t−1 ωv ∗t−1 (δ) ≤ ωgt−1 (δ) ∨ ωρtt−1 (δ).

(14.26)

t−1

Finally, (14.23), (14.25), and (14.26) provide the required inequalities (14.16). The following assertion is easily obtained from the proved theorem. Theorem 14.3 Assume that the condition RNDAO is satisfied, and the functions gt −1 and Kt (·) satisfy the Lipschitz property with the constants Lgt and LKt , respectively. Subsequently, the Bellman functions vt∗ also satisfy the Lipschitz property with constants Lvt , which can be determined from the following recurrence relation: LvN∗ = LgN , ∗ Lvt−1 = Lgt−1 ∨ [Lvt∗ (LKt + 1) +

where Ct∗ =

N

∨ Cs , Ct =

s=t

by (14.14), t = N, . . . , 1.

Ct∗ LKt ], t = N, . . . , 1. rt∗

sup gt (x¯t −1 ), t = N, . . . , 1, and rt∗ is defined

x¯ t−1 ∈Bt

Proof The equality LvN∗ = LgN is evident. Using the formulas (14.14) and (14.16), in addition to the Lipschitz property of gt −1 and Kt (·), that is28 ωgt (δ) ≤ Lgt δ, ωKt (δ) ≤ LKt δ, we obtain the desired result. Note 14.3 1) To apply in Theorem 3.1 clause 2) of the Remark 14.1, it is sufficient to expand the set Bt , on which the moduli of continuity are defined, to a convex set containing Bt ; the minimal set is the convex hull, conv(Bt ). For some models, Bt

the Lipschitz constants can be considered as minimal, i.e., Lgt−1 = sup{ωgt−1 (δ)/δ, δ > 0}, LKt−1 = sup{ωKt−1 (δ)/δ, δ > 0}.

28 Here,

14 Guaranteed Deterministic Approach to Superhedging: . . .

285

is a convex set, t = 0, . . . , N. In particular, this is the case for the multiplicative market model described below. 2) If E is a nonvoid compact subset of the metric space (X, ρ), then a function f : X "→ Y , where (Y, d) is a metric space, satisfies the Lipschitz property if and only if the following condition for the modulus of continuity ωfE (δ) = ω(δ) holds: a = lim sup δ→0

ω(δ) < ∞, δ

(14.27)

The necessity is apparent. If (14.27) holds, the function f is uniformly continuous on E, as ω(δ) → 0 when δ → 0. Let us fix an arbitrary ε > 0. Subsequently, there is δ ∗ = δ ∗ (ε) > 0, such that ω(δ) ≤ (a + ε)δ when δ ∈ [0, δ ∗ ]. As ω is bounded from above by the maximum of the continuous function (x1 , x2 ) "→ d(f (x1 ), f (x2 )) on the compact E × E, which we denote by m, we have ω(δ) ≤ [m/δ ∗ ∨(α + ε)]δ for all δ ≥ 0. 3) As in item 2) of Remark 14.1, let E be a convex subset of normed space X, equipped with the metric ρ(x, y) = &x − y& containing at least two points (hence, a continuum of them). Subsequently, the condition (14.27) is necessary and sufficient for the Lipschitz property of a function f : E "→ Y , where (Y, d) is a metric space. Moreover, w(δ) ≤ aδ for all δ ∈ [0, ∞), i.e., a is a Lipschitz constant. The necessity is apparent. If (14.27) holds, let us fix an arbitrary ε ∈ (0, 1) and choose δ ∗ = δ ∗ (ε) > 0, such that ω(δ) ≤ (a + ε)δ when δ ∈ [0, δ ∗ ]. The inequality (14.9) implies the following inequality for x ≥ t > 0:   t ω(t) ω(x) ≤ 1+ ; x x t

(14.28)

choosing in (14.28) x > δ ∗ and t < εδ ∗ , we obtain ω(x) ≤ (a + ε)(1 + ε). x>0 x sup

As ε can be considered arbitrarily small, we conclude that a is a Lipschitz constant. 4) The result of Theorem 14.3 can be useful, in particular, to assess the accuracy of the numerical solution of the problem, in the case where there is an additional, stronger-than-continuity, “smoothness” property of functions and multivalued mappings, namely their Lipschitz property. Note that the Lipschitz property of potential payout functions gt is a realistic assumption that is fulfilled for many types of options. The exception is the binary option.29 The Lipschitz property of

29 Also

called the digital option.

286

S. N. Smirnov

multivalued mappings for the multiplicative model is proved in Proposition 14.2 below. Consider a model that is of a multiplicative-independent type and homogeneous in time,30 where the deterministic price dynamics is defined using a multiplicative representation Xti = Mti Xti −1 , Mt = (Mt1 , . . . , Mtn ) ∈ Ct (·), t = 1, . . . , N,

(14.29)

where Ct (·) is a non-empty compact subset of Rn and Mti are the multiplicative factors, describing market uncertainty. For a vector z = (z1 , . . . , zn ) ∈ Rn , we denote by Λ(z) a diagonal matrix of the form Λ(z)ij =

, zi , for i = j ; 0, for i = j.

(14.30)

Thus, (14.29) can be written in matrix form as follows: Xt = Λ(Mt )Xt −1 .

(14.31)

It is evident that the price increments Yt = ΔXt can be related to the multiplicative representation by the identity Yt = [Λ(Mt ) − I ]Xt −1 , t = 1 . . . , N, where I is a unit matrix; Kt (x0 , . . . , xt −1 ) = {y ∈ Rn : y = [Λ(m) − I ]xt −1 , m ∈ Ct (x0 , . . . , xt −1 )}. ˇ where Cˇ is a non-empty compact convex subset Proposition 14.2 Let Ct (·) ≡ C, of Rn . Thus, the multivalued mapping taking compact convex values, defined as x = (x 1 , , . . . , x n ) "→ K(x) = ˇ = {y = (y 1 , . . . , y n ) : y i = (M i − 1)x i , i = 1, . . . , n, M = (M 1 , . . . , M n ) ∈ C}

satisfies the Lipschitz property with respect to the Pompeiu–Hausdorff metric. Proof Let us denote Li = M i − 1; subsequently, considering (14.31), y ∈ K(x) is equivalent to y i = Li x i , i = 1, . . . , n, where L ∈ C = Cˇ − e, e = (1, . . . , 1). The

30 This

terminology was introduced in [9].

14 Guaranteed Deterministic Approach to Superhedging: . . .

287

expression for the support function of K(x) is σK(x) (z) = sup

n 

y∈K(x) i=1

zi y i = sup

n 

L∈C i=1

zi Li x i = σC ((z1 x 1 , . . . , zn x n )) = σC (Λ(z)x),

(14.32) where Λ(z) is a diagonal matrix, defined by (14.30). Let ρ be a Euclidean metric on

Rn ,

A

i.e., ρ(a, b) = &a − b&2 =

n

(a i − b i )2 ;

i=1

then, the Pompeiu–Hausdorff metric hρ corresponding to ρ on the space of convex compact sets can be represented as hρ (A, B) = max |σA (z) − σB (z)|, z:&z&2 =1

(14.33)

as we have already noted in the proof of Lemma 14.4. Therefore, using (14.32) and (14.33), we have hρ (K(x), K(x )) = max |σK(x)(z) − σK(x ) (z)| = z:&z&2 =1

= max |σC (Λ(z)x) − σC (Λ(z)x )| ≤ z:&z&2 =1

≤ &C &2

sup &Λ(z)&2 &x − x &2 ,

z:&z&2 =1

where &C &2 = sup &a&2 < ∞, a∈C

&Λ(z)&2 = max &Λ(z)x& = x:&x&2 =1

n

∨ |zi | = &z&∞ .

i=1

Here, we use the fact that the support function σA is a Lipschitz one, with the constant &A&2 ; see [8, Proposition 9.10]. Thus, considering the relation between the norms &z&∞ ≤ &z&2 ≤ &z&1 , we obtain hρ (K(x), K(x )) ≤ &C &2 &x − x &2 ≤ &C &2 &x − x &1 , i.e., the multivalued mapping x "→ K(x) is a Lipschitz one, with the Lipschitz constant &C &2 . Note 14.4 In the model considered by V.N. Kolokoltsov in Chapter 13 of [1], the multiplicative factors are assumed to lie in some fixed intervals, i.e., the set n 0 Cˇ is chosen as a rectangular parallelepiped, Cˇ = [αi , βi ], where αi < βi , i=1

288

i = 1, . . . , n. Therefore, C =

S. N. Smirnov n 0

[αi − 1, βi − 1] and &C &2 = max &y&2 , where y∈V A n (|αi − 1| ∨ |βi − 1|)2 . V is the set of the 2n vertices of C ; hence, &C &2 = i=1

i=1

References 1. Bernhard, P., Engwerda, J.C., Roorda, B., Schumacher, J., Kolokoltsov, V., Saint-Pierre, P., Aubin, J.-P.: The Interval Market Model in Mathematical Finance: Game-Theoretic Methods, p. 348. Springer, New York (2013) 2. DeVore, R.A., Lorentz, G.G.: Constructive Approximation, p. 452. Springer, New York (1993) 3. Diedonné, J.: Foundations of Modern Analysis, p. 361. Academic, New York (1960) 4. Dovgoshey, O., Martio, O., Ryazanov, V., Vuorinen, M.: The Cantor function. Expo. Math. 24(1), 1–37 (2006) 5. Hu, S., Papageorgiou, N.: Handbook of Multivalued Analysis: Theory, vol. I. Mathematics and Its Applications, vol. 419, p. 968. Springer, Berlin (1997) 6. Jones, F.B.: Connected and disconnected plane sets and the functional equation f (x + y) = f (x) + f (y). Bull. Am. Math. Soc. 48, 115–120 (1942) 7. Leichtweiss, K.: Konvexe Mengen, p. 330. Springer, Berlin (1980) (in German) 8. Polovinkin, E.S.: Mnogoznachniy analiz i differencialniye vklucheniya [Set-valued analysis and differential inclusions], p. 524. Nauka, FizMatLit Publication, Moscow (2015) (in Russian) 9. Smirnov, S.N.: A guaranteed deterministic approach to superhedging: financial market model, trading constraints and Bellman–Isaacs equations. Math. Game Theory Appl. 10(4), 59–99 (2018) (in Russian) 10. Smirnov, S.N.: A guaranteed deterministic approach to superhedging: no arbitrage market condition. Math. Game Theory Appl. 11(2), 68–95 (2019) (in Russian) 11. Smirnov, S.N.: A guaranteed deterministic approach to superhedging: the proprieties of semicontinuity and continuity of the Bellman–Isaacs equations. Math. Game Theory Appl. (2019, in print) (in Russian) 12. Titman, A.F.: Theory of approximation of functions of a real variable. In: International Series of Monographs in Pure and Applied Mathematics, vol. 34, p. 644. Pergamon Press, Oxford (1963) 13. Rockafellar, R.T.: Convex Analysis, p. 451. Princeton University Press, Princeton (1970)

Chapter 15

Evaluation of Portfolio Decision Improvements by Markov Modulated Diffusion Processes: A Shapley Value Approach Benjamin Vallejo-Jimenez and Mario A. Garcia-Meza

Abstract We use the Shapley value to evaluate the introduction of a second stochastic process to an optimization problem. We find that introducing a TimeInhomogeneous Markov Modulated Diffusion process to an asset portfolio decision problem yields higher returns to the rational decision maker. These increases in returns can be diminished by a high volatility in the state change or a high discount factor. Keywords Shapley value · Portfolio optimization · Markov modulated diffusion processes · HJB equation · Ito’s lemma · Feedback optimization

15.1 Introduction This chapter applies the Shapley value to evaluate the inclusion of a Markov Modulated diffusion process as a factor to consider in an optimal portfolio decision to maximize total discounted logarithmic utility. This is an extension of the work by Vallejo et al. [23], in which the distribution of the gainings by different parts of a portfolio are decided via the computation of the Shapley value. In Game Theory, the Shapley value [21] offers a fair way to distribute a total payoff resulting from cooperation among players of a game. The Shapley value is not the only way to distribute profits among players in a cooperative game; for instance, in [1], the author explores the results of a weighed voting system and

B. Vallejo-Jimenez Universidad de Colima, Colima, Mexico e-mail: [email protected] M. A. Garcia-Meza () Facultad de Economía, Contaduría y Administración, Universidad Juarez del Estado de Durango, Durango, Mexico e-mail: [email protected] © Springer Nature Switzerland AG 2019 L. A. Petrosyan et al. (eds.), Frontiers of Dynamic Games, Static & Dynamic Game Theory: Foundations & Applications, https://doi.org/10.1007/978-3-030-23699-1_15

289

290

B. Vallejo-Jimenez and M. A. Garcia-Meza

was extended by Owen [15] for all cooperative games with transferable utilities and further characterized by Feltkamp [8]. Also, the core is a feasible allocation of resources that fulfills many requirements of a cooperative game (cf. [16]). Another approach, is the Imputation Distribution Procedure, proposed first by Petrosjan [17], with applications in differential games (cf. [18, 19]), has several economical applications from economic growth to marketing (cf. [6, 10–12]). In the light of such a variety of solutions to the fair distribution of the gainings from cooperation, the Shapley value is a particularly useful method, since it must satisfy conditions of efficiency, symmetry, linearity and null player. This value is also widely applied to ponder marginal contributions and allows for decision making. For instance, Zhang et al. [25] apply Shapley value methods to define regional allocation of carbon emission quotas in China. Cao [3] on the other side, makes an application of the value as a systemic risk measure. Fréchette et al. [9] analyze algorithm portfolios based on their contributions via a measure drawn from coalitional game theory. Colini-Baldeschi et al. [4] propose an allocation criterion for the variance of the sum of n possibly dependent random variables by translating the problem into a cooperative game. The maximization of total discounted logarithmic utility with a risky asset has been a widely studied problem since Merton (cf. [13, 14]) and Samuelson [20], and more recently by Cox and Huang [5]. These authors show that shifting between assets can generate opportunities for long-term investors based on the expectation on multi-period problems. Thus, including new processes to generate more realistic scenarios provides an opportunity for attractive propositions; Zariphopolou [24] works on the maximization of expected utility based on a terminal wealth with a deterministic riskless asset and a risky asset whose price depends only on a continuous-time Markov chain. Fei [7] denotes the optimal portfolio with inflation following a Markovian switching process. Sotomayor and Cadenillas [22] find explicit solutions for optimal investment decisions for specific HARA (hyperbolic absolute risk aversion) utility functions with Brownian motion and regime switching. Bäuerle and Rieder [2] propose optimal portfolio decisions when the stock return and its volatility are associated to an external (homogeneous and finite) Markov chain. Vallejo et al. [23] extended the previous work by obtaining closed forms for consumption and portfolio decisions with volatility and tendencies based on an external time-inhomogeneous Markov chain.

15.2 Diffusion Process Without Markov Chain Consider the following problem of determining portfolio and consumption decisions that maximizes the total expected discount utility. A rational consumer has access to a bond, a risky asset as her own consumption. Due to the inclusion of a risky asset, the setup of this problem requires a filtered probability space (or stochastic basis) to define the randomness of the processes, that is, (Ω, F , F = {Ft , 0 ≤ t ≤ T }, P ), i.e. Ω is a set, F is a σ -algebra on Ω, P is a probability measure on (Ω, F ), and

15 Shapley Value Portfolio Decision with Markov Modulated Diffusion Process

291

Ft is a filtration where we will assume it is contained all information of the market for the period until time t. Let bt be the price of a riskless bond whose dynamic process is defined by the differential equation dbt = rdt, bt

(15.1)

and the stock price process St evolves according to the following stochastic differential equation: dSt = μ0 dt + σ0 dWt , St

(15.2)

where dWt ∼ N(0, dt) is a Wiener process or Brownian Motion, σ0 represents the standard deviation on the Geometric Brownian Motion, r and μ0 are the constant trends. In what follows we suppose that μ0 , σ0 ∈ R+ and μ0 > r > 0, which allows rational consumers to invest in both assets. The consumer defined in this problem is thus, concerned with allocating it’s wealth in an optimal way among two different assets and her consumption. This restriction is represented by the following differential equation: dat = at (1 − θ )

dSt dbt + at θt − dct , bt St

(15.3)

where the amount of wealth destined to consumption at any time t follows dct = ct dt and θt defines the fraction of wealth not destined for consumption that is invested in the risky asset, say a stock, at any given time t. Note that we can therefore define the portfolio strategy by θt , which is called admissible whenever @T 2 0 θs ds < ∞ almost surely. Therefore, at is the self-financed real wealth process. Thus, the utility maximization problem is stated as #



max E

e−ρt u(ct )dt|Ft

(15.4)

0

Subject to   ct dat = at r0 + θt (μ0 − r) − dt + at θt σ0 dWt at

(15.5)

where ρ is the discount rate of the agent. Consider a logarithmic utility function, that is u(ct ) = ln ct , thus the feedback solution of the maximization problem is given by θt =

μ0 −r , σ02

ct = ρat ,

(15.6)

292

B. Vallejo-Jimenez and M. A. Garcia-Meza

as the optimal decision of allocation of wealth between assets and optimal rate of consumption, respectively. Such resulting decisions yield  dat = at r0 +



μ0 − r σ0

2





− ρ dt + at

μ0 − r σ0

 dWt

(15.7)

as the wealth dynamics under optimal controls with an optimal expected track defined by S(at , t) =

r 1 + (ln(at )e−ρt + ln(ρ) − 1). 2 ρ ρ

(15.8)

15.3 Time-Inhomogeneous Markov Modulated Diffusion Processes We present closed solutions for consumption and portfolio decisions when a risky asset is driven by a time-inhomogeneous Markov-modulated diffusion process (MMDP). The solutions that this method yield are then compared and pondered against the Shapley value. Consider a rational agent with the same problem of allocation between consumption and two different assets. The possible events are contained in a similar filtered state, (Ω, F , F = {Ft , 0 ≤ t ≤ T }, P ). The dynamics of the bond is driven by (15.1) and the risky asset evolves according to dSt = μi dt + σi dWt . St

(15.9)

Note that, in this case, the appreciation rate μi and standard deviation σi are indexed in a finite state space S and are chosen according to a transition matrix Q = (qij (t))i,j ∈S with time transition probability under P with respect to F. For the sake of simplicity, from now on we suppose that σi : S → R+ and μi : S → R. Also, suppose that it is rational convenient to invest. Moreover, it is convenient in the long run to invest a positive amount in the risky asset i.e. E[μi ] > r > 0 for all i in S . As in the previous section, we denote by θt,i the fraction of wealth not destined to consumption that the agent is investing in the stock St at time t. The portfolio strategy θt,i is now dependent on time and on the state given by a Markov chain. This formulation allows us to consider a general situation where there are transition states given by exogenous factors in the economy. Let the real wealth process be defined similarly as in Eq. (15.5) by   ct dt + at θt,i σi dWt . dat = at r0 + θt,i (μi − r) − at

(15.10)

15 Shapley Value Portfolio Decision with Markov Modulated Diffusion Process

293

Therefore, the utility maximization problem is given by (15.4) subject to (15.10). The feedback solution to the stochastic optimal control problem using the HamiltonJacobi-Bellman equation, considering a logarithmic utility function, leads to similar optimal controls θt =

μi − r , σi2

(15.11)

and ct = ρat .

(15.12)

That is, the proportion of wealth dedicated to the risky asset increases in the difference between the growth trend of the stock and the bond, which is positive, and decreases according to the associated risk in any particular state. This result mirrors the one found in the previous section. Since, θt,i is shown to depend exclusively on i, we denote it onwards as θi . With the obtained set of controls, we can derive the corresponding wealth dynamics and optimal expected track by       μi − r 2 μi − r − ρ dt + at (15.13) dat = at r + dWt σi σi and S(at , t, i) =

2 @ ∞ μi −r −ρt + 1 −ρs ds + ln(a )e t 2 t ρe ρ σi 2ρ @∞ (15.14) + j ∈S t qij (s)[g(s, j ) − g(s, i)]e−ρs ds

r ρ +ln(ρ)−1

+

1 ρ

respectively.

15.4 Risk Sensitivity Transference In this section, the portfolio strategy in terms of the wealth proportion allocated to the risky asset is compared with the inclusion of a Markov Modulated Diffusion Process, in order to determine the main conditions that affect the risk sensitivity of the decision maker. Let θi+1 be the strategy for the next state, that is, the wealth allocation decision to be implemented when the state i + 1 is in place. The expected risk—and therefore, the expected wealth proportion allocated into the risky asset— in the next state is higher than what can be expected in the problem without MMDP. This can be expressed as E(θi+1 ) > θ0 .

(15.15)

294

B. Vallejo-Jimenez and M. A. Garcia-Meza

Equation (15.15) implies that, for a complete implementation of the MMDP, we can express the main conditions for risk sensitivity transference as $

μi+1 − r E 2 σi+1

% >

μ0 − r σ02

(15.16)

Nevertheless, it is also possible to make partial implementations. We present three cases, represented with an equation similar to (15.16), rearranged for simplicity in interpretation. • Total implementation: In this case, we assume that both trend and standard deviation follow a non-homogeneous Markov modulated process, based on the next step expectation in closed form. That is $



σ0 E (μi+1 − r) σi+1

2 % > μ0 − r

(15.17)

• Trend implementation: Here, a partial implementation is done, where only the trend follows a MMDP, that is,  E[μi+1 ] − r > (μ0 − r)

σ1 σ0

2 (15.18)

• Variance implementation: A partial implementation with only the variance following a MMDP, that is, $ (μ1 − r)E

σ0 σi+1

2 % > μ0 − r

(15.19)

The first consideration arises from the fact that if μi+1 is not fixed, the changes on the trend will absorb part of it, originally attributed to σ0 almost everywhere (the set of cases where this does not happen is a set of measure zero), for a given process on which we adjust both a classical and a modern to describe it. model σ0 σ0 Thus, we assume that if μi+1 is not fixed, then σ1 , σi+1 > 1, as shown in Eqs. (15.17) and (15.18). if μi+1 = μ1 is fixed, then we do not have a

Nonetheless, σ0 prior knowledge of E σi+1 , as shown on Eq. (15.19). On the cases (15.17) and (15.18), any portfolio risk transference sensitivity

σ0 σ0 will depend on both trend absorption—measured by σ1 , σi+1 —and E[μi+1 ], depending on at least one of them being sufficiently high to determine a more risk permissive portfolio strategy. Meanwhile, the case stated by Eq. (15.19) depends σ0 only on a sufficiently large E σi+1 .

15 Shapley Value Portfolio Decision with Markov Modulated Diffusion Process

295

15.5 Inclusion of New Stochastic Processes in Portfolio Decisions In this section we present the main results of the paper. Here, we use the Shapley value to evaluate the inclusion of a stochastic process in a portfolio decision. In a sense, we make a treatment of the assets as different players in a game where they should decide their inclusion in an investment fund. An asset manager makes a decision as to include the asset in the portfolio given their marginal contribution to the profits of the fund. Thus, we define individual strategies for three players, each seeking to maximize their expected utility, discounted by her marginal propensity to consume, denoted by ρ. That is, each player desires to maximize (15.4), where player 1 has a wealth dynamics is described by   ct da1t = a1t r1 + θ1 (μ1 − r1 ) − dt + a1t θ1 σ1 dW1 , a1t

(15.20)

A second wealth dynamics is described by   ct da2t = a2t r1 + θt,i (μ2,i − r1 ) − dt + a2t θt,i σ2 dW2t , a2t

(15.21)

and let the third player’s wealth be   ct da3t = a3t r1 + θt,i (μ3,i − r1 ) − dt + a3t θt,i σ3,i dW3t . a3t

(15.22)

While the wealth dynamic described in Eq. (15.20) do not include a MMDP, the wealth dynamic in (15.21) increases the complexity of the model by including such a diffusion process in the trend and (15.22) includes it in both the trend and magnitude of deviations. The corresponding Hamilton-Jacobi-Bellman (HJB) equation for the above problem yields the following recursive equation     ∂J (at , t) ∂J (at , t) ct E u(ct )e−ρt dt + at r + θt,i (μi − r) − + + ∂t ∂at at cs |s∈[t,t+dt] ⎞ ⎤ ⎛   1 ∂ 2 J (at , t) 2 2 2 at θt,i σi dt + ⎝ qij (t)[J (at , t, j ) − J (at , t, i)]⎠ dt ⎦ . (15.23) 2 ∂at2

0=

max

j ∈Q

The derivation of this equation is shown in Appendix 1. The HJB equation above has optimal strategy J (at , t, i) = β0 + β1 u(at )e−ρt + g(t, i)e−ρt ,

(15.24)

296

B. Vallejo-Jimenez and M. A. Garcia-Meza

where β0 and β1 are the coefficients of a linear ansatz. The optimal consumption rate and proportion of the wealth invested in a risky asset are ct∗ = ρat ,

(15.25)

and θi =

μi − ri . σi2

(15.26)

The obtention of those values is detailed in Appendix 2. Substituting, each player has the problem of maximizing (15.4), subject to their wealth track, described by   (μ1 − r1 )2 μ1 − r1 da1t = r1 + − ρ dt + dW1 2 a1t σ1 σ1

(15.27)

  (μ2i − r1 )2 μ2i − r1 da2t = r1 + − ρ dt + dW2t 2 a2t σ2 σ2

(15.28)

  (μ1 − r1 )2 μ3i − r1 da1t = r1 + − ρ dt + dW3t 2 a1t σ3i σ1

(15.29)

Note that each player has its particular use of the risky asset. Players 2 and 3 use a Markov modulated diffusion process, so player 1 has instant utility v1 (·) = (μ1 − r1 )2 /σ12 , while players two and three’s are described by v2 (·) = (μ2i − r1 )2 /σ22 , and v3 (·) = (μ3i − r1 )2 /σ3i2 respectively. For the next section, assume ρ = r1 and a1 = a2 = a3 = 1 for the sake of simplicity of exposition. The change of such assumptions do not change the results in a relevant way. A cooperative game is then defined between the three players, where the profits are shared through the Shapley Value. The Shapley Value [21] is a concept from cooperative games that offers a fair allocation of the total payoffs among players under the assumption of transferable utility. Even though this is but one of many methods of allocation, it is the only one that satisfies conditions of efficiency, symmetry, linearity and null player. The efficiency property shows that the exact amount obtained by the total payoff will be distributed. The symmetry property determines that if two players contribute in the same amount to the total payoff, then they should receive the same outcome. The linearity property implies that if a single game is divided into two (or more) subgames, the total amount received by their partial contributions remains. Finally, the null player property states that if a player does not contribute to any coalition, he or she should not have any outcome at the end.

15 Shapley Value Portfolio Decision with Markov Modulated Diffusion Process

297

Recall that a coalitional game is composed by a set of players N where |N| = n is the number of players, and a function S : 2N → R, such that S(∅) = 0, called the characteristic function. Let Z ⊂ N be a coalition of players, then S(Z) represents the total sum of expected payoffs for each member due to cooperation. To compute the payoff φi (S) player j ∈ N should receive due to her contribution in the game (n, N), we perform the following operation: φj (S) =

 Z⊆N\{j }

|Z|!(n − |Z| − 1)! (S(Z ∪ {j }) − S(Z)), n!

(15.30)

where the difference S(Z ∪ {j }) − S(Z) is the marginal contribution of player j to the subcoalition Z. Note that this contribution is weighted in every subcoalition. In this section we compute the marginal contribution of the inclusion of a Markov Modulated Diffusion Process in the total utility based on the optimal track of the problem. Thus, the Shapley value for the three players are described by ⎡ ⎤⎡ ⎤ 3 1 2 v1 1 φ(v) = ⎣ 3 2 −1⎦ ⎣v2 ⎦ , (15.31) 6 v3 12 −5 −1 where φ(v) = (φ1 (v1 ), φ2 (v2 ), φ3 (v3 )) . This implies that the conditions for {φ3 (v) ≥ φ2 (v)} are given by   ; ,   , ; σ22 (μ2i − r1 )2 (μ2i − r1 )2 (μ3i − r1 )2 ≥ ⇐⇒ ≥ (μ3i − r1 )2 σ3i2 σ22 σ3i2 and the conditions for {φ2 (v) ≥ φ1 (v)} are ,   ; ,   ; σ12 (μ2i − r1 )2 (μ1 − r1 )2 (μ1 − r1 )2 ≥ ⇐⇒ ≥ (μ2i − r1 )2 σ22 σ12 σ22

15.6 Final Remarks The introduction of a Markov Modulated Diffusion Process in an optimization model can be seen as the inclusion of more agents in a game, where the classical process yields a certain profit for a rational agent. If she desires to include an asset driven by a MMDP to the portfolio, she should asses the inclusion considering its marginal contribution on her total discounted logarithmic utility. The Shapley value can be a suitable method for evaluation of the marginal utility obtained by the inclusion of a MMDP. Here we find that, if the inclusion of such a process yields a higher volatility to the model, then the marginal contribution is diminished.

298

B. Vallejo-Jimenez and M. A. Garcia-Meza

Thus, this chapter presents evidence that supports the inclusion of Markovian processes in classical optimal decision problems under some few flexible conditions. The problem was approached from two perspectives: an increase of wealth proportion invested on the risky asset, providing information on a possible risk overestimation on classical models, and an increase of total discounted logarithmic utility, providing conditions under which the consideration of a model based on a risky asset driven by a Time-inhomogeneous MMDP actually improves portfolio decisions. The use of the Shapley Value can be used for the comparison of the inclusion of different processes to the portfolio optimization problem with risky assets. Acknowledgements We would like to thank the National Council of Science and Technology (CONACyT) of Mexico for its support. We are grateful to Universidad de Colima and Universidad Juárez del Estado de Durango.

Appendix 1 In this appendix we derive the results presented in Sect. 15.5. The problem represented by Eq. (15.4) subject to the wealth constraints described by Eq. (15.20) are then reduced by HJB equation to # J (at , t) =

max

cs |s∈[t,t+dt]

E

t +dt

u(cs )e−ρs ds + d(J (at , t)) + J (at , t) + o(dt) ,

t

which can be reduced to 0=

max

cs |s∈[t,t+dt]

  E u(ct )e−ρt dt + o(dt) + d(J (at , t)) .

This, according to the problem at hand, can be expressed then as 

 ∂J (at , t) ∂J (at , t) 1 ∂ 2 J (at , t) 2 2 a t μa + a σ + t a dt+ ∂t ∂at 2 ∂at2 ⎛ ⎞  ∂J (at , t) + at σa dWt + ⎝ qij (t)[J (at , t, j ) − J (at , t, i)]⎠ dt. dat

d(J (at , t)) =

j ∈S

where dat = at μa dt + at σa dWt , μa = r + θt,i (μi − r) −

ct , at

15 Shapley Value Portfolio Decision with Markov Modulated Diffusion Process

299

and σa = θt σi . These solutions are substituted in the HJB equation, thus     ∂J (at , t) ∂J (at , t) ct E u(ct )e−ρt dt + at r + θt,i (μi − r) − + + ∂t ∂at at cs |s∈[t,t+dt] ⎞ ⎤ ⎛   1 ∂ 2 J (at , t) 2 2 2 at θt,i σi dt + ⎝ qij (t)[J (at , t, j ) − J (at , t, i)]⎠ dt ⎦ (15.32) 2 ∂at2 j ∈S

0=

max

Appendix 2 This annex, refers directly to Vallejo et al. [23], some results on this paper are required in order to understand construction and use of several elements from Eqs. (15.6) to (15.14). Referred paper brings closed-form solutions of the utility maximization problem, solving a continuous decision making problem for an infinitely-lived rational consumer with a logarithmic utility function, when the risky asset is driven by a Time-Inhomogeneous Markov Modulated chain and a classical Geometric Brownian motion. General solution can be simplified and is consistent with simplified versions of this model, providing solutions for classical problems with fixed variations. Those classical models and their solutions are required for this research. The bond and risky asset are defined as dbt = rdt bt

(15.33)

dSt = μi dt + σ dWt St

(15.34)

and

And the utility maximization problem is given by #



Maximize E

u(ct )e−ρt dt|Ft

(15.35)

0

s.t. dat = at (r + θt,i (μi − r) −

ct )dt + at θt,i σi dWt at

(15.36)

300

B. Vallejo-Jimenez and M. A. Garcia-Meza

Requiring optimal control tools, such as Hamilton-Jacobi-Bellman equation, defining a value function as #



J (at , t) = Maximize E cs |s∈[t,∞)

u(cs )e−ρs ds

(15.37)

t

Simplifying this and applying mean value theorem for integral calculus, it follows that   0 = Maximize E u(ct )e−ρt dt + o(dt) + d(J (at , t)) cs |s∈[t, t+dt]

(15.38)

Now, in order to continue, Ito’s Lemma is applied to expand d(J (at , t)), as ⎤ (at ,t ) a θ σ dW u(ct )e−ρt dt + o(dt) + ∂J∂a t i t t,i t ⎥ ⎢ ∂J (at ,t ) ∂J (at ,t ) ⎢+ + ∂at at (r + θt,i (μi − r) − actt ) ⎥ ∂t ⎥ ⎢ 0 = Maximize E ⎢ 1 ∂ 2 J (at ,t ) 2 2 2 ⎥ ⎥ ⎢ +2 cs |s∈[t, t+dt] at θt,i σi dt 2 ⎣ ∂at ⎦ + j ∈E qij (t) [J (at , t, j ) − J (at , t, i)] dt ⎡

(15.39)

Under the assumption of independence between Brownian motion and Markov drift. Equation (15.39) can be easily simplified by elimination of martingale elements as follows: ⎤ ⎡ u(c t )e−ρt dt ⎥ ⎢ (at ,t ) t ,t ) + ∂J∂a at (r + θt,i (μi − r) − actt ) ⎥ ⎢ + ∂J (a ∂t t ⎥ ⎢ (15.40) 0 = Maximize E ⎢ 1 ∂ 2 J (at ,t ) 2 2 2 ⎥ ⎥ ⎢ + 2 ∂a 2 at θt,i σi dt cs |s∈[t, t+dt] ⎣ t ⎦ + j ∈E qij (t) [J (at , t, j ) − J (at , t, i)] dt Than under optimal conditions (assuming an optimal consumption), can be expressed as 0 = u(ct∗ )e−ρt + +



∂J (at , t) ∂J (at , t) at (r + θt,i (μi − r) − + ∂t ∂at

qij (t) [J (at , t, j ) − J (at , t, i)]

ct∗ 1 ∂ 2 J (at , t) 2 2 2 )+ at θt,i σi at 2 ∂at2

(15.41)

j ∈E

This kind of problems require both: an assumption for utility function and a proposed value function. By selecting a logarithmic utility, an admissible value function could be J (at , t, i) = β0 + β1 u(at )e−ρt + g(t, i)e−ρt .

15 Shapley Value Portfolio Decision with Markov Modulated Diffusion Process

301

Simplifying, we have ∂g(t, i) ∂t ∗ c + β1 u (at )at (r + θt,i (μi − r) − t ) + at  + qij (t) [g (t, j ) − g (t, i)]

0 = u(ct∗ ) − ρ (β0 + β1 u(at )) +

− ρg(t, i) 1 2 2 β1 u

(at )at2 θt,i σi 2 (15.42)

j ∈E

And therefore, with a logarithmic utility, and some partial derivatives, optimal ∗ portfolio and consumption decisions follow θi = μσi −r 2 and ct = ρat . i

References 1. Banzhaf, J.F.: Weighted voting does not work: a mathematical analysis. Rutgers Law Rev. 19, 317–343 (1965) 2. Bäuerle, N., Rieder, U.: Portfolio optimization with Markov-modulated stock prices and interest rates. IEEE T. Automat. Contr. 49(3), 442–447 (2004) 3. Cao, Z.: Multi-CoVaR and Shapley value: a systemic risk measure. Banque de France Working Paper (2013) 4. Colini-Baldeschi, R., Scarsini, M., Vaccari, S.: Variance allocation and Shapley value. Methodol. Comput. Appl. 20(3), 919–933 (2018) 5. Cox, J.C., Huang, C.-f.: Optimal consumption and portfolio policies when asset prices follow a diffusion process. J. Econ. Theory 49(1), 33–83 (1989) 6. Dockner, E.J., Jorgensen, S., Van Long, N., Sorger, G.: Differential Games in Economics and Management Science. Cambridge University Press, Cambridge (2000) 7. Fei, W.Y.: Optimal consumption and portfolio under inflation and markovian switching. Stochastics 85(2), 272–285 (2013) 8. Feltkamp, V.: Alternative axiomatic characterization of the Shapley and Banzhaf values. Int. J. Game Theory 24 179–186 (1995) 9. Fréchette, A., Kotthoff, L., Michalak, T.P., Rahwan, T., Hoos, H.H., Leyton-Brown, K.: Using the Shapley value to analyze algorithm portfolios. In: Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, pp. 3397–3403 (2016) 10. Garcia-Meza, M.A., Gromova, E.V., López-Barrientos, J.D.: Stable marketing cooperation in a differential game for an oligopoly. Int. Game Theory Rev. 20(3), 1750028 (2018) 11. Jørgensen, S., Gromova, E.: Sustaining cooperation in a differential game of advertising goodwill accumulation. Eur. J. Oper. Res. 254(1), 294–303 (2016) 12. Jørgensen, S., Zaccour, G.: Differential Games in Marketing. Springer Science & Business Media, Berlin (2012) 13. Merton, R.C.: Lifetime portfolio selection under uncertainty: the continuous time case. Rev. Econ. Stat. 247–257 (1969) 14. Merton, R.C.: Optimum consumption and portfolio rules in a continuous-time model. J. Econ. Theory 3(4), 373–413 (1971) 15. Owen G.: Multilinear extensions and the Banzhaf value. Nav. Res. Logist. Q. 22, 741–750 (1975) 16. Peleg, B.: Axiomatizations of the core. In: Handbook of Game Theory with Economic Applications, pp. 397–412. Elsevier B. V., Amsterdam (1992)

302

B. Vallejo-Jimenez and M. A. Garcia-Meza

17. Petrosjan, L.A.: Stable solutions of differential games with many participants. Viestnik of Leningrad University 19, 46–52 (1977) 18. Petrosjan, L.A., Danilov N.N.: Cooperative Differential Games and Their Applications. Izd. Tomsk Univesity, Tomsk (1985) 19. Petrosjan, L., Zaccour, G.: Time-consistent Shapley value allocation of pollution cost reduction. J. Econ. Dyn. Control 27(3), 381–398 (2003) 20. Samuelson, P.A.: Lifetime portfolio selection by dynamic stochastic programming. Rev. Econ. Stat. 51(3), 239–246 (1969) 21. Shapley, L.S., Shubik, M.: A method for evaluating the distribution of power in a committee system. Am. Polit. Sci. Rev. 48, 787–792 (1954) 22. Sotomayor, L.R., Cadenillas, A.: Explicit solutions of consumption-investment problems in financial markets with regime switching. Math. Financ. 19(2), 251–279 (2009) 23. Vallejo-Jimenez, B., Venegas-Martinez F., Soriano-Morales Y.V.: Optimal consumption and portfolio decisions when the risky asset is driven by a time-inhomogeneous Markov modulated diffusion process. Int. J. Pure Appl. Math. 104(3), 353–362 (2015) 24. Zariphopoulou, T.: Investment-consumption models with transaction fees and Markov-Chain parameters. SIAM J. Control Optim. 30(3), 613–636 (1992) 25. Zhang, Y.J., Wang, A.D., Da, Y.B.: Regional allocation of carbon emission quotas in China: evidence from the Shapley value method. Energ. Policy 74, 454–464 (2014)

Chapter 16

Conditionally Coordinating Contracts in Supply Chains Nikolay A. Zenkevich, Irina Berezinets, Natalia Nikolchenko, and Alina Rucheva

Abstract The chapter revisits the supplier-retailer supply chain game and investigates target sales rebate and buyback contracts, which motivate participants for both individual rational and Pareto optimal behavior. The research considers the Stackelberg model in supply chain under condition of fixed retail price and stochastic demand. Authors proposed the algorithm of the conditionally coordinating problem solving for both types of contracts. A general framework is introduced, the condition for achieving the coordinating equilibrium is characterized, and a special case for uniformly distributed demand is analyzed. The models with uniformly distributed demand demonstrate that the conditional coordination can be achieved, and examples show that the problem of establishing contract parameters is a problem of sharing of supply chain expected profit under a compromise between supplier and retailer. Keywords Supply chain coordination · Coordinating contracts · Nash equilibrium · Pareto-optimal solution · Conditionally coordinating contracts · Sales rebate contract · Buyback contract

16.1 Introduction In this chapter, the games between supplier and retailer based on sales rebate contract and buyback contract are considered. The most relevant and cited research on supply chain coordination using sales rebate contract are follows: [4, 6–8, 11, 12, 14, 17, 18, 21–23, 26, 29]. Taylor [23] showed that sales rebate contract helps achieving coordination and win-win situation in supply chain. Herein, according to the author, supply chain coordination is achieved when the supply chain profit

N. A. Zenkevich () · I. Berezinets · N. Nikolchenko · A. Rucheva St. Petersburg State University, St. Petersburg, Russia e-mail: [email protected]; [email protected]; [email protected]; [email protected] © Springer Nature Switzerland AG 2019 L. A. Petrosyan et al. (eds.), Frontiers of Dynamic Games, Static & Dynamic Game Theory: Foundations & Applications, https://doi.org/10.1007/978-3-030-23699-1_16

303

304

N. A. Zenkevich et al.

reaches its maximum, and a win-win situation is a decision, when both players have a better result by implementing contract than without one. At the same time, Cachon [4] revealed that the sales rebate contract cannot coordinate supply chain without additional settings and, thus, concluded this type of contract is not coordinating. In particular, a contract coordinates a supply chain if the set of players’ strategies is Nash equilibrium. Besides, in an ideal setting, the equilibrium should be unique, in other case, players can tend to coordinate suboptimal set of actions. The sales rebate contract is a widespread tool in marketing since it motivates retailers to sell more. Traditionally researchers of this type of contract [8, 23] distinguish two types of rebate: linear and target. Linear rebate is a rebate paid per each unit sold by the retailer. The difference between linear and target rebate is that the supplier pays the rebate if only the sales volume is more than the threshold set by the supplier. Considering buyback contracts most of the researchers [10, 15, 25, 27, 28] state the ability of the buyback to coordinate the chain. The literature on the buyback contracts [1, 2, 4, 9, 13, 19, 24] traditionally considers two types of buyback contracts depending on the one who sells the stocked products at the end of the season. The supplier to whom the retailer returns unsold goods or the retailer to whom the supplier pays compensation for the unsold goods, but the unsold products remains at the retailer and can be sold by him at the end of the season for the salvage value v. The type of the profit functions, the expected values of the profits and the coordinating conditions for the contract parameters will depend on this contract condition. In this paper we investigate and propose the algorithm of solving the problem of two-echelon supply chain coordination, consisting of one supplier and one retailer, interacting on the basis of sales rebate or buyback contract.

16.2 Sales Rebate Contract Model Let’s consider a supply chain consisting of two participants: one supplier and one retailer. The supplier vends goods to the retailer, and the retailer sells them in the market. The supply chain partners have contractual relationships via sales rebate contract. As such, the supplier offers the retailer the following conditions: the wholesale price per unit and a rebate per unit sold in the market above the threshold set by the supplier. In response to supplier’s offer the retailer makes a decision on quantity of goods to buy from the supplier on the grounds of the information about the market demand. As a base model for supplier-retailer interaction in a supply chain, we will use the Stackelberg model, according to which the supplier is a leader choosing her strategy first, and the retailer is a follower choosing his strategy after the supplier with stochastic demand [3, 16, 20]. This model is one of the most appropriate to describe the supply chain, whereby under the contract terms the supplier makes the decision first. After that the retailer chooses the purchasing volume based on the suggested conditions. Both players are risk-neutral, try to maximize their profits and operate in a setting of complete information of contract parameters; this means

16 Conditionally Coordinating Contracts in Supply Chains

305

both players have the information about the costs, the retail price and the salvage value. According to the sales rebate contract model, the supplier offers retailer the following: wholesale price per unit ω, rebate per unit r sold by the retailer above the threshold t, wherein t < q. The retailer chooses the best response strategy. It is assumed that both players know their costs: production costs per unit supplier incurs cs and retailer’s marginal costs per unit cr . The retailer sells the products on the market at fixed retail price p per unit. If the retailer is unable to sell the entire volume of products at the price p, he can sell the remaining products at the salvage value v per unit. Assume, that the following conditions are fulfilled: 0 < cs < ω < p. At the same time, the salvage value does not exceed the supplier’s costs per unit: v < cs < ω, moreover, v < ω; v < p; v < cs . The list of notations used in the model is presented in Table 16.1. We develop the model under the assumption of stochastic demand. Denote by ξ a random variable of demand of this type of product, and by τ - a random variable of volume of this type of product sold. Assume that τ = g(ξ ), where , τ = g(ξ ) =

ξ,

0 ≤ ξ < q,

(16.1)

q, ξ ≥ q.

Let ξ be a continuous random variable with probability density function fξ (x). Then, we find the expectation of sales volume or the expected value of a random variable τ : # +∞ # q # +∞ E[τ ] = E[g(ξ )] = g(x)fξ (x)dx = xfξ (x)dx + qfξ (x) dx. 0

0

q

Table 16.1 The list of notations, used in the model S R ω r t q p v cs cr c P rofs P rofr P rofsc

Supplier Retailer The wholesale price per unit (c.u.) The value of the rebate paid to the retailer by the supplier for a unit of products sold in excess of the established threshold (c.u.) Sales volume set by the supplier, in excess of which he pays the retailer a rebate per unit The volume of products vended by the supplier to the retailer (pcs.) The retail price per unit (c.u.) The salvage value per unit (c.u.) The supplier’s production costs per unit (c.u.) The retailer’s marginal costs per unit (c.u) Supply chain total costs, c = cs + cr (c.u.) The supplier’s profit per transaction (c.u.) The retailer’s profit per transaction (c.u.) Supply chain profit per transaction, P rofsc = P rofs + P rofr (c.u.)

306

N. A. Zenkevich et al.

Therefore, #

q

E[τ ] =

#

+∞

xfξ (x)dx + q

fξ (x) dx

0

q

or

 E[τ ] = q 1 − Fξ (q) +

#

q

xfξ (x) dx. 0

Because #

q 0

q

xfξ (x)dx = xFξ (x) |0 −

#

q

Fξ (x) dx, 0

the last expression can be presented as follows: #

q

E[τ ] = q −

Fξ (x) dx.

(16.2)

0

For further transformations, find the derivative of E[τ ] with respect to variable q: ∂ E[τ ] = 1 − Fξ (q). ∂q

(16.3)

The retailer’s profit will depend on the sales volume or on the value of the random variable τ . There are two possible situations: 0 < τ ≤ t and t < τ ≤ q. Consider the situation, when 0 < τ ≤ t. In this case, the retailer does not fulfill the contract terms, since the volume of sold products is less than the sales volume threshold set by the supplier. Consequentially, the supplier does not pay the rebate to the retailer. The players’ profits P rofr , P rofs , and the supply chain profit P rofsc for this case are as follows: P rofr = (p − v)τ − (ω + cr − v)q,

(16.4)

P rofs = (ω − cs )q,

(16.5)

P rofsc = (p − v)τ − (ω + cr − v)q + (ω − cs )q = (p − v)τ + (v − c)q.

(16.6)

In the case, when t < τ ≤ q, the players’ profits P rofr , P rofs , and the supply chain profit P rofsc have the following forms: P rofr = (p − v + r)τ − (ω + cr − v)q − tr,

(16.7)

P rofs = r(t − τ ) + (ω − cs )q,

(16.8)

16 Conditionally Coordinating Contracts in Supply Chains

307

P rofsc = (p − v + r)τ − (ω + cr − v)q − tr + r(t − τ ) + (ω − cs )q = = (p − v)τ + (v − c)q. (16.9) Based on expressions (16.7)–(16.9), we receive the expressions for the expected values of the profits of the retailer, supplier and supply chain:  # E[P rofr ] = (p − v + r) q −

q

 Fξ (x) dx − (ω + cr − v) q − tr,

(16.10)

0

 # E[P rofs ] = (ω − cs )q − r q −

q

 Fξ (x) dx − t ,

(16.11)

0

 # E[P rofsc ] = (p − v) q −

q

 Fξ (x) dx + (v − c)q.

(16.12)

0

In this research, the contract is defined as coordinating if the individual and collective rationality properties are fulfilled. This means that such a contract must be a Nash equilibrium in the considered Stackelberg model and possesses a Paretooptimality property. We start constructing a coordinating contract on the assumption that the retailer sells more products than the sales volume threshold set by the supplier, t < τ ≤ q, and the supplier pays a rebate per unit sold in excess of this threshold. For the case when the retailer sells less than the specified sales volume (0 < τ ≤ t), the relationships between supply chain participants will be regulated by the wholesale price contract. Consider the algorithm for the conditionally coordinating problem solving. The First Step Since the retailer knows the supplier’s offer, it is necessary to find the optimal volume q of products purchased from the supplier. The optimal volume is the volume q ∗ , that maximizes the expected value of retailer’s profit. For computing the extreme point of the retailer’s profit expectation E[P rofr ], we calculate:

 ∂E[P rofr ] = (p − v + r) 1 − Fξ (q) − (ω + cr − v). ∂q The necessary condition for the extreme of the expected value of the profit function is: 

(p − v + r) 1 − Fξ (q) − (ω + cr − v) = 0. From the last equation, we have: Fξ (q) =

p − ω − cr + r . p−v+r

(16.13)

308

N. A. Zenkevich et al.

Hence, for finding the stationary point of the function E[P rofr ] the following condition needs to be fulfilled:   −1 p − ω − cr + r 0 . (16.14) qr = Fξ p−v+r The second derivative of the function E[P rofr ] has the following form:

 ∂ 2 E[P rofr ] = (p − v + r) −fξ (q) . ∂q 2 Due to the condition p > v, r > 0, and the probability density function fξ (x) is always non-negative, the second derivative is always non-negative in stationary point of the function:

(p − v + r) −fξ (q 0 ) ≤ 0.

(16.15)

Therefore, the point q 0 , specified by the expression (16.14), is the point of maximum q ∗ of the expectation of the retailer’s profit function E[P rofr ]. The Second Step At the second step, we find the extreme point of the expected value of the supply chain profit function E[P rofsc ] and receive the first coordinating condition. The first derivative of the function E[P rofsc ] equals: ∂ ∂E[P rofsc ] = ∂q ∂q

  # (p − v) q −

q

  Fξ (x) dx + (v − c)q =

0

 = (p − v) 1 − Fξ (q) + v − c.

The necessary condition of the function extreme is as follows:

 (p − v) 1 − Fξ (q) + v − c = 0. Then 0 )= Fξ (qsc

p−c p−v

or 0 qsc = Fξ−1



 p−c . p−v

(16.16)

16 Conditionally Coordinating Contracts in Supply Chains

309

Then we can calculate the second derivative of the expected value of the supply chain profit function.

∂ 2 E[P rofsc ] 0 = (p − v) −f (q ) . ξ ∂ 2q

(16.17)

0 is the maximum point of the function It is clear, that the stationary point qsc E[P rofsc ]. In order for the contract to be coordinating, we find such a value of the wholesale price ω, that the following equality will be fulfilled coordinating condition: ∗ qsc = qr∗

or Fξ−1



p−c p−v



= Fξ−1



p − ω − cr + r p−v+r

 .

(16.18)

From the previous equation we receive the following expression for ω∗ : ω ∗ = cs +

v−c r. v−p

(16.19)

The obtained ω∗ will ensure such a volume of purchased products q ∗ , that the maximum of expected values of the retailer’s and the supply chain profit functions will be achieved. The Third Step The first two steps of coordinating contract model development allowed us to achieve the maximum of the expected values of retailer’s and the entire supply chain profits through the coordinating condition (16.18) for the contract parameter ω. Now it needs to find other parameters of the contract r and t, such that the expected value of the supplier’s profit will reach its maximum. We will proceed as follows: we substitute the obtained maximum point of the retailer’s profit expected value q ∗ (16.14) in the expectation of supplier’s profit (16.11) and the expression for the wholesale price ω∗ , obtained from the condition (16.18). After that, we find such values of contract parameters r and t, that the function E[P rofs ] (16.11) will achieve its maximum. If this problem has a solution, then the sales rebate contract will be coordinating. Otherwise, the problem of construction of coordinating sales rebate contract has no solution. We specified the expressions for the expected profit values of the supply chain participants and the entire supply chain in terms of sales rebate contract in Table 16.2. As already noted [4], in addition to coordinating contracts, there may be sales rebate contracts that coordinate the supply chain under certain conditions. An example of such a contract will be discussed in the next section.

With rebate

Without rebate

q

Fξ (x) dx +



0

+(v − c)q

+(v − c)q   # q (p − v) q − Fξ (x) dx +

0

E[P rofsc ]  # (p − v) q −

0

(ω − cs − r)q + # q +r Fξ (x) dx + rt

(ω − cs )q

E[P rofs ]

Table 16.2 Expressions for the profit expected values in terms of sales rebate contract in general form q

 Fξ (x) dx −

−(ω + cr − v)q − rt

0

−(ω + cr − v)q  # q (p − v + r) q − Fξ (x) dx − 

0

E[P rofr ]  # (p − v) q −

310 N. A. Zenkevich et al.

16 Conditionally Coordinating Contracts in Supply Chains

311

16.3 A Model of Sales Rebate Contract with a Uniformly Distributed Demand Consider the problem of constructing a coordinating sales rebate contract for the case when a random variable ξ has uniform distribution on the interval [0, β]. Under this assumption the cumulative distribution function for random variable ξ has the following form:

Fξ (x) =

⎧ ⎪ 0, ⎪ ⎨x β ⎪ ⎪ ⎩ 1,

x ≤ 0, , 0 ≤ x ≤ β,

(16.20)

x > β.

Then #

q

E[τ ] = q −

Fξ (x) dx, 0

and we have the final expression for the expected value of τ : # E[τ ] = q − 0

q

q2 x dx = q − . β 2β

(16.21)

We will solve the problem of coordinating contract construction using the results obtained in Sect. 16.2. Let’s write the expressions for E[P rofr ], E[P rofs ], E[P rofsc ], substituting the expected value of τ (16.21) in (16.10), (16.11) and (16.12) (Table 16.3). Consider the algorithm for the conditionally coordinating problem solving for the case with uniformly distributed demand. The First Step At the first step, we substitute q ∗ /β in (16.13): p − ω − cr + r q∗ = β p−v+r or q∗ = β



p − ω − cr + r p−v+r

 .

(16.22)

In (16.22) we obtain the final expression for the volume qr∗ , which maximizes the retailer’s profit expected value.

With rebate

Without rebate

(p − v) q −

 2β

q2

 + (v − c)q

E[P rofsc ]   q2 (p − v) q − + (v − c)q 2β

(ω − cs − r)q + r

(ω − cs )q

E[P rofs ]

Table 16.3 Expressions for the profit expected values in case of uniformly distributed demand



q2

+ rt

−(ω + cr − v)q − rt

−(ω + cr − v)q   q2 (p − v + r) q − − 2β

E[P rofr ]   q2 (p − v) q − − 2β

312 N. A. Zenkevich et al.

16 Conditionally Coordinating Contracts in Supply Chains

313

The Second Step Using the expression (16.19), which does not depend on the type of distribution, we have the coordination condition: ω ∗ = cs +

v−c r. v−p

The Third Step At the third step, substitute the expressions (16.22) and (16.19) in supplier’s profit expected value:

 q ∗2 E[P rofs ] = ω∗ − cs − r q ∗ + r + rt, 2β having in advance carried out the following formulas transformations. First, convert the expression (ω∗ − cs − r) by substituting the expression for ω∗ : (ω∗ − cs − r) = cs +

v−c v−c r − cs − r = r − r. v−p v−p

Now substitute the expressions for ω∗ and qr∗ : qs∗ = β



ω∗ p + r − cr − p+r −v p−v+r

 =

β((v − r + c)p − p2 − vc + cr) . (p + r − v)(v − p) (16.23)

As a final step we convert the function of the supplier’s profit expected value using the earlier obtained equations: β(p − c)(p(v − r + c) − p2 + c(r − v)) + (p(v − r + c) − p2 + c(r − v))  2 p(v − r + c) − p2 + c(r − v) + } + rt. (v − p)(p − v + r)

E[P rofs (r, t)] = r{

We will consider the function E[P rofs (r, t)] as a function of two variables r and t. Let’s solve the problem of construction of coordinating sales rebate contract by finding such values of parameters t and r, that the expected value of supplier’s profit will be maximum at q = q ∗ . Let’s show that the function E[P rofs (r, t)] is not locally concave (or convex), and thus has no maximum points. To do this, we find all second-order partial ∂ 2 E[P rofs (r, t)] ∂E[P rofs (r, t)] = r and = 0. derivatives. First partial derivative ∂t ∂t 2 ∂ 2 E[P rofs (r, t)] ∂ 2 E[P rofs (r, t)] = = 1. Mixed partial derivatives will be equal: ∂t∂r ∂r∂t

314

N. A. Zenkevich et al.

∂ 2 E[P rofs ] We denote the second order partial derivative as A and write down the ∂r 2 Hessian in the form: 5 5 5A 15 5. 5 Δ=5 1 05 Since the determinant Δ is equal to −1, we can conclude that the function E[P rofs (r, t)] is not locally concave (or convex), and consequently the maximum point of the function does not exist. It should be also noted, that the feasible region of r and t is an open set and optimum function is not reached. Therefore the considered sales rebate contract is not coordinating. In this section we demonstrated that under assumption that the demand has uniform distribution, the sales rebate contract is not coordinating due to the condition of individual rationality for the supplier is not fulfilled. Since there is no maximum of the expected value of the supplier’s profit function on the contract parameters r and t. Whereas the supplier does not meet the condition of individual rationality, he will not be motivated to execute this contract. The next section shows that, if there is no coordinating solution to the sales rebate contract model (unconditional coordination), it is possible to construct “conditional” coordinating solution of the game by introducing additional constraints on the parameters of the contract.

16.4 Conditionally Coordinating Solution for the Sales Rebate Contract with Uniformly Distributed Demand The first two steps of construction a coordinating solution are identical to those described in Sect. 16.3. Since the problem of finding the parameters r and t, which maximize the expectation of the supplier’s profit, has no solution, we find such r and t, which under certain conditions lead to the such a situation that for obtained qr∗ and ω∗ , the expected value of the supplier’s profit will be no less, than for the case without the sales rebate contract. The expected value of the supplier’s profit is:     c−v q ∗2 ∗ r − cs − r q + r t + E[P rofs (r, t)] = cs + = p−v 2β   q ∗2 c−v ∗ ∗ q r +r t −q + . = p−v 2β

(16.24)

Now let’s express the expected value of the supplier’s profit, assuming that the sales volume of the retailer is less than the threshold t set by the supplier. As mentioned earlier, in this case, for all players in the game there is a wholesale price contract. We substitute the expression (16.19) for the wholesale price ω = ω∗ in expression

16 Conditionally Coordinating Contracts in Supply Chains

315

for E[P rofs ], and receive: ] = (ω∗ − cs )q ∗ = E[P rofs ] = E[P rofs   c−v c−v ∗ r − cs q ∗ = rq . = cs + p−v p−v wholesaleprice

(16.25)

Let’s return to the expression (16.24) for the function E[P rofs (t, r)] and rewrite it in the form:  ∗2  q wholesaleprice rebat e ∗ − q + t r. ] = E[P rofs ]+ E[P rofs ] = E[P rofs 2β (16.26) The analysis of the last expression allows to conclude that if the supplier sets the wholesale price equal to ω = ω∗ , and chooses the threshold of products sold t, based on the condition: t > q ∗ −q ∗2/2β, and, at the same time, the retailer will choose the volume of purchased products equal to q = q ∗ , then due to the sales rebate contract, the supplier will receive the expected value of profit more than in the wholesale price contract with ω∗ , and the expected values of profits of the supply chain and the retailer will reach their maximum. Therefore, under the assumption that demand has a uniform distribution, the contract will be coordinating with: t > q ∗ − q ∗2 /2β. Now consider the profit allocation between the players for the case when the supplier pays the retailer a rebate per unit sold above the sales volume threshold. The retailer at the same time will choose the volume of purchase equal to q = q ∗ .To do this, we first write an expressions for the supplier’s and the expectation of the retailer’s profit in the wholesale price contract with ω = ω∗ , where ω∗ = cs + βr(c − v)/(p − v) and q = q ∗ . We receive: c−v ∗ rq . p−v     c−v q ∗2 wholesaleprice E[P rofs − c+ r − v q ∗. ] = (p − v) q ∗ − 2β p−v (16.27) wholesaleprice

E[P rofs

]=

Now let’s write expressions for the expected values of the supplier’s and retailer’s profits for the case of rebate payment: E[P rofsrebate ] = 

E[P rofrrebate ]

c−v ∗ rq + p−v

q ∗2 = (p − v) q − 2β ∗





 q ∗2 − q ∗ + t r. 2β

(16.28)

   q ∗2 c−v ∗ ∗ − c+ r−v q + q − − t r. p−v 2β 

(16.29)

316

N. A. Zenkevich et al.

Analysis of the last four expressions allows us to conclude that if the supplier will set the threshold of sales volume t, he is willing to pay a rebate r per unit sold above, based on the condition, that t > q ∗ − q ∗2 /2β, then he will “take” a part of the retailer’s profit expectation and thereby increase his profit expectation. Example 16.4.1 Consider an example based on empirical data from Hong Kong companies using a sales rebate contract in their business practices [8]. For a supply chain consisting of a supplier and a retailer, we have the following information: cr = 54 (c.u), cs = 120 (c.u.), c = 174 (c.u.), p = 210 (c.u.), v = 58 (c.u.). Suppose that demand is a random variable ξ which has uniform distribution on the interval [0, 1200].The solution of the example will be carried out assuming that t > 0. We do not consider the situation when t = 0, since we do not consider the case of using a contract with a linear rebate which is not coordinating [4, 23] and leads to a negative profit of the supplier. Using the results of the second section, we calculate the optimal value of qr∗ for the retailer:     p − ω − cr + r 210 − ω − 54 + r 156 + r − ω . qr∗ = β = 1200 = 1200 p−v+r 210 − 58 + r 152 + r The wholesale price ω = ω∗ will be expressed as follows: ω ∗ = cs +

58 − 174 116 v−c r = 120 + r = 120 + r. p−v 58 − 210 152

Substituting this ω∗ in the expression for the qr∗ , we receive: qr∗

1200 = 152 + r



116 156 + r − 120 − r 152



1200 = 152 + r

  36 36 + r = 284. 152

116 r, the optimal 152 volume of qr∗ will be 284 (pcs.) and at this value q ∗ the expected values of the profits of the retailer and the entire supply chain will reach its maximum. Consider the expression for the supplier’s profit expectation when qr∗ = 284 (pcs.). Let’s use the obtained condition: t > q ∗ − q ∗2 /2β and find the constraint under which the contract will be coordinating: All calculations were rounded to integers. Thus, when ω∗ = 120+

t > 284 −

2842 . 2400

Thus, coordination will be achieved with t > 250. In other words, the supplier will benefit from a sales rebate contract if the value of threshold t is greater than 250. Let’s analyze how the supplier’s and retailer’s expected profit values will change in terms of the change of the parameters r from 1% to 10% of retail price, and t from t = 252 to t = 276 (with a step of eight pieces). The obtained expected values

16 Conditionally Coordinating Contracts in Supply Chains

317

Table 16.4 The expected values of the profits of the retailer and supplier for the parameters t and r r (c.u.) t = 252 E[P rofr ] E[P rofs ] E[P rofsc ] t = 260 E[P rofr ] E[P rofs ] E[P rofsc ] t = 268 E[P rofr ] E[P rofs ] E[P rofsc ] t = 276 E[P rofr ] E[P rofs ] E[P rofsc ]

2.1

4.2

6.3

8.4

10.5

12.6

14.7

16.8

18.9

21

4658 458 5116

4201 915 5116

3743 1373 5116

3285 1830 5116

2828 2288 5116

2370 2746 5116

1913 3203 5116

1455 3661 5116

988 4188 5116

540 4576 5116

4640 475 5116

4165 951 5116

3690 1426 5116

3215 1901 5116

2739 2377 5116

2264 2852 5116

1789 3327 5116

1313 3803 5116

838 4278 5116

363 4753 5116

4624 492 5116

4132 984 5116

3639 1476 5116

3147 1968 5116

2655 2461 5116

2163 2953 5116

1671 3445 5116

1179 3937 5116

687 4429 5116

195 4921 5116

4607 509 5116

4098 1018 5116

3589 1527 5116

3008 2036 5116

2571 2545 5116

2062 3054 5116

1553 3562 5116

1044 4071 5116

535 4580 5116

27 5089 5116

of the supplier’s and the retailer’s profits, depending on the contract parameters, are presented in Table 16.4. The results presented in Table 16.4 show that the contract parameters r and t split the profits of the supply chain participants. By increasing the rebate value, the supplier, on the one hand, attracts and stimulates the retailer to enhance sales, and on the other hand “takes the rebate back” by the growth of the wholesale price. The example shows that for certain values of the wholesale price and rebate, the expected profits of the supplier and retailer are approximately equal. At sufficiently large values of r, the supplier’s expected profit exceeds the profit of the retailer. Therefore, the supplier and the retailer as a result of negotiations should compromise and establish such r and t to make the contract profitable for both players. Moreover, such a contract should be no worse than a contract without a rebate.

16.5 Buyback Contract Model For the buyback contract consideration we also use the Stackelberg model for two players. In this model the leading supplier chooses her strategy first by setting the parameters of the contract: the wholesale price ω and the redemption price b and then the retailer as follower chooses his strategy, that is the purchasing volume depending on the choice of the first player. It is assumed in the model that both players are rational, so they try to maximize their profits, neutral to risk and have

318

N. A. Zenkevich et al.

complete information about the level of costs of each other, the retail price and the salvage value of the goods. The model considers the single transaction case: the retailer places an order to the supplier before the season starts and cannot change it during the season. We will use the following notations for the buyback contract model: the supplier offers the retailer a contract at the wholesale price ω per unit and if the retailer doesn’t sell all the products he has ordered the supplier pays her back buyback price b for each unsold item. If the retailer accepts the contract, he places an order for products for q items. Otherwise, the game is over. In addition to the conditions on the parameters formulated in the first section, this game adds one more: the buyback price does not exceed the wholesale price however it should be higher than the salvage value: v < b < ω. The list of notations used in the modelling was given in Table 16.1. Here it is also assumed that the sales volume is a random variable τ and is functionally related with demand ξ (12.1), where ξ is a continuous random variable with probability distribution function Fξ (x) and density function fξ (x). The expected value of the random variable τ and its derivative with respect to variable q are given by formulas (16.2) and (16.3) respectively. Consider the first type of buyback contract, when the supplier pays to retailer b for each unsold unit, while the unsold product can be sold by the retailer at the end of the season at a price of v [2, 4]. Write out the expressions for the profit functions of both players E[P rofr ] , E[P rofs ] and the chain E[P rofsc ] for the first-type buyback contract: P rof r = pτ + (b + v) (q − τ ) − (ω + cr ) q = (p − b − v) τ − (ω + cr − b − v) q,

(16.30) P rof s = qω − cs q − b (q − τ ) = (ω − cs − b) q + bτ,

(16.31)

P rof sc = P rof r + P rof s = (p − v) τ + (v − c) q.

(16.32)

Based on (16.30)–(16.32) find the expressions for the expected values of the profits: #

q

E[P rofr ] = (p − b − v) (q −

Fξ (x) dx) − (ω + cr − b − v) q,

(16.33)

0

#

q

E[P rofs ] = (ω − cs − b) q + b(q −

Fξ (x) dx),

(16.34)

0



E P rof sc



 # = (p − v) q −

q 0

 Fξ (x) dx + (v − c) q.

(16.35)

16 Conditionally Coordinating Contracts in Supply Chains

319

For the second type of buyback contract [9, 19, 27] the supplier compensates v out of b she paid to retailer for each unsold unit by selling it at salvage value. In this case, the expressions of the players’ and chain’s profits will have the form: P rof r = pτ + b (q − τ ) − (ω + cr ) q = (p − b) τ − (ω + cr − b) q,

(16.36)

P rof s = qω − cs q − (b − v) (q − τ ) = (b − v) τ + (ω − cs − b + v) q, (16.37) P rof sc = P rof r + P rof s = (p − v)τ + (v − c)q.

(16.38)

Hence the expressions for the expected values of the profits will be:  # E[P rofr ] = (p − b) q −

q

 Fξ (x) dx − (ω + cr − b) q,

(16.39)

0

 # E[P rofs ] = (ω − cs − b + v) q + (b − v) q −

q

 Fξ (x) dx ,

(16.40)

0

# E[ P rof sc ] = (p − v)(q −

q

Fξ (x) dx) + (v − c)q.

(16.41)

0

If the contract coordinates the chain, its parameters are chosen in such a way that the retailer’s order quantity, which maximizes his profit expectation, will also maximize the chain’s profit. It is important to note that the profit of the chain does not depend on the parameters of the contract; they only affect the split of the chain profit between the players. Since the supplier knows that with any contract parameters (ω, b), the retailer will select the order quantity q that will maximize his profit, he will choose these parameters in such a way as to secure his best possible win (i.e. the profit expectation). To do this, the supplier must choose the contract parameters in such a way that they motivate the retailer to order the quantity q that will also maximize the chain profit. To design the buyback coordinating contract we will follow the algorithm similar to the one according to which the coordinating sales rebate contract was designed. Let’s consider in general terms the solution of the coordination problem for the first type of buyback contract, when unsold goods are sold by the retailer at the end of the season at the salvage value. The First Step Find the extreme of the retailer’s profit expectation qr∗ . For that calculate the first ∂E P rof r ] function and equate it to 0: derivative of the [ ∂q

 

 ∂E P rof r = (p − b − v) 1 − Fξ (q) − (ω + cr − b − v) = 0. ∂q

(16.42)

320

N. A. Zenkevich et al.

Hence:

p−ω−c r . Fξ qr0 = p−b−v

(16.43)

The stationary point of the retailer’s profit expectation function satisfies: qr0

=

Fξ−1



p − ω − cr p−b−v

 .

(16.44)

Then consider the second derivative of the retailer’s profit expectation function:

∂ 2 E[P rof r ] qr0 . = − − b − v) f (p ξ ∂q 2 As the probability density function fξ (x) is always non-negative, so qr0 will be the maximum point of the retailer’s profit expectation function if the condition is satisfied: p ≥ b + v. Therefore   −1 p − ω − cr ∗ qr = Fξ , (16.45) p−b−v if p≥b + v. The last constraint seems logical: the retail price should not be less than the sum of the salvage value and compensation paid to the retailer by the supplier for unsold goods, otherwise the retailer would prefer not to sell products during the season at the price p per unit, but to sell it at the end of the season for v and receive compensation from the supplier in the amount of b per unit. If this condition is not fulfilled and p < b + v, then, since ω < p, b + v > ω > p. So the retailer receives for each unsold unit more than the wholesale price that he paid to the supplier and in this case the contract will not be interesting for the supplier. The Second Step   Find the extreme of the expected value of the chain profit E P rof sc . To do this, ∂E P rof r ] write out the first derivative of the function [ and equate it to 0: ∂q

 

 ∂E P rof sc = (p − v) 1 − Fξ (q) + v − c = 0. ∂q

(16.46)

Based on (16.46) we have:

p−c 0 Fξ qsc = . p−v

(16.47)

16 Conditionally Coordinating Contracts in Supply Chains

321

The stationary point of the retailer’s profit expectation function will satisfy the ratio: 0 = Fξ−1 qsc



 p−c . p−v

(16.48)

Find the second derivative of the expected value of the chain profit function:

∂ 2 E[P rof sc ] 0 = − − v) f (p ξ qsc . ∂q 2 Since the probability density function fξ (x) is always non-negative, and v < p, the second derivative above is always non-positive. Therefore the found stationary point 0 will be the maximum of the chain profit expectation function: qsc 0 ∗ qsc = qsc =Fξ−1



p−c p−v

 (16.49)

.

Hence, based on (16.45) and (16.49): Fξ−1



p − ω − cr p−b−v

 =

Fξ−1



p−c p−v

 .

(16.50)

Due to the strict monotony of the function Fξ the coordinating condition for ω based on (16.50) has the following form: ω ∗ = cs +

p−c b. p−v

(16.51)

The Third Step Substitute the expression for the order quantity qr∗ and ω∗ obtained from (16.49) to (16.51) into the function of supplier’s profit expectation (16.34) and find the maximum this function. Since from (16.51) an expression for the parameter ω is obtained as a function of the parameter b : ω = ω(b) we should find such the value of the parameter b for which the expected value of the supplier’s profit will take the maximum. If this problem can be solved then the contract will be coordinating. If there is no solution, an attempt could be made to form a conditionally coordinating contract or we can conclude that it is impossible to build a coordinating contract. The algorithm for designing a coordinating contract will be discussed in more details in the next section on the example of a buyback contract for a product with demand having a uniform distribution. Let us proceed to the construction of a coordinating contract for the second case, when the supplier pays at the end of the season b for each unsold unit of goods to the retailer, and then she can sell them at a salvage value v.

322

N. A. Zenkevich et al.

First we identify the extreme of the retailer profit expectation function. So we ∂E P rof calculate the derivative of the function [ ∂q r ] and equate it to 0:  

 ∂E P rof r = (p − b) 1 − Fξ (q) − (ω + cr − b) = 0. ∂q Hense:

p−ω−c r . Fξ qr0 = p−b

(16.52)

The stationary point of the retailer’s profit expectation function will satisfy the following equation: qr0 = Fξ−1



p − ω − cr p−b

 (16.53)

.

Find the second derivative of this function: ∂ 2 E[P rof r ] ∂q 2

= − (p − b) fξ qr0 .

Since the probability density function fξ (x) is always non-negative and, according to the model conditions p > b, the second derivative of the function will always be non-positive, thus, the found stationary point qr0 will be the maximum of the retailer’s profit expectation function: =

qr0

qr∗ =Fξ−1



p − ω − cr p−b

 (16.54)

.

At the second step, we find the extreme of the chain profit expectation function. Since the chain profit does not depend on the parameters of the contract, we can use the results obtained in (16.49): 0 ∗ qsc = qsc =Fξ−1



p−c p−v

 .

So the condition of the chain coordination for the second type of buyback contract based on (16.54) and (16.49): Fξ−1



p − ω − cr p−b

 =

Fξ−1



p−c p−v

 .

(16.55)

The expression for the wholesale price ω obtained from (16.55) will be the coordinating. Due to the strict monotony of the function Fξ we can write the

16 Conditionally Coordinating Contracts in Supply Chains

323

following expression for ω∗ from (16.55): ω ∗ = p − cr −

(p − b)(p − c) . p−v

(16.56)

At the third step, the expression for ω = ω(b) (16.56) is substituted into the expression of the supplier’s profit expectation and then we try to find maximum of this function of the variable b. If this task has no solution, then the contract is not coordinating one and we can try to design conditional coordination. The expected values of the profits functions of the chain and both players for two types of buyback contracts are summarized in Table 16.5. Since the assumption of zero salvage value is rather common in the literature [5, 28], then the table also shows the expressions for expected values of profits for v = 0.

16.6 A Model of Buyback Contract with a Uniformly Distributed Demand Let’s consider the design of coordinating buyback contract for the case when the random variable ξ is uniformly distributed on the interval [0, β], using the results of Sects. 16.2 and 16.4.  the  expressions   for the profits expectation of the chain  We write and the players E P rof sc , E P rof s , E P rof r for each of the two types of the buyback contract using the expression (16.21) for the expectation τ (Table 16.6). Consider the first type of buyback contract when the retailer receives compensation from the supplier b for each unsold unit and can realize these stocks at salvage value v per unit at the end of the season. The First Step In the first step use the result obtained in (16.43) and substitute the expression for the demand distribution function: qr∗ p − ω − cr = . β p−b−v So this is the expression for the volume of products at which the expected value of retailer’s profit will be maximum: qr∗ = β

p − ω − cr . p−b−v

(16.57)

The Second Step ∗: Obtain the coordination condition equating the values of qr∗ and qsc ω ∗ = cs +

p−c b. p−v

(16.58)

The retailer sells unsold products The suppliersells unsold products Zero salvage value: v=0

qω − cs q − b(q − E [τ ]) qω − cs q − b(q−E [τ ]) + v (q − E [τ ]) qω − cs q − b (q − E [τ ])

−cq + v (q − E [τ ]) +pE[τ ]

−cq + v (q − E [τ ]) +pE [τ ]

pE [τ ] − cq

Table 16.5 The profits expectation of the chain and players, buyback contracts     E P rof sc E P rof s

−q (ω + cr ) + pE [τ ] +b (q − E [τ ])

−q (ω + cr ) + pE [τ ] +b (q − E [τ ])

−q (ω + cr ) + pE [τ ] +(b + v)(q − E [τ ])

  E P rof r

324 N. A. Zenkevich et al.

Table 16.6 The profits expectation of the chain and players, uniformly distributed demand     E P rof sc E P rof s

q2 q2 +(v − c)q The retailer sells (p − v) q − 2β (ω − cs − b) q+b(q − 2β ) unsold products

q2 +(v − c)q The supplier sells (p − v) q − 2β (ω − cs − b + v) q+(b − v)(q − unsold products q2 2β )

(p − b) q −

q2 2β



(p − b − v) q −

  E P rof r



−(ω + cr − b − v) q −(ω + cr − b) q

q2 2β

16 Conditionally Coordinating Contracts in Supply Chains 325

326

N. A. Zenkevich et al.

The Third Step At the third step, we should try to find maximum of the supplier’s profit   expectation E P rof s (b) as a function of the variable b with known parameters qr∗, p, v, c, cr , cs and the given coordinating condition (16.58), where  E P rof s (b) is:    ∗  ∗  qr∗ 2 ∗ . E P rof s (b) = ω − cs − b qr + b qr − 2β

(16.59)

Let’s substitute in the expression (ω∗ − cs − b) the result (16.58) for ω∗ :

∗  p−c p−c ω − cs − b = cs + b − cs − b = b − b. p−v p−v

(16.60)

Based on (16.60) the expression (16.59) can be rewrite as: 

 E P rof s (b) =



     ∗2 p−c qr∗ q r ∗ ∗ ∗ p−c b − b qr + b qr − − = bqr . p−v 2β p−v 2β



q∗ p−c − 2βr must be non-negative. Otherwise the expected Note that the expression p−v value of the supplier’s profit will be negative and the contract will be uninteresting  to her. Thus, the function E P rof s (b) is a linear function of the variable b and has no local extreme. Since the function of the expected value of the supplier’s profit in case of the first type buyback contract will not have extreme on the interval v < b < p, it can be concluded that the buyback contract of the first type will not be unconditionally coordinating. The construction of a “conditional” coordination decision in a buyback contract model of the first type with a uniformly distributed demand will be considered in the next section. Let’s analyze the second type of buyback contract, when the retailer receives compensation for each returned unsold item in the amount of b from the supplier at the end of the season and then the supplier can sell the returned goods at salvage value v per unit. The First Step In the first step, we use the coordinating condition (16.54) and substitute the expression for the demand distribution function: qr∗ p − ω − cr = . β p−b Thus we obtain the expression for the order quantity at which the expected value of the retailer’s profit will be the maximum: qr∗ = β

p − ω − cr . p−b

(16.61)

16 Conditionally Coordinating Contracts in Supply Chains

327

The Second Step ∗ and get an expression for ω∗ : Equate the values of qr∗ and qsc ω ∗ = p − cr −

(p − b) (p − c) . p−v

(16.62)

The Third Step At the third step, we try to specify such a value of the parameter b, at which the ∗ , will be expected value of the supplier’s profit E P rof s (b) at the given qr∗ = qsc maximum:     ∗  ∗ qr∗ 2 ∗ E P rof s (b) = ω − cs − b + v qr + (b − v) qr − . (16.63) 2β Substitute in the expression (ω∗ − cs − b + v) results for ω∗ from (16.62):

 (b − v) (v − c) ω ∗ − cs − b + v = . p−v

Hence function of the expected value of the supplier’s profit will have the form:       (b − v) (v − c) ∗ p−c qr∗ 2 q∗ ∗ E P rof s (b) = qr +(b − v) qr − = qr∗ (b − v) − r . p−v 2β p−v 2β q∗

p−c As was shown for first-type buyback contract, the multiplier ( p−v − 2βr ) must be non-negative, because otherwise the supplier’s profit expectation will be negative and such contract will not be considered by her. The obtained function of the supplier’s profit expectation E P rof s (b) is a linear function of the variable b on the interval v < b < p and has neither a local maximum nor a minimum. Thus, the second-type buyback contract will also not be unconditionally coordinating. Proceed to finding the restriction for the contract parameters what can provide the conditional coordination of the chain in the model of a buyback contract.

16.7 Conditionally Coordinating Solution for the Buyback Contract with Uniformly Distributed Demand As already noted above, the behavior of players must fulfill the property of individual rationality. Suppose that the supplier is faced with a dilemma, which contract to offer to the retailer: the buyback contract or a wholesale price contract with the parameter ω∗ , where ω∗ satisfies coordinating condition (16.58) for the first-type and (16.62) for a second-type buyback contracts and qr∗ is the order quantity, maximizing the expected value of the retailer’s profit. The expected values

328

N. A. Zenkevich et al.

of the supplier’s profits for first-type and for   second-type of the buyback contracts E P rof bb1 s (ω∗ ) and E P rof bb2 s (ω∗ ) accordingly will be: ! E

P rofsbb1

   ∗

∗ " ∗ qr∗ 2 ∗ , ω = ω − cs − b q r + b q r − 2β

  ! "  ∗ qr∗ 2 bb2 ∗ ∗ ∗ . E P rofs (ω ) = ω − cs − b + v qr + (b − v) qr − 2β

(16.64)

(16.65)

If the supplier offers the retailer a wholesale price contract with the same wholesale price ω∗ , then assuming that the retailer chooses the order quantity qr∗ which maximize the expected values of his and chain’s profits the expression for the supplier’ profit expectation will be: ! "  wholesaleprice E P rof s = ω∗ − cs q ∗ = (ω − cs ) q ∗ .

(16.66)

Compare the expected value of the supplier’s profit in case of using the wholesale price contract and the buyback contracts. To do this we calculate the following differences: " ! !

"  q ∗2 wholesaleprice = ω ∗ − cs q ∗ − b − (ω − cs ) q ∗ = E P rofsbb1 ω ∗ − E P rofs 2β = −b

q ∗2 . 2β

! ! "

"  q∗2 wholesaleprice E P rofsbb2 ω ∗ −E P rofs = ω ∗ − cs q ∗ − (b − v) − 2β − (ω − cs ) q ∗ = − (b − v)

q ∗2 . 2β

Since v < b, both of these differences will be negative, that is, thus considering the same wholesale price ω∗ and the retailer’s order qr∗ , the expected value of supplier’s profit will always be higher in case of the wholesale price contract comparing to the buyback contract. Then we reject the assumption that the optimal order quantity qr∗ is similar for the buyback and wholesale price contract. In fact in case of the wholesale price contract the retailer only maximize its profit expectation while the chain profit expectation is not maximized. Let’s analyze under what constraints on the parameter b the supplier’s profit expectation for the buyback contract will be higher than for the wholesale price contract with the above assumptions.

16 Conditionally Coordinating Contracts in Supply Chains

329

Write down the expression for ω∗ if the coordinating condition is satisfied for first-type buyback contract: ω ∗ = cs +

p−c b. p−v

(16.67)

Express the parameter b from (16.67):

 (ω∗ − cs ) (p − v) b ω∗ = . p−c

(16.68)

The order quantity qrbb1∗ maximizing the expected value of profits for the chain and the retailer in case of the first-type buyback contract satisfies the ratio: qrbb1∗ =

β(p − ω∗ − cr ) β(p − c) ∗ = qsc . = (p − b − v) (p − v)

(16.69)

Substitute (16.68) and (16.69) in expression (16.64): ! " β (ω∗ − c ) s E P rofsbb1 (ω∗ ) = (p − c) . 2 (p − v)

(16.70)

For a second-type buyback contract if the coordinating condition is satisfied, then: ω ∗ = p − cr −

(p − b) (p − c) . p−v

(16.71)

Express the parameter b from (16.71):

 ω∗ − p + cr (p − v) . b(ω ) = p + p−c ∗

(16.72)

Either the retailer’s order quantity qrbb2∗ , which provides the maximum for both the retailer’s and the chain profit expectation, satisfies the ratio: qrbb2∗ =

β(p − ω∗ − cr ) β(p − c) ∗ = qsc . = (p − b) (p − v)

(16.73)

Substitute the expressions for (16.72) and (16.73) into (16.65): ! " β (p − c) (ω∗ − c ) s E P rofsbb2 (ω∗ ) = . 2(p − v)

(16.74)

Now consider the case of the wholesale contract where the wholesale price is still equal to ω∗ , and the retailer chooses only the order quantity from the condition of

330

N. A. Zenkevich et al. wholesaleprice∗

his profit expectation maximum, we denote it qr wholesaleprice∗

qr

=

:

β(p − ω∗ − cr ) . (p − v)

(16.75)

Substitute expressions (16.75) into (16.66): ! "  (p − ω∗ − cr )

wholesaleprice ∗ E P rofs (ω ) = β ω∗ − cs . (p − v)

(16.76)

In order to ensure a buyback contract conditionally coordinates the chain, the expected value of the supplier’s profit must be higher than her profit expectation for the wholesale price contract, that is: " ! " ! wholesaleprice ∗ (ω ) > 0, E P rofsbb1 (ω∗ ) − E P rofs

(16.77)

" ! " ! wholesaleprice ∗ (ω ) > 0. E P rofsbb2 (ω∗ ) − E P rofs

(16.78)

Let’s check whether inequality (16.77) holds under the assumptions made. Substitute the expression (16.70) and (16.76) for the functions of the expected value of the supplier’s profit in the left part of (16.77) : ! E

"

P rofsbb1 (ω∗ )

−E

!

"

wholesaleprice ∗  ω P rofs

  p+c (ω∗ − cs ) ∗ ω + cr − . =β (p − v) 2

(16.79) Since we assume ω∗ − cs >0 and p − v > 0, then expression (16.79) will be positive ∗ if (ω∗ + cr − p+c 2 ) is positive. From this condition, we obtain the restriction on ω : ω∗ >

p+c − cr . 2

(16.80)

Thus substitute the expression for ω∗ (16.67) into inequality (16.80): cs +

p+c p−c b> − cr . p−v 2

Then obtain the condition on b: b>

p−v . 2

(16.81)

The inequality (16.81) determines the condition on the parameter b, when the supplier’s profit expectation for the first-type buyback contract with uniformly

16 Conditionally Coordinating Contracts in Supply Chains

331

distributed demand is higher than for the wholesale price contract, therefore the conditional supply chain coordination will be carried out. Then check the fulfillment of (16.78), so substitute in it the expressions (16.74) and (16.76) for the functions of the supplier’s profit expectation:

∗   ! " ! " ω − cs p+c wholesaleprice ∗ E P rofsbb2 (ω∗ ) −E P rofs (ω ) = β − + ω∗ + cr . (p − v) 2

(16.82) ∗ The expression (16.82) will be positive if (− p+c 2 + ω + cr ) is greater than zero, that is:

ω∗ >

p+c − cr . 2

(16.83)

Substitute the expression for ω∗ (16.71) into inequality (16.83): p − cr −

p+c (p − b) (p − c) > − cr , p−v 2

Hence the restriction on b will be the following: b>

p+v . 2

(16.84)

If the inequality (16.84) for parameter b holds, then the expected value of the supplier’s profit for the second-type buyback contract in case of uniformly distributed demand will be higher in comparison with the wholesale price contract, that is, the conditional supply chain coordination will be implemented. Thus, we have shown that with the certain conditions on the contract parameter b, a buyback contract provides the supplier with a higher profit expectation compared to the wholesale price. Thus, both types of buyback contracts ensure the conditional supply chain coordination. Let’s analyze, due to what there is an increase in the expected value of the supplier’s profit in case of using the buyback contracts. To do this, write out the expressions for the order quantity maximizing the retailer’s profit expectation at a fixed wholesale price ω∗ for each contract: wholesale price∗

qr

=

p − ω ∗ − cr , p−v

qrbb1∗ =

p − ω ∗ − cr , p−b−v

qrbb2∗ =

p − ω ∗ − cr . p−b

332

N. A. Zenkevich et al.

While p − b − v < p − b and p − b − v < p − v, so wholesale price∗

< qrbb1∗ ,

wholesale price∗

< qrbb2∗.

qr

qr

Thus, assuming that the demand is uniformly distributed, the retailer optimal order quantity for the buyback contract will always be higher than for the wholesale price if the wholesale price is ω∗ . Thus it could be stated that the buyback contract motivates the retailer to make a larger order. If the conditions on parameter b (16.81) and (16.84) are fulfilled, an increase in the retailer’s order leads to an increase in the expected value of the supplier’s profit compared to the wholesale price contract, which, in turn, makes it possible to obtain conditional supply chain coordination through the buyback contract. Consider an example of finding a coordinating solution contract through buyback contracts for a chain of two participants, assuming that the demand for a product has a uniform distribution. Example 16.7.1 The data for Example 6.1 is taken from [19]. For a chain of two players, a supplier and a retailer, the following information is known: cs = $3, c = $3.3, p = $8, v = $1. The value of the retailer’s costs was added to this set cr = $0.3. The demand is a random variable ξ uniformly distributed on the interval [0; 200] (the original example assumed a normal distribution of demand with parameters m = 200 and σ = 50). Consider the design of the coordinating buyback contract for each of the two possible types of buyback contract (all calculation results in the example are rounded to tenths). The First-Type Buyback Contract We use the resulting expression (16.57) and calculate the optimal value of qr∗ for the retailer: qr∗ = β

8−ω−3 p − ω − cr = 200 . p−b−v 8−b−3

To achieve the chain coordination the volume of the optimal order quantity for the retailer must coincide with the volume of the optimal order quantity for the chain, that is qr∗ = 200

8−ω−3 = 134. 8−b−3

We obtain the coordinating condition for the parameter ω: ω ∗ = cs +

p−c b = 3 + 0.7b. p−v

16 Conditionally Coordinating Contracts in Supply Chains

333

The resulting expression for ω∗ ensures the simultaneous maximization of the profit expectation for both the retailer and the chain. For the higher supplier’s profit expectation in case of buyback contract in comparison with the wholesale price contract, condition (16.81) must be met: b > (p − v)/2 > 3.5, so ω∗ > 5.5. Thus, it will be beneficial for the supplier to use the buyback contract only at the wholesale price ω∗ > $5.5. The Second-Type Buyback Contract The optimal order quantity qr∗ for the second-type buyback contract is specified by the expression (16.61): qr∗ = β

8−ω−3 p − ω − cr = 200 . p−b 8−b

∗: And qr∗ = qsc

qr∗ = 200

8−ω−3 ∗ = qsc = 134. 8−b

Then obtain the coordinating condition on the parameter ω: ω ∗ = p − cr −

(p − b) (p − c) (8 − b) 4.7 = 8 − 0.3 − = 2.3 + 0.7b. p−v 7

To the expected value for the supplier’s profit for the second-type buyback contract has been higher than for the wholesale price contract, the condition (16.84) must be satisfied: b > (p + v)/2 > 4.5, ω∗ > 5.5. So it can be concluded that the buyback contract will be beneficial for the supplier to use only at the wholesale price ω∗ > $ 5.5. Table 16.7 provides examples of the profits expectation of the chain participants calculated for different values of the parameters (ω∗ , b(ω∗ )), that satisfy the coordination condition. As can be seen from the table the total supply chain profit does not depend on the contract’s parameters however the parameters determine how the total profit is allocated between the players. Each column of the table presents calculations for the similar parameter ω∗ for the wholesale price contract, taking into account the retailer’s profit maximization, and for the buyback contracts of both types in case of the chain coordination conditions is satisfied. As was shown earlier, the supplier’s profit in the buyback contract becomes higher (and therefore the contract becomes preferable for her) in the case of a wholesale price above $5.5.

334

N. A. Zenkevich et al.

Table 16.7 Dependence of the profits expectation from the parameters (ω∗ , b(ω∗ )) ω* 3.1 The wholesale price contract: qr∗ 131   E P rof r 302   E P rof s 13   E P rof sc 315

4

5.5

6

106

63

49

196

69

41

7

0

106

157

146

80

13

301

226

187

87

13

The first-type buyback contract, coordination condition satisfied: ∗ qr∗ = qsc 134 134 134 134 b





E P rof r   E P rof s   E P rof sc

  E P rof r   E P rof s   E P rof sc

7.6

20

3

134

134

0.1

1

3.7

4.2

6

6.9

309

248

148

114

47

7

67

168

201

269

309

316

316

316

316

316

316

134

134

The second-type buyback contract, coordination condition satisfied: ∗ qr∗ = qsc 134 134 134 134 b

7

7

1.1

2.5

4.7

5.5

7

309

248

148

114

47



7.9

7

67

168

201

269



316

316

316

316

316

316

16.8 Conclusion Supply chain coordination is an important topic in supply chain management and can be carried out through various mechanisms, including the integration of the information systems of the players, centralization of planning processes and the contracts usage. All these mechanisms should align the actions of all chain members in order to ensure the maximum chain performance and, as a consequence, provide better results for each of them. The paper considers the problem of two-echelon supply chain coordination through two widely used in practice contract types: sales rebate and buyback. The contract coordinates the supply chain if the individual and collective rationality properties are fulfilled. This means that such a contract must be a Nash equilibrium and possess a Pareto-optimal property. The research considers the Stackelberg model in supply chain under condition of fixed retail price and stochastic demand. This study contributes in theory and practice in several ways. First, the authors demonstrate in clear way that the sales rebate and buyback contracts do not coordinate supply chain in unconditional way while they do not provide such a set of incentives that could ensure maximum profit expectations for the chain and both of its players. However with certain conditions on contract parameters the individual rationality could be obtain for both players.

16 Conditionally Coordinating Contracts in Supply Chains

335

Second, the algorithm for the conditionally coordinating problem solving is suggested. This algorithm includes finding the optimal volume order quantity for the supplier; finding the maximum point of the expected value of the supply chain profit function and receiving the coordinating condition of the contract; and, finally, finding the restrictions on the contracts parameters due to which the conditionally coordination is achieved. Conditionally coordinating contract provides the maximum of the retailer’s and the chain’s profit expectations and also ensures the higher supplier’s profit expectation. Third, the proposed algorithm was implemented for the case of uniformly distributed demand both for sales rebate and buyback contracts. The examples are presented and demonstrate that sales rebate and buyback have different mechanisms of profits increase. While the sales rebate motivates the retailer to sell more, the buyback stimulates the retailer to order more products. Since there may be several sets of parameters ensuring the conditional supply chain coordination with different split of the supply chain profit between players, the choice of them may become a compromise reached during the negotiation process. Further studies of supply chain coordination through rebate and buyback contracts could be developed in several directions. The coordinating problem can be considered and empirically justified for other types of product demand distribution. Also, while this paper deals with the case of a fixed retail price, the case of a changing retail price when the demand depends on the retail price p may be of great practical interest. And, finally, the usage of several conditions in one contract may be another promising area of research, e.g. the coordination through rebate contract with buyback can be considered.

References 1. Becker-Peth, M., Katok, E., Thonemann, U.W.: Designing buyback contracts for irrational but predictable newsvendors. Manag. Sci. 59(8), 1800–1816 (2013) 2. Bernstein, F., Federgruen, A.: Decentralized supply chains with competing retailers under demand uncertainty. Manag. Sci. 51(1), 18–29 (2005) 3. Cachon, G.P., Netessine, S.: Game theory in supply chain analysis. In: Simchi-Levi, D., Wu, S.D., Shen, Z.M. (eds.) Handbook of Quantitative Supply Chain Analysis: Modeling in the E-Business Era. Kluwer Academic Publishers, Norwell (2004) 4. Cachon, G.P.: Supply chain coordination with contracts. In: Kok, A.G., Graves, S.C. (eds.) Handbooks in Operation Research and Management Science. Elsevier B.V., Amsterdam (2003) 5. Cachon, G.P., Lariviere, M.: Supply chain coordination with revenue sharing contracts: strengths and limitations. Manag. Sci. 51(1), 30–44 (2005) 6. Chiu, C.H., Choi, T.M., Li, X.: Supply chain coordination with risk sensitive retailer under target sales rebate. Automatica 47(8), 1617–1625 (2011) 7. Chiu, C.H., Choi, T.M., Tang, C.S.: Price, rebate, and returns supply contracts for coordinating supply chains with price-dependent demands. Prod. Oper. Manag. 20(1), 81–91 (2011) 8. Chiu, C.H., Choi, T.M., Yeung, H.T., Zhao, Y.: Sales rebates in fashion supply chains. Math. Probl. Eng. 2012, 1–19 (2012) 9. Donohue, K.L.: Efficient supply contracts for fashion goods with forecast updating and two production modes. Manag. Sci. 46(11), 1397–1411 (2000)

336

N. A. Zenkevich et al.

10. Emmons, H., Gilbert, S.M.: Note. The role of returns policies in pricing and inventory decisions for catalogue goods. Manag. Sci. 44(2), 276–283 (1998) 11. Genc, T.S., Giovanni, P.: Optimal return and rebate mechanism in a closed-loop supply chain game. Eur. J. Oper. Res. 269, 661–681 (2018) 12. Heydari, J., Asl-Najafi, J.: A revised sales rebate contract with effort-dependent demand: a channel coordination approach. Int. Trans. Oper. Res. (2017). https://doi.org/10.1111/itor. 12556 13. Hou, J., Zeng, A.Z., Zhao, L.: Coordination with a backup supplier through buyback contract under supply disruption. Transp. Res. E Logist. Transp. Rev. 46(6), 881–895 (2010) 14. Huang, X., Gu, J.W., Ching, W.K., Siu, T.K.: Impact of secondary market on consumer return policies and supply chain coordination. Omega 45, 57–70 (2014) 15. Krishnan, H., Kapuscinski, R., Butz, D.A.: Coordinating contracts for decentralized supply chains with retailer promotional effort. Manag. Sci. 50(1), 48–63 (2004) 16. Li, L., Whang, S.: Game theory models in operations management and information systems. In: Chatterjee K., Samuelson, W.F. (eds) Game Theory and Business Applications. International Series in Operations Research and Management Science. Springer, Boston (2002) 17. Liao, C.-N.: A joint demand uncertainty, sales effort, and rebate form in marketing channel. J. Stat. Manag. Syst. 12(1), 155–172 (2013) 18. Muzaffar, A., Deng, S., Malik, M.N.: Contracting mechanism with imperfect information in a two-level supply chain. Oper. Res. Int. J. (2017). https://doi.org/10.1007/s12351-017-0327-4 19. Pasternack, B.A.: Optimal pricing and return policies for perishable commodities. Mark. Sci. 4(2), 166–176 (1985) 20. Petrosyan, L.A., Zenkevich, N.A., Shevkoplyas E.V.: Game Theory. SPb., BKHV-Petersburg (2012) 21. Saha, S.: Supply chain coordination through rebate induced contracts. Transp. Res. E Logist. Transp. Rev. 50, 120–137 (2013) 22. Sainathan, A., Groenevelt, M.: Vendor managed inventory contracts - coordinating the supply chain while looking from the vendor’s perspective. Eur. J. Oper. Res. 272(1), 249–260 (2018) 23. Taylor, T.: Coordination under channel rebates with sales effort effect. Manag. Sci. 48(8), 992– 1007 (2002) 24. Tsay, A.A.: Managing retail channel overstock: markdown money and return policies. J. Retail. 77(4), 457–492 (2001) 25. Wang, C.X.: A general framework of supply chain contract models. Supply Chain Manag. Int. J. 7(5), 302–310 (2002) 26. Wong, W.K., Qi, J., Leung, S.Y.S.: Coordinating supply chains with sales rebate contracts and vendor managed inventory. Int. J. Prod. Econ. 120(1), 151–161 (2009) 27. Xiao, T., Shi, K., Yang, D.: Coordination of a supply chain with consumer return under demand uncertainty. Int. J. Prod. Econ. 124(1), 171–180 (2010) 28. Xiong, H., Chen, B., Xie, J.: A composite contract based on buyback and quantity flexibility contracts. Eur. J. Oper. Res. 210(3), 559–567 (2011) 29. Yang, Y., Cao, E., Lu, K.L., Zhang, G.: Optimal contract design for dual-channel supply chains under information asymmetry. J. Bus. Ind. Mark. 32(8), 1087–1097 (2017)