Threshold Decision-making in Clinical Medicine: With Practical Application to Hematology and Oncology (Cancer Treatment and Research, 189) 3031379926, 9783031379925

This book aims to provide threshold models to help physicians to make optimal diagnostic, therapeutic and predictive dec

121 74 4MB

English Pages 161 [156] Year 2023

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Threshold Decision-making in Clinical Medicine: With Practical Application to Hematology and Oncology (Cancer Treatment and Research, 189)
 3031379926, 9783031379925

Table of contents :
Preface
Contents
List of Figures
1 Evidence and Decision-Making
1.1 Introduction
1.2 Evidence-Based Medicine (EBM)
1.3 Theories of Decision-Making
1.4 Rational Decision-Making
1.5 Threshold Model of Decision-Making: The Linchpin Between EBM and Decision-Making
References
2 Making Decisions When No Further Diagnostic Testing is Available
2.1 Introduction
2.2 Expected Utility Theory Threshold Model
2.3 Threshold Model When the Diagnosis Is Not Certain, and Outcomes Are Not Certain
2.4 Threshold Models When the Diagnosis Is Certain, but Health Outcomes (Utilities) Are Uncertain
References
3 Making Decisions When no Further Diagnostic Testing is Available (Expected Regret Theory Threshold Model)
3.1 Introduction
3.2 Acceptable Regret
References
4 Decision-Making When Diagnostic Testing is Available
4.1 Introduction
4.2 Threshold Modeling When Diagnostic Testing is Available
References
5 Formulating Management Strategies Using Fast-and-Frugal Trees (A Decision Tool to Transform Clinical Practice Guidelines and Clinical Pathways into Decision Support at the Point of Care)
5.1 Introduction
5.2 Fast-and-Frugal Trees
References
6 Using Decision Curve Analysis to Evaluate Testing and/or Predictive Modeling
6.1 Introduction
6.2 Illustrative Examples
References
7 Hybrid and Dual-Processing Threshold Decision Models
7.1 Hybrid Threshold Model
7.2 The Dual-Processing Threshold Model
References
8 Which Threshold Model?
8.1 Introduction
8.1.1 A Brief Review of the Principles of Medical Decision-Making
8.1.2 Different Theoretical Models Generate Different Recommendations
8.1.3 Contemporary Clinical Practice Represents an Environment Favoring the Overuse of Diagnostic and Treatment Interventions
8.1.4 Simple Versus Complex Models
8.1.5 Adhering to Practical Wisdom
References
9 Medical Decision-Making and Artificial Intelligence
9.1 Introduction
9.2 Machine Learning
9.3 Machine Learning and Clinical Care
9.4 AI Challenges and Limitations
9.5 Statistical and Decision-Theoretical View of Artificial Intelligence Modeling
9.6 Conclusions
References
Appendix
A.1 Expected Utility Theory
A.1.1 A Decision About Treatment (Rx) Versus No Treatment (NoRx): When the Diagnosis (Clinical Event) Is Not Certain and No Further Diagnostic (dx) Test Is Available
A.1.2 Rx Versus NoRx
A.1.3 Rx1 Versus Rx2: When Diagnosis Is Not Certain and No Further dx Test Is Available
A.1.4 Rx1 Versus Rx2: When Diagnosis Is Certain and No Further dx Test Is Available
A.1.5 Rx Versus NoRx or Rx1 Versus Rx2: Number Needed to Treat and Number Needed to Harm
A.1.6 Rx Versus NoRx: When Diagnosis Is Not Certain and a dx Test Is Available
A.2 Expected Regret Theory (ERT)
A.2.1 Rx Versus NoRx: When Diagnosis Is Not Certain and No Further dx Test Is Available
A.2.2 Rx Versus NoRx: When Diagnosis Is Certain and No Further dx Test Is Available
A.2.3 Rx1 Versus Rx2: When Diagnosis Is Not Certain and No Further dx Test Is Available
A.2.4 Rx Versus NoRx: When Diagnosis Is Not Certain and a dx Test Is Available
A.2.5 Acceptable Regret Rx Versus NoRx: When the Diagnosis Is Not Certain, and a Diagnostic Test Is Available
A.3 Fast-And-Frugal Trees (FFT and FFTT): An Example
A.4 Complex Versus Simple Models
Glossary

Citation preview

Cancer Treatment and Research Series Editor: Steven T. Rosen

Benjamin Djulbegovic Iztok Hozo

Threshold Decision-making in Clinical Medicine With Practical Application to Hematology and Oncology

Indexed in PubMed/Medline

Cancer Treatment and Research Volume 189 Series Editor Steven T. Rosen, Duarte, CA, USA

This book series provides detailed updates on the state of the art in the treatment of different forms of cancer and also covers a wide spectrum of topics of current research interest. Clinicians will benefit from expert analysis of both standard treatment options and the latest therapeutic innovations and from provision of clear guidance on the management of clinical challenges in daily practice. The research-oriented volumes focus on aspects ranging from advances in basic science through to new treatment tools and evaluation of treatment safety and efficacy. Each volume is edited and authored by leading authorities in the topic under consideration. In providing cutting-edge information on cancer treatment and research, the series will appeal to a wide and interdisciplinary readership. The series is listed in PubMed/Index Medicus.

Benjamin Djulbegovic · Iztok Hozo

Threshold Decision-making in Clinical Medicine With Practical Application to Hematology and Oncology

With assistance from David Lizárraga

Benjamin Djulbegovic Hematology Stewardship Program Division of Hematology/Oncology Department of Medicine Medical University of South Carolina Charleston, SC, USA

Iztok Hozo Department of Mathematics Indiana University Northwest Gary, IN, USA

Assisted by David Lizárraga Research Assistant, Department of Computational and Quantitative Medicine Beckman Research Institute, City of Hope Duarte, CA, USA

ISSN 0927-3042 ISSN 2509-8497 (electronic) Cancer Treatment and Research ISBN 978-3-031-37992-5 ISBN 978-3-031-37993-2 (eBook) https://doi.org/10.1007/978-3-031-37993-2 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

To Mira and Stela for tolerating hundreds of video calls and time taken away from them in our never-ending passion for using mathematics to improve the practice of medicine

Preface

The American College of Cardiology and American Heart Association (ACC/ AHA) recommends that people should be treated with statins if their risk for heart disease ≥ 7.5% over 10 years. Hematologists/oncologists administer one type of chemotherapy to patients whose bone marrow shows ≥ 20% (for a presumed diagnosis of acute leukemia) but another type of treatment if the blast number is < 20% (for an assumed diagnosis of myelodysplastic syndrome). American Society of Oncology recommends treatment with growth factors if the risk of patients developing febrile neutropenia after chemotherapy ≥ 20%. But why not treat at 7.4%, or 19%, in the case of statins and treatment for leukemia or administration of growth factors, respectively? The problem we just described is related to the ancient epistemological quandary known as the Sorites paradox,1 also known as “little-by-little arguments”. Sorites in the Greek language means heap; paradox consists of assessing difficulties in defining clear boundaries between the quantities of interest—at which point the collection of grains becomes large enough to be called a heap? The Sorites paradox abounds in medicine, as in everyday life. It is a consequence of a relationship between scientific evidence (that exists on a continuum of credibility) and decision-making (that is, categorical, yes/no exercises, as decisions, have to be made). In this book, we espoused the view that the threshold model (a topic of this book) represents a rational (and pragmatic) way to address the Sorites paradox.2 The specificity needed to deal with the Sorites paradox could not be accomplished within classic decision theory, which often—with good theoretical reasons, we might add—promotes subjective utilities. However, during the last quarter of the century, it has become obvious that medical decision-making needs to be informed by reliable evidence and specific information about the effects of our health interventions. Therefore, to enable the application of the threshold model at the bedside, we linked two important fields in clinical medicine: evidence-based

1

https://en.wikipedia.org/wiki/Sorites_paradox; https://plato.stanford.edu/entries/sorites-par adox/. 2 While a number of advanced theoretical avenues (such as fuzzy theory, three-valued logic, supervaluation, etc.) have been attempted to deal with the Sorites paradox, none of them penetrated clinical medicine (see Chap. 1). vii

viii

Preface

medicine (EBM) and decision-making theories. In doing so, our main goal is to promote the techniques that can be used in real time, at the point of care, to provide the right care, to the right patient, in the right setting, and at the right time. We kept theory to a minimum and illustrated the use of the threshold models with plenty of examples. To facilitate reading each chapter independently, we repeat pertinent concepts in each chapter as needed. Nevertheless, Chaps. 1–3 and the Appendix provide the most technical details for those readers wishing to understand all mathematical minutiae’s. While most examples are from hematology and oncology (where we have the most expertise), the models apply to all clinical conditions. This book is a product of more than 25 years of work on contribution to advances in decision analyses and EBM by Drs. Djulbegovic and Hozo. They are joined by our research assistant Dr. Lizárraga who helped us convert our life-time effort into this book in timely fashion. None of this would, however, be possible without the support, understanding, and sacrifice of our families, who patiently tolerated our hundreds of video calls, and time taken away from them. Nevertheless, if what we wrote helps improve the decision-making of clinicians, trainees, and policy-makers resulting in better patient care, we will consider the effort well worth it. References Djulbegovic B, Hozo I, Mandrola J. Sorites paradox and persistence in overuse and underuse in healthcare delivery services. J Eval Clin Pract 2023;29(6):877–879 Charleston, USA Gary, USA March 2023

Benjamin Djulbegovic Iztok Hozo

Contents

1 Evidence and Decision-Making . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Evidence-Based Medicine (EBM) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Theories of Decision-Making . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4 Rational Decision-Making . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.5 Threshold Model of Decision-Making: The Linchpin Between EBM and Decision-Making . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1 1 2 4 11 14 22

2 Making Decisions When No Further Diagnostic Testing is Available . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Expected Utility Theory Threshold Model . . . . . . . . . . . . . . . . . . . . . . . 2.3 Threshold Model When the Diagnosis Is Not Certain, and Outcomes Are Not Certain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 Threshold Models When the Diagnosis Is Certain, but Health Outcomes (Utilities) Are Uncertain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

32 36

3 Making Decisions When no Further Diagnostic Testing is Available (Expected Regret Theory Threshold Model) . . . . . . . . . . . . . 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Acceptable Regret . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

39 39 49 51

4 Decision-Making When Diagnostic Testing is Available . . . . . . . . . . . . . . 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Threshold Modeling When Diagnostic Testing is Available . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

53 53 53 64

5 Formulating Management Strategies Using Fast-and-Frugal Trees (A Decision Tool to Transform Clinical Practice Guidelines and Clinical Pathways into Decision Support at the Point of Care) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

67 67

25 25 25 26

ix

x

Contents

5.2 Fast-and-Frugal Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

68 74

6 Using Decision Curve Analysis to Evaluate Testing and/or Predictive Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 Illustrative Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

77 77 77 84

7 Hybrid and Dual-Processing Threshold Decision Models . . . . . . . . . . . . 7.1 Hybrid Threshold Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2 The Dual-Processing Threshold Model . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

85 85 88 91

8 Which Threshold Model? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.1.1 A Brief Review of the Principles of Medical Decision-Making . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.1.2 Different Theoretical Models Generate Different Recommendations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.1.3 Contemporary Clinical Practice Represents an Environment Favoring the Overuse of Diagnostic and Treatment Interventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.1.4 Simple Versus Complex Models . . . . . . . . . . . . . . . . . . . . . . . . . . 8.1.5 Adhering to Practical Wisdom . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

93 93

9 Medical Decision-Making and Artificial Intelligence . . . . . . . . . . . . . . . . . 9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.2 Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.3 Machine Learning and Clinical Care . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.4 AI Challenges and Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.5 Statistical and Decision-Theoretical View of Artificial Intelligence Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

101 101 101 102 103

93 94

95 97 98 98

107 108 108

Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129

List of Figures

Fig. 1.1

Three principles of evidence-based medicine (EBM). Principle 1 states that “not all evidence is created equal”, which underpins the creation of a hierarchy of the quality (certainty) of evidence. Figure 1.1A (top row) displays the old hierarchy of evidence by the study type (now considered obsolete); Fig. 1.1B (top row) shows GRADE hierarchy of the quality of evidence, which combines the study design with factors that can affect the quality of evidence (Table 1.1). Principle 2 states that claims should be based on all (totality of) relevant and available studies, often synthesized using techniques of meta-analysis (middle row). Principle 3 states that evidence alone is necessary but not sufficient for making optimal clinical decisions. Multiple factors affect decisions (Fig. 1.1A, bottom row), but the most important is the integration of benefits and harms of health interventions and consideration of the patient’s values and preferences to arrive at the threshold above which one intervention is favoured over another (Fig. 1.1B, bottom row) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

3

xi

xii

Fig. 1.2

List of Figures

Basic outline of a threshold model (A) is where a decision is made (illustrated by a square) to either treat (Rx) or not treat (NoRx). After a decision is made, there exists a probability (p) (denoted as pD in the graph) where the patient is either diseased (D+;) or not (D−; 1 − pD). Consequences of the decision can be measured as health outcomes (utilities; U, which may relate to morbidity, mortality, quality-adjusted life years, etc.). The expected utility (EU) of administering the treatment is calculated as (p * U(Rx, D+) + (1 − p) * (U(Rx, D−)) and withholding treatment as (p * U(NoRx, D+) + (1 − p) * (U(NoRx, D−)). Expected utility theory dictates that the decision-maker should choose to either treat or not based on whichever has a higher EU. The threshold at which the expected utilities are the same (pt ) means that when pD is above pt , a decision-maker should administer the treatment, and when pD is below pt , a decision-maker should withhold treatment. The generic net benefits [B = U(Rx, D+) − U(NoRx, D+)] and harms [H = U(NoRx, D−) − U(Rx, D−)] of treatment can be used to estimate pt (see calculation in A and a graphical illustration in B; red line). For example, in panel B, a high treatment benefits/harms ratio of 36 (blue line) refers to treatment for a patient for suspected tuberculosis. Such a high B/H ratio means that treatment should be administered even if the probability of suspected tuberculosis is very low (i.e., in this case, as long as pD > pt > 2.7% the treatment should be administered because benefits so heavily outweigh its harms). Ideally, data informing the B/H ratio should come from high-quality research synthesis using techniques of systematic reviews and meta-analysis (presented as small forest plots in panel 1A). Figure 1.2C shows that when physicians were asked to determine their treatment thresholds using intuition and regret, the decision threshold dramatically differed from the EUT threshold (see text) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

16

List of Figures

Fig. 1.3

Fig. 2.1

Fig. 2.2

Fig. 3.1

An irreducible uncertainty, inevitable error, unavoidable injustice: the inverse relationship between underuse and overuse. False positive (FP) and false negative (FN) errors are inevitable and inextricably linked: as FP decreases, FN increases, and vice versa; outside of a perfect test (100% sensitivity and specificity), it is impossible to simultaneously decrease FP (regret of commission) and FN errors (regret of omission). The right threshold is a function of our values. Figure adapted from K. Hammond. Human Judgment and Social Policy. Oxford Press, 1996; Djulbegovic & Ash. JAMA 2011; 305:2005–2006 (May 18) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A decision tree modeling a decision to treat or not a patient with a probability of disease (pD); 1 − pD = probability of non-disease. Utilities are enclosed in rectangles where U 1 represents health outcome (e.g., survival, mortality, morbidity) of a diseased patient who was treated; U 2 represents an outcome of a non-diseased patient who was treated; U 3 refers to an outcome for a diseased patient who was not treated, and U 4 denotes an outcome of a non-diseased patient who was not treated. Panel a) shows probabilities and utilities that are independent of each other; that is, it refers to the setting when both disease diagnosis and health outcomes are not certain, while panel b) refers to a clinical situation when the diagnosis is certain (pD = 1), but health outcomes are not. The tree is solved by multiplying each utility by a given probability and summing it over to the corresponding tree branch. The strategy with a higher EU is preferred. To calculate the threshold probability at which benefits and harms of each strategy is equal, we solve the tree for either: a) pD or b) the corresponding morbidity or mortality used to define the given utilities. Note that the “No Rx ” branch may refer to the alternative treatment, in which case a decision dilemma is whether to choose treatment 1 over treatment 2 (see Boxes 2.1 and 2.3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Threshold probability of disease (pt; dotted line) is shown as a) function of the generic definition of net benefits and net harms (Eq. 2.5) and b) the number needed to treat for one patient to benefit or be harmed (NNTB or NNH, respectively) (Eq. 2.12). Note how pt displays a different behavior when different EBM metrics were used for its calculation instead of generic net benefits and net harms . . . . . . . Dual visual analog scale for elicitation of regret . . . . . . . . . . . . . . .

xiii

19

27

29 42

xiv

Fig. 4.1

List of Figures

A horizontal bar showing the probability of disease (pD) on a horizontal axis for values between 0 and 1, where three decisions could be made: observation only without testing or treatment (No Test; No Rx), diagnostic testing that will determine whether to administer treatment or not (Perform testing), and administering the treatment without testing (No Test; Rx). Each decision is clearly separated by one of two thresholds (both indicated by dashed lines): the testing threshold (ptt ), or the treatment threshold (prx ). Note that the difference between ptt and prx (i.e., the width of the “Perform testing” section), is affected by the accuracy or harms of the testing, such that more accurate or lower-risk tests increase the range of probabilities that would result in testing (i.e., result in a wider “Perform testing” section); on the other hand, less accurate or higher risk tests decrease the range of probabilities that would result in testing (i.e., result in a narrower “Perform testing” section). In addition, ordering a diagnostic test depends on the benefits (B) and harms (H) of the treatment under consideration: in general, the higher the B/H ratio, the lower the treatment threshold (see also Chap. 2). Note that a diagnostic test should never be ordered if the harm of treatment is greater than or equal to its benefit. ptt and prx thresholds refer to the prior (pretest) probability of disease. If pD < ptt it is guaranteed that the post-test probability of disease is always below the treatment threshold pt introduced in Chap. 2. This would be the case even if the test is positive. If pD > prx it is guaranteed that the post-test probability of disease is always above the treatment threshold pt (see Chap. 2). This would occur even if the test is negative . . . . . . . . .

54

List of Figures

Fig. 4.2

Fig. 4.3

A decision tree conceptualizing three decisions (treat, perform a diagnostic test, or withhold test/treatment) and all outcomes resulting from a chance event. The small black square represents a decision, and the small black circular nodes represent events due to chance (i.e., test results or a probability of disease, pD). Utilities (U 1–8 ) for each outcome are shown enclosed in rectangles. Each is represented in terms of disutilities by subtracting the effects of morbidity or mortality (M), treatment effect (RRR), harms of treatment (H rx ), and harms of testing (H te ) from perfect health (customarily set at 1; see Glossary). S is the test sensitivity (the frequency of true positives, TP); Sp is the test specificity (or the frequency of true negatives, TN); FN is the frequency of false negatives (FN = 1 − S)' FP is the frequency of false positives (FP = 1 − Sp); M rx = morbidity or mortality on treatment; RRR refers to the relative risk reduction; M rx = M · (1 − RRR) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A nomogram that can be used to calculate the threshold probability (blue points and line) given a likelihood ratio (LR; gray points and line) and a benefit-to-harm ratio (B/ H; black points and line). To calculate a testing threshold (ptt ), a positive LR is used, and to calculate (prx ), a negative LR is used. To use the nomogram, connect the points for a given LR and B/H and continue this line to the threshold probability scale (a dashed red line is shown as an example in this figure). In this example, we assumed B/H = 3 and LR− = 0.5, which converts into the treatment threshold probability of 40%. That is, we should treat if the probability of the disease exceeds 40%. When testing is not available (see Chap. 2 for examples), LR± = 1. Note that we encourage calculating B/H using evidence-based measures as per above (i.e., calculate B = M · E − H rx ) and use H rx as the absolute risk of treatment harm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

xv

55

58

xvi

Fig. 4.4

Fig. 4.5

Fig. 4.6

List of Figures

A horizontal bar showing the probability of disease (pD) on a horizontal axis (not to scale) for probabilities of disease between 0.0 and 1.0, illustrating decision-making whether to administer alloSCT with or without MRD testing: observation only and no treatment without testing (No test, No Rx), MRD diagnostic testing that will determine whether to administer treatment or not (Perform testing), and administering treatment without testing (No test, Rx). The figure shows that if the estimated risk of pD (AML recurrence in this case) is below 13.6%, then the patient should be observed without testing and treatment. If the estimated pD is greater than 69.3%, then alloSCT should be administered without further testing. MRD testing should be done if the estimated probability of AML recurrence is between the testing threshold (ptt = 0.136) and the treatment threshold (prx = 0.693) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Nomogram showing the threshold probability for the likelihood ratio and net benefits-to-net harms ratio (B/H ratio) with the intersecting dashed red line for a) ptt and b) prx . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A horizontal bar showing the probability of disease (pD) on a horizontal axis (not to scale) for probabilities of disease between 0.0 and 1.0, illustrating decision-making whether to administer DOACs with or without CTPA testing: observation only and no treatment without testing (No test, No Rx), CTPA diagnostic testing that will determine whether to administer treatment or not (Perform testing), and administering treatment without testing (No test, Rx). The figure shows that if the estimated risk of pD (PE, pulmonary embolism) is below 0.23%, then the patient should be observed without testing and treatment. If the estimated probability of PE is greater than 60.6%, then DOACs should be administered without further testing. CTPA testing should be done if the estimated probability of VTE recurrence is between the testing threshold (ptt = 0.0023) and the treatment threshold (prx = 0.606) . . . . . . . . . . . . . . . . . . . . .

60

63

64

List of Figures

Fig. 5.1

Fig. 5.2

Signal detection theory and fast frugal trees (FFTs). Note that the decision criterion (x c ) represents a binary decision and can result in more noise or signal to generate a correct rejection/miss or a hit/false alarm, respectively. That is, moving the decision criterion (x c ) to the left increases the probability of detecting true positives (hit, signal) but at the expense of an increase in false positives (false alarms). Similarly, moving the decision criterion to the right increases the detection of true negatives (correct rejection) but at the expense of increases in false negatives (misses). A desire to increase true positives is referred to in the literature as “liberal”, while aiming to minimize false positives is often considered a “conservative” strategy. The figure also shows how different permutations of FFT relate to better detection of a signal (true positives) or true negatives (correct rejections). In this case, FFT consists of three binary cues, resulting in either noise (n) or signal (s) defined by the first two cues, as the last cue has two exits (FFTss , FFTsn , FFTns , or FFTnn ). The dashed arrows roughly indicate where each FFT relates to noise and signal detection. Of the four FFTs, FFTss has the most liberal decision criterion (leading to an increase in detection of the signal, i.e., sensitivity but at the expense of an increase in false positives). FFTnn is the most conservative (i.e., aimed to decrease false alarms but at the expense of an increase in false negatives), while FFTsn and FFTns are less extreme, where FFTsn is more liberal compared to FFTns . The figure is based on [6] [with permission]. Note that throughout the text, we interchangeably refer to “s” as “y” (yes) and “n” as “n” (no) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Each cue (with its corresponding threshold) within the FFT is ranked based on weighted accuracy (2a); 2b) shows different permutations to determine which FFT optimizes sensitivity and specificity (2b). Note that because FFTs utilize the most optimal cues (i.e., EGFR, Non-smoker status, and age ≤ 63 years), we obtain four permutations of this tree (FFTyy , FFTyn , FFTnn , FFTny ), where the tree with the greatest tradeoff between sensitivity and specificity (FFTyn ) was selected . . . . . . . . . . . . . . . . . . . . . . . . . .

xvii

69

71

xviii

Fig. 5.3

Fig. 6.1

List of Figures

Fast-and-frugal tree (FFT) to determine whether to treat patients with advanced NSCLC with either non-targeted (i.e., chemotherapy and/or immunotherapy) versus targeted (i.e., Tyrosine Kinase Inhibitors; TKIs) therapy. The most optimal structure of this FFT (FFTyn ) was selected based on the analysis shown in Fig. 5.2b. Note that P(D+ |T −) indicates the probability of selecting a non-targeted therapy (“Non-Target”), and P(D+ |T +) indicates the probability of selecting a TKI. The total number of true positives (TP), false positives (FP), true negatives (TN), and false negatives (FN) is shown under each decision (i.e., either “yes” or “no”) within the tree. [Note that this FFT was derived from individual-patient data, but it is also possible to construct FFT based on the aggregate data as long as the prevalence of the condition we want to predict and aggregate data on the cues’ sensitivity and specificities are known] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A simplified illustration of a decision curve analysis. Note that Model B (solid red line) has the highest net benefit (“benefit”) over the other strategies (i.e., treat none, test, treat all, model A) for the entire range of preferences (thresholds) and would be universally recommended for all patients without the need to elicit the preferences for each patient. Dotted line = treat none; dashed line = diagnostic test; solid black line = treat all; solid blue line = Model A (i.e., a predictive model comprised of patient demographics, diagnostic testing, biomarker data, or other laboratory results); solid red line = Model B (i.e., an alternative predictive model that may have included different covariates that the Model A). [The Y axis “Benefit” represents the “net benefits” of DCA and should not be confused with other definitions of net benefits defined in Chapter 2 and elsewhere.]. Adapted from an original figure published by [7] [with permission] . . . . .

73

80

List of Figures

Fig. 6.2

Fig. 6.3

Fig. 7.1

A decision model showing a three-choice dilemma: “treat all”, “treat none”, or treat or not based on model results (i.e., “model”) with utilities expressed based on a expected utility or b expected regret. Each branch of the tree can result in either a patient with disease (D+) or not (D−). qi refers to the model predicted probability of disease, pi is the actual probability of disease, and RRR is the relative risk reduction of the treatment. T, threshold probability for treatment; D−, the disease is absent; RRR, relative risk reduction of treatment. U1 to U4—utilities (outcomes) associated with each management strategy. Regret is computed as the difference in utilities of the action taken and the action that, in retrospect, should have been taken (see Chap. 3) based on [2] [with permission] . . . . . . . . . . . . . . . . . . Decision curve showing net benefit across the entire range of threshold probabilities (pt ) for the following four strategies: (1) “Treat all” (i.e., refer all patients to hospice; purple and green), (2) “Treat none” (i.e., administer second-line chemotherapy; shown as the x-axis), and (3) Prediction model, or “Model” (blue and red). The model and treat all strategies have been estimated using expected utility theory, EUT (blue and purple, respectively) or expected regret theory, ERT (red and green, respectively). Inverse ERT models (-ERT) are shown for comparability to EUT. Note that this model assumes a RRR of 0.1 and can produce slightly different results for different values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Dual-decision threshold model. Classic, expected utility threshold probability as a function of benefit/harms ratio as derived by system II, EUT (expected utility threshold, solid line). The treatment should be given if the probability of disease is above the threshold, otherwise should be withheld. Note that if the system I perceives that harms are higher benefits (BI < H I ), the threshold probability is always higher than classic EUT (dotted line). However, if BI > H I , the threshold probability is always lower than the EUT threshold (dashed line). [See comment below and Chaps. 2 and 3 for specific formulations of (net) benefits (B) and (net) harms (H)]. Figure reproduced from Djulbegovic et al. [3] with permission . . . . . . . . . . . . . . . . . . .

xix

81

84

90

xx

Fig. A.1

Fig. A.2

Fig. A.3

Fig. A.4

List of Figures

Consider a clinical situation when no further diagnostic test is available to a physician, and he/she can choose between two alternatives only: to treat (Rx ), or not to treat (NoRx ) a patient who may or may not have a disease. Expected utility (EU) is the average utility of all possible results, weighted by their corresponding probabilities. The decision-maker should select the option with the larger expected utility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Consider a clinical situation when no further diagnostic test is available to a physician, and he/she can choose between two alternatives only: to treat using a particular treatment (Rx1 ), or to treat using an alternative treatment (Rx2 ) a patient who may or may not have a disease . . . . . . . . . . . A decision tree conceptualizing three decisions (treat, perform a diagnostic test, and act according to test result, or withhold test/treatment) and all outcomes resulting from a chance event. The small black square represents a decision, and the small black circular nodes represent events due to chance (i.e., test results or a patient’s outcomes). Utilities (U1 − U8 ) for each outcome are shown enclosed by a rectangle. Each is represented in terms of disutilities by subtracting evidence-based measures of the effects of morbidity or mortality (M), harms of treatment (Hr x ), and harms of testing (Hte ) from perfect health (customarily set at 1). S is the test sensitivity (the frequency of true positives, TP), Sp is the test specificity (or the frequency of true negatives, TN), FN is the frequency of false negatives (FN = 1 − S), FP is the frequency of false positives (FP = 1 − Sp), and R R R refers to the relative risk reduction . . . . . . . . . . . . . . . . . . A decision tree showing a clinical situation when diagnosis is not certain, and no further diagnostic test is available to a physician who has to choose between two alternatives only: to treat (Rx ), or not to treat (NoRx). The tree is solved using regret theory. We define regret as the difference between the utility of the outcome of the action taken and the utility of the outcome of another action we should have taken, in retrospect (see text for details) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

110

113

115

118

List of Figures

Fig. A.5

xxi

A decision tree conceptualizing three decisions (treat, perform a diagnostic test and act accordingly, or withhold test/treatment) and all outcomes resulting from a chance event. The small black square represents a decision, and the small black circular nodes represent events due to chance (i.e., test results or a patient’s outcomes). Utilities for each outcome are shown enclosed by a rectangle. Each is represented in terms of disutilities by subtracting evidence-based measures of the effects of morbidity or mortality (M), harms of treatment (Hr x ), and harms of testing (Hte ) from perfect health (customarily set at 1). S is the test sensitivity (the frequency of true positives, T P), Sp is the test specificity (or the frequency of true negatives, T N ), F N is the frequency of false negatives (F N = 1 − S), F P is the frequency of false positives (F P = 1 − Sp), and E = R R R refers to the efficacy or effectiveness of treatment, expressed as relative risk reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120

xxii

Fig. A.6

Fig. A.7

List of Figures

a A graphical representation of hypothetical expected regret thresholds ptt and prx , as well as the acceptable regret thresholds Parx , Patt and Pawh . b Further shows a relationship EUT, ERT (acceptable regret) and the overuse of underuse in delivery of healthcare interventions. If the probability of the disease is smaller than the acceptable expected regret theory (AERT) testing threshold, pat, the AERT would lead to test ordering, whereas according to EUT, we should not test. This, in turn, leads to overtesting. If the probability of the disease is larger than the AERT testing threshold, pat, according to AERT, we would be reluctant to order the test, whereas according to EUT, we should test. As a consequence, this would lead to undertesting. We think that overtesting typically occurs in the “rule out worst-case scenarios” in which physicians cannot afford to miss a particular diagnosis. Once a serious diagnostic possibility enters the physician’s mind, every patient with chest pain or shortness of breath gets a computed tomography (CT) angiogram to rule out pulmonary embolism (PE), every patient with headache receives a CT of the head to rule out brain tumor, every patient with “incidentaloma” (incidental and unexpected finding of a mass on imaging studies performed for different reasons) gets a biopsy, and so on. We think undertesting typically occurs when ordering a diagnostic test is perceived as not needed (i.e., consciously or subconsciously is felt to be risky or associated with unacceptable level of regret). Hence, patients with atypical chest pain will not get a CT angiogram to rule out PE, patients with headache do not get a CT of the head, patients with “incidentaloma” will not get a biopsy, and so on. Ptt , testing threshold according to EUT; Prx , treatment threshold according to EUT; Pat , testing threshold according to acceptable regret (Ro) model; Pawh , threshold probability below which treatment can be withheld without experiencing regret (if decision was wrong) (see also Chap. 3) (reproduced from Hozo and Djulbegovic with permission) . . . . . . . . . . . . . . . . . 124 A graphical representation of the FFT model for the management of pulmonary embolism. The computerized pulmonary angiography (CTPA or CT) test is administered, and if negative, the D-Dimer (DD) test is administered. If either of the tests results positive, the FFT recommends treating the patient, if the D-Dimer test is negative, we withdraw treatment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125

List of Figures

Fig. A.8

xxiii

Simple (on the left) and complex decision models (on the right). U1 to U4 refer to utilities (health outcomes) associated with each decision; p- denotes the probability of disease or outcome (p1 to p3 ). Rx-treatment; NoRx-no treatment. The inset in the upper right corner illustrates how complex models differ from simple models by expanding utilities and introducing new probabilities describing clinical events of interest. For illustration purposes, we only show the expansion of U1, but any outcome or probability can be expanded/modified as needed. Importantly, the calculation of the events depicted in the inset can be conducted separately from the analysis in the main tree. As a result, the complex models can be reduced to simple decision models (in this case, U1 based on the derivation of expected utility in the Box on the right can be used to replace U1 in simple tree on the left) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127

1

Evidence and Decision-Making

1.1

Introduction

Today, every country struggles to provide adequate health care to its citizens. Globally, an average of $8.3 trillion or 10% of gross domestic product (GDP) is annually spent on health services. In 2019, the USA spent nearly 18% ($3.2 trillion) of its GDP on health care, projected to reach $6.2 trillion by 2028. Despite these enormous resources devoted to improving health, modern health care worldwide is characterized by low quality that can be categorized as (a) underuse, (b) overuse, or (c) misuse. Underuse (undertreatment and undertesting) is the failure to provide adequate health care that would have produced favorable health outcomes. Overuse (overtreatment and overtesting) is defined as the provision of healthcare services whose risks of harm exceed the potential benefits. On average, only 55% (11–79%) of recommended care is delivered to adults in the USA, while, on average, more than 30% of care is considered inappropriate or wasteful, accounting for approximately 25% of total healthcare spending. Finally, misuse relates to medical errors or delivery of incorrect or erroneous care that results in 70,000–250,000 deaths annually in the USA alone. While many reasons are identified for suboptimal care, in the final analyses, they reduce to: (1) the failure to apply or the lack of high-quality evidence related to the effects of most healthcare interventions and (2) suboptimal decision-making. It has been estimated that inadequate adherence to evidence-based guidelines represents the third leading cause of preventable patient deaths and accounts for one-third of unnecessary healthcare spending. Similarly, it has been estimated that personal decisions are the leading cause of death. At the same time, 80% of all healthcare expenditures are affected by physicians’ decisions. Therefore, it stands to reason that improving evidence-based decision-making represents one of the most promising avenues to improve inferior healthcare, which dominates the delivery of health services worldwide. This is what this book is about: drawing on the synthesis of decades of research in the field of © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 B. Djulbegovic and I. Hozo, Threshold Decision-making in Clinical Medicine, Cancer Treatment and Research 189, https://doi.org/10.1007/978-3-031-37993-2_1

1

2

1 Evidence and Decision-Making

evidence-based medicine (EBM) and decision-making, where we provide theoretically robust yet practical approaches to optimal clinical management strategies. Along the way, we provide examples of how applications of the tools and techniques described in the book can improve clinical decision-making and patients’ health outcomes. Because we primarily work in the cancer field, most examples illustrate decision-making in cancer medicine. However, the methods and techniques discussed in this book apply to any area of medicine. We now provide a short description of EBM and theories of decision-making, concluding by identifying the link between both fields: the threshold decision-making model.

1.2

Evidence-Based Medicine (EBM)

EBM can be defined as a set of principles and methods to ensure that populationbased policies and individual decisions are consistent with the totality of the most credible evidence. These decisions often require weighting the tradeoffs between alternative courses of action while relying on the spectrum of cognitive techniques from analytical calculus to intuition and affect-based processes. That is, EBM posits that our actions and beliefs are justifiable (or reasonable/rational; see below) in proportion to the trustworthiness of the evidence (evidentialism), and the extent to which we believe that evidence is determined by credible processes (reliabilism). From these theoretical precepts, three key principles of EBM have emerged (Fig. 1.1): 1. Not all evidence is created equal; the practice of medicine should be based on the best available evidence. Understanding that evidence differs in quality, i.e., that some evidence is more trustworthy than others (Fig. 1.1A, top row), has given risen to research on identifying factors (biases and random errors) that can identify more from less trustworthy evidence. This has paralleled the work on critical appraisal (typically implemented via checklists) as conditio sine qua non for the practice of EBM. At the time of this writing, the assessment of the quality (also known as certainty) of evidence has been best operationalized within Grading of Recommendations Assessment, Development, and Evaluation (GRADE) evidentiary classification system (Fig. 1.1B, top row). 2. The pursuit of truth is best accomplished by evaluating the totality of the evidence and not selecting evidence that favors a particular claim. This principle has promoted methods and techniques of systematic reviews and meta-analyses as the optimal way to summarize the totality of evidence on the topic (e.g., collated in electronic databases such as the Cochrane Database of Systematic Reviews (Fig. 1.1, middle row)). 3. Evidence is necessary but not sufficient for effective decision-making. Many factors influence decision-making, which ultimately affects the health outcomes (consequences of decisions) of importance to the decision-makers. The latter also requires consideration of the decision-makers’ (e.g., the patient’s) values

Fig. 1.1 Three principles of evidence-based medicine (EBM). Principle 1 states that “not all evidence is created equal”, which underpins the creation of a hierarchy of the quality (certainty) of evidence. Figure 1.1A (top row) displays the old hierarchy of evidence by the study type (now considered obsolete); Fig. 1.1B (top row) shows GRADE hierarchy of the quality of evidence, which combines the study design with factors that can affect the quality of evidence (Table 1.1). Principle 2 states that claims should be based on all (totality of) relevant and available studies, often synthesized using techniques of meta-analysis (middle row). Principle 3 states that evidence alone is necessary but not sufficient for making optimal clinical decisions. Multiple factors affect decisions (Fig. 1.1A, bottom row), but the most important is the integration of benefits and harms of health interventions and consideration of the patient’s values and preferences to arrive at the threshold above which one intervention is favoured over another (Fig. 1.1B, bottom row)

1.2 Evidence-Based Medicine (EBM) 3

4

1 Evidence and Decision-Making

and preferences within the given environment and context (Fig. 1.1, bottom row). In our recent review on Progress on EBM, we concluded that EBM had made tremendous progress in advancing the first two principles in research and practice of medicine, but it “has yet to generate a coherent theory of healthcare decisionmaking and will continue to partner with other disciplines, such as cognitive and decision sciences, toward this goal”. On a more practical level, significant challenges remain in providing clinicians with tools to make shared decision-making at the point of care feasible and efficient, resulting in a positive experience for both patients and clinicians. This book attempts to address this unresolved challenge within the field of EBM and provide a robust and coherent theoretical framework and practical tools for bedside decision-making. As argued throughout the book, this coherent synthesis is generated via the threshold model of decision-making. Before we outline a threshold model’s theoretical and epistemological basis, we first provide a brief overview of the decision-making theories. In this book, we do not attempt to teach techniques of critical appraisal and systematic reviews; we refer a reader to many excellent books and papers on these topics. Table 1.1 lists factors identified in EBM that are known to affect the quality of evidence. We relate to the EBM concepts to the extent that they are essential to the decision-making and application of the threshold model. When it comes to intervention research, it is important to distinguish evidence related to the questions of efficacy (“Can an intervention work in the ideal study setting?”), from effectiveness (“Does an intervention work and is generalizable to real-world settings and applicable to individual patients?”), and efficiency or cost-effectiveness (“Is the intervention worth it and should it be paid for?”) (see Glossary). For the most part, we are concerned with interventions we intend to administer to individuals or groups of individuals as a part of public health measures (e.g., the decision to administer a vaccine). Still, we will interchangeably rely on evidence from efficacy and effectiveness studies to populate our decision models.

1.3

Theories of Decision-Making

There is no universal theory of decision-making: there are hundreds of theories and models that may be applicable in some settings but not in others. It is customary to classify decision-making theories in normative versus descriptive versus prescriptive theories. Normative theories are based on mathematical and statistical axioms addressing the question of what people “should or ought to do”. The most popular normative theory that dominates health economics and clinical decisionmaking modeling is expected utility theory (EUT). EUT is the only theory of choice that satisfies all mathematical axioms of rational decision-making, ensuring that the choices are consistent with the deciders’ values and preferences (see Table 1.2).

1.3 Theories of Decision-Making Table 1.1

5

Factors that can affect (decrease or increase) the quality (certainty) of evidence Reporting bias

Factors that decrease the quality Inconsistency of evidence

Publication and outcome reporting bias Variability or heterogeneity in results due to true differences in treatment effect (due to differences in PICO; a framework for investigating clinical questions defined as: Patient, Intervention, Comparison (or Comparator), Outcome) Measures: statistical: large I2 (e.g., > 50%); clinical: (PICO-related)

Imprecision

A few relevant clinical events (< 200–300), small studies (N < 400), wide confidence intervals consistent with important differences in both directions or no effects or all

Indirectness

Generalizability or applicability of evidence

Study limitations (risk of bias) Method of intervention assignment (e.g., adequacy of generation of random sequence); inadequacy of allocation concealment (selection bias); bias due to contamination or the lack of masking personnel to co-interventions (performance bias), bias due to loss of follow-up or incomplete outcome data (attrition bias), bias when the ascertainers of outcomes were not masked (detection bias), and reporting biases, when either outcomes or studies were not published Factors that increase the quality Large magnitude of effect of evidence

A statistically significant relative risk of > 5 (< 0.2)

Plausible confounding

Which would reduce a demonstrated effect or would suggest a spurious effect when the results show no effects

Dose–response gradient

A change in the outcome of interest proportional to the treatment dosage or some other quantifiable amount of treatment

6

1 Evidence and Decision-Making

According to EUT, rational decision-making is associated with selecting the alternative with a higher expected utility such as, for example, treatments that result in higher quality-adjusted life years. Expected utility is the average of all possible results weighted by their corresponding probabilities. It is typically based on Bayesian probability calculus. Normative theories such as EUT typically ignore the rich context in which most people operate but may be useful in context-poor situations such as public health governmental decisions where limited resources and time constraints are not rate-limiting factors (see Chap. 8). However, research over the last three decades has convincingly shown that what people “should” do is often different from what they actually do. This is known as a “normative-descriptive gap”. Descriptive theories attempt to describe and explain how people actually make their decisions addressing this “is” versus “ought to” phenomenon. Abundant research has systematically documented differences between normative versus descriptive decision-making. Many descriptive theories have been proposed, but at this time most authors accept that our cognitive apparatus can be best explained by so-called dual-processing theories. The dual-processing theories (DPT) postulate that cognition is governed by Type 1 processes (which are intuitive, automatic, fast, resource-frugal, narrative, experiential, and affect-based) and Type 2 processes (which are analytical, slow, verbal, and deliberative and support formal logical and probabilistic analyses). This is a position we take in this book. Nevertheless, some authors proposed that intuitive and deliberate judgments are based on common principles. Regardless of whether one or two minds affect our reasoning and decisionmaking, most (but not all) “normative-descriptive” discrepancies can be related to psychological biases (see Table 1.3) due to two features that characterize human reasoning. First, humans are “cognitive misers” with a tendency to use the least possible effort to engage in problem-solving and decision-making. Second, humans poorly calibrate the response to risks and uncertainties primarily due to the effects of emotions and feelings, which respond much faster to the problem at hand than our analytical, deliberative cognitive apparatus. Importantly, some processes, such as the cognitive emotion regret, can serve as a link between type 1 and type 2 processes. That is, regret taps into the affect-based aspect of cognition and, by imagining counterfactuals in the processes, guides our deliberative and analytical reasoning (see Chaps. 3, 7, and 8). This brings us to the importance of risk and uncertainties in decision-making and how we respond to them. It is customary to distinguish between decisions under risks (when probabilities of events are known, as when we employ well-validated predictive models to guide our decision) versus decisions under uncertainty (when probabilities of events are either not known or not available to a decision-maker, in which case we typically rely on holistic, experientially and affect-driven decision-making). Importantly, probabilities and uncertainties are measures of a “degree of belief” rather than a property of the real world. A statistician, Bruno de Finetti, remarked, “The only relevant thing is uncertainty—the extent of our knowledge and ignorance. The actual fact of whether or not the events

1.3 Theories of Decision-Making

7

Table 1.2 Axioms of rational decision-making Principles

Description

Ordering alternatives (commensurability)

A rational decision-maker should either prefer one alternative to the other (Rx A over Rx B), or they should be indifferent to them (Rx A = Rx B)

Dominance

A rational decision-maker should never accept a strategy that is “dominated” by other strategies. A strategy is strongly dominant if, compared to another strategy, it yields an outcome superior in every evaluative aspect For example: If Rx A is associated with higher benefits, lower harms and is cheaper, then it dominates Rx B

Transitivity principle

If we prefer Rx A > Rx B and Rx B > Rx C, we should also prefer Rx A > Rx C

Cancellation principle (independence from A choice between two alternatives should depend irrelevant alternatives) only on those outcomes that differ, not on those outcomes that are the same for both options. Common factors cancel out (Rx A·Rx B) Rx A E.g., (RxC·Rx B) = RxC ; For example, Rx A and C have the same costs but differ on health outcomes such as survival rate Invariance principle (independence)

A rational decision-maker should not be affected by the way alternatives are presented. If one prefers Rx A to Rx B, then they should also prefer (Rx A + Rx C) to (Rx B + Rx C) (adding Rx C to both options should not affect our choices) Description invariance (“Framing effect”) and Procedure invariance (“Elicitation effects”) should not affect our choices. For example, presenting information in terms of mortality versus its complement (survival = 1 − mortality) should not affect our decisions

Consistency

If we prefer Rx A > Rx B, then we should also prefer a gamble between Rx A versus Rx B (to the certainty of settling for guaranteed Rx B)

Continuity (interchangeability: desirability For any outcome S, which is preferable to L (low), and probability tradeoff) but not as good as H (high) outcomes, there is some probability (p) at which a decision-maker is indifferent between S (for sure) and the gamble with a chance p of getting H and (1 − p) of getting L

8

1 Evidence and Decision-Making

Table 1.3 Common cognitive biases in decision-makinga Bias

Description

Aggregate bias

An anticipated difference in the group-level and the individual-level effect

Anchoring

Valuing an initial piece of information or a single reference point more than subsequent pieces of information; often leads to inadequate adjustments of clinical events of interest

Ascertainment bias (sampling bias)

Data collection that disproportionately targets one or more groups that incorrectly represent the population

Attribution error

Errors made during self-evaluation or the evaluation of the behaviors of someone else

Availability

Reaching conclusions about a population based only on readily available data; judging the probability of an event by how easily it is remembered

Base rate neglect

Favoring (sometimes single) individual observations or experiences without incorporating the base rate or the background probability of an event (a key reason why relying on representative heuristic may be wrong)

Commission bias

Favoring action in situations over inaction

Confirmation bias

Valuing evidence that confirms the investigator’s hypothesis or prior beliefs

Diagnostic momentum

Accepting a diagnosis without additional testing or a second opinion

Gambler’s fallacy (Monte Carlo fallacy) Erroneously predicting the outcome of an event or a diagnosis based on the frequency of prior outcomes that do not affect future outcomes (i.e., failure to realize that the outcomes are statistically independent) Gender bias

Assumption(s) made on overconsidering gender

Hindsight bias

Overconfidence in the predictability of an event or outcome

Multiple alternatives

Accommodation of multiple alternative hypotheses or diagnoses can reduce confidence and create conflict in making a hypothesis or diagnosis

Omission bias

Favoring inaction in situations over action

Order effects

A bias is introduced when a participant provides different answers depending on the order of the questions being asked

Outcome bias

A tendency to seek more beneficial outcomes over poor outcomes that could result in serious diagnoses being missed

Overconfidence

Incorrectly thinking that one knows more than they do, potentially causing conjecture to outweigh the evidence

Playing the odds

An incorrect outcome or diagnosis due to the rarity of the correct outcome or diagnosis (continued)

1.3 Theories of Decision-Making

9

Table 1.3 (continued) Bias

Description

Premature closure

The arrival of early conclusion when investigating an outcome or diagnosis that will potentially miss critical outcomes or serious diagnoses

Psych-out errors

Falsely attributing symptoms to psychiatric or mental health conditions

Representativeness

A process for categorizing something by how closely it resembles the sample population (e.g., by comparing the features of a case history to the text description or our own clinical experience); is affected by many factors such as base case neglect (prevalence), regression toward the mean, the accuracy of cues, etc. When people rely on representativeness to make judgments (instead of objective data), they are likely to judge wrongly because the fact that something is more similar (representative) does not actually make it more likely

Search satisficing

Concluding an investigation or testing once abnormalities to provide a good enough answer (instead of best possibly answer)

Sutton’s slip (Sutton’s law)

Heavily relying on Occam’s Razor or the simplest explanation (or diagnosis) and ignoring less probable outcomes or rare diseases

Triage-Cueing

The bias associated with triage where a misdiagnosis is associated with a patient

Unpacking principle

Failing to investigate or ask questions that would otherwise result in critical information that would affect a diagnosis or outcome

Visceral bias

Emotional feelings or opinions about a patient that affect the outcome or diagnosis

The most common biases in clinical practice are shown in italics a Note that many of these biases represent mental shortcuts, useful effort-reduction, and decision-making strategy simplification; they are quick and often correct—but at the cost of occasionally sending us off course

considered are in some sense determined or known by other people is of no consequence”. A decision-maker’s state of mind is typically influenced by many factors, including emotions, which often violate analytical rules of probability calculus in a rather predictable manner (sometimes referred to as “predictably irrational” or “risk as feeling” phenomena). Two phenomena frequently affect decision-making in medicine, particularly when stakes are high, the so-called: (1) possibility effect and (2) the certainty effect. The possibility effect refers to our difficulties in distinguishing cause from the coincidence and tendency to overweigh small and underweigh moderate and large probabilities. As a result, a change from impossible to slightly possible has a far more substantial impact than an equal increase from a mere possibility to a slightly higher possibility of the occurrence of an event under consideration. The certainty effect describes people’s inclination to value a change from a possible to

10

1 Evidence and Decision-Making

a certain effect much more than an equal change from a merely possible to a more likely one. Importantly, in affect-rich situations, people tend to exhibit probability neglect (i.e., sensitivity only to the presence or absence of stimuli and recognizing outcomes as possible or not). However, in affect-poor contexts, probabilities tend to be evaluated without such distortions, which is one of the reasons why EUT still seems preferable for public policy decisions. The EUT may need to be reformulated at bedside encounters where emotions and decision stakes often drive decisions. Many such decisions rely on regret, a powerful cognitive emotion that employs counterfactual reasoning processes to tap into the analytical aspect of our cognitive architecture and affect-based decision-making. Extensive research has demonstrated that the regulation of regret is of crucial importance for decisionmaking. Indeed, medical decision-making is often associated with regret-averse decision processes. These psychological mechanisms are the basis of our quest for therapeutic and diagnostic certainty that can explain the practice of overtesting. These are important insights that provide the basis for the further development of prescriptive theories of decision-making. Because humans can be poor decision-makers, prescriptive approaches are concerned with improving decisionmaking. This is typically accomplished by using ideas from descriptive theories to modify normative decision theories. Most real-life decisions are context-rich (varying as a function of a clinical setting, time pressure, cognitive load, framing effect, social context, conflict of interest, etc.) that depend on characteristics of the decision itself (e.g., high stakes vs. low-stakes situations), as well as individual characteristics of the decision-maker (e.g., cultural background, professional background, cognitive ability, decision-making styles) in an almost infinite number of combinations. Clinical visits occur within a limited time and in the context of the ongoing information explosion. A typical clinical encounter is approximately 11 min long, with less than 2 min available to search for reliable information, with interruptions occurring, on average, every 15 min. At the same time, the scientific explosion remains unabated: more than 1.8 million articles are published in more than 10,000 biomedical journals every year, with MEDLINE alone containing over 28 million indexed citations from more than 5200 journals. In addition, 75 randomized clinical trials and 11 systematic reviews are published every day. This information explosion needs to be contrasted with the human brain’s limited capacity for information processing, memory limitations, and relatively low storage capability. Ultimately, this means that real-life decision-making requires adaptating to our environment, which we now briefly discuss. Under the real-life complexity of clinical decision-making and human information processing limitations, the Theory of Bounded Rationality posits that rational behavior relies on a satisficing process (i.e., finding a good enough solution) instead of a maximizing process (i.e., finding the best possible solution). These are adaptive evolutionary mechanisms that have evolved to effectively deal with ambiguities and uncertainties that surround us both in daily life and clinical medicine. Satisficing can have several forms. It can be implemented via heuristics, which represents the mechanisms for implementing bounded rationality. Heuristics refers

1.4 Rational Decision-Making

11

to problem-solving and decision-making strategies that ignore part of the information, aiming to make decisions more quickly, frugally, and/or accurately than more complex methods. These simple strategies can often outperform complex statistical models in a phenomenon known as “less-is-more”. They are widely used in medical education as popular “mental shortcuts”, “rules of thumb,” clinical pathways (flowcharts or algorithms), and fast-and-frugal (FFT) decision trees. Effective heuristics assume that a point must exist at which obtaining more information or computation becomes detrimental and costly. That is, by relying on satisficing, a decision-maker selects the first available option, which exceeds a certain aspiration level, deemed to be “good enough” rather than waiting to choose the best possible option. This aspirational level can take the form of so-called robust satisficing, a concept similar to “acceptable regret”—referring to circumstances when wrong decisions are still regretted but tolerated. For example, after accounting for (acceptable) regret, the “stubborn quest for diagnostic certainty (long lamented as a driver of inappropriate diagnostic testing) may not be that irrational as physicians rarely regret ordering tests but may regret not ordering them (as when they are sued after they missed a diagnosis that resulted in not prescribing adequate treatment or providing needed prognostic information). In this book, we employ the threshold concept as both a theoretically sound and a practically feasible prescriptive model for the wide implementation of rational decision-making in clinical medicine. As we will show, the threshold model can be derived from various perspectives (EUT, regret, FFT, dual-processing theories). We must consider multiple approaches to decision-making because absolute certainty of scientific inference is theoretically impossible as different problems will require different approaches to finding the best, most rational solution under uncertainty. Sometimes we should rely on EUT, sometimes on cognitive emotions such as regret, and sometimes we should employ heuristics or other problem-solving strategies depending on the problem. As noted by Kahneman, “theory of choice that completely ignores feeling such as the pain of losses and the regret of mistakes is not only descriptively unrealistic but also might lead to prescriptions that do not maximize the utility of outcomes as they are actually experienced”. Before describing a general outline of the threshold model, let us first address the notion of rational decision-making.

1.4

Rational Decision-Making

Throughout this book, we refer to rationality and rational decision-making. So, what do we mean by “rationality”? Table 1.2 lists the core ingredients (“Principles”, P) of rationality commonly identified across theoretical models of rationality. Rationality refers to the property of human reasoning but is often defined as acting in a way that helps us achieve our goals, which in clinical medicine typically refers to the desire to improve our health. The “rationality debate” often revolves around the most optimal procedures needed to achieve our goals and sometimes distinguishes normative from pragmatic rationality. In this

12

1 Evidence and Decision-Making

book, we will start with the normative model (threshold EUT model), which we will extend to encompass other aspects of human reasoning (regret-based model, dual-processing model, FFT-based threshold model) toward pragmatic applications at the bedside. It is essential to realize that rationality does not guarantee error-free decisions. Normative theories of choice indicate that rationality of choice is a matter of the procedure of choice and not of what is chosen: a good decision can result in bad outcomes, and a bad decision can result in good outcomes. However, in the long run, adhering to the procedures that result in better decisions should, on average, result in better outcomes. The rationality Principle 1 and Principle 2 (Table 1.4) indicate that in order to achieve our goals, a rational choice requires taking into account both benefits (gains) and harms (losses) of alternative courses of action, which in clinical medicine typically occur under conditions of uncertainty. As alluded to earlier, EBM posits a link between rationality and believing what is true (at the inferential level) while requiring that “rational thinkers respect their evidence” (at the decision-making level). That is, our actions, decisions, and recommendations should be consistent with underlying evidence (Principle 2). All things equal, it is more rational to act based on more reliable evidence (such as one generated in well-conducted randomized trials) than observational or anecdotal evidence, at high risk for bias. Not surprisingly, empirical research has demonstrated that the quality (certainty) of evidence and an intervention’s benefits and harms are critical determinants of clinical guidelines panels’ practice recommendations. We noted earlier that normative EUT principles are often violated without indicating that such (irrational) violation is detrimental to a decision-maker. In fact, abundant research has demonstrated that depending on the context, both intuitive affect-based type 1 and analytical deliberative-type 2 processes can generate biases or produce normatively correct answers. Hence, Principle 3 indicates that rational thinking should be informed by human cognitive architecture, i.e., the rational action should be coherent with formal principles of rationality and human intuitions about good decisions. That is, the most optimal decisions would be those that achieve coherence at both the normative and intuitive levels. To achieve this, Stanovich suggests that “the trick may be to value formal principles of rationality, but not to take them too seriously, asking reflectively about the appropriateness of our emotional reactions to a decision”. This can be achieved by using the threshold model reformulated within regret or other theoretical frameworks; if a decision under consideration remains robust and unchanged under different frameworks, such a decision could qualify as “rational”. Our discussion indicates that it is impossible to define a “one size fits all” rationality model which fits all clinical circumstances and decision-makers’ cognitive styles and characteristics. Principle 4 states that rationality depends on the context and should respect the epistemological, environmental, and computational constraints of human brains. This principle is the basis for commonly used heuristic strategies in medicine, including guidelines, pathways, and FFT decision trees based on satisficing process (finding a good enough solution) instead

13 Table 1.4 Core ingredients (“principles”) of rationality commonly identified across published theoretical models Principle Description 1

Most major theories of choice agree that rational decision-making requires integrations of benefits (gains) and harms (losses) to fulfill our goals (e.g., better health)

2

Decision-making in medicine typically occurs in situations fraught with uncertainties. The rational approach requires reliable evidence to deal with the inherent uncertainties; it also relies on cognitive processes that allow the integration of probabilities/uncertainties consistent with the underlying quality of evidence

3

Rational thinking should be informed by human cognitive architecture that is composed of type 1a reasoning processes, which characterizes “old mind” (affect-based, intuitive, fast, resource-frugal), and type 2a processes (analytic and deliberative, consequential driven, and effortful) of “new mind”

4

Rationality depends on the context and should respect the epistemological, environmental, and computational constraints of human brains

5

Rationality (in medicine) is closely linked to the ethics and morality of our actions requiring full consideration of utilitarian (society-oriented), duty-bound (individual-oriented), and right-based (autonomy, “no decision about me, without me”) ethics

a Some authors questions this terminology, but no one questions that human cognition involves intuitive, affect-based processes and deliberative, analytical processes

of the EUT maximizing approach aiming to find the normatively best possible solution. Finally, Principle 5 states rationality (in medicine) is closely linked to the ethics and morality of our actions: requiring consideration of utilitarian (society-oriented), duty-bound (individual-oriented), and right-based (autonomy, “no decision about me, without me”) ethics. Unfortunately, achieving a coherent solution across the various actors and dimensions that often conflict is technically very difficult, if not impossible. For example, physicians may wish to fulfill their deontic duties and prescribe therapy with the best benefit/harm ratio for their patients. Such a treatment may, unfortunately, be too costly for a society or an individual to afford, preventing what is the most rational approach to improve the patient’s health. Under these conditions, the principles of rational decision-making may require the application of what the philosopher John Rawls calls “deliberative and considered judgment” to link precepts of moral philosophy (Principle 5) with the theory of rational choice (Principles 1 and 2), arriving at so-called reflective equilibrium using systematic methods with the least likelihood of distortion (Principles 2, 3, and 4). Pinker suggested that alignment of conflicting moral goals can be achieved by combining self with social interests via impartiality—the interchangeability of perspectives. While complete technical solutions of achieving full coherence underpinning moral reasoning remain challenging, we believe that even under these conditions, quantitative modeling including threshold models may aid in formulating the most rational and just decisions.

14

1.5

1 Evidence and Decision-Making

Threshold Model of Decision-Making: The Linchpin Between EBM and Decision-Making

Can we operationalize the general rationality principles we just outlined and effectively deal with the enormous complexities of clinical decision-making? In this book, we affirmatively answer this question, proposing using the threshold model at the bedside in everyday clinical decision-making and for public health decisions. Initially proposed in 1975, the threshold model represents one of the most important advances in the field of medical decision-making. As argued throughout the text, the threshold concept is closely related to the question of rational decision-making. When should the physician act (e.g., order a diagnostic test) or prescribe treatment? The threshold model represents a linchpin between evidence (which exists on the continuum of credibility, from the impossibility to virtual certainty) and decision-making (which is a categorical exercise—we decide to act or not act). That is, decisions are always made at the threshold, invariably invoking some sort of “if (some condition or conditions are met) then (make decision/act)” conditional rule regardless of the complexity of the mathematical apparatus that may determine whether the condition is met or not (i.e., whether this is based on classic, fuzzy logic, or machine-learning algorithms), or who makes this determination (human or machine1 ). Fundamentally, the threshold bridges the inferential world of EBM, which focuses on correct “conclusions” related to the truthfulness of empirical findings with theories of decision-making that are concerned about most rational actions “here and now”. Epistemologically, threshold model can be seen as a solution of the ancient Sorites paradox—still a fully unresolved theoretical problem caused by borderline cases resulting in different classifications based on a single individual or minor differences in the quantities of interest. For example, currently, diagnosis of acute leukemia consists of identifying 20% of blasts in bone marrow, but identifying 19% of blasts is categorized as myelodysplastic syndrome. Should treatment be really different based on the difference in the count of 1% of blasts? While many advanced theoretical avenues (such as fuzzy theory, three-valued logic, supervaluation) have been attempted to deal with the Sorites paradox (also epistemologically known as a numerical vagueness problem), none of them so far have made inroads into clinical medicine. At this time, classic statistical decision theory provides the best solution to dealing with small changes of information presented on the continuum. Here, we should also mention that within decision analytical modeling, the magnitude of differences is irrelevant—classical rules of statistical inference do not apply. Decisions should be based only on the mean net benefits and harms, irrespective of whether differences are statistically significant. The previous sections highlighted the existence of a wide range of decision theories, which give rise to many models of rationality. In fact, it has been

1

In this book, we will assume that the machine-driven algorithms employ expected utility theory and that when the tradeoffs are involved, the optimization processes not based on invoking human emotions are used.

1.5 Threshold Model of Decision-Making: The Linchpin Between EBM …

15

demonstrated that what is rational behavior under one theory may be irrational under another theory. As alluded to earlier, it is the context that is of paramount importance to rationality, and no single model of rationality can possibly fit all decision-making circumstances. However, regardless of the theoretical framework, the general requirement still holds—to exercise rational choices, we need a model to link evidence that exists on the continuum with categorical decision-making. The threshold model is the only model that conceptually meets this requirement. Nevertheless, as this discussion implies, specific solutions will depend on the theoretical choice under which we want to operate to determine both policy and our individual decision-making. This means a specific action threshold will be different between threshold models based on EUT versus regret versus DPT versus FFT, for example. The main task of a decision-maker is to assess the setting and context, which, in turn, will determine the choice of the model (see Chap. 8). As indicated earlier, in resource-rich, time-unlimited, context-poor situations, such as is often the case in policy decision-making, the EUT-based threshold models may provide the optimal approach to medical decision-making. On the other hand, in contextrich circumstances dominated by emotions where the aim is to minimize regret, the application of a regret-based threshold model may represent the best solution. The dual-processing threshold model may be particularly applicable in circumstances that are dominated by high uncertainty, where we need to rely on intuition. In this book, we will often consider single-point decisions for both didactic and pragmatic reasons. However, as pointed out by Kahneman et al., “[a] singular decision is a recurrent decision that is made only once”. Many single-point decisions can be reformulated as repeated decisions over time, but given that decisions in the healthcare setting almost always occur at single-point encounters, i.e., during outpatient or in-patient visits, the simple threshold models are applicable to most bedside decisions. Nevertheless, sometimes we need to take into account a series of decisions to arrive at the optimal management strategy. In this case, the FFT-based threshold may represent the best solution to a given decision problem. Nonetheless, the appropriateness of the model’s prescription should be judged and re-assessed within a framework of the given context, as discussed throughout this chapter and the book. To further illustrate what we have in mind, let’s outline a simple but powerful threshold model, which will serve as the basis for almost all the models we will discuss in this book (Fig. 1.2). We start by employing the EUT framework. Figure 1.2A shows that, at the decision node (depicted by a square), a decisionmaker chooses what to do—in this case, to treat versus not to treat the patient who may or may not have a disease that requires treatment. Once the decision is made, the decision-maker does not control events (denoted by circles at each branch in Fig. 1.2A)—they occur as a function of the probabilities (in this case, there is the probability that the patient will have the disease, pD, and the probability that they will not have the disease under consideration, 1 − pD). Each of the decision branches depicted in Fig. 1.2. is also associated with the consequences of the chosen actions, typically expressed as health outcomes (or utilities and disutilities; see Glossary).

16

(A)

1 Evidence and Decision-Making

(B)

(C)

Fig. 1.2 Basic outline of a threshold model (A) is where a decision is made (illustrated by a square) to either treat (Rx) or not treat (NoRx). After a decision is made, there exists a probability (p) (denoted as pD in the graph) where the patient is either diseased (D+;) or not (D−; 1 − pD). Consequences of the decision can be measured as health outcomes (utilities; U, which may relate to morbidity, mortality, quality-adjusted life years, etc.). The expected utility (EU) of administering the treatment is calculated as (p * U(Rx, D+) + (1 − p) * (U(Rx, D−)) and withholding treatment as (p * U(NoRx, D+) + (1 − p) * (U(NoRx, D−)). Expected utility theory dictates that the decision-maker should choose to either treat or not based on whichever has a higher EU. The threshold at which the expected utilities are the same (pt ) means that when pD is above pt , a decision-maker should administer the treatment, and when pD is below pt , a decision-maker should withhold treatment. The generic net benefits [B = U(Rx, D+) − U(NoRx, D+)] and harms [H = U (NoRx, D−) − U(Rx, D−)] of treatment can be used to estimate pt (see calculation in A and a graphical illustration in B; red line). For example, in panel B, a high treatment benefits/harms ratio of 36 (blue line) refers to treatment for a patient for suspected tuberculosis. Such a high B/H ratio means that treatment should be administered even if the probability of suspected tuberculosis is very low (i.e., in this case, as long as pD > pt > 2.7% the treatment should be administered because benefits so heavily outweigh its harms). Ideally, data informing the B/H ratio should come from high-quality research synthesis using techniques of systematic reviews and meta-analysis (presented as small forest plots in panel 1A). Figure 1.2C shows that when physicians were asked to determine their treatment thresholds using intuition and regret, the decision threshold dramatically differed from the EUT threshold (see text)

A representative question of the clinical encounter of interest to both a physician and their patient is which of the alternative management strategies (i.e., Rx vs. NoRx in this case) the physician should prescribe (and the patient accepts). According to EUT, we should compare EU (Rx) versus EU (NoRx) and recommend whichever treatment has a higher EU. The appendix illustrates details of the mathematical solutions related to the various decision trees used in this book, but we remind a reader that the calculation of EU represents the average of all possible results weighted by their corresponding probabilities. That is, we simply multiply the outcome (utility) values by each corresponding probability in the decision tree

1.5 Threshold Model of Decision-Making: The Linchpin Between EBM …

17

to arrive at the value of EU for a given decision [EU (Rx) versus EU (NoRx)] in this example. Knowing whether EU of one management option is superior to another provides a valuable answer. Still, it does not help with individualizing the choice of treatment, which is a common goal for physicians and patients. This can be accomplished by using a threshold model. The threshold model stipulates that the most rational decision in medicine is to initiate an intervention when the expected benefits outweigh its expected harms at a given probability (pt ) of disease or clinical outcome. Individualization of treatment is the eternal goal of clinical medicine, which is increasingly being made possible by advances in precision medicine. This is accomplished by contrasting an individual patient’s probability of disease (pD; or outcome) against threshold probability (pt ) by application of the aforementioned the “if…then” rule. That is, if pD > pt , we should give treatment, but if pD < pt , we should refrain from administering treatment (Fig. 1.2B). This is a general threshold principle (that can take different forms, depending on a given formulation of threshold, as illustrated throughout this book). Thus, operational achievements of the goals in health care can be realized by linking evidence with decisions via the threshold model both at the individual and population levels. As we advance the notion of precision and individualized medicine, words of caution are necessary. Our assessment of risks (probabilities) will always rely on the (sub)group data—risk in any individual patient remains ultimately unknowable. That is, the risk is a group phenomenon and is knowable only as a population-based measure. We can never say which individual patient will have an event of interest with perfect certainty. In addition, risk assessment can be quite accurate at the group level. Any individual patient has the probability of either a 100 or 0% of a given disease or outcome. Therefore, decision-making at the bedside aiming at individuals will always have to be based under uncertainty (risk) using evidence generated from the appropriate (sub)groups of individuals (Fig. 1.2A). It also follows that the more applicable and reliable group evidence is, the better our decisions will be for individual patients. This is a key reason why we should insist on “trustworthy” evidence to populate our decision models. It is the EBM apparatus that provides the ingredients for the threshold model. Ideally, data on probabilities and outcomes should be derived from systematic reviews/meta-analyses of high-quality data. Unfortunately, oncology treatment recommendations are supported only by 1–2% of published high-quality evidence. As a result, we should be ready to assess the robustness of our decisions by performing a series of sensitivity analyses as we vary assumptions about the quality of data and their effects on our choices. Table 1.3 lists biases that can affect the production of evidence, our inferences and, ultimately our decisions. These cognitive biases impact our assessment of probability (of disease), health outcomes (disutility), and choices we make. In addition, it is essential to note that various modeling approaches and various metrics we use may affect decisions differently. That is, we may decide to use direct evidence on pD (often by employing published predictive models), or model the effect of treatment on the probability of disease or health outcomes. For example, we may express treatment effects via relative risk

18

1 Evidence and Decision-Making

reduction (RRR) and express its effect on pD as (pD · (1 − RRR)); this means that if treatment is 100% effective, the disease is completely eradicated or prevented (as pD · (1 − 1) = 0). On the other hand, we may decide to model the effect of treatments on disutility, which may take the form as 1 − M · (1 − RRR), where M represents the outcome in a patient without treatment. The Glossary lists different EBM metrics that can be used to define the threshold model parameters. The point is that different theoretical frameworks and metrics used to populate threshold models will affect the actual calculations of the thresholds differently. Let’s further illustrate some of the key insights from the threshold model using it in generic terms (Fig. 1.2A). We use generic definitions of net benefits (B) and harms (H) to solve the model. As already defined above, B = U(Rx, D+) − U(NoRx, D+), i.e., the difference in the utility of the outcomes if the patients with disease were treated or were not treated; H = U(NoRx, D−) − U(Rx, D−), i.e., the difference in the utility of the outcomes if a patient without the disease were not treated versus treated. If we use these definitions for net B and net H, we can solve the tree shown in Fig. 1.2a for threshold probability (pt ) as: pt = 1 B . 1+ H

The graph (Fig. 1.2B) shows very important findings derived from the threshold model: as net benefits of treatments outweigh the harms, the certainty in diagnosis above which we should treat dramatically drops. Conversely, if a treatment’s benefit/harm ratio is smaller, the threshold probability for therapeutic action increases. Note that net B and net H are here expressed as global (often as subjective) utilities that can further be decomposed or expressed as specific (dis)utilities (health outcomes), morbidity or mortality, life expectancy, absence of pain, cost, or disability-adjusted life years, etc. (see Chap. 2). For example, when considering outcomes related to morbidity and mortality, Basinga and colleagues estimated that the net benefit/harm ratio of administering antituberculosis therapy to a patient with suspected tuberculosis (TB) is about 36. This converts into the calculated treatment threshold probability of 2.7% [1/(1 + 36)]. This also means that, according to EUT, rational physicians should prescribe drugs against TB if they estimate that the probability of TB in a patient suspected of TB exceeds 2.7%! However, note that at the probability of 2.7%, the vast majority of patients (> 97%) suspected of having TB will not have tuberculosis. Thus, acting according to EUT will predictably lead to unnecessary treatment (overuse) in many patients to help those few patients who require treatment. This means that overtesting and overtreatment are built into the EUT model—the normative theory widely accepted as the rationality gold standard and the only model that satisfies all mathematical and statistical axioms of rationality shown in Table 1.2 (see Chap. 8 for further details on this important point, and discussion when EUT can help minimize underuse). When the treating physicians are asked to consult their intuition and emotions such as regret to estimate the threshold probability of TB above which they would commit to 6–8 months of anti-TB treatment, on average, they assessed it at approximately 25–50%. But, this non-EUT derived threshold also means that we would not treat almost a quarter to half of the patients who otherwise would have been treated according to the EUT model. This example illustrates a tension between

1.5 Threshold Model of Decision-Making: The Linchpin Between EBM …

19

two theoretical approaches to rational decision-making. The tension fundamentally revolves around tradeoffs: the extent of underuse (false negatives) we are willing to tolerate in other to avoid overuse (false positives). These tensions exist both at evidentiary inferential levels—as obtaining absolute “truth” in research is impossible-research, “payback” always represents a mixture of false and true findings, but also at decision-making levels. That is, our judgments are always made under uncertainties; we strive to minimize uncertainties, but we cannot completely eliminate them, leaving us with conditions of irreducible uncertainty. Because of irreducible uncertainty, it is inevitable that we will make both decision and inferential errors, which can be either false positives or false negatives. In turn, the consequence of our errors will affect different people in different ways, which means that our decisions may ultimately result in unavoidable injustice imparted by inevitable tradeoffs between false positives and false negatives errors (see Fig. 1.3). An important point to realize here is that false positives and false negative errors are inextricably linked: for most tests, as false positives decrease, false negatives increase, and vice versa; it is impossible to simultaneously decrease false positives and false negatives. The right threshold is not only a function of net B and net H, but it also depends on our values and preferences, which are largely affected by our cognitive apparatus, emotions, and the circumstances in which we find ourselves. For example, evolutionarily, we have evolved to ascertain the signal related to harms versus benefits differently. Because harms, particularly traumatic or vivid events, are typically associated with intense emotions, they are more easily encoded in individual and collective memory. Under these circumstances, we are more willing to act even if the signal is likely to result in false positives. This can be one reason that explains the hesitancy in receiving the COVID-19 vaccine (a development of one of the most remarkable achievements in the history of medicine) in the wake of rare coincidental blood clots reported with COVID-19

Fig. 1.3 An irreducible uncertainty, inevitable error, unavoidable injustice: the inverse relationship between underuse and overuse. False positive (FP) and false negative (FN) errors are inevitable and inextricably linked: as FP decreases, FN increases, and vice versa; outside of a perfect test (100% sensitivity and specificity), it is impossible to simultaneously decrease FP (regret of commission) and FN errors (regret of omission). The right threshold is a function of our values. Figure adapted from K. Hammond. Human Judgment and Social Policy. Oxford Press, 1996; Djulbegovic & Ash. JAMA 2011; 305:2005–2006 (May 18)

20

1 Evidence and Decision-Making

vaccines during the COVID-19 pandemic. These responses are driven by type I processes, which typically interpret the probabilities as “yes/no” categorical events instead of calibrating the risk analytically to appreciate that benefits of receiving the COVID-19 vaccine outweigh the risks of the vaccine by a large margin. However, we are more conservative regarding benefits—we require more assurance before we act, even if the signal is possibly a false negative. Historically, for example, the practice of medical research has typically been willing to miss a true signal at a rate of 4 times higher than accepting one false positive finding, i.e., it is customary to set the false negative error (β) at 20% and the false positive error (α) rate at 5%. However, this ratio is not “set in stone”—the context and circumstances, as repeatedly highlighted, modify our tolerance toward the magnitude of false positive versus false negative errors depending on the consequences of potentially wrong actions. In addition, cognitive emotions (such as regret) play a significant role and often affect decisions differently when we make policy or management recommendations for individuals versus populations. For example, to avoid the regret of omission (false negatives, underuse) for individuals, we would have to tolerate more false positives (overuse) that will invariably affect the population of the patients. On the other hand, if we want to reduce false positives, the false negatives will increase, and individuals will suffer. Thus, the impossibility of error-free decision-making will unavoidably lead to some injustice, either to individuals or to society. Figure 1.3 shows this relationship between underuse and overuse, a relationship which Hammond memorably called “irreducible uncertainty, inevitable errors, unavoidable injustice”. The tradeoffs between false negatives and false positives (and the extent of the injustice toward individuals vs. society), depends on the calculated thresholds. The latter, as explained, depends on decision theory, which we select to derive the threshold for the action to address a decision problem at hand. This may include defining situations when we can tolerate errors without feeling regret (under the concept of acceptable regret or robust satisficing). Figure 1.2B (and Chaps. 3, 7 and 8) showed how the threshold derived under EUT differs from the threshold derived using regret or dual processing theory. Thus, as highlighted above, what is “rational” behavior under one rationality theory may be irrational under the other theory. We have repeatedly pointed out that context in decision-making is of paramount importance to rationality and that no one model of rationality can possibly satisfy all rationality principles under all contexts. All threshold models satisfy Principles 1 and 2 (the benefit of action should outweigh its harms under given risk or uncertainty; see Table 1.4); regret and dual-processing models aim to satisfy not only Principles 1 and 2 but also Principle 3 (by tapping both in type 1 and type 2 processes). Because type 1 processes are typically invoked in context-rich circumstances, as are invariably all bedside decisions, these threshold models may be particularly applicable in such settings. On the other hand, the EUT threshold model seems more applicable to context-poor circumstances (Chap. 8). Heuristic-driven FFT(T) threshold models often satisfy Principle 4 (decisions should take into account the computational constraints of human brains). Still, no model (threshold or otherwise) can normatively fulfill Principle 5

1.5 Threshold Model of Decision-Making: The Linchpin Between EBM …

21

Table 1.5 Principles of pragmatic rationality for medical practicea Principle

Description

Distinguishing between reducible versus irreducible uncertainty

Including, whether the diagnosis is certain or not. Separate probabilities from outcomes Are there elements of criteria for making a diagnosis also used for the definition of outcomes? This, in turn, can help with the model selection (see Chaps. 2 and 4, and Chaps. 3 and 8)

A rational decision-maker respects his/her evidence

Practice EBM: strength of recommendations should reflect the quality of underlying evidence using consistent judgments (i.e., the higher quality of evidence, the stronger recommendations and vice versa) Use all EBM tools at your disposal: • Summary of evidence on benefits and harms of diagnostic and treatment interventions • Decision aids • Predictive models

Knowing when to stop

Maximizing versus satisficing EUT versus regret versus dual-processing model vs FFT(T) threshold models

Handling false-positive and false-negative errors

In explicit and transparent ways Knowing how to specify what values are placed on these errors Learning under which circumstances we can tolerate inevitable errors Acceptable regret model

Practical wisdom

Tailoring thinking style to the problem at hand “A good doctor knows how to treat/order a diagnostic test, a better one knows when to treat/order a test, but the best one knows when not to do it…” Awareness of the potential for unavoidable injustice, because the consequences of our actions may affect different individuals in different ways

a

Note in this book, we focus on the threshold decision model as alluded to in this section on EBM; we relate the given EBM tools in the context of a given model but refer a reader to many excellent texts on the principles and practice of EBM

(to provide a unique solution to the conflicting goals espoused by the consideration of utilitarian, deontic, and right-based ethics). Importantly, however, Principle 5 cannot be met without involving in a decision of all parties who are affected by the given decision, which may be possible using Pinker’s “formula”: consult self and social interests via impartiality—and the interchangeability of perspectives”—to get at the core of moral reasoning. Thus, eliciting the patient’s values and preferences in shared decision-making at the bedside or consulting all relevant stakeholders for public health decisions provides a foundation for ethical and moral reasoning exercising Rawlsian deliberative and considered judgment. In this book, we show

22

1 Evidence and Decision-Making

that focusing on the methods for calculating thresholds can further aid these complex decisions. Along the way, we provide some guidance on the choice of the appropriate model (see Chap. 8). The general principles are outlined in Table 1.5. Finally, even if we earlier stressed that the most optimal decisions would be those that achieve coherence at both the normative and intuitive levels, research also suggests that people with better decision-making competence (defined as the ability to follow normative principles when making decisions) do better in life. Further empirical research is needed to identify the situations which can be best matched to the appropriate rational decision-making strategies discussed in this chapter and this book. This remains a $64 million research issue.

References 1. Ariely D (2008) Predictibly irrational. HarperCollins Publishers, New York 2. Basinga P, Moreira J, Bisoffi Z, Bisig B, Van den Ende J (2007) Why are clinicians reluctant to treat smear-negative tuberculosis? An inquiry about treatment thresholds in Rwanda. Med Decis Making 27(1):53–60 3. Berwick DM, Hackbarth AD (2012) Eliminating waste in U.S. Health Care. JAMA 307(14):1513–1516 4. Berwick DM (2017) Avoiding overuse—the next quality frontier. The Lancet 390(10090):102– 104 5. Casalino LP, Gans D, Weber R (2016) U.S. physician practices spend more than $15.4 billion annually to report quality measures. Health Affairs 35(3):401–406 6. Centers for Medicare & Medicaid Services (2021) Quality measurement and quality improvement. CMS.gov, Washington, D.C. 7. Centers for Medicare & Medicaid Services. National Health Expenditure (NHE) Fact Sheet. CMS.gov. https://www.cms.gov/Research-Statistics-Data-and-Systems/Statistics-Trends-andReports/NationalHealthExpendData/NHE-Fact-Sheet. Published 2019. Accessed 19 July 2021 8. Croskerry P (2010) To err is human–and let’s not forget it. CMAJ 182(5):524 9. Croskerry P, Nimmo GR (2011) Better clinical decision making and reducing diagnostic error. J R Coll Physicians Edinb 41(2):155–162 10. DeFinetti B (1974) The theory of probability. Wiley, New York 11. Djulbegovic B (2011) Uncertainty and equipoise: at interplay between epistemology, decision making and ethics. Am J Med Sci 342(4):282–289 12. Djulbegovic B (2014) A framework to bridge the gaps between evidence-based medicine, health outcomes, and improvement and implementation science. J Oncol Pract 10(3):200–202 13. Djulbegovic B (2021) Ethics of uncertainty. Patient Educ Couns 104(11):2628–2634 14. Djulbegovic B, Beckstead J, Nash DB (2014) Human judgment and health care policy. Popul Health Manag 17(3):139–140 15. Djulbegovic B, Beckstead JW, Elqayam S et al (2014) Evaluation of physicians’ cognitive styles. Med Decis Making 34(5):627–637 16. Djulbegovic B, Elqayam S, Reljic T et al (2014) How do physicians decide to treat: an empirical evaluation of the threshold model. BMC Med Inform Decis Mak 14(1):47 17. Djulbegovic B, Elqayam S (2017) Many faces of rationality: implications of the great rationality debate for clinical decision-making. J Eval Clin Pract 23(5):915–922 18. Djulbegovic B, Guyatt GH, Ashcroft RE (2009) Epistemologic inquiries in evidence-based medicine. Cancer Control 16(2):158–168 19. Djulbegovic B, Hamm RM, Mayrhofer T, Hozo I, Van den Ende J (2015) Rationality, practice variation and person-centred health policy: a threshold hypothesis. J Eval Clin Pract 21(6):1121–1124

References

23

20. Djulbegovic B, Hozo I (2007) When should potentially false research findings be considered acceptable? PLoS Med 4(2):e26 21. Djulbegovic B, Hozo I, Greenland S (2011) Uncertainty in clinical medicine. In: Gifford F (ed) Philosophy of medicine (handbook of the philosophy of science). Elsevier, London, pp 299–356 22. Djulbegovic B, Hozo I, Ioannidis JP (2015) Modern health care as a game theory problem. Eur J Clin Invest 45(1):1–12 23. Djulbegovic B, Hozo I, Dale W (2018) Transforming clinical practice guidelines and clinical pathways into fast-and-frugal decision trees to improve clinical care strategies. J Eval Clin Practice 24:1247–1254 24. Djulbegovic B, Hozo I, Li S-A, Razavi M, Cuker A, Guyatt G (2021) Certainty of evidence and intervention’s benefits and harms are key determinants of guidelines’ recommendations. J Clin Epidemiol 136:1–9 25. Djulbegovic B, Ioannidis JPA (2019) Precision medicine for individual patients should use population group averages and larger, not smaller, groups. Eur J Clin Invest 49(1):e13031 26. Djulbegovic B, Loughran TP Jr, Hornung CA et al (1999) The quality of medical evidence in hematology-oncology. Am J Med 106(2):198–205 27. Djulbegovic B, Paul A (2011) From efficacy to effectiveness in the face of uncertainty indication creep and prevention creep. JAMA 305(19):2005–2006 28. Djulbegovic B, van den Ende J, Hamm RM et al (2015) When is rational to order a diagnostic test, or prescribe treatment: the threshold model as an explanation of practice variation. Eur J Clin Invest 45(5):485–493 29. Djulbegovic M, Djulbegovic B (2011) Implications of the principle of question propagation for comparative-effectiveness and “data mining” research. JAMA 305(3):298–299 30. Donebedian A (1978) The quality of medical care. Science 200:856–864 31. Hammond KR (1996) Human judgment and social policy. Irreducible uncertainty, inevitable error, unavoidable injustice. Oxford University Press, Oxford 32. He L, Zhao WJ, Bhatia S (2020) An ontology of decision models. Psychol Rev 129(1):49–72 33. Hozo I, Djulbegovic B (2008) When is diagnostic testing inappropriate or irrational? Acceptable regret approach. Med Decis Making 28(4):540–553 34. Hozo I, Djulbegovic B (2009) Clarification and corrections of acceptable regret model. Med Decis Making 29:323–324 35. Hozo I, Schell MJ, Djulbegovic B (2008) Decision-making when data and inferences are not conclusive: risk-benefit and acceptable regret approach. Semin Hematol 45(3):150–159 36. Institute of Medicine (2001) Crossing the quality chasm. A New health system for the 21st century. National Academy of Sciences, Washington 37. Institute of Medicine (2001) Crossing the quality chasm: a new health system for the 21st century. The National Academies Press, Washington, D.C. 38. Kahan DM, Wittlin M, Peters E, Slovic P, Ouellette LL, Braman D, Mandel G (2011) The tragedy of the risk-perception commons: culture conflict, rationality conflict, and climate change. Temple University Legal Studies Research Paper No. 2011-26, Cultural Cognition Project Working Paper No. 89, Yale Law & Economics Research Paper No. 435, Yale Law School, Public Law Working Paper No. 230, Available at SSRN: https://ssrn.com/abstract=187 1503 or https://doi.org/10.2139/ssrn.1871503 39. Kahneman D, Sibony O, Sunstein CR (2021) Noise. Little, Brown, Spark, New York 40. Kahneman D (2012) Thinking fast and slow, U.K. edn. Penguin, London 41. Kohn LT, Corrigan JM, Donaldson MS (eds) (2000) To Err is human. Building a safer health system. The National Academies Press, Washington, D.C. 42. Kruglanski AW, Gigerenzer G (2011) Intuitive and deliberate judgments are based on common principles. Psychol Rev 118:97–109 43. Makary MA, Daniel M (2016) Medical error—the third leading cause of death in the U.S. BMJ 353

24

1 Evidence and Decision-Making

44. Manchikanti L, Falco FJ, Boswell MV, Hirsch JA (2010) Facts, fallacies, and politics of comparative effectiveness research: part 2—implications for interventional pain management. Pain Physician 13(1):E55-79 45. Manchikanti L, Falco FJ, Boswell MV, Hirsch JA (2010) Facts, fallacies, and politics of comparative effectiveness research: part I—basic considerations. Pain Physician 13(1):E23-54 46. McGlynn E, Asch SM, Adams J (2003) The quality of health care delivered to adults in the United States. N Engl J Med 348(2635–1645) 47. McGlynn EA, Schneider EC, Kerr EA (2014) Reimagining quality measurement. N Engl J Med 371(23):2150–2153 48. McGlynn EA (2020) Improving the quality of U.S. Health Care—what will it take? New Engl J Med 383(9):801–803 49. Parker AM, Bruine de Bruin W, Fischhoff B, Weller J (2018) Robustness of decision-making competence: evidence from two measures and an 11-year longitudinal study. J Behav Decis Making 31(3):380–391 50. Pinker S (2021) Rationality. What it is. Why it seems scarce. Why it matters. Random House, New York 51. Rawls J (1999) A theory of justice, Revised. Harvard University Press, Cambridge 52. Slovic P, Finucane ML, Peters E, MacGregor DG (2004) Risk as analysis and risk as feelings: some thoughts about affect, reason, risk, and rationality. Risk Anal 24(2):311–322 53. Stanovich KE (1999) Who is rational? Studies of individual differences in reasoning. Lawrence Elrbaum Associates, Mahway 54. Stanovich KE (2018) How to think rationally about world problems. J Intell 6(2):25 55. Tikkanen R, Abrams MK (2020) U.S. Health Care from a global perspective, 2019: higher spending, worse outcomes? https://doi.org/10.26099/7avy-fc29. Accessed 19 July 2021 56. Tsalatsanis A, Hozo I, Djulbegovic B (2014) Empirical evaluation of regret and acceptable regret model. In: 36th annual meeting of society of medical decision making, 19–22 Oct 2014. Miami 57. Tukey J (1960) Conclusions vs decisions. Technometrics 2(4):423–433 58. Vincent S, Djulbegovic B (2005) Oncology treatment recommendations can be supported only by 1–2% of published high-quality evidence. Cancer Treat Rev 314:319–322 59. Wennberg JE (2010) Tracking medicine. A researcher’s quest to understand health care. Oxford University Press, New York 60. WHO 2020 Report. Global spending on health: weathering the storm. https://www.who.int/ publications/i/item/9789240017788

2

Making Decisions When No Further Diagnostic Testing is Available

2.1

Introduction

In this chapter, we illustrate how evidence about treatments’ benefits and harms can be integrated to enable rational decision-making even under considerable clinical uncertainty. As discussed in Chaps. 1 and 4, we can often reduce clinical uncertainty by collecting more information, typically through the time-honored practice of history and physical exams and ordering further diagnostic tests. However, what should a clinician do after gathering all pertinent information when no further testing is available? This chapter will discuss how to answer this question when a decision-maker (i.e., a patient or a physician) has to make a clinical decision without resorting to further diagnostic testing. We illustrate the approach by considering the administration of anticoagulants in a patient presenting with venous thromboembolism (VTE), or whether to recommend an allogeneic stem cell transplant in a patient with acute myeloid leukemia (AML).

2.2

Expected Utility Theory Threshold Model

Chapter 1 provides a brief synopsis of decision theories (see Glossary). In this chapter, we will employ expected utility theory (EUT) to develop the threshold model for optimal decision-making when gathering further information (diagnostic testing) is not possible for a decision-maker. EUT is the only theory of choice that satisfies all mathematical axioms of rational decision-making and is widely used in economics and medical decision analyses. In the clinical setting, which is typically fraught with uncertainties about diagnosis and clinical outcomes, EUT allows for a decision-maker to make his/her decision based on the estimated probability of an event (e.g., disease, recurrence) and the consequences of our actions in terms of their effects on health outcomes. In decision-theory parlance, health outcomes are typically referred to as utilities or disutilities (see Glossary). Utilities can be © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 B. Djulbegovic and I. Hozo, Threshold Decision-making in Clinical Medicine, Cancer Treatment and Research 189, https://doi.org/10.1007/978-3-031-37993-2_2

25

26

2 Making Decisions When No Further Diagnostic Testing is Available

expressed in different units such as (but not limited to): health outcomes describing morbidity or mortality, life expectancy, absence of pain, cost, or disability-adjusted life years. To apply EUT, we require an assessment of relevant probabilities and utilities. Expected utility is calculated by summing all utilities arising from a beneficial and harmful decision, weighted by their corresponding probabilities. According to EUT, rational decision-making consists of selecting the alternative with higher expected utility. For example, treatments that result in higher qualityadjusted life years or smaller mortality would be preferred over those resulting in lower quality-adjusted life years or higher mortality. We will now illustrate how clinicians can use the EUT threshold model to arrive at the most optimal decision. We restrict consideration to single-point decisions when (a) both diagnosis and health outcomes are uncertain, and (b) diagnosis is certain, but health outcomes (utilities) are not. As explained in Chap. 1, the results of a decision model are significantly affected by the quality of evidence of all ingredients that go into a model. We assume that decision-makers will strive to use the highest quality of evidence to populate their model. When only a low quality of evidence is available, the decision-maker will perform necessary sensitivity analyses to assess the robustness of decisions to change in the model assumptions.

2.3

Threshold Model When the Diagnosis Is Not Certain, and Outcomes Are Not Certain

This is the case when the probability of disease and utilities are independent of each other (see Fig. 2.1). The square in Fig. 2.1 denotes the decision point: in this case, we have two alternatives to treat or not to treat. Once we decide to choose one over another treatment alternative, we cannot control the process anymore and the event will occur as a function of the chance (depicted as small black circles in Fig. 2.1) where each decision branch is associated with a utility (health outcomes) of the decision we made. Within this model, each expected utility (either to treat or not to treat) is calculated as: Expected Utilitytreatment = p D · U1 + (1 − p D) · U2

(2.1)

Expected Utilityno treatment = p D · U3 + (1 − p D)·U 4

(2.2)

where U 1 represents outcomes (utilities) of a diseased patient who was treated; U 2 refers to health outcomes of a non-diseased patient who was treated; U 3 represents health outcomes of a diseased patient who was not treated, and U 4 denotes outcomes of a non-diseased patient who was not treated (see Fig. 2.1a); and pD refers to the probability of disease. A key to applying the threshold model is in the definition of treatments’ net benefits and net harms. The net benefit of a treatment represents the difference in the utility of the outcomes if the diseased patient were either treated (U 1 ) or not treated (U 3 ); net harms is defined as the difference in the utility of the outcomes

2.3 Threshold Model When the Diagnosis Is Not Certain, and Outcomes Are …

27

Fig. 2.1 A decision tree modeling a decision to treat or not a patient with a probability of disease (pD); 1 − pD = probability of non-disease. Utilities are enclosed in rectangles where U 1 represents health outcome (e.g., survival, mortality, morbidity) of a diseased patient who was treated; U 2 represents an outcome of a non-diseased patient who was treated; U 3 refers to an outcome for a diseased patient who was not treated, and U 4 denotes an outcome of a non-diseased patient who was not treated. Panel a) shows probabilities and utilities that are independent of each other; that is, it refers to the setting when both disease diagnosis and health outcomes are not certain, while panel b) refers to a clinical situation when the diagnosis is certain (pD = 1), but health outcomes are not. The tree is solved by multiplying each utility by a given probability and summing it over to the corresponding tree branch. The strategy with a higher EU is preferred. To calculate the threshold probability at which benefits and harms of each strategy is equal, we solve the tree for either: a) pD or b) the corresponding morbidity or mortality used to define the given utilities. Note that the “No Rx ” branch may refer to the alternative treatment, in which case a decision dilemma is whether to choose treatment 1 over treatment 2 (see Boxes 2.1 and 2.3)

if a patient without the disease were either not treated (U 4 ) or treated (U 2 ). Note that net benefits and harms are each positive. Using the notation shown in Fig. 2.1, this can be expressed as: Net Benefits = U1 −U3

(2.3)

Net Harms = U4 −U2

(2.4)

Solving the decision tree depicted in Fig. 2.1a (see Appendix for details) for pD, we obtain the threshold (pt ) as the probability of disease at which EU of giving treatment is equivalent to the EU if treatment was not administered. That is, at this threshold of the probability of disease, the benefits of treatments are equal to its harms. This is expressed as: pt =

Net Harm 1 =( ) Net (Net Benefit + Net Harm) 1 + NetBenefit Harm

(2.5)

This means that if the probability of disease, pD, is above the threshold probability of disease (pt ), a decision-maker should choose to treat the patient; otherwise, when pD is below pt , the decision-maker should decide not to treat the patient. Note that in this chapter, we are only concerned with the calculation

28

2 Making Decisions When No Further Diagnostic Testing is Available

of threshold probability (pt ), but we do not discuss how physicians can assess the probability of disease, pD. In practice, clinicians use various approaches that rely on intuitive, gestalt-based experiential techniques to formal prognostic and predictive models to help individualize the forecast about probability of disease or health outcomes. In this sense, the threshold model is a quintessential technique for individualizing decision-making. We discuss the evaluation of predictive models separately in Chaps. 5 and 6. Therefore, we will refer here only to the predictive models in the context of the specific examples (see Boxes 2.1–2.3). However, as outlined in Chap. 1, these generic definitions of net benefits and net harms need to be translated into specific evidence-based medicine (EBM) clinical metrics to be useful at the bedside. Thus, we can express (dis)utilities in terms of the mortality or morbidity of a disease either without treatment (M), or mortality or morbidity associated with treatment (M rx ). It is common practice to assign the utility of the best possible health state a value of 1, while allocating the utility of the worst imaginable health state, typically death, a value of 0. Thus, for example, disutility due to morbidity will be equal to (1 − M). As shown below, we often add both effects of treatment and the harms of treatment to calculate the disutility of each branch in a decision tree. A common EBM metric to express treatment effects is relative risk reduction (RRR = 1 − relative risk; see Glossary), which is typically constant over the range of predictable absolute risks. In addition to the efficacy (or, effectiveness) of treatment, disutilities need to incorporate absolute harms of treatment (H rx ), as well as the relative value that a decision-maker places on avoiding disease burden versus tolerating harms of treatment (RVH ). Note that if RVH < 1, decision-makers prefer avoiding the impact of disease on their health more than harms of the treatment; on the other hand, if RVH > 1, a decision-maker would prefer avoiding treatment-related harms over experencing disease burden. If RVH = 1, a decision-maker values equally avoiding disease impact as much as avoiding treatment harms. Using these metrics, we can now express the (dis)utilities in Fig. 2.1 in the following way: U1 = 1 − M · (1 − RRR) − RV H · Hr x

(2.6)

U2 = 1 − RV H · Hr x

(2.7)

U3 = 1 − M

(2.8)

U4 = 1

(2.9)

Net benefits and net harms can be obtained by substituting (dis)utilities into each equation and simplifying them to Net Benefits = U 1 − U 3 = M · RRR − RV H · Hr x ; and Net Harms = U 4 − U 2 = RV H · Hr x . . Solving for the threshold probability of disease (pt ), the following is obtained: pt =

RV H · Hr x (M · RRR − RV H · Hr x + RV H · Hr x )

2.3 Threshold Model When the Diagnosis Is Not Certain, and Outcomes Are …

29

which can be reduced to pt =

RV H · H r x M · RRR

(2.10)

Thus, we now have an actionable threshold equation defined by readily available clinical measures. However, as discussed in Chap. 1 (and illustrated below; see Fig. 2.2), the derivation of the action threshold (pt ) at which we should prescribe treatment is a function of the metrics we choose to present from our data. For example, using two other popular EBM metrics, the number needed to treat for one patient to benefit (NNTB), and the number needed to harm (NNH; see Glossary), Eq. 2.11 can be redefined as: pt = RV H ·

NNTB NNH

(2.11)

It is important to note that while Eq. 2.5 bounds threshold probability within the probabilities ranging from 0 to 1, this is not the case when EBM metrics are used to calculate threshold probabilities (Eqs. 2.11 and 2.12). Because the probabilities cannot exceed 1 or be smaller than 0, it is important to restrict pt to 1 and 0 when the calculation indicates that the pt > 1 or pt < 0, respectively. Similar equations for the threshold can be obtained if, instead of comparing treatment versus no treatment, we evaluate treatment 1 versus treatment 2, as illustrated in Boxes 2.1 and 2.3 (see also Appendix). Box 2.1. Clinical example 1 (treatment 1 versus treatment 2 when diagnosis and outcomes are not certain): should allogeneic stem cell transplant (alloSCT) be given over standard chemotherapy (chemoRx) to a patient with acute myeloid leukemia (AML) with an intermediate-risk of disease?

Fig. 2.2 Threshold probability of disease (pt; dotted line) is shown as a) function of the generic definition of net benefits and net harms (Eq. 2.5) and b) the number needed to treat for one patient to benefit or be harmed (NNTB or NNH, respectively) (Eq. 2.12). Note how pt displays a different behavior when different EBM metrics were used for its calculation instead of generic net benefits and net harms

30

2 Making Decisions When No Further Diagnostic Testing is Available

A 45-year-old man was diagnosed with acute myeloid leukemia. His cytogenetics indicates translocation t(9;11) consistent with an intermediate risk of disease. The patient has no other comorbidities. After induction chemotherapy, his disease has achieved a complete response by standard morphological criteria. However, a morphological evaluation even when supplemented with immunological and molecular techniques cannot completely rule out the presence of disease. Therefore, you are not sure if the patient still has AML; if you are certain that the patient does not have AML that will never recur, then you would, obviously, refrain from administering unnecessary and potentially deadly treatment. Because the consequences of not treating the patient with AML are usually catastrophic, physicians have historically provided some postremission treatment. In contemporary practice, the most common decision dilemma is whether to recommend alloSCT versus consolidation chemotherapy with high-dose ara-C (cytarabine), hoping to achieve a long-term remission and possibly a cure. The patient has a perfect HLA match with his brother (10 of 10). We use data from one of the more reliable studies on assessing clinical outcomes comparing alloSCT with other consolidation treatments [1]. The dilemma here is that we need to decide if we should recommend more beneficial but more harmful treatment (alloSCT) over a less beneficial but less harmful treatment (chemoRx) From a modeling perspective, this means that we are uncertain if disease exists, and about mortality outcomes associated with the presence or absence of AML. The question for a decision-maker is: what is the threshold probability of AML recurrence at which we are indifferent between administering alloSCT versus standard chemotherapy regimen? The threshold formula for the comparison of treatment 1 versus treatment 2 is equal to (see also Appendix): pt =

RV H · (Hr x1 − Hr x2 ) Mr x2 − Mr x1

Note that the equation assumes tradeoffs between harms and benefits of one treatment versus another. If one treatment is less harmful and more efficacious, i.e., it is associated with less morbidity or mortality, then obviously such a treatment should be given. However, if one treatment is more harmful but more efficacious (i.e., have less morbidity or mortality), then the equation above shows that such treatment should be given as long as the probability of AML recurrence is above the threshold but less than 1. If the threshold is greater than 1, then the less harmful (and less beneficial) treatment should be given Evidence Treatment 1, alloSCT: mortality of the disease with Rx (M rx1 ) = 41% and treatment-related mortality [harms due to Rx (H rx1 )] = 19%

2.3 Threshold Model When the Diagnosis Is Not Certain, and Outcomes Are …

Treatment 2, ChemoRx: mortality of the disease with Rx (M rx2 ) = 53% and treatment-related mortality [harms due to Rx (H rx2 )] = 3% Therefore, and assuming that RV = 1: pt =

RV H · (Hr x1 − Hr x2 ) 1 · (19 − 3)) = = 1.33 Mr x2 − Mr x1 53 − 41

Note, that we get the same results if we express the equation using both NNTB and NNH metrics: NNTB = 1/ARD (1 over absolute risk difference) = 100/(53 − 41) = 0.083 NNH = 1/AHD (1 over absolute harms difference) = 100/(19 − 3) = 0.0625 pt =

NNTB 8.3 = = 1.33 > 1 NNH 6.25

That is, threshold probability is pt > 1, and, therefore, alloSCT should not be offered (or, should be offered only to those patients whose disease is virtually certain to recur). Note, however, the calculation applies to a typical, “average” patient with an intermediate risk of AML. This, however, is a heterogenous category and some patients may have a higher risk of recurrence than others. The decision can be further individualized by using formal predictive models, to better assess if our patient falls in the category of patients with high risk for disease recurrence. Recently, Hu and colleagues [5] developed a risk prediction model to risk stratify the probability of intermediate risk AML recurrence after complete remission is achieved. They developed a nomogram consisting of three predictors (white blood cell count at diagnosis, the presence of mutated DNMT3A, and involvement of signaling pathways genes) to further separate intermediate risk AML in those with high versus low risk of recurrence. They showed that alloSCT is a superior treatment in the high but not in the low-risk subgroup, which would be consistent with our threshold model. Unfortunately, the model’s performance was relatively modest with a c-index = 0.703 for survival although similar in accuracy to other published predictive models described in the literature. Nevertheless, this model and similar assessments can be used as a starting point to be contrasted against threshold probability. If the values of pD versus pt are clearly separated, then such a decision may be easier to implement than if these assessments are close to each other. In the latter case, further sensitivity analyses can be undertaken. One way to do this is to note that the equation above points to the existence of a simple linear relation between differences in treatments harms and benefits (i.e., mortality or morbidity while on treatment): we can

31

32

2 Making Decisions When No Further Diagnostic Testing is Available

see that if harms is half of the difference in mortality, the threshold probability above which we should administer alloSCT drops to 50% only; if it is a quarter of the difference in mortality, then it further reduces to 25%, etc. Similarly, a proportional increase in reducing the difference in mortality for fixed harms can equally cut the threshold probability at which we should administer alloSCT. These decisions are not easy, and it takes the time to optimally derive them. However, when it comes to life and death decisions like these, we owe to our patients a careful calibration of the range of assumptions to be confident that our recommendations are as accurate as possible. In this example, we assume RVH = 1, i.e., a patient equally values avoiding recurrence of AML to dying from treatment. In Chaps. 3 and 7, we will explore effect of elicitation of the decision-maker’s values and preferences using regret and dual-processing theory, respectively.

2.4

Threshold Models When the Diagnosis Is Certain, but Health Outcomes (Utilities) Are Uncertain

Often in clinical medicine, the diagnosis of the disease can be understood by detecting what one might also refer to as “outcomes”. This is true for the entire field of thrombosis in which the diagnosis of venous thromboembolism (VTE) is based on the ascertainment of outcomes [deep venous thrombosis, or pulmonary embolism (PE)] consistent with VTE. For instance, the widely popular Wells’ predictive model for the diagnosis of PE relies on clinical signs and symptoms to establish the pretest probability of PE, which, in turn, is defined by the test result of imaging studies (i.e., the diagnosis of PE on the imaging test could be conceptualized as the “outcome” as the same components, or predictors are used both for the definition of disease and health outcomes). Similarly, in the cardiovascular field, myocardial infarction is typically defined by composite outcomes, in which diagnostic predictors are included in the definition of outcomes. In cancer medicine, clinicians use imaging, molecular, and myriad other tests to detect the relapse of cancer, metastasis, etc., which, in turn, are often an integral part of the definition of the disease (and disease-free outcomes). In these instances, such as when mortality or morbidity is used, a problem arises in calculating the threshold where these outcomes (mortality or morbidity) affect both the probabilities of disease and utility. From the perspective of deriving a threshold model, this results in double counting; that is, the model does not adhere to the classical threshold requirement for the independence of probabilities and outcomes. Because in these cases, we conceptualize diagnosis via outcomes (and, from a modeling perspective, to avoid double counting), we simply stipulate that diagnosis is certain while outcomes are not, and solve the tree shown in Fig. 2.1b for threshold for morbidity or mortality (i.e., outcome or event) of interest. Thus, we set the probability for disease equal to 1 (i.e., the disease is certain). This results in a separate threshold model, more appropriate for this clinical situation

2.4 Threshold Models When the Diagnosis Is Certain, but Health Outcomes …

33

(see Fig. 2.1b). When pD = 1, Eq. 2.10 can be rewritten to calculate the threshold mortality or morbidity (i.e., health outcomes) where treating and not treating a patient are equal: Mt =

RV H · H r x RRR

(2.12)

Similarly, the generic threshold model (see Eq. 2.5), a decision-maker would choose treatment only when the probability of morbidity or mortality (without treatment in this case) is above M t (Eq. 2.12).1 Box 2.2. Clinical example 2 (treatment vs. no treatment when the disease is certain but outcomes are not): should a direct oral anticoagulant (e.g., rivaroxaban) be used to prevent VTE recurrence in a patient with lung cancer?

A 55-year-old male patient with metastatic lung cancer asked you if he should take medication (anticoagulants) to prevent VTE as he was told that the patients with cancer can die from PE. In this instance, you must make a decision without further testing to address the following question: should direct oral anticoagulants (DOACs, e.g., rivaroxaban, apixaban) be used to prevent VTE recurrence given the risk for major bleeding events associated with these drugs? We extracted data on net benefits and harms from a systematic review of patients with cancer who were given DOACs (compared to a placebo), focusing on PE recurrence [6]. Cochrane systematic reviews/meta-analyses are widely considered to be of high quality as they typically synthesize data from multiple randomized clinical trials (three RCTs in this example) using a rigorous systematic methodology. In this case, the certainty of evidence as measured by the GRADE system (see Chap. 1) was also estimated to be at low risk of bias; the authors appraised evidence as moderate quality (certainty) related to DOAC effects for prevention of VTE [6]. The risk ratio of pulmonary embolisms in patients with cancer that were given DOACs compared to those given a placebo was estimated to be 0.48. Thus, the relative risk reduction, RRR = 1 − 0.48 = 0.52 or 52%. The Cochrane review estimated that the increased risk of major bleeding for DOAC was approximately 23 in 1000 patients or 2.3% (H rx ). In this example, we assume that the decision-maker will equally value avoiding treatment harms and avoiding disease burden (i.e., RVH = 1).

1

See also Appendix, Section B2. In the case when diagnosis (clinical event- See glossary) is not certain (i.e., 0 < p < 1) and outcomes are expressed in binary (yes/no) terms and no further dx test is available, it can be shown that equivalent formula shown in the equation #12 can be obtained if we are interested in determining the threshold probability of event (p): pt =

RV H · H r x RRR

34

2 Making Decisions When No Further Diagnostic Testing is Available

To summarize, the following assumptions are made (evidence): . RRR—the relative risk reduction, based on the recurrence of VTE in patients given direct oral anticoagulants as compared to the placebo group was 52%, making RRR = 0.52 . H rx —the harm associated with direct oral anticoagulants resulting in an increased risk of bleeding is 2.3% . RVH —the decision-maker equally values avoiding treatment harms and burden of disease, making RVH = 1 To answer the question your patient asked you, you need the threshold equation based on Eq. 2.11: pt =

RV H · H r x M · RRR

As the probability of disease is certain (i.e., equal to 1), we reformulated the equation above into a threshold model to calculate a threshold for an outcome of interest (Eq. 2.12). In this case, we obtain: Mt =

RV H · H r x 1 · 0.023 = = 0.044 or 4.4% RRR 0.52

Therefore, if the risk of PE for the patient is above 4.4%, you should prescribe DOACs. As in the example 1, predictive models can be consulted to estimate probability of VTE (PE) in a patient with cancer to individualize decision-making. A number of predictive models have been published in the literature, but they have been rarely validated in prospective cohorts. Nevertheless, some of the models are used in practice. For example, one such predictive model is the Khorana score (described by [7] that includes five variables: (1) site of cancer, (2) platelet count, (3) hemoglobin, (4) leukocyte count, and (5) body mass index. It should be noted that while this predictive model is widely popular, there are currently no models that predict malignancy-associated VTE that are highly accurate. Another model is published by [9] which used only (tumor-site category) and one biomarker (D-dimer) to estimate the risk of VTE recurrence; the model was externally validated but had modest discriminatory properties with c-statistics = 0.68. However, it should also be noted that the model outperformed the strategy “do not treat anyone” versus “treat all”, and therefore, it is reasonable to use it to assess the risk of VTE recurrence and compare it against M t (see also Chap. 6). Ultimately, if no satisfactory model exists, physicians have to rely on their judgments and experience to contrast the probability of the event of interest with thresholds.

2.4 Threshold Models When the Diagnosis Is Certain, but Health Outcomes …

Box 2.3. Clinical example 3 (treatment 1 vs. treatment 2) (when the disease is certain but outcomes are not): should direct oral anticoagulant (DOAC) be used over low-molecular-weight heparin (LMWH) to treat and prevent VTE recurrence?

A 65-year-old female patient with breast cancer developed VTE manifested as right femoral deep venous thrombosis. You are trying to decide between LMWH) or a (DOAC; e.g., rivaroxaban): which anticoagulant should be used to treat and prevent VTE recurrence given their efficacy and risk of bleeding? We extracted data on net benefits and harms from a systematic review and meta-analysis of two randomized clinical trials of patients with cancer that were given either DOACs or LMWH [8]. When comparing two treatments, it is important to recognize that the logic and approach are similar to the previous example of whether to give treatment or not (see Box 2.2). That is, the decision is determined by comparing the net benefits and net harms of DOAC versus LMWH. Here is a summary of the benefits and harms of the treatments under consideration (evidence): . Efficacy of treatment—M rx1 (VTE on DOAC: 42/725 = 5.8%) versus M rx2 (VTE on LMWH: 64/727 = 8.8%) resulting in an absolute decrease in VTE by 3% in favor of DOAC over LMWH . H rx1 —the harm associated with DOACs (40/725 = 5.5%); H rx2 —harms due to LMWH (23/727 = 3.1) resulting in an absolute increase in the risk of major bleeding with DOAC over LMWH by 2.4% . RVH —the decision-maker equally values avoiding treatment harms and disease (VTE recurrence), making RVH = 1 When choosing one treatment over another, by assuming that pD = 1 we can modify the equation shown in Box 2.1, to arrive at the following relationship: We use treatment 1 (Rx1 ) if M rx2 > M rx1 + RVH · (H rx1 − H rx2 ) because M rx refers to “bad” (unfavorable) health outcomes under Rx; so, if M rx2 > M rx1 , this means that more “bad” events would occur under M rx2 and hence, Rx1 should be used, and vice versa. In case of DOAC versus LMWH, we have: VTELMWH > VTEDOAC + RVH · (BleedDOAC − BleedLMWH ) 8.8% > 5.8% + 1 · (5.5 − 3.1) 8.8% > 8.2% Hence, a DOAC is preferred as 8.8% of “bad” outcomes will, on average, occur when LMWH are used versus 8.2% when DOACs (rivaroxaban) are prescribed. Note, however, in this example we have not calculated M t (i.e., outcome threshold when treatment is not given). To do this, we would need

35

36

2 Making Decisions When No Further Diagnostic Testing is Available

outcome data on treatment versus no treatment (i.e., DOAC versus no treatment and LMWH versus no treatment). This can be indirectly derived (see Appendix). In that case, the following threshold formula applies: Mt =

RVH · (H r x1 − Hr x2 ) (RRR1 − RRR2)

The equation tells us that we can use treatment 1 over treatment 2, if the probability of outcome of interest (M) exceeds the threshold, M t Although EUT models are mathematically most appealing, they are often violated in practice because human beings bring affect and emotions, their values and preferences, as well as many other contextual factors such as life circumstances including costs, logistic of administration and compliance with treatment, etc. into the consideration of a decision at hand. In the next chapter, we will show the application of a threshold model from the perspective of regret theory.

References 1. Cornelissen JJ, Van Putten WL, Verdonck LF, Theobald M, Jacky E, Daenen SM et al (2007) Results of a HOVON/SAKK donor versus no-donor analysis of myeloablative HLA-identical sibling stem cell transplantation in first remission acute myeloid leukemia in young and middle-aged adults: benefits for whom? Blood 109(9):3658–3666 2. Djulbegovic B, Hozo I, Lyman GH. Linking evidence-based medicine therapeutic summary measures to clinical decision analysis. MedGen: Medicine. 2000;2(1):E6. 3. Djulbegovic B, van den Ende J, Hamm RM, Mayrhofer T, Hozo I, Pauker SG (2015) When is rational to order a diagnostic test, or prescribe treatment: the threshold model as an explanation of practice variation. Eur J Clin Invest 45(5):485–493 4. Djulbegovic B, Hozo I, Mayrhofer T, van den Ende J, Guyatt G (2019) The threshold model revisited. J Eval Clin Pract 25(2):186–195 5. Hu X, Wang B, Chen Q, Huang A, Fu W, Liu L et al (2021) A clinical prediction model identifies a subgroup with inferior survival within intermediate risk acute myeloid leukemia. J Cancer 12(16):4912–4923 6. Kahale LA, Matar CF, Tsolakian I, Hakoum MB, Barba M, Yosuico VE, et al. Oral anticoagulation in people with cancer who have no therapeutic or prophylactic indication for anticoagulation. Cochrane Database of Systematic Reviews. 2021(10). 7. Khorana AA, Kuderer NM, Culakova E, Lyman GH, Francis CW (2008) Development and validation of a predictive model for chemotherapy-associated thrombosis. Blood, The Journal of the American Society of Hematology. 111(10):4902–4907 8. Li A, Garcia DA, Lyman GH, Carrier M (2019) Direct oral anticoagulant (DOAC) versus lowmolecular-weight heparin (LMWH) for treatment of cancer associated thrombosis (CAT): a systematic review and meta-analysis. Thromb Res 173:158–163 9. Pabinger I, van Es N, Heinze G, Posch F, Riedl J, Reitter E-M et al (2018) A clinical prediction model for cancer-associated venous thromboembolism: a development and validation study in two independent prospective cohorts. The Lancet Haematology. 5(7):e289–e298

References

37

10. Pauker SG, Kassirer JP (1975) Therapeutic decision making: a cost-benefit analysis. N Engl J Med 293(5):229–234 11. Wells PS, Anderson DR, Rodger M et al (2001) Excluding pulmonary embolism at the bedside without diagnostic imaging: management of patients with suspected pulmonary embolism presenting to the emergency department by using a simple clinical model and d-dimer. Ann Intern Med 135(2):98–107

3

Making Decisions When no Further Diagnostic Testing is Available (Expected Regret Theory Threshold Model)

3.1

Introduction

In Chap. 2, we illustrated the application of the expected utility theory (EUT) to rational decision-making when no further diagnostic testing is available. In this chapter, we apply regret theory to the decision problems discussed in Chap. 2. As discussed in Chap. 1, there are many theories of decision-making, most of which differ in the extent that decision-makers rely on affect versus analytical apparatus to arrive at the optimal decision. Because regret—defined as a “cognitive emotion”, which we are motivated to regulate to achieve our desired goals—taps into both emotional and deliberative aspects of decision-making, makes regret theory particularly suitable for application to bedside clinical decisions. Regret can be about the past (“retrospective regret”) and the future (“anticipated or prospective regret”). Mechanisms of regret regulation aim to prevent future regret and manage current regret. People are regret-averse. Regret can occur because of a decision to act (“regret of commission”) or a decision not to act (“regret of omission”). According to regret theory, optimal medical decisions are associated with regret-averse decision processes and outcomes—we are motivated to make decisions with minimal possible regret. Given that many decisions in clinical medicine cannot be reversed, the regulation of regret related to medical decisions mainly focuses on minimizing future regret. This is the particularly case in high-stakes situations such as the decision to undergo toxic but potentially curative treatment (e.g., stem cell transplant for life-threatening acute myeloid leukemia, surgery for cancer of the esophagus, etc.), or when in their terminal phase of life, patients face decisions whether to forgo potentially life-prolonging treatment versus accepting peaceful death with hospice. Under these circumstances, people mostly rely on anticipatory regret to guide their choices. Anticipation of regret leads to more vigilant decisions, which, in turn, is expected to decrease post-decisional regret. When facing high-stake decisions, people typically consider whether to “play safe” or © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 B. Djulbegovic and I. Hozo, Threshold Decision-making in Clinical Medicine, Cancer Treatment and Research 189, https://doi.org/10.1007/978-3-031-37993-2_3

39

40

3 Making Decisions When no Further Diagnostic Testing is Available …

be regret-averse according to the heuristic “Better safe than sorry versus better risky than regretful”. To keep regret at a minimum, decisions based on anticipated regret may promote both risk-avoiding (“safe”, typically default options) and risk-seeking tendencies (“risky”) choices. The threshold model can be reformulated using regret theory to help us rationally decide when choosing one therapeutic option over another will result in minimal regret. We should note at the outset that the results of calculation using the regret theory are often1 mathematically identical to derivations based on the EUT. This is because the model derivations typically employ assumptions that reflect an inversion of the results of EUT. Remember, according to EUT, rational decisionmaking is associated with selecting an alternative with a higher expected utility. In contrast, according to the expected regret theory (ERT), rational decision-making is related to selecting the alternative with the lowest expected regret. However, the psychological processes of these two theories are remarkably different: EUT employs high-level analytical cognitive apparatus, while ERT relies on both affect and analytical aspects of human cognition. This can explain the differences in the results even though the final mathematical derivations are often identical. Let’s illustrate critical similarities and differences in the assumptions. Remember, the definition of treatments’ net benefits and net harms is crucial to applying the threshold model. As defined in Chap. 2, the net benefit of treatment represents the difference in the utility of the outcomes if the diseased patients were either treated (U 1 ) or not treated (U 3 ); net harms is defined as the difference in the utility of the outcomes if a patient without the disease were either not treated (U 4 ) or treated (U 2 ; see Fig. 2.1). Mathematically we define regret (Rg) as the loss in utility between the action taken and the best possible action we could have taken, in retrospect. That is, regret is best thought of in terms of counterfactual, “what-if” scenarios, i.e., what could have happened had we decided differently. For example, let’s consider what would be the best possible action in treating (Rx) someone with disease (D+) [Rg(Rx, D+)], or treating someone without the disease [Rg(Rx, D–)]. Using the definition stated above, the regret of treating someone with the disease can be expressed as: Rg(Rx, D+) = Utility of best action in retrospect− Utility of action taken = max[U (Rx, D+), U (NoRx, D+)]−U (Rx, D+) = 0 (3.1) Because treating someone with a disease with effective therapy is considered obviously more rational than not treating them, the best possible decision would result in zero regrets.

1

This is not always the case; for example, if we assume that the simultaneous occurrence of the effect of disease and the harms of treatment are clinically not negligible (as it was assumed in the derivation of the threshold equation in Chap. 2), then EUT and ERG will generate mathematically different results (see Appendix for technical details).

3.1 Introduction

41

In the case of treating someone without disease, we obtain: Rg(Rx, D−) = Utility of the best action in retrospect− Utility of action taken = max[U (NoRx, D−), U (Rx, D−)]−U (Rx, D−) = Net Harms (H )of treatment (3.2) Because we unnecessarily treated the patient without the disease (instead of avoiding treatment), our regret, in this case, would amount to the harm of treatment. In the same way, we can define regret associated with no treatment: Rg(NoRx, D+) = Utility of best action in retrospect− Utility of action taken = max[U (Rx, D+), U (NoRx, D+)]−U (NoRx, D+) = Net Benefits (B)of treatment (3.3) Rg(NoRx, D−) = Utility of best action in retrospect −Utility of action taken = max[U (Rx, D−), U (NoRx, D−)]−U (NoRx, D−) = 0 As in Chap. 2, we need to derive the decision thresholds in the case when diagnosis and outcomes are not certain (A) versus when the diagnosis is certain, but health outcomes (utilities) are uncertain (B). We illustrate how regret can be used holistically and how we can integrate it with objective evidence on morbidity or mortality. Finally, we stress the importance of using regret in the elicitation of “patients” preferences about the therapeutic options they may face. (A) Decision threshold when the diagnosis and outcomes are not certain, and no further diagnostic tests are available (1) holistic approach to using regret As repeatedly argued, most treatments are associated with multiple dimensions, some good (benefits) and some bad (harms). Often deciding what to do necessitates making tradeoffs. We can employ the threshold model to make these tradeoffs rationally, as argued throughout this book. The threshold model requires integration of each key component of benefits and harms for a given health intervention and consideration of the patient’s values and preferences (V&P) of the inherent tradeoffs related to the treatment effects. The latter is considered normative and ethical conditio sine qua non for the practice of medicine, often captured in saying “no decision about me, without me”—elicitation of patients’ V&P is essential to the practice of medicine. The problem, however, is that there are many methods for elicitation of patients’ V&P. To date, no universally accepted method for measuring V&P has been developed. Popular research methods such as standard gamble, time-tradeoff, discrete-choice experiments are time-consuming and difficult to understand. In contrast, those easily comprehensible and efficient techniques

42

3 Making Decisions When no Further Diagnostic Testing is Available …

Fig. 3.1 Dual visual analog scale for elicitation of regret

such as visual analog scales (VAS) do not capture tradeoffs that characterize many, if not most clinical decisions. In addition, it is exceedingly difficult, if not impossible, to accurately determine the tradeoffs across multiple outcomes that can be permuted in many ways. A solution to this problem is to exploit the feature of regret that taps into both its analytic and affect-based cognitive dimensions to capture a patient’s (decisionmaker’s) global or “holistic” perception of treatment effects. By asking questions about tradeoffs in this way, we directly address both cognitive mechanisms, intuitive and deliberative, of the decision process. This, in turn, can lead to a more accurate assessment of a patient’s (or clinician’s) preferences. By eliciting regret of omission (e.g., failure to provide benefits by not administering necessary treatment) versus regret of commission (e.g., giving unnecessary/ harmful treatment), we can calculate the threshold using the expression: 1   pt =  regret of omission 1 + regret of commission

(3.4)

where pt threshold represents the probability of disease at which the benefits of treatments are equal to its harms (as holistically elicited via regret assessment). Thus, as discussed in Chap. 2, if the probability of disease, pD, is above the threshold probability of disease (pt ), the benefit of treatment outweighs its harms, and we should choose such a treatment. That is, we will regret less if we treat the patient when pD > pt . If pD is below pt , the opposite is the case, and we should choose (and would regret less) not to treat the patient. Figure 3.1 shows the Dual Visual Analog Scale (DVAS)2 technique for the elicitation of regret holistically that can be used as the measures of regret of omission and commission in Eq. 3.4.

2

We wish to thank Dr. Alessandro Cucchetti for his help with the DVAS illustration.

3.1 Introduction

Box 3.1. (Fig. 3.1) Dual Analog Visual Scale (DVAS) for elicitation of regret. The elicitation consists of two questions:

(a) elicitation of regret of omission* “On a scale of 0–100, where 0 indicates no regret and 100 indicates the maximum regret you could feel, how would you rate the level of your regret if you failed to provide the necessary treatment to your patient (i.e. you did not give treatment that, in retrospect, you should have given)?” (That is, what is your intensity of regret failing to benefit your patient) (b) elicitation of regret of commission* “On a scale of 0–100, where 0 indicates no regret and 100 indicates the maximum regret you could feel, how would you rate the level of your regret if you had administered unnecessary treatment to your patient (i.e. administered treatment that, in retrospect, should not have been given)?” (That is, what is your intensity of regret of harming your patient) Alternatively, a more convenient approach consists of determining a ratio of regret of omission/regret of commission by asking, “how many more times would you regret failing to provide beneficial treatment over providing unnecessary/potentially harmful treatment?”** Other alternative expressions that the practitioners or patients may find useful can be stated: “How many more times is worse off of not providing the benefit (of treatment) you should have recommended compared with unnecessary (harmful) treatment?” “How many more times would you regret not getting treatment that could have benefited you over getting an unnecessary (harmful) treatment?” Note that the DVAS is presented on the log scale to avoid ceiling and floor effects. -*A decision-maker can decide to assess all relevant outcomes holistically (typically up to 7 due to limitations of human brains’ processing power) or narrowly focused on specific health outcomes (e.g., survival/mortality, heart attack). **Note that as the regret of actions (commission) and inactions (omissions) is interchangeable (i.e., often co-occur), it may be better to talk about regret “choices” and refer to regretting failing treatment 1 over treatment 2. Also, note that the threshold equation using the format shown in Eq. 3.4 is equivalent regardless of whether the choice is between active treatment versus no treatment or treatment 1 versus treatment 2.

43

44

3 Making Decisions When no Further Diagnostic Testing is Available …

(B) integrating regret with evidence on morbidity or mortality In Chap. 2, we also showed that net benefits and harms could be expressed in terms of mortality or morbidities. In the case of a clinical problem where we face a decision whether to treat or not (see Fig. 2.1), we can contrast the consequences of decisions using EUT and ERT in terms of the net benefits and net harms as: Expected Utility Theory Utilities

Expected Regret Utilities

U (Rx, D+) = 1 − Mr x − RV · Hr x

Rg(Rx, D+) = 0

U (Rx, D−) = 1 − RV · Hr x

Rg(Rx, D−) = RV · Hr x

U (NoRx, D+) = 1 − M

Rg(NoRx, D+) = M · R R R − RV · Hr x

U (NoRx, D−) = 1

Rg(NoRx, D−) = 0

As before, Mrx = morbidity or mortality on treatment; M = disease morbidity or mortality without treatment; Hrx = morbidity or mortality due to treatment. Note that Mr x = M · (1 − RRR), where RRR is the relative risk reduction of the given treatment (see Glossary). The latter indicates that if RRR = 0, then M is not affected by treatment, but if treatment is 100% effective, then M reduces to zero. RV = RV H refers to the patient’s preferences (see below). In the case of the decision between Treatment 1 versus Treatment 2, the net benefit and net harms can be defined as [by convention, we refer to the treatment with better (lower mortality or morbidity) treatment effects and higher harms as “Treatment 1”], and the treatment with the lower (higher mortality or morbidity) treatment effects and the lower harms as “Treatment 2”, i.e., Hr x1 ≥ Hr x2 and 1 − Mr x1 ≥ 1 − Mr x2 ): Expected Utility Theory Utilities

Expected Regret Utilities

U (Rx1 , D+) = 1 − Mr x1 − RV H · Hr x1

Rg(Rx1 , D+) = 0

U (Rx1 , D−) = 1 − RV H · Hr x1

Rg(Rx1 , D−) = RV H · (Hr x1 − Hr x2 )

U (Rx2 , D+) = 1 − Mr x2 − RV H · Hr x2

Rg(Rx2 , D+) = (Mr x2 − Mr x1 ) − RV H · (Hr x1 − Hr x2 )

U (Rx2 , D−) = 1 − RV H · Hr x2

Rg(Rx2 , D−) = 0

Solving the tree shown in Fig. 2.1a (see also Appendix for mathematical details), we obtain the following threshold equation when facing a decision dilemma of treating versus no treating: RV H · Hr x M · RRR

(3.5)

RV H · (Hr x1 − Hr x2 ) Mr x2 − Mr x1

(3.6)

pt = and pt =

3.1 Introduction

45

when facing a decision dilemma of choosing treatment 1 versus treatment 2. When selecting treatment over no treatment (or treatment 1 vs. treatment 2), Eqs. 3.2 and 3.3 indicate that we should treat if p D > pt ; otherwise, not. Note that these are the same equations as those derived in Chap. 2, i.e., both EUT and ERT generate identical expressions. The main difference lies in the way we elicit values for the parameter RV H . As explained, RV H refers to a decisionmaker’s (typically a patient’s) preferences; it is expressed as relative values of harm of treatment with respect to the consequences of disease outcome M (when outcomes are equally valued, RV H = 1). For RV H < 1, we value avoiding disease outcomes more than the consequences (outcomes) of treatment harms. Conversely, if RV H > 1, we value avoiding the consequences (outcomes) of treatment harms more than disease outcomes. Importantly, to employ Eqs. 3.2 and 3.3, we need to use the same units for the outcome of interest, typically mortality or some measure of morbidity. It is possible to combine mortality and morbidity in the same units, but this requires elicitation of patient values related to their tradeoffs between morbidity and mortality for each outcome (see Chap. 7). That can possibly be done via different methods of elicitation of patients’ preferences (e.g., standard gamble, time-tradeoff, visual analog scale), but, as discussed earlier, these methods are challenging to implement (and understand) in typical clinical practice (see also Chap. 7). An alternative is to elicit RV via regret by defining RV = regret of commission of unnecessary giving harmful treatment/regret of omission of failing to provide effective treatment. Note the regret-based threshold can be simplified to: pt (E RG) = RV · pt (EU T ) Thus, we can preserve the evidentiary and analytical apparatus of EUT while adding important affect-based components of regret to decision-making. Box 3.2. An example of application of regret threshold model when diagnosis and outcomes are not certain: should allogeneic stem cell transplant (alloSCT) be given over standard chemotherapy (chemoRx) as a consolidation treatment to a patient with acute myeloid leukemia (AML) whose disease achieved complete remission after induction Rx?

In Chap. 2 (Box 3.1), we presented a decision dilemma of selecting alloSCT versus chemoRx for a 45-year-old patient with AML whose disease achieved complete remission after induction therapy. We have (somewhat unrealistically) assumed RV = 1 and concluded that alloSCT should not be offered to this patient. However, many patients place more value on avoiding AML (which will cause certain death once it recurs) than on the harms of alloSCT. That is, RV of regret of commission of harms due to alloCT/regret of omission of failing to provide potentially curable treatment (alloSCT) < 1. In our experience, the patients typically regret less than twice the chance of missing

46

3 Making Decisions When no Further Diagnostic Testing is Available …

out on the cure for their AML over dying from alloSCT, which amounts to RV = 0.5. Replacing the values in the threshold equation (Eq. 2.3): pt =

RV H · (Hr x1 − Hr x2 ) 0.5 · (19 − 3)) = = 0.53 Mr x2 − Mr x1 55 − 40

This means that alloSCT should be offered for any estimated risk of AML relapse > 53%*-dramatically different results than those when the patient’s V&P are ignored or assumed that RV = 1. *see Chap. 4 for integration of a diagnostic test for predicting probability of relapse of AML.

(B) Decision threshold when the diagnosis is certain, but health outcomes (utilities) are uncertain Often, as described in Chap. 2, in clinical medicine, the diagnosis of the disease can be understood by the detection of what one might also refer to as “health outcomes”. This typically occurs when a predictor/criterion of diagnosis is included in the definition of health outcomes. In cancer medicine, this is common. Researchers and clinicians use imaging, molecular, and other tests to detect the relapse of cancer or metastasis (i.e., diagnose the disease), which, in turn, is often an integral part of the definition of progression or disease-free outcomes. To derive the threshold equation under these circumstances, we conceptualize diagnosis via outcomes; as explained in Chap. 2, to avoid double counting, we simply stipulate that diagnosis is certain (setting pD = 1) while outcomes are not, and we solve the tree shown in Fig. 2.1b (Chap. 2) for the threshold for morbidity or mortality (M t ) (i.e., outcome or event) of interest. This results in two new threshold equations (see also Appendix for technical details): When facing a decision dilemma of treating versus no treating: Mt =

RV H · Hr x RRR

(3.7)

Equation 2.12 indicates that the benefit of treatment over no treatment outweighs its harms when the probability of morbidity or mortality (without therapy in this case) is above M t . Note that H rx is expressed as the absolute risk (or, absolute risk difference, if it was based on comparison with no treatment or placebo; see also Glossary). When facing a decision dilemma of choosing treatment 1 versus treatment 2, we obtain: Mt =

RV · (Hr x1 − Hr x2 ) RRR1 − RRR2

(3.8)

3.1 Introduction

47

Equation 2.12 in Chap. 2 (Box 3.1) indicates that net benefit of treatment 1 (Rx1) over treatment 2 (Rx2) outweigh its harms when the probability of morbidity or mortality (without treatment in this case) is above M t . However, if we want to directly compare the disease outcome of one treatment versus another, then we can use this formula: Use Rx1 if Mr x2 > Mr x1 + RV · (Hr x1 − Hr x2 )

(3.9)

The equation assumes that M rx refers to bad, undesirable outcomes such as mortality or morbidity; so, when mortality or morbidity with Rx2 > Rx1, we should use Rx1. Otherwise, we should use Rx2 (if outcomes are good and desirable, then the relationship reverses). We determine the values for RV using regret elicitation as described above. Box 3.3. An example of application of the regret threshold model (treatment versus no treatment when the disease is certain but outcomes are not): should a direct oral anticoagulant (e.g., rivaroxaban) be used to prevent VTE (venous thromboembolism) in a patient with lung cancer?

In Chap. 2, we presented a case of a 55-year-old male patient with metastatic lung cancer who asked if he should be prescribed direct oral anticoagulant (DOAC) rivaroxaban to prevent VTE (deep vein thrombosis or pulmonary embolism). We assumed that RV = 1 and found that Mt =

RV H · Hr x 1 · 0.023 = = 0.044 or 4.4% RRR 0.52

That is, if the estimated risk of VTE for the patient is above 4.4%, you should prescribe rivaroxaban. However, we also noted that at this risk, the existing risk assessment models have poor performance characteristics, making recommendations somewhat difficult. The recommendations can be further firmed up if we elicit the patient’s values regarding their regret of avoiding a clot versus incurring major bleeding. Given that most patients value avoiding a clot of about 1.3 times more than avoiding bleeding, we can set RV = 0.75 (~1/1.3). This results in a new threshold: Mt =

RV H · Hr x 0.75 · 0.023 = = 3.3% RRR 0.52

which provides further confidence that DOAC’s benefits outweigh harms even at a lower probability of VTE.

48

3 Making Decisions When no Further Diagnostic Testing is Available …

Box 3.4. Should a 60-year man be treated for Smoldering Multiple Myeloma (SMM)?

SMM is an asymptomatic precursor condition to multiple myeloma (MM), an incurable disease that eventually results in death. Your patient was recently diagnosed with SMM, is completely asymptomatic, but would like to avoid the progression of SMM to MM. The work-up shows 15% of plasma cells in bone marrow, IgG kappa protein serum M-protein of 1.9 g/dl, and kappa-lambda FLC ratio = 15. He asks if he should be given treatment to prevent the evolution of SMM to MM. You are aware of two randomized trials (RCTs) that used somewhat different criteria for the diagnosis of SMM. The first RCT published in 2013 used lenalidomide with dexamethasone while the second RCT, published in 2020, used lenalidomide only. Thorsteinsdottir et al. recently summarized the estimates of the benefits and harms of the treatments reported in these RCTs: RCT#1: hazard ratio (HR) for progression-free survival (PFS) = 0.18 resulting in approximate RRR = 1 − 0.18 = 0.82; serious harms (treatmentrelated deaths, and secondary malignancies) occurred in about 10% on treatment versus 2% in the control arm; further 7% (4/57) discontinued treatments due to adverse related events (converting into absolute harm difference, Hrx = 0.17 − 0.02 = 0.15; see Glossary for the comment on HR versus relative risk). RCT#2: hazard ratio (HR) for progression-free survival (PFS) = 0.28, which converts into approximate mate RRR = 1 − 0.28 = 0.72; serious harms (treatment-related deaths, and secondary malignancies) occurred in about 6% on treatment versus 3% in the controls; 20% (18/90) patients discontinued treatment because of toxicity (resulting in H rx = 0.26 − 0.03 = 0.23). Assuming that the patient equally values avoiding MM as avoiding serious harms of treatments, we obtain the following thresholds: Mt_RCT1 =

RV H · Hr x 1 · 0.15 = = 18% RRR 0.82

Mt_RCT2 =

RV H · Hr x 1 · 0.23 = = 32% RRR 0.72

However, the results are highly sensitive to assumptions about the magnitude of harm. In the examples above, we did not count all harms but only the most serious ones. If we take a more realistic stand and use all grade 3 and 4 harms reported in these trials (16/57 = 28% in the first, and 36/90 = 40% in the second trial), then the threshold above which we should commit 0.4 0.28 = 34% and Mt_RCT2 0.72 = 55%. to treatment becomes Mt_RCT1 = 0.82

3.2 Acceptable Regret

49

According to the Mayo 2–20–20 SMM risk model, the patient’s disease is classified as low risk with a risk of progression to MM exceeding the calculated threshold after about 4–5 years. Therefore, no immediate treatment would be justified. If the patient’s SMM were classified as a high-risk disease (defined as M-protein ≥ 2 g/dl; ≥ 20% of plasma cells in bone marrow, and abnormal FLC ratio ≥ 20), then the risk of progression to MM would be > 20% within first year of diagnosis pointing in favor of treatment. However, sensitivity of the decision to assumption of harms indicate that we should be very cautious in administering treatment for SMM. The analysis demonstrates that under most realistic assumptions, we should not commit to immediate treatment of patients with SMM. Of course, as repeatedly discussed, the final recommendation depends on the elicitation of patients’ V&P.

3.2

Acceptable Regret

In this chapter, we introduced regret—an essential concept to everyday bedside decision-making. It is a key to the elicitation of patients’ V&P, often captured as a reflection on the probability of a clinical event versus consequences of the patient management: “what is most likely diagnosis/event versus what I cannot afford/regret to miss”. As shown, simple formulas can further formalize the use of regret, both at the intuitive, holistic, and analytical levels. Chapters 7 and 8 further illustrate the application of regret theory in different clinical contexts. However, everyday experience and empirical data indicate that sometimes making wrong decisions is tolerable and does not generate negative emotions for decision-makers. This led us to develop the concept of “acceptable regret” that describe the circumstances when the wrong decisions can be tolerated. “Acceptable regret” represents a form of decision-making satisficing and is similar to “robust satisficing”. “Satisficing,” or finding a “good enough” solution (in contrast to optimizing or maximizing, which requires finding a “perfect” solution), was initially introduced by Nobelist H. Simon in the context of the Theory of Bounded Rationality. Both acceptable regret and robust satisficing postulate that humans can rationally accept some losses without feeling high emotional burdens of regret. Acceptable regret is widely documented in clinical medicine where, for example, ordering numerous tests is often acceptable practice even if it deviates from normative standards. We showed how acceptable regret directly relates to underuse and overuse. It can explain much variation in contemporary clinical practice, including at the endof-life care, which involves some of the most consequential decisions patients can make. However, acceptable regret can also be used prescriptively to help improve decision-making.

50

3 Making Decisions When no Further Diagnostic Testing is Available …

Using the acceptable regret we can identify the probability below which the physician could comfortably withhold treatment and the probability above which the physician would not regret giving treatment even if these decisions were, in retrospect, wrong. Based on the acceptable regret (ARg) concept, when considering whether to treat, we can show (see Appendix for technical details) that acceptable regret for giving treatment is equal to Rgh = rh · H (i.e., as a proportion of the net harms of the treatment). Then acceptable threshold probability for administering treatment is Pr x = 1 −

Rgh = 1 − rh . H

If the probability of the disease is greater than or equal to P r x , we can recommend the treatment knowing that our acceptable regret will be no more than a fraction of net harms, H · r h . As we would intuitively expect, P r x depends on the magnitude of the harms associated with the therapy. The greater the harms, the higher the probability of disease we require to initiate treatment without feeling regret. Note that r h represents a fraction of net harms (ranging from zero to one). This means that if r h = 0, we are not prepared to tolerate any treatment-related risks. In turn, this means we require absolute diagnostic certainty to act, clearly a realistically unattainable position. On the other hand, if r h = 1, this means that we are ready to act as soon as the probability of disease is greater than zero (i.e., the moment we imagine that it is not impossible to have a condition of interest that may require treatment). When considering withholding treatment, we can express the acceptable regret as Rgb = rb · B (i.e., as a proportion of the net benefits of the treatment). Therefore, the acceptable threshold probability for withholding treatment is Pwh =

Rgb rb · B = = rb B B

. As intuitively expected and formally shown in the Appendix, it makes sense that the threshold probability below which we should withhold our treatment ( Pwh) depends solely on the magnitude of the net benefit of treatment. Here, too, r b represents a fraction of net benefits (ranging from zero to one). This means that if r b = 0, we are not prepared to tolerate withdrawal of treatment as long its net benefits > 0, another realistically unattainable position. On the other hand, if r b = 1, this means that we are always ready to withdraw treatment. In most clinical situations, people’s r b is low. When low Rgb for withholding treatment is combined with overweighting of small probabilities of disease (Chap. 1), people rarely want to withdraw potentially beneficial therapies. This explains why it is an exception rather than a rule to withhold treatment even if the probability of success is minimal (as, for example, in hospice settings).

References

51

Finally, it can be shown that it is always acceptable (see Appendix) to order a diagnostic test when ARg > false negative (FN) of a test · net benefits (B) > false positives (FP) of a test · net harms (H). Given that historically in clinical research, 20% of FNs and 5% of FPs have been tolerated, the equation reduced to 0.2 · B > 0.05 · H. This condition is satisfied for the vast majority of health interventions in today’s practice, which explains the contemporary practice of overtesting. Similarly, we should never order a dx test if ARg < FP · H < FN · B, i.e., if 0.05 · FP < 0.2 · B. Note that decision-making under acceptable regret violates actions under EUT. This is because, unlike the threshold probability, the maximal level of expected regret depends not only on the net benefit/harms ratio but also on the absolute magnitude of the treatment’s net benefit and net harm. Thus, the maximum expected utility principle may be violated when regret is sufficiently small. The expected utility threshold probability will be the same for any (net) benefit/harms ratio regardless of the absolute magnitude of the net benefits and harms. For example, for B/H = 10, the same results will be obtained for absolute net benefits of 10%, 1%, or 0.01%, as long as the absolute harms is 1%, 0.1%, or 0.001%, respectively. According to the threshold model, we should administer treatment for the probability of a disease pD greater than pt . However, when the absolute chance of benefiting our patient is minimal, the treatment may be considered futile, i.e., expected regret may be acceptably small to allow withholding of treatment, even if the estimated pD is much higher than the threshold probability because expected regret depends not only on benefit/harms ratio but also on the absolute magnitude of treatment benefits. Chapter 8 further places regret models in perspective with other threshold models.

References 1. Djulbegovic B, Elqayam S, Reljic T, Hozo I, Miladinovic B, Tsalatsanis A et al (2014) How do physicians decide to treat: an empirical evaluation of the threshold model. BMC Med Inform Decis Mak 14(1):47 2. Djulbegovic B, Hamm RM, Mayrhofer T, Hozo I, Van den Ende J (2015) Rationality, practice variation and person-centred health policy: a threshold hypothesis. J Eval Clin Pract 21(6):1121–1124 3. Djulbegovic B, van den Ende J, Hamm RM, Mayrhofer T, Hozo I, Pauker SG et al (2015) When is rational to order a diagnostic test, or prescribe treatment: the threshold model as an explanation of practice variation. Eur J Clin Invest 45(5):485–493 4. Djulbegovic B, Elqayam S (2017) Many faces of rationality: Implications of the great rationality debate for clinical decision-making. J Eval Clin Pract 23(5):915–922 5. Djulbegovic B, Elqayam S, Dale W (2018) Rational decision making in medicine: Implications for overuse and underuse. J Eval Clin Pract 24(3):655–665 6. Djulbegovic B, Frohlich A, Bennett CL (2005) Acting on imperfect evidence: how much regret are we ready to accept? J Clin Oncol 23(28):6822–6825 7. Djulbegovic B, Hozo I, Schwartz A, McMasters K (1999) Acceptable regret in medical decision making. Med Hypotheses 53:253–259 8. Djulbegovic B, Lyman G (2006) Screening mammography at 40–49 years: regret or regret? Lancet 368:2035–2037

52

3 Making Decisions When no Further Diagnostic Testing is Available …

9. Djulbegovic B, Hozo I (2007) When should potentially false research findings be considered acceptable? PLoS Med 4(2):e26 10. Djulbegovic B, Tsalatsanis A, Mhaskar R, Hozo I, Miladinovic B, Tuch H (2016) Eliciting regret improves decision making at the end of life. Eur J Cancer 68:27–37 11. Djulbegovic B, Beckstead JW, Elqayam S, Reljic T, Hozo I, Kumar A et al (2014) Evaluation of physicians’ cognitive styles. Med Decis Making 34(5):627–637 12. Djulbegovic M, Beckstead J, Elqayam S, Reljic T, Kumar A, Paidas C et al (2015) Thinking styles and regret in physicians. PLoS ONE 10(8):e0134038 13. Hozo I, Djulbegovic B (2008) When is diagnostic testing inappropriate or irrational? Acceptable regret approach. Med Decis Making 28(4):540–553 14. Hozo I, Djulbegovic B (2009) Will insistence on practicing medicine according to expected utility theory lead to an increase in diagnostic testing? Med Decis Making 29:320–322 15. Hozo I, Djulbegovic B (2009) Clarification and corrections of acceptable regret model. Med Decis Making 29:323–324 16. Hozo I, Schell MJ, Djulbegovic B (2008) Decision-making when data and inferences are not conclusive: risk-benefit and acceptable regret approach. Semin Hematol 45(3):150–159 17. Hozo I, Tsalatsanis A, Djulbegovic B (2018) Expected utility versus expected regret theory versions of decision curve analysis do generate different results when treatment effects are taken into account. J Eval Clin Pract 24(1):65–71 18. Mhaskar RS, Reljic T, Wao H, Kumar A, Miladinovic B, Djulbegovic B (2014) Treatmentrelated harms: what was planned and what was reported? National Cancer Institute’s Cooperative group phase III randomized controlled trials: a systematic review. J Clin Epidemiol 67(3):354–356 19. Schneiderman LJ, Jecker NS, Jonsen AR (1996) Medical futility: response to critiques. Ann Intern Med 125:669–674 20. Thorsteinsdottir S, Kristinsson SY (2022) The consultant’s guide to smoldering multiple myeloma. Hematology Am Soc Hematol Educ Program 2022(1):551–559 21. Tsalatsanis A, Barnes LE, Hozo I, Djulbegovic B (2011) Extensions to regret-based decision curve analysis: an application to hospice referral for terminal patients. BMC Med Inform Decis Mak 11:77 22. Tsalatsanis A, Hozo I, Vickers A, Djulbegovic B (2010) A regret theory approach to decision curve analysis: a novel method for eliciting decision makers’ preferences and decision-making. BMC Med Inform Decis Mak 10(1):51 23. Tsalatsanis A, Hozo I, Djulbegovic B (2017) Acceptable regret model in the end-of-life setting: patients require high level of certainty before forgoing management recommendations. Eur J Cancer 75:159–166 24. Zeelenberg M, Pieters R (2007) A theory of regret regulation 1.0. J Consumer Psychol 17:3–18 25. Zeelenberg M, Pieters R (2007) A theory of regret regulation 1.1. J Consumer Psychol 17:29– 35

4

Decision-Making When Diagnostic Testing is Available

4.1

Introduction

When a decision-maker has the option of diagnostic testing, they face a typical dilemma: (1) do not administer treatment and do not test, (2) test and decide to administer treatment based on the test result, and (3) administer treatment without testing. In this chapter, we will discuss the theory behind threshold modeling when diagnostic testing is available; we will illustrate the approach by presenting a case vignette. As in Chap. 2, we will use expected utility theory (EUT) to calculate thresholds to determine the most appropriate management option.

4.2

Threshold Modeling When Diagnostic Testing is Available

When diagnostic testing is available, a decision-maker must decide between three decisions: (1) withhold treatment without testing, (2) perform the diagnostic test to decide whether to treat or not, or (3) administer treatment without testing (see Figs. 4.1 and 4.2). If a diagnostic test is performed, then the test result (either positive or negative) will be used to guide the decision: if the test result is positive, we should administer treatment; if the test result is negative, we should withhold treatment. The intuition behind this is shown in Fig. 4.1. As discussed throughout this book, assessment of the threshold probability guides our testing and treatment decisions, which we typically contrast against the probability of disease or outcomes. For example, imagine that the probability of disease is very small (say, close to zero), it might make no sense to order a diagnostic test. Similarly, when the estimated probability of disease is very high (say, close to 1), it might make sense to administer treatment without confirmation of the diagnosis by a diagnostic test.

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 B. Djulbegovic and I. Hozo, Threshold Decision-making in Clinical Medicine, Cancer Treatment and Research 189, https://doi.org/10.1007/978-3-031-37993-2_4

53

54

4 Decision-Making When Diagnostic Testing is Available

Fig. 4.1 A horizontal bar showing the probability of disease (pD) on a horizontal axis for values between 0 and 1, where three decisions could be made: observation only without testing or treatment (No Test; NoRx), diagnostic testing that will determine whether to administer treatment or not (Perform testing), and administering the treatment without testing (No Test; Rx). Each decision is clearly separated by one of two thresholds (both indicated by dashed lines): the testing threshold (ptt ), or the treatment threshold (prx ). Note that the difference between ptt and prx (i.e., the width of the “Perform testing” section), is affected by the accuracy or harms of the testing, such that more accurate or lower-risk tests increase the range of probabilities that would result in testing (i.e., result in a wider “Perform testing” section); on the other hand, less accurate or higher risk tests decrease the range of probabilities that would result in testing (i.e., result in a narrower “Perform testing” section). In addition, ordering a diagnostic test depends on the benefits (B) and harms (H) of the treatment under consideration: in general, the higher the B/H ratio, the lower the treatment threshold (see also Chap. 2). Note that a diagnostic test should never be ordered if the harm of treatment is greater than or equal to its benefit. ptt and prx thresholds refer to the prior (pretest) probability of disease. If pD < ptt it is guaranteed that the post-test probability of disease is always below the treatment threshold pt introduced in Chap. 2. This would be the case even if the test is positive. If pD > prx it is guaranteed that the post-test probability of disease is always above the treatment threshold pt (see Chap. 2). This would occur even if the test is negative

The threshold model can help up improve this intuition. To calculate the exact values of the threshold, we need information on diagnostic test values (sensitivity and specificity), benefit and harms of treatment, and information on the risk of a test, in case we consider an invasive test. Ideally, evidence for each of these variables would come from the highest quality evidence (as discussed in Chap. 1 and other chapters). However, evidence to populate the threshold model is often of low quality. In that case, we should use the best available data (including our best “guess”) but should examine the robustness of our decision by varying our assumptions. When empirical evidence is unavailable, the threshold model can still be used by employing clinical experience and intuition. Our goal here is to calculate two critical thresholds (see Fig. 4.1): the threshold for testing (ptt ) and the threshold for administering treatment (prx ). We will then contrast the prior probability of disease (pD) against these thresholds by applying the rule we introduced in Chap. 1: if pD < ptt , we should observe the patient and withhold treatment without testing; if ptt < pD < prx , we should perform testing and administer treatment only if the test result is positive (or, observe and withhold treatment if the test result is negative); and if pD > prx , then we should administer treatment without testing. Figure 4.1 also explains the critical relation between pretest versus post-test probabilities with regard to the decision thresholds (see also Chap. 5, where we discussed FFT decision trees). The crucial point here is to

4.2 Threshold Modeling When Diagnostic Testing is Available

55

Fig. 4.2 A decision tree conceptualizing three decisions (treat, perform a diagnostic test, or withhold test/treatment) and all outcomes resulting from a chance event. The small black square represents a decision, and the small black circular nodes represent events due to chance (i.e., test results or a probability of disease, pD). Utilities (U 1–8 ) for each outcome are shown enclosed in rectangles. Each is represented in terms of disutilities by subtracting the effects of morbidity or mortality (M), treatment effect (RRR), harms of treatment (H rx ), and harms of testing (H te ) from perfect health (customarily set at 1; see Glossary). S is the test sensitivity (the frequency of true positives, TP); Sp is the test specificity (or the frequency of true negatives, TN); FN is the frequency of false negatives (FN = 1 − S); FP is the frequency of false positives (FP = 1 − Sp); M rx = morbidity or mortality on treatment; RRR refers to the relative risk reduction; M rx = M · (1 − RRR)

realize that, ultimately, management decisions depend on treatment benefits and harms only. Let us now see how we can determine the testing (ptt ) and treatment thresholds (prx ) shown in Fig. 4.1 (The Appendix shows derivations of the threshold formulas). Figure 4.2 shows a tree representing the decision dilemma we just described. Using the same approach described in Chap. 2, we define net treatment benefits (B) as the difference in utilities (outcomes) between diseased patients who were treated (U 1 ) versus not treated (U 3 ). As before we define net treatment harms (H) as the difference in the utility of the outcomes between patients without the disease who were not treated (U 4 ) versus those were treated (U 2 ) B = U1 − U3 = M − Mr x − Hr x = M − M · (1 − RRR) − Hr x = M · RRR − Hr x (4.1) H = U4 − U2 = Hr x

(4.2)

where RRR is the relative risk reduction (also called the efficacy); note that completely ineffective treatment has a RRR = 0, which means it does not affect M.

56

4 Decision-Making When Diagnostic Testing is Available

M refers to the morbidity or mortality without treatment. On the other hand, when treatment is 100% effective, it completely obliterates the impact of disease, as M · (1 − 1) = 0); H rx refers to harm of treatment (which can be expressed as the absolute risk of single treatment or the absolute risk differences between harms of two treatments). Note that M − M rx is equal to the absolute risk differences (ARD) in the efficacy of treatments under consideration (see Glossary). Further, note that different patients may weigh the consequences of treatment versus impact of disease differently. Explicitly modeling the patient’s (or a decision-maker’s) preferences may be expressed as the relative value (RVH ) of avoiding harms of treatment, H rx with respect to avoiding disease outcome, M. Then, the net benefits defined in Eq. 4.1 can be written as: B = (M · RRR) − RV H · H r x

(4.3)

If the patient (a decision-maker) values avoiding the consequences of treatment and disease equally, then we set RVH = 1. If the patient values avoiding the impact of disease more than harms of treatment, then RVH < 1; if the patient places more values on avoiding the harms of treatment than the burden of disease, then we assume that RVH > 1. In Chap. 3 we illustrated using regret to elicit the patient’s preferences in a simple and easy-to-understand way. Lastly, diagnostic testing can also be associated with harms (H te ), further decreasing the utilities of a particular management action, as shown in Fig. 4.2. The accuracy of diagnostic testing is critical to calculating the thresholds ptt and prx . Specifically, the proportion of individuals with a positive test result if they had the disease refers to sensitivity or, a true positive test result (TP), while a positive test result in patients who did not have the disease is defined as a false positive finding (FP). Similarly, some patients can have a negative test result if they have the disease, or a false negative (FN) test; finally, some patients can have a negative test result while remaining free of disease, referred to as specificity, or a true negative test result (TN). Note that test sensitivity and specificity can be used to obtain the proportion of false positives (FP = 1 − test specificity) and false negatives (FN = 1 − test sensitivity), respectively (see Glossary). As noted, both the accuracy and risk of the diagnostic test, in consideration of treatment benefits and harms, affect the range of probabilities that result in diagnostic testing. Once we identified all necessary ingredients, using a general definition of benefits and harms, we can derive the testing and treatment thresholds as: ptt (testing threshold) = pr x (treatment threshold) =

(FP · Hr x ) + Hte (FP · Hr x ) + (S · B) (Sp · Hr x ) − Hte (Sp · Hr x ) + (FN · B)

(4.4) (4.5)

Using evidence-based metrics (see Glossary), these thresholds can be rewritten as: ptt (testing threshold) =

(1 − S p ) · Hr x + Hte (1 − S p ) · Hr x + S · (RRR · M − Hr x )

(4.6)

4.2 Threshold Modeling When Diagnostic Testing is Available

57



pr x (treatment threshold) =

(S p · Hr x

 S p · Hr x − Hte + (1 − S) · (RRR · M − Hr x )

(4.7)

where Sp is the test specificity and S is the test sensitivity. In these expressions, the net benefit of treatment is substituted by RRR · M − H rx . A decision-maker should never consider treatment (and associated diagnostic testing) if the net harm of this treatment is equal to or greater than its net benefit because when H rx ≥ RRR · M − H rx ), the testing threshold ( ptt ) becomes undefined. This provides an intuitive corollary that a diagnostic test should never be ordered if the harm of treatment is greater than or equal to its benefit. Interestingly, if a perfect diagnostic test did exist where diseased patients were identified every time with no harm in testing (i.e., both TP and TN = 1; both FP and FN = 0; both S p and S = 1; H te = 0), the testing threshold becomes zero, and the treatment threshold becomes one. In such a hypothetical situation, the patients would always be perfectly classified and receive treatment without testing as we are 100% certain that the patient has the disease. Note that most diagnostic tests are non-invasive and for all practical purposes, are considered harmless. When H te = 0, then the threshold probabilities reduce to: pt =

1 1 + LR± ·

B H

(4.8)

where LR± is the likelihood ratio of patients with either positive or negative test results of the diagnostic test and HB is the treatment’s benefit/harms ratio. Specifically, LR+ = S/FP is commonly referred to as “positive” LR, while LR− = FN/S p is referred as “negative” LR. A LR+ is used to calculate ptt , and a LR− is used to calculate prx . It should be noted that if LR = 1, then pt is equal to the treatment threshold where no further testing is available (see Chap. 2). A nomogram can be used to easily calculate each of the three values, pt , LR, and HB (see Fig. 4.3). Let us now illustrate the application of the threshold model to real clinical examples (see Boxes 4.1 and 4.2). Box 4.1. Clinical example 1: should a test for the minimal residual disease (MRD) be performed before deciding to administer an allogeneic stem cell transplant (alloSCT) for a patient with acute myeloid leukemia (AML) with intermediate risk?

In Chap. 2, we presented the analysis when we had to make a decision about alloSCT without the help of further testing. In this example, we consider using minimal residual disease (MRD) testing to help us decide whether to administer alloSCT. A 65-year-old man has been diagnosed with acute myeloid leukemia (AML). His cytogenetics indicates translocation t(9;11), a NPM1 mutation, and is considered at intermediate risk for disease recurrence. The patient has no other comorbidities. After induction chemotherapy,

58

4 Decision-Making When Diagnostic Testing is Available

Fig. 4.3 A nomogram that can be used to calculate the threshold probability (blue points and line) given a likelihood ratio (LR; gray points and line) and a benefit-to-harm ratio (B/H; black points and line). To calculate a testing threshold (ptt ), a positive LR is used, and to calculate (prx ), a negative LR is used. To use the nomogram, connect the points for a given LR and B/H and continue this line to the threshold probability scale (a dashed red line is shown as an example in this figure). In this example, we assumed B/H = 3 and LR− = 0.5, which converts into the treatment threshold probability of 40%. That is, we should treat if the probability of the disease exceeds 40%. When testing is not available (see Chap. 2 for examples), LR± = 1. Note that we encourage calculating B/H using evidence-based measures as per above (i.e., calculate B = M · E − H rx ) and use H rx as the absolute risk of treatment harm

his disease has achieved a complete response by standard morphological criteria, and he completed consolidation treatment with four courses of highdose ara-C. Multiparameter flow cytometry (MFC) testing at a positivity threshold of 0.01% can be used to detect minimal residual disease (MRD) to predict if the patient’s AML will relapse (or not). This can guide our decision to administer alloSCT treatment (or not). In this case, the patient has a perfect HLA match from his brother (10 of 10). This diagnostic test, however, is not perfect and has some degree of error, and alternatively, we could administer alloSCT treatment without MRD testing. In both cases, we recognize that alloSCT has its own associated risk with the procedure and

4.2 Threshold Modeling When Diagnostic Testing is Available

would not like to expose the patient to unnecessary harms. The dilemma we face is to decide between not administering treatment to the patient, using MRD testing to guide further treatment, or treat the patient without testing at all. From a modeling perspective, this means testing is another decision in addition to either treating or not treating the patient. The results of the test should determine whether the patient will receive alloSCT or not (positive or negative test result, respectively). A decision-maker is now presented with the following question: how high (low) should the probability of AML recurrence be before deciding against no further treatment (i.e., alloSCT), testing to decide whether to administer alloSCT, or administering alloSCT treatment without MRD testing? We rely on data published by [6, 7] for the diagnostic accuracy of MRD testing and a recent review by Loke et al. [11] for the benefits and harms of alloSCT treatment. The detection of MRD MFC testing has been an important predictor of relapse and is appropriate testing for this patient’s cytogenetics (translocation t(9;11)). alloSCT has serious treatment-related harm, including a treatment related mortality rate of about 15%. We estimated a relative reduction of relapse-related mortality of alloSCT by 49%. We assume the same efficacy of treatment for overall survival. The absolute mortality for AML patients who did not receive alloSCT treatment was 69% within 1 year. In a cohort of patients with AML (and a mutation of the NPM1 gene), approximately 70% of patients tested positive after MFC testing (i.e., obtained a MRD+ result and had AML recurrence; true positive, TP). The true negative rate (i.e., the number of patients with a MRD- result, and did not relapse) is approximately 86%. Additionally, the LR− and LR+ were 0.35 and 5.00, respectively. Finally, the risks of MRD testing are low even if the sample was obtained using bone marrow aspirate/biopsy, and even lower for peripheral blood collection. MRD testing is considered to be practically harmless, thus H te = 0, and hence we use Eq. 4.8: pt =

1 LR±

·

B H

+1

where the likelihood ratio (LR) is either positive (LR+ = S/FP) or negative (LR− = FN/Sp) for ptt or prx , respectively. Evidence: B = (M · RRR) − H rx = (0.69 · 0.49) − 0.15 = 19%; H rx = 15% Testing Accuracy: TP = 70%; FN = 30%

59

60

4 Decision-Making When Diagnostic Testing is Available

FP = 14%; TN = 86% Testing Likelihood Ratio: LR− = 0.30/0.86 = 0.35 LR+ = 0.70/0.14 = 5.00 ptt =

1 B LR+ · H +1

=

1 5.00· 0.19 0.15 +1

= 0.136 or 13.6%

pr x =

1 B LR− · H +1

=

1 0.35· 0.19 0.15 +1

= 0.693 or 69.3%

In this instance, the threshold for testing, ptt , is 13.6%, and the threshold for treatment, prx , is 69.3% (see Fig. 4.4). This indicates that only when the probability of disease is below 13.6%, we should not perform MRD testing. On the other hand, we can recommend alloSCT without MRD testing if the estimated probability for disease is higher than 69.3%.

Fig. 4.4 A horizontal bar showing the probability of disease (pD) on a horizontal axis (not to scale) for probabilities of disease between 0.0 and 1.0, illustrating decision-making whether to administer alloSCT with or without MRD testing: observation only and no treatment without testing (No test, NoRx), MRD diagnostic testing that will determine whether to administer treatment or not (Perform testing), and administering treatment without testing (No test, Rx). The figure shows that if the estimated risk of pD (AML recurrence in this case) is below 13.6%, then the patient should be observed without testing and treatment. If the estimated pD is greater than 69.3%, then alloSCT should be administered without further testing. MRD testing should be done if the estimated probability of AML recurrence is between the testing threshold (ptt = 0.136) and the treatment threshold (prx = 0.693)

Let us now assess the robustness of our decisions by varying our assumptions (see Table 4.1). One of our assumptions was that the TP rate for MFC testing was 70%. If we use the TP rate of 60% (representing the use of a positivity threshold of 0.1% and not a 0.01% threshold) and we set the TN rate at 90% (which is conservative as there is evidence that the specificity of MRD testing is nearly 100% when the positivity threshold is dropped to 0.1%), the LR− and LR+ become 0.44 and 6, respectively. The threshold for testing, ptt , would only slightly decrease to 11.6%, and the treatment without testing threshold, prx , slightly decreased to 64.0%. It is important to note

4.2 Threshold Modeling When Diagnostic Testing is Available

61

that the use of the NPM1 gene to predict MRD appears to be more predictive in younger patients (< 60 years) compared to older patients (> 60 years). This may lower the accuracy (i.e., TP may be < 60%) of this test in older patients. The fit adults with excellent matched donors have a lower than 15% chance of alloSCT treatment-related mortality. Thus, if we then assume that the net harms were lowered by half and net benefits of alloSCT treatment were the same (i.e., H = 7.5%, B = 19%), and the accuracy of MRD testing remained the same (i.e., TP = 70% and TN = 85.7%), then the thresholds for testing and for treatment with no testing become smaller (ptt = 7.3%, and prx = 53.0%). Furthermore, if we assume the benefits of alloSCT treatment were only slightly higher than the harms (i.e., H = 15%, B = 17%), then we would obtain slightly higher thresholds of testing and treatment (ptt = 15%, and prx = 71.6%). Thus, the analysis indicates that under a wide range of assumptions, obtaining MRD testing seems to be the best strategy for a larger range of pD compared to no testing at all (with or without treatment). Table 4.1 A summary table showing the original data extracted from the literature, new assumptions (see Box 4.1 text), and ptt and prx thresholds Extracted from the Literature

New Assumption

ptt

prx

B = 19%; H rx = 15%; TP = 70%; TN = 85.7%; FN = 30%; FP = 14.3%



0.136

0.693

B = 19%; H rx = 15%;

TP = 60%; TN = 90%; FN = 40%; FP = 10%

0.116

0.640

B = 19%; TP = 70%; TN = 85.7%; FN = 30%; FP = 14.3%

H rx = 7.5%

0.073

0.530

H rx = 15% TP = 70%; TN = 85.7%; FN = 30%; FP = 14.3%

B = 17%

0.15

0.716

– indicates that no new assumptions were made and indicates that the original assumptions were used to calculate for ptt and prx

However, if we have reason to assume the harm of a treatment outweighs its benefit (i.e., H > B), then we should never consider testing or treatment for our patient.

62

4 Decision-Making When Diagnostic Testing is Available

Box 4.2. Clinical example 2: should a computed tomography pulmonary angiography (CTPA) be used to test a patient with a low clinical risk of having a pulmonary embolism (PE) before deciding to administer anticoagulants?

A 59-year-old male was suspected of having a cardiopulmonary embolism (PE). Clinically we assess that this patient is at low risk for PE. As a decision-maker, you must decide on one of three options: (1) treat the patient with anticoagulants such as direct oral anticoagulants (DOACs; e.g., rivaroxaban, apixaban), (2) use CTPA testing to determine if the patient has PE (treating only if the result is positive), and (3) continue to observe the patient without treatment. We are interested in determining the optimal management strategy that will be associated with the lowest mortality reduction. Note that the threshold model presented here is appropriate only if diagnosis is separated (independent) of the consequences (disutility) of treatment. If the predictors of disease (such as CT finding) are also used as outcome predictors, then the model shown in Chap. 2 should be used for decision-making. Data were extracted from a recent meta-analysis containing the results from randomized controlled trials comparing the effects of DOACs to a placebo in treating VTE. The risk ratio for VTE mortality reduction in patients given anticoagulants versus a placebo was 0.39. Thus, the relative risk reduction is 0.61 (RRR = 1 − 0.39 = 0.61). The mortality due to VTE is estimated to be 5%. The treatment harms associated with DOACs resulting in fatal bleeding events is 0.3%. Finally, the sensitivity and specificity for CTPA for the diagnosis of PE is 0.93 and 0.98, respectively. If we assume you (as a decision-maker) equally value avoiding treatment harms and avoiding the impact of disease, we set RVH = 1. Evidence: RRR = 0.61 Using the equation for net benefits and net harms from Chap. 2, we get: Net Benefits = M · RRR − RV H · Hr x = (0.05 · 0.61) − (1 · 0.003) = 0.0275 Net Harms = RV H · Hr x = 1 · 0.003 = 0.003 Testing Accuracy: TP = 93%; FN = 7% FP = 2%; TN = 98% Testing Likelihood Ratio: LR− = 0.07/0.98 = 0.071 LR+ = 0.93/0.02 = 46.5

4.2 Threshold Modeling When Diagnostic Testing is Available

= 0.0023 or 0.23%

ptt =

1 B LR+ · H +1

=

1 46.5· 0.0275 0.003 +1

pr x =

1 B LR− · H +1

=

1 0.071· 0.0275 0.003 +1

= 0.606 or 60.6%

Thus, a patient should be observed without DOAC treatment only if they have a probability of PE below 0.23% (see Figs. 4.1a and 4.6). A patient should receive testing if they have a probability of PE between 0.23% and 60.6%, and, finally, the patient should receive DOAC treatment without testing if they have a probability of PE > 60.6% (see Figs. 4.5b and 4.6). [Due to rounding errors, some of the calculations may slightly differ from those reported elsewhere in the book].

Fig. 4.5 Nomogram showing the threshold probability for the likelihood ratio and net benefits-to-net harms ratio (B/H ratio) with the intersecting dashed red line for a) ptt and b) prx

63

64

4 Decision-Making When Diagnostic Testing is Available

Fig. 4.6 A horizontal bar showing the probability of disease (pD) on a horizontal axis (not to scale) for probabilities of disease between 0.0 and 1.0, illustrating decision-making whether to administer DOACs with or without CTPA testing: observation only and no treatment without testing (No test, NoRx), CTPA diagnostic testing that will determine whether to administer treatment or not (Perform testing), and administering treatment without testing (No test, Rx). The figure shows that if the estimated risk of pD (PE, pulmonary embolism) is below 0.23%, then the patient should be observed without testing and treatment. If the estimated probability of PE is greater than 60.6%, then DOACs should be administered without further testing. CTPA testing should be done if the estimated probability of VTE recurrence is between the testing threshold (ptt = 0.0023) and the treatment threshold (prx = 0.606)

As noted, the approach outlined in this chapter does not incorporate a decisionmaker’s values and preferences, or emotions such as regret. In Chaps. 3 and 5, we cover decision-making when considering the patients’ preferences and regret.

References 1. Bain B (2005) Bone marrow biopsy morbidity: review of 2003. J Clin Pathol 58(4):406–408 2. Cornelissen JJ, Van Putten WL, Verdonck LF, Theobald M, Jacky E, Daenen SM et al (2007) Results of a HOVON/SAKK donor versus no-donor analysis of myeloablative HLA-identical sibling stem cell transplantation in first remission acute myeloid leukemia in young and middle-aged adults: benefits for whom? Blood 109(9):3658–3666 3. Djulbegovic B, Desoky AH (1996) Equation and nomogram for calculation of testing and treatment thresholds. Med Decis Making 16(2):198–199 4. Djulbegovic B, Hozo I, Lyman GH (2000) Linking evidence-based medicine therapeutic summary measures to clinical decision analysis. MedGen: Med 2(1):E6 5. Hu X, Wang B, Chen Q, Huang A, Fu W, Liu L et al (2021) A clinical prediction model identifies a subgroup with inferior survival within intermediate risk acute myeloid leukemia. J Cancer 12(16):4912–4923 6. Ivey A, Hills RK, Simpson MA, Jovanovic JV, Gilkes A, Grech A et al (2016) Assessment of minimal residual disease in standard-risk AML. N Engl J Med 374(5):422–433 7. Jongen-Lavrencic M, Grob T, Hanekamp D, Kavelaars FG, Al Hinai A, Zeilemaker A et al (2018) Molecular minimal residual disease in acute myeloid leukemia. N Engl J Med 378(13):1189–1199 8. Klemen ND, Feingold PL, Hashimoto B, Wang M, Kleyman S, Brackett A et al (2020) Mortality risk associated with venous thromboembolism: a systematic review and Bayesian meta-analysis. Lancet Haematol 7(8):e583–e593 9. Koreth J, Schlenk R, Kopecky KJ, Honda S, Sierra J, Djulbegovic BJ et al (2009) Allogeneic stem cell transplantation for acute myeloid leukemia in first complete remission: systematic review and meta-analysis of prospective clinical trials. JAMA 301(22):2349–2361

References

65

10. Lim W, Le Gal G, Bates SM, Righini M, Haramati LB, Lang E et al (2018) American Society of Hematology 2018 guidelines for management of venous thromboembolism: diagnosis of venous thromboembolism. Blood Adv 2(22):3226–3256 11. Loke L, Buka R, Craddock C (2021) Allogeneic Stem Cell Transplantation for Acute Myeloid Leukemia: Who, When, and How? Front Immunol (12.659595.). https://doi.org/10.3389/ fimmu.2021.659595 12. Mai V, Guay C-A, Perreault L, Bonnet S, Bertoletti L, Lacasse Y et al (2019) Extended anticoagulation for VTE: a systematic review and meta-analysis. Chest 155(6):1199–1216 13. Marti C, John G, Konstantinides S, Combescure C, Sanchez O, Lankeit M et al (2015) Systemic thrombolytic therapy for acute pulmonary embolism: a systematic review and metaanalysis. Eur Heart J 36(10):605–614 14. Ortel TL, Neumann I, Ageno W, Beyth R, Clark NP, Cuker A et al (2020) American Society of Hematology 2020 guidelines for management of venous thromboembolism: treatment of deep vein thrombosis and pulmonary embolism. Blood Adv 4(19):4693–4738 15. Pauker SG, Kassirer JP (1980) The threshold approach to clinical decision making. N Engl J Med 302(20):1109–1117 16. Penack O, Peczynski C, Mohty M, Yakoub-Agha I, Styczynski J, Montoto S et al (2020) How much has allogeneic stem cell transplant–related mortality improved since the 1980s? A retrospective analysis from the EBMT. Blood Adv 4(24):6283–6290 17. Vasanthamohan L, Boonyawat K, Chai-Adisaksopha C, Crowther M (2018) Reduced-dose direct oral anticoagulants in the extended treatment of venous thromboembolism: a systematic review and meta-analysis. J Thromb Haemost 16(7):1288–1295

5

Formulating Management Strategies Using Fast-and-Frugal Trees (A Decision Tool to Transform Clinical Practice Guidelines and Clinical Pathways into Decision Support at the Point of Care)

5.1

Introduction

Clinical management is rarely based on the collection of one data item. Instead, it is typically characterized by the continuous collection and evaluation of clinical data (symptoms, signs, laboratory, imaging tests, etc.) to establish a platform for further management decisions.1 In other words, clinical decision-making often consists of formulating management strategies. In this chapter, we discuss an efficient method for making medical decisions based on series of collected clinical information (i.e., “cues”), and how by selecting the most relevant cues, we can generate an efficient binary decision-making aid called fast-and-frugal trees (FFTs). FFTs are based on the heuristic theory of decision-making (Chap. 1) to allow a quick evaluation and selection of the most relevant cues to optimize patient care for binary tasks such as whether to treat a patient or not. A decision-maker considers each cue (predictor, test) within the series of the FFT cues resulting in either (1) responding to the next cue or (2) exiting the tree. To illustrate the concepts, we present an evidence-based FFT related to prediction whether to treat patients with advanced lung cancer using Tyrosine Kinase Inhibitors (TKIs) versus employing non-targeted therapy.

1 Throughout the book, we refer to clinical management interchangeably with “treatment” to indicate a commitment to a course of action that may include management consisting of treatment or diagnostic testing.

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 B. Djulbegovic and I. Hozo, Threshold Decision-making in Clinical Medicine, Cancer Treatment and Research 189, https://doi.org/10.1007/978-3-031-37993-2_5

67

68

5.2

5 Formulating Management Strategies Using Fast-and-Frugal Trees …

Fast-and-Frugal Trees

Fast-and-frugal trees are decision-making aids that allow decision-makers to reach decisions quickly and accurately based on limited information. FFTs can be constructed from both continuous and binary data to create a series of sequential predictors or “cues” that are most relevant to categorizing each patient (or, a group of patients) into binary outcomes. Except for the last cue, every cue throughout the tree will have a single exit. The last cue of the FFT has two exits to prevent infinite trees and to guarantee that a decision is ultimately made. An FFT is formulated via a series of if–then statements, e.g., if a woman with breast cancer has a positive family history of breast cancer, then obtain genetic testing for breast cancer. If the condition is met, the decision can be made, and the FFT is exited. If the condition is not met, the FFT considers other cues, one after another, until the exit condition of a cue is met. Interestingly, due to their parsimonious nature, FFTs are less likely to suffer from overfitting and can be more accurate compared to complex multivariate regression or machine learning models (“less is more”; see Chap. 9). FFT draws its theoretical robustness by relating to signal detection theory, evidence accumulation theory, and the threshold model to help improve decisionmaking. Signal detection theory is widely used in science and medicine. It is rooted in the fundamental assumption that the two possible events (signal, e.g., presence of disease, and noise, e.g., absence of disease) have overlapping distributions in such a way that if we set the criterion to maximize true positives, we will increase false positives. Similarly, if we want to maximize specificity (true negatives) that will come at the expense of increased rates of false negatives (see Fig. 5.1). Importantly, it is the exit structure of FFTs that determines the ratio between false negatives and false positives. For example, FFTyy has the highest true positive rate at the expense of the highest false positive rate. In contrast, FFTnn has the highest true negative rate at the expense of the highest false negative rate (see Fig. 5.1). It is important not to confuse the inverse relationship between false positive and false negative classification based on using two distributions, signal, and noise, with complementary values of test accuracy based on a single population. For example, if we only examine the signal distribution, we can remember from Chap. 4 that the probability of false negatives = 1 − the probability of true positives; and that probability of false positives = 1 − the probability of true negatives. Note that it is impossible to decrease false negatives and false positives simultaneously: these classification errors are linked, so a decrease in false negatives will lead to an increase in false positives and vice versa. FFTs provides a useful framework to convert clinical practice guidelines and clinical pathways into decision support that can be integrated into an electronic health record at the point of care. Clinical practice characteristically consists of a series of decisions. In contrast, practice guidelines usually relate to single or multiple recommendations that are typically not formulated into a coherent decision-management strategy. Clinical pathways can help logically organize the

5.2 Fast-and-Frugal Trees

69

Fig. 5.1 Signal detection theory and fast frugal trees (FFTs). Note that the decision criterion (x c ) represents a binary decision and can result in more noise or signal to generate a correct rejection/miss or a hit/false alarm, respectively. That is, moving the decision criterion (x c ) to the left increases the probability of detecting true positives (hit, signal) but at the expense of an increase in false positives (false alarms). Similarly, moving the decision criterion to the right increases the detection of true negatives (correct rejection) but at the expense of increases in false negatives (misses). A desire to increase true positives is referred to in the literature as “liberal”, while aiming to minimize false positives is often considered a “conservative” strategy. The figure also shows how different permutations of FFT relate to better detection of a signal (true positives) or true negatives (correct rejections). In this case, FFT consists of three binary cues, resulting in either noise (n) or signal (s) defined by the first two cues, as the last cue has two exits (FFTss , FFTsn , FFTns , or FFTnn ). The dashed arrows roughly indicate where each FFT relates to noise and signal detection. Of the four FFTs, FFTss has the most liberal decision criterion (leading to an increase in detection of the signal, i.e., sensitivity but at the expense of an increase in false positives). FFTnn is the most conservative (i.e., aimed to decrease false alarms but at the expense of an increase in false negatives), while FFTsn and FFTns are less extreme, where FFTsn is more liberal compared to FFTns . The figure is based on [6] [with permission]. Note that throughout the text, we interchangeably refer to “s” as “y” (yes) and “n” as “n” (no)

sequence of clinical decisions. That is, practice guidelines can be thought of as addressing one recommendation at a time. At the same time, the pathways are often used to represent healthcare plans (referred to as protocols, clinical algorithms, or flowcharts) that provide detailed steps for managing a particular clinical problem or formulating the entire spectrum of care. However, there is no standardized method to develop clinical pathways. Pathways are typically developed ad hoc, in various, non-standardized ways without a theoretical framework to guide their development. FFTs provide a robust theoretical basis for converting practice

70

5 Formulating Management Strategies Using Fast-and-Frugal Trees …

guidelines and pathways into logically coherent decision strategies by relating to signal detection theory, evidence accumulation theory, and the threshold model to help improve decision-making. Figure 5.1 shows FFTs relation to signal detection theory. Its connection to evidence accumulation theory (EAT) assumes that decisions are made by accumulating evidence via a sequential sampling process. The standard FFT deals with (the accuracy of) classifying outcomes but does not consider the consequences of decisions. The latter can be achieved by linking the FFT to the threshold model, which stipulates that the most rational action is to prescribe treatment or order a diagnostic test when the expected treatment benefit outweighs its expected harms at a given probability of disease or clinical outcome. FFTs have been linked to the threshold model both within expected utility (Chap. 2) and regret theory (Chap. 3). We denote the threshold-based FFT as FFTT. Importantly, the FFTT model may produce different results than the standard FFT. This is because the benefits and harms of treatment define the classification accuracy of FFTT. Unlike in standard FFT, where we classify each subject based on the type of exit (Yes or No; see Fig. 5.1), the FFTT classification is determined by comparing the (post-cue) probability of the outcome with the threshold demarcated by benefits and harms (see Box 5.1). For example, as discussed in Chaps. 2 and 3, when benefits considerably outweigh harms, we may rationally act even if we are not certain about the diagnosis. Given the widespread and increasing use of guidelines and pathways in medicine, we expect that FFT and FFTT will increasingly become vital tools in developing decision support at the point of care. Note that FFT can be derived from established pathways or guidelines. In this case, we typically generate one, ideally, evidence-based FFT. However, theoretically, we can permutate cues in any order we want. For example, the FFT shown in Box 5.1 starts with a cue about EGFR mutation, followed by smoking habits and age. Because the sensitivity and specificity of these cues are not independent, changing the order of cues may modify the accuracy of the entire FFT. Indeed, one of the advantages of FFT analysis is it allows us to permutate the cues to find their most accurate combination. In turn, finding the correct cue combination identify the optimal FFT for practice. In general, if there are n cues, we can construct (n!) · 2(n−1) possible FFTs. Thus, if we have three cues, there are (3!) · 2(2) = 6 · 4 = 24 FFTs; if we have four cues, we can construct 192 possible FFTs; for five cues, it is possible to develop 1920 FFTs, etc. One way to determine the optimal combination of the cues for FFT is to first individually rank each cue according to some (accuracy) criterion. For example, cues can be automatically ranked according to the weighted accuracy derived from calculating their maximal sensitivity and specificity. Once the rank of cues is created, the most optimal sequence of all permutations can be generated (see Fig. 5.2b in Box 5.1 below). However, automatic ordering of cues can generate some meaningless clinical combinations; therefore, it is crucial to be mindful of the clinical relevance and logical feasibility of ordering cues. Box 5.1 illustrates the application of FFT and FFTT in the management of advanced stage of lung cancer.

5.2 Fast-and-Frugal Trees

71

Fig. 5.2 Each cue (with its corresponding threshold) within the FFT is ranked based on weighted accuracy (2a); 2b) shows different permutations to determine which FFT optimizes sensitivity and specificity (2b). Note that because FFTs utilize the most optimal cues (i.e., EGFR, Non-smoker status, and age ≤ 63 years), we obtain four permutations of this tree (FFTyy , FFTyn , FFTnn , FFTny ), where the tree with the greatest tradeoff between sensitivity and specificity (FFTyn ) was selected

Box 5.1. Should TKI targeted therapy be given over non-targeted therapy to a patient with advanced non-small-cell lung cancer?

A 62 year-old woman who smokes daily was diagnosed with advanced non-small-cell lung cancer (NSCLC). She has metastatic disease in the lungs, liver, and bones, and next-generation sequencing (NGS) gene analysis revealed she has an EGFR mutation. Should you treat the patient with targeted therapy over historically used non-targeted treatment such as chemotherapy? Different clinical guidelines provide unclear and conflicting management strategies. As a decision-maker, you want to proceed with rational evidence-based management strategies. Additionally, we want to know which, if any, of the collected data might be best used to create a management strategy for other patients with NSCLC. We use data from a published study by Salgia et al. [8, 9]2 that included 798 patients with advanced NSCLC. All patients had stage IV disease and confirmed metastatic disease. Data collected (> 20 covariates) include demographics (age, sex, smoking status), brain metastasis, number of metastatic sites, treatment information (whether radiation therapy was used, if the patient is being treated in the hospital or not), and molecular NGS results, specifically, to detect the following oncogenes: EGFR, ALK, BRAF, MET, ROS, RET, ERBB2, or other molecular mutations. The data also included information on the actual decision by lung cancer experts, who considered the

2

We thank Dr. Ravi Salgia for sharing data with us.

72

5 Formulating Management Strategies Using Fast-and-Frugal Trees …

totality of all data when they recommended treatment. You want to find out if we can simplify a decision-making process by generating FFT from these data, which subsequently can be used to guide treatment in your patient. From a modeling perspective, there exists an optimal number of variables that could be used to create a heuristic tree to predict a management strategy to either treat the patient using targeted or non-targeted therapy. Fast-andfrugal trees (FFTs) incorporate the most relevant cues with binary pathing to manage patients parsimoniously (i.e., frugally) by excluding less relevant variables from the tree. Thus, the question for a decision-maker is: do we administer non-targeted or targeted treatment based on our patient’s demographics, the advanced nature of the disease, and the results of NGS testing? We want to know if a simple FFT can predict a decision made by expert clinicians who had multiple clinical, laboratory, and radiologic information at their disposal. There exist multiple software options to automatically create FFTs. For this example, we used the R statistical software package “FFTrees”. As in the case of building any predictive model, making an FFT requires randomly splitting the data to first train or fit a tree (70% of the data) and then test the model against the trained tree (the remaining 30% of the data, validation sample).3 Each variable (i.e., individual cues) in the data is ranked based

3

Bootstraping and cross-validation are also widely used in development of predictive models.

5.2 Fast-and-Frugal Trees

on weighted accuracy (Fig. 5.2a), and all permutations of the FFT are tested before arriving at the FFT that maximizes sensitivity and specificity (see Figs. 5.2b and 5.3). Using the data we just referred to, we obtain the following, most optimal FFT:

Fig. 5.3 Fast-and-frugal tree (FFT) to determine whether to treat patients with advanced NSCLC with either non-targeted (i.e., chemotherapy and/or immunotherapy) versus targeted (i.e., Tyrosine Kinase Inhibitors; TKIs) therapy. The most optimal structure of this FFT (FFTyn ) was selected based on the analysis shown in Fig. 5.2b. Note that P(D+ |T −) indicates the probability of selecting a non-targeted therapy (“Non-Target”), and P(D+ |T +) indicates the probability of selecting a TKI. The total number of true positives (TP), false positives (FP), true negatives (TN), and false negatives (FN) is shown under each decision (i.e., either “yes” or “no”) within the tree. [Note that this FFT was derived from individualpatient data, but it is also possible to construct FFT based on the aggregate data as long as the prevalence of the condition we want to predict and aggregate data on the cues’ sensitivity and specificities are known]

73

74

5 Formulating Management Strategies Using Fast-and-Frugal Trees …

Using this tree, we can determine with the overall 84% accuracy (positive predictive value, PPV = 85.2%; negative predictive value, NPV = 82.8%) of selecting targeted over non-targeted therapy. This simple, easy-to-understand, transparent 3 cue-FFT is frugal (i.e., used only 42% of all available information), and fast, with the decision, reached in less than two steps. It also allows us to tailor our decision toward the individual characteristics of our patient. In this case you found that NGS results revealed an EGFR mutation, that she was a smoker, and 62-year-old, indicating that you should treat the patient with a TKI (such as erlotinib). Threshold Derived FFT (FFTT) We will use mortality as an outcome to determine the benefit/harms ratio of TKI versus non-targeted therapy. NSCLC has a mortality rate of approximately 70% after 2 years. Data from the systematic review and meta-analysis by [1] indicate that the TKI, erlotinib decreases the overall mortality to about 60%. Based on the package insert for erlotinib, the incidence of serious pulmonary toxicity (interstitial lung disease-like events) who received erlotinib was 0.7%. Thus, we obtain: Benefit/Harms ratio: (0.7 − 0.6 − 0.007)/0.007 = 13.29 We can estimate an action threshold, or a threshold at which to decide to treat or not (above or below the threshold, respectively) using the equation presented in Chap. 2: Action Threshold(AT) = 1/(1 + Benefits/Harms Ratio) = 1/(1 + 13.29) = 7.00% We contrast the AT with post-cue probability (PPV) at each exit of the FFT; in this case, the post-cue probability at each exit is always greater than AT = 7%. This means that if we use the FFTT, we will treat all patients, while if we employed the standard FFT, we would treat only 52% of patients.

References 1. Chen X, Liu Y, Røe OD, Qian Y, Guo R, Zhu L et al (2013) Gefitinib or erlotinib as maintenance therapy in patients with advanced stage non-small cell lung cancer: a systematic review. PLoS ONE 8(3):e59314 2. Djulbegovic B, Hozo I, Dale W (2018) Transforming clinical practice guidelines and clinical pathways into fast-and-frugal decision trees to improve clinical care strategies. J Eval Clin Pract 24(5):1247–1254

References

75

3. Djulbegovic B, Hozo I, Lizarraga D, Thomas J, Barbee M, Shah N et al (2023) Evaluation of a fast-and-frugal clinical decision algorithm (“pathways”) on clinical outcomes in hospitalized patients with COVID19 treated with anticoagulants. 29(1):3–12. https://doi.org/10.1111/ jep.13780. Epub 2022 Oct 13 4. Djulbegovic B, Hozo I, Lizarraga D, Guyatt G (2023) Decomposing clinical practice guidelines panels’ deliberation into decision theoretical constructs. J Evaluat Clin Pract. Accepted Jan 2023 5. Hozo I, Djulbegovic B, Luan S, Tsalatsanis A, Gigerenzer G (2017) Towards theory integration: threshold model as a link between signal detection theory, fast-and-frugal trees and evidence accumulation theory. J Eval Clin Pract 23(1):49–65 6. Luan S, Schooler LJ, Gigerenzer G (2011) A signal-detection analysis of fast-and-frugal trees. Psychol Rev 118(2):316 7. Phillips ND, Neth H, Woike JK, Gaissmaier W (2017) FFTrees: A toolbox to create, visualize, and evaluate fast-and-frugal decision trees. Judgm Decis Mak 12(4):344–368 8. Salgia R, Mambetsariev I, Tan T, Schwer A, Pearlstein DP, Chehabi H et al (2020) Complex oncological decision-making utilizing fast-and-frugal trees in a community setting—role of academic and hybrid modeling. J Clin Med 9(6):1884 9. Salgia R, Mambetsariev I, Pharaon R, Fricke J, Baroz AR, Hozo I et al (2020) Evaluation of omics-based strategies for the management of advanced lung cancer. JCO Oncol Pract 17(2):e257–e265

6

Using Decision Curve Analysis to Evaluate Testing and/or Predictive Modeling

6.1

Introduction

In this chapter, we extend the threshold model to evaluate the value of diagnostic tests or predictive models over a range of all possible thresholds by using decision curve analysis (DCA). DCA has been developed within the expected utility theory (EUT) and expected regret theory (ERT) framework. DCA assesses the value of acting on predictive models or diagnostic tests in comparison with alternative management strategies such as “treat all” versus “treat none” patients with the condition of interest. Recall Chap. 2 that the threshold refers to the probability (risk) of disease or health outcomes at which we are indifferent about the benefits and harms of competing treatment strategies. This also means that assessing the threshold probability at which a decision-maker is indifferent between failure to administer a beneficial treatment versus a potentially harmful health intervention allows capturing patient preferences related to given management choices. However, we do not need to elicit the threshold from each patient; instead, we can model decisions about treatment over a range of thresholds without knowing details about specific (dis)utilities that determine the given threshold. Thus, DCA effectively incorporates all key ingredients of clinical decisions: the predictive model’s accuracy, the consequences of a decision, and a patient’s preferences to assess the best course of action.

6.2

Illustrative Examples

We illustrate both EUT and ERT applications of DCA. The performance of predictive models is traditionally evaluated using the accuracy statistics. The accuracy statistics typically refer to discrimination (such as sensitivity, specificity, an area under a curve, etc.) and calibration statistics (such as assessing agreement between the model’s predicted versus observed risks). However, the model accuracy does © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 B. Djulbegovic and I. Hozo, Threshold Decision-making in Clinical Medicine, Cancer Treatment and Research 189, https://doi.org/10.1007/978-3-031-37993-2_6

77

78

6 Using Decision Curve Analysis to Evaluate Testing and/or Predictive Modeling

not tell us anything about the consequences of using or acting upon the results of the predictive test that depends on the benefit and harms of health interventions employed after learning the results of the predictive model. Nevertheless, a more accurate and better-calibrated model will allow us to better estimate the consequences of treatment when we use DCA to compare its net benefits against the alternative management strategies [i.e., whether to use the model (and treat according to the model’s results) versus treat all patients versus treat none]. According to the EUT DCA model, the optimal management is the one that yields the higher net benefit (NB), calculated as  net benefit(NB) =

true positives N



 −

false positives N

  ·

pt 1 − pt

 (6.1)

where N is the sample size and pt is the threshold probability of disease or health outcome; true positives and false positives refer to the consequences of correct or incorrect classification or the relative seriousness of the consequences of the   pt treatment decision. The expression 1− pt has been referred to by some as a “weighting factor”, indicating the value of undertreatment and overtreatment by a decision-maker. This weighting factor is best understood as weighting all possible outcomes related to the benefit and harms of treatment in a global, holistic way. However, from the point of view of psychological mechanisms, holistic assessment is not a characteristic of EUT. As further highlighted below and discussed in Chaps. 3, 7, and 8, the elicitation of a patient’s (a decision-maker’s) values and preferences (V&P) depends not only on the probabilistic logical analysis that characterizes EUT but also on emotions that color most decisions in health care. This is why DCA based on regret theory (which taps both in analytic and the affect-aspect of our cognitive architecture) reflects more realistic psychological mechanisms than DCA based on EUT. However, because of EUT’s gold rationality status (Chap. 1), it is probably best to run both DCA using EUT and ERT to ensure that our actions are coherent with the formal principles of EUT rationality and human intuitions about good decisions. In addition to the holistic assessment of the consequences of our actions, we also believe it is helpful to refer to treatment effects more explicitly. In the original EUT DCA, the impact of treatment is implicitly incorporated into the threshold probability. The original EUT DCA does not explicitly model treatment effects on patient outcomes. This makes it difficult to separate the predictive model’s classification accuracy from the treatment’s impact. Thus, in some published EUT versions of DCAs, it is possible to act according to the predictive model or prefer another management strategy even if treatment is worthless. A regret version of DCA explicitly integrates the effects of treatment into the decision model and should be used to supplement the application of the standard EUT DCA model. Note that in the EUT DCA formulation, the “treatment” can refer to various interventions, such as a diagnostic test, referral to a specialist, surgery, or medical treatments.

6.2 Illustrative Examples

79

Figure 6.1 shows a simplified and hypothetical decision curve for a range of net benefits for the treatment or diagnostic test (“Benefit”) versus harms of treatment. Threshold probability ranges from 0 to 1; at threshold = 0.5, a decision-maker equally values the benefit of treatment versus avoiding its harms. In general, if a decision-maker values benefits more than harms, the threshold will be closer to the left (near the y-axis), but if the decision-maker values avoiding harms more, the threshold will be farther away and to the right (away from the y-axis). In practice, values about the threshold (pt ) at which a decision-maker feels indifferent between treatment alternatives can be elicited by asking a simple question: “How many more times would be worse off of not treating someone with the disease (NotRX, failure to provide benefits) comparing to treating someone without disease (Rx, administering unnecessary treatment)?” (X = NotRx/Rx). From this, we get, pt = 1/(1 + X). The approach extends the elicitation of preferences based on the holistic assessment of all critical clinical outcomes. The global assessment of benefits and harms is also rooted in regret and the dual-processing theoretical framework (see Chaps. 3 and 7), pointing toward the values of a holistic evaluation of the harms and benefits of each treatment alternative. However, some DCA modelers advocate using expert physicians to narrow the range of the thresholds over which the analysis should be run. We found this controversial because the experts are often wrong in assessing empirical evidence and patients’ values and preferences. According to DCA, we should use a strategy with the highest net benefits at the given pt . Note, however, in Fig. 6.1, Model B (solid red line) has the highest net benefit (“benefit”) over the other strategies (i.e., treat none, test, treat all, model A) for the entire range of preferences and would be used for all patients without the need to elicit the preferences for each patient. Because there is no widely accepted method for elicitation of patients’ V&P, the potential to theoretically determine if the decision is sensitive to patient V&P represents a major usefulness of DCA. If decision curves cross, we recommend eliciting the individual patient’s V&P to guide management. This is because there is no such thing as the “right or wrong” risk attitude, V&Ps uniquely belong to each individual. Thus, there is no “average” V&P people have distinct sets of V&Ps. For example, during the COVID-19 pandemic,1 most people readily accepted vaccination (valuing avoiding disease impact much more than rare adverse events related to the COVID-19 vaccine). Still, some individuals refused the vaccine (i.e., they appeared to place a high value on avoiding rare adverse events associated with COVID-19 vaccine than falling sick to COVID-19) even though empirical, high-quality, randomized evidence shows that the benefits of the COVID-19 vaccine by far outweigh its risks.2 Nevertheless, as stated earlier, many experts in DCA suggest defining a clinically relevant range of thresholds (e.g., by asking expert clinicians) before conducting DCA.

1 COVID-19 infection caused by SARS-CoV-2 was first identified in December of 2019. The WHO declared the pandemic over as a global health emergency on May 3, 2023. For COVID-19 timeline, see https://www.cdc.gov/museum/timeline/covid19.html. 2 https://www.cdc.gov/coronavirus/2019-ncov/vaccines/.

80

6 Using Decision Curve Analysis to Evaluate Testing and/or Predictive Modeling

Fig. 6.1 A simplified illustration of a decision curve analysis. Note that Model B (solid red line) has the highest net benefit (“benefit”) over the other strategies (i.e., treat none, test, treat all, model A) for the entire range of preferences (thresholds) and would be universally recommended for all patients without the need to elicit the preferences for each patient. Dotted line = treat none; dashed line = diagnostic test; solid black line = treat all; solid blue line = Model A (i.e., a predictive model comprised of patient demographics, diagnostic testing, biomarker data, or other laboratory results); solid red line = Model B (i.e., an alternative predictive model that may have included different covariates that the Model A). [The Y axis “Benefit” represents the “net benefits” of DCA and should not be confused with other definitions of net benefits defined in Chapter 2 and elsewhere.] Adapted from an original figure published by [7] [with permission]

Figure 6.2a shows variation of DCA illustrated in Fig. 6.1 but in which specific effects of treatments are considered by assuming that it affects the probability of disease or another clinical event of interest. Figure 6.2b shows the same model based on ERG theory. According to the regret model, the best strategy is one with the lowest net expected regret difference (NERD):       false positives pERG_T true positives − · NERD = − N N 1 − pERG_T

(6.2)

where pERGT =

H U4 − U2 = B + H − RRR · H (U1 − U3 ) + (U4 − U2 ) − RRR · (U4 − U2 ) (6.3)

B refers to the net benefits (not DCA “net benefits”, NB), H to the net harms, and RRR is the relative risk reduction (see also Chaps. 2 and 3, Appendix, and Glossary). Alternatively, it is possible to develop DCA by assuming the effects of treatment on outcomes (utilities) as we have done in most of the chapters. At

6.2 Illustrative Examples

81

Fig. 6.2 A decision model showing a three-choice dilemma: “treat all”, “treat none”, or treat or not based on model results (i.e., “model”) with utilities expressed based on a expected utility or b expected regret. Each branch of the tree can result in either a patient with disease (D+) or not (D−). qi refers to the model predicted probability of disease, pi is the actual probability of disease, and RRR is the relative risk reduction of the treatment. T, threshold probability for treatment; D−, the disease is absent; RRR, relative risk reduction of treatment. U1 to U4—utilities (outcomes) associated with each management strategy. Regret is computed as the difference in utilities of the action taken and the action that, in retrospect, should have been taken (see Chap. 3) based on [2] [with permission]

this time, there is no consensus of whether modeling of treatment effects should be done at the level of utilities or assumed that implicit treatment effects in the published DCA provides sufficient information to the end-user about the consequences of treatment. However, if RRR = 0, Eq. 6.2 reduces to Eq. 6.1, indicating that explicit modeling of treatment effects (either at the probability or utility level) could provide helpful, specific information to the end-users. The box below illustrates the use of DCA and whether to refer the patient with metastatic colon cancer whose disease progressed to lung and peritoneum after first-line treatment on FOLFOX (oxaliplatin, leucovorin, fluorouracil) to hospice. As shown, the results under EUT DCA somewhat differ from those obtained under ERG. This is because solving the EUT decision tree shown in Fig. 6.2a requires the inclusion of differences between utilities of U1 and U2, which is not necessary under ERG formulation of DCA (Fig. 6.2b). When RRR = 0, the EUT and ERG

82

6 Using Decision Curve Analysis to Evaluate Testing and/or Predictive Modeling

DCA models generate the same results because the differences between the utilities of U1 and U2 vanish.3 Box 6.1 Should a 61-year-old male adult patient with advanced colon cancer and ECOG status of 3 be referred to hospice or not?

A 61-year-old man was diagnosed with advanced colon cancer, which progressed to the liver, lung, and peritoneum after first-line treatment on FOLFOX. His performance ECOG status is 3; he has diabetes and hypertension, which are relatively well-controlled. Should you refer the patient to hospice, use a model to guide your decision to referral, or consider secondline treatment? In this case, “treatment/intervention” consists of a referral to hospice and “treat none” to alternative, second-line chemotherapy. From a modeling perspective, there are three strategies to consider: (1) referring every critically ill similar to the patient we just described to hospice, (2) providing second-line treatment (here denoted as “treat none”), and (3) referring (“treat”) based on the probability of death using a predictive model. The thresholds at which a patient (or, a decision-maker) is indifferent between alternative treatment actions can be estimated using expected utility theory (EUT) or expected regret theory (ERG) DCA. Here we create a decision curve for each of these strategies over all possible ranges of patient preferences. To begin, we first utilize patient-level data from the SUPPORT study that included 4301 critically ill adult patients (269 of which had a colon cancer diagnosis) to create a prognostic model that took into account patient age, and key clinical, and laboratory data. This model is used to predict patient survival at 180 days. There exist multiple software options to creating and illustrating decision curves, including software packages in R and Stata or DCAs can be created simply by calculating each threshold and plotting all decision curves using a graphing program such as Excel. Here, we used DCA programmed in EXCEL, which is available as open-access [Supplement by Hozo et al. 8]. Using the SUPPORT study data and the decision curve calculator from Hozo et al. [8], we obtain the following decision curve figure: From Fig. 6.3, we can immediately see that “Treat None, i.e., administer second-line chemotherapy” (the x-axis) has the lowest net benefit over most of the range of threshold probabilities and should not be considered. The

3

Most recently, we have developed a generalized version of DCA (gDCA), which can take into account the effects of the treatment at the level of the probability of branches and utilities nodes explicitly by decomposing e global, holistic utilities into specific evidence-based metrics including RRR, RV, and other EBM measures shown throughout the book. For details see “Hozo I & Djulbegovic B. Generalised decision curve analysis for explicit comparison of treatment effects https://onlinelibrary.wiley.com/doi/10.1111/jep.13915”. For most purpose, assuming treatment effects into utilities branches will suffice (see also Appendix).

6.2 Illustrative Examples

“Treat all (i.e., refer all patients to hospice)” and prediction model decision curves based on EUT have similar net benefits until about 55%. For thresholds > 55%, the model decision curve remains higher than the “Treat all” and “Treat none” strategies, and thus should be used when patients’ values and preferences are consistent with these thresholds. Similarly, the “Treat All (i.e., refer all patients to hospice)” and prediction model decision curves based on ERT have a relatively high net benefit at and above threshold > 50%; the model decision curve remains high until thresholds of ~ 85%, at which “Treat none (i.e., administer second-line therapy)” strategies become the dominant strategy. Thus, for large range of threshold probabilities (i.e., preferences), the decision-maker should rely on the prediction model to determine whether to refer the patient to hospice, indicating an almost uniform approach to making treatment decisions for all patients with colon cancer like the one described in the vignette. Only, when a patient (decisionmaker) values avoiding harms of referral to hospice over its potential benefits 6 to 20 times (corresponding to threshold of 85% and 95%, respectively), then using second-line chemotherapy becomes a preferred option.

83

84

6 Using Decision Curve Analysis to Evaluate Testing and/or Predictive Modeling

Fig. 6.3 Decision curve showing net benefit across the entire range of threshold probabilities (pt ) for the following four strategies: (1) “Treat all” (i.e., refer all patients to hospice; purple and green), (2) “Treat none” (i.e., administer second-line chemotherapy; shown as the x-axis), and (3) Prediction model, or “Model” (blue and red). The model and treat all strategies have been estimated using expected utility theory, EUT (blue and purple, respectively) or expected regret theory, ERT (red and green, respectively). Inverse ERT models (-ERT) are shown for comparability to EUT. Note that this model assumes RRR of 0.1 and can produce slightly different results for different values

References 1. Fitzgerald M, Saville BR, Lewis RJ (2015) Decision curve analysis. JAMA 313(4):409–410 2. Hozo I, Tsalatsanis A, Djulbegovic B (2018) Expected utility versus expected regret theory versions of decision curve analysis do generate different results when treatment effects are taken into account. J Eval Clin Pract 24(1):65–71 3. Knaus WA, Harrell FE, Lynn J, Goldman L, Phillips RS, Connors AF et al (1995) The SUPPORT prognostic model: objective estimates of survival for seriously ill hospitalized adults. Ann Intern Med 122(3):191–203 4. Salgia R, Mambetsariev I, Tan T, Schwer A, Pearlstein DP, Chehabi H et al (2020) Complex oncological decision-making utilizing fast-and-frugal trees in a community setting—role of academic and hybrid modeling. J Clin Med 9(6):1884 5. Tsalatsanis A, Hozo I, Vickers A, Djulbegovic B (2010) A regret theory approach to decision curve analysis: a novel method for eliciting decision makers’ preferences and decision-making. BMC Med Inform Decis Mak 10(1):1–14 6. Vickers AJ, Elkin EB (2006) Decision curve analysis: a novel method for evaluating prediction models. Med Decis Making 26(6):565–574 7. Vickers AJ, van Calster B, Steyerberg EW (2019) A simple, step-by-step guide to interpreting decision curve analysis. Diagn Prognostic Res 3(1):1–8 8. https://onlinelibrary.wiley.com/doi/full/10.1111/jep.12676

7

Hybrid and Dual-Processing Threshold Decision Models

7.1

Hybrid Threshold Model

In the previous chapters, we presented various derivations of the threshold model based on the same disease outcomes. We assumed that a decision-maker would calculate the threshold based on either mortality or morbidity outcomes. Basinga and van den Ende derived the threshold by combining both mortality and morbidity outcomes. They retained the original EUT formulation but asked a decision-maker to weigh the health outcomes of morbidities relative to mortalities. This is similar to relative values (RV) of how a decision-maker feels about preferences to avoiding disease outcomes versus adverse events we introduced in Chap. 3. In practice, the weighting of morbidity versus mortality can be done by directly asking the decision-makers to assign the value weight of morbidity to mortality using their holistic, intuitive assessments.1 This so-called hybrid threshold ( p t ) is equal to: pt =

H Mortr x + H Morbr x · RVMorb Mort − Mortr x + (Morb − Morbr x ) · RVMorb

where H Mortr x refers to treatment-related mortality, H Morbr x refers to treatmentrelated morbidity, RVMorb is the value weight of morbidity for mortality (if weight

1 In a research setting, many sophisticated methods have been developed to assign values to a given health state. These include methods such as standard gamble, time-tradeoff, swing weighting, discrete choice experiments, etc. to derive the quality of health state, quality of life, quality-adjusted life expectancies. In general, as mentioned in the earlier chapters, most patients do not find these methods easy to understand and they have not penetrated into clinical practice. As a result, these methods and techniques are not discussed in this book in detail. See Chap. 3 for the method we advocate. A reader is also referred to general textbooks on the measurement of utilities for further details.

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 B. Djulbegovic and I. Hozo, Threshold Decision-making in Clinical Medicine, Cancer Treatment and Research 189, https://doi.org/10.1007/978-3-031-37993-2_7

85

86

7 Hybrid and Dual-Processing Threshold Decision Models

= 1, this means that the person values living with disabling effects of disease equally to death), Mort is the mortality of the disease without treatment, Mortr x refers to the mortality of the disease on treatment, Morb is the morbidity of the disease without treatment, and Morbr x is the morbidity of the disease on treatment. Because the published literature on using the hybrid model relies on the intuitive assessment of weighting harms of morbidity versus mortality, it can be considered a version of the dual-processing model further discussed below. Let us illustrate how this hybrid threshold model can be used in practice (Box 7.1). Box 7.1 Application of the Hybrid Threshold Model

An 81-y/o woman came to see you for a second opinion. She recently had surgery for ductal infiltrating, grade 3, HER2+, estrogen and progesterone receptors negative, stage II breast cancer. She saw an oncologist who told her that she had “high-risk” breast cancer and recommended adjuvant treatment with paclitaxel + trastuzumab. She is in otherwise good health, with no major comorbidities, enjoys playing bingo with her friends twice a week, and dreads adverse events of treatment she was recommended. She said she would rather die than become incapacitated or spent a long time in the hospital because of treatment toxicities. She asks you if she can avoid adjuvant treatment and if “surgery was enough” to cure her cancer. You decide to populate the hybrid model to provide the advice sought. At the time of your consult, you consider that it is uncertain if the patient has a residual disease that eventually may progress to loco-regional or metastatic disease. So, from the threshold model perspective, this is the application of the model when the diagnosis is not certain. Above which threshold of the probability of disease recurrence would you recommend treatment to this patient? You also know that further adjuvant treatment can reduce the risk of recurrence, but you are not sure about data for octogenarians. A quick internet search reveals a study by Mamtani et al. that seems to be applicable to your patient. It is a small, retrospective study reporting outcomes with a median follow-up of 67 months that included only 40 high-risk patients similar to yours. However, you found no other studies and decide to use the data from this paper as the first step. Only 4 out 40 (10%) patients received similar treatment recommended to your patients, none of whom had any toxicities leading the authors to recommend adjuvant therapy to elderly women without comorbidities. This is likely due to serious selection bias as you are aware of other studies that report both treatment-related mortality and morbidity related to adjuvant treatment for breast cancer. For example, in the study by Muss et al., 1.5% (95%CI 0.6 to 3.1%) of patients older than 65 died of adjuvant chemotherapy. This approximates the treatment-related death rates due to adjuvant chemotherapy reported by Rosetock et al., although both studies included few octogenarians. Because it would be unusual for octogenarians

7.1 Hybrid Threshold Model

to have fewer adverse outcomes than the patients in their sixties, you decided to assume higher, 3% for treatment-related mortality in your patient. Muss et al. also reported that 7% of patients older than 65 had discontinued their treatment because of adverse events. No data are reported for patients older than 80, but the paper states that discontinuation of treatment increased linearly with age. Hence, we assumed that 14% of patients older than 80 would discontinue adjuvant treatment. Finally, because of the lack of data on treatment efficacy or effectiveness for octogenarians, we can extrapolate the results from the meta-analysis of the Early Breast Cancer Trialists’ Collaborative Group (EBCTG) to the case of our patient. This is acceptable because it is usually assumed that relative treatment effects remain constant across the disease baseline risks. The EBCTG shows that trastuzumab added to chemotherapy reduced death due to breast cancer by 33% (RRR = 1 − relative risk = 1 − 0.67) and reduces the breast cancer recurrence rate by 34% (RRR = 1 − 0.66). Your patient also told you that dying is about 20 times worse than living with adverse effects of treatment and living with serious morbidity. Using actuarial tables [Actuarial Life Table (ssa.gov)] you estimated that the life expectancy for an 81-year-old female (without breast cancer) is 9.23. Based on the consideration above, we assume the following to populate the hybrid threshold equation: H Mortr x (treatment-related mortality) = 3% (0.03). H Morbr x (treatment-related morbidity) = 14% (0.14). RVMorb (value weight of morbidity with respect to mortality) = 0.05. Mort (mortality without treatment) = 5% (0.05). Mortr x (mortality on treatment): Mort RRR = 0.05 × 0.33 = 0.0165. Morb (morbidity without treatment) = 5% (0.05). Morbr x (morbidity on treatment) = 0.05 × 0.34 = 0.017. Plugging these values in the threshold formula, we obtain: H Mortr x + H Morbr x · RVMorb Mort − Mortr x + (Morb − Morbr x ) · RVMorb 0.03 + 0.14 · 0.05 = = 1.05 0.05 − 0.0165 + (0.05 − 0.017) · 0.05

pt =

Without adjuvant treatment, the estimated breast cancer recurrence in the EBCTG analysis is about 25% over five years. Thus, the threshold probability will have to be lower than this value to recommend adjuvant treatment. Because pt > 1, the harm of adjuvant chemotherapy outweighs its benefits, the patient can be advised that not receiving adjuvant therapy would be most consistent with her desires of how to live her life. Note that varying RV morb would not make much difference in the recommendations. However, assuming that morbidity and mortality without treatment exceeds 20%

87

88

7 Hybrid and Dual-Processing Threshold Decision Models

would reduce threshold below 25%. It behooves the treating physician to explore multiple scenarios (in concert with the patient) before making final recommendations consistent with the evidence and the patient’s values and preferences.

7.2

The Dual-Processing Threshold Model

The threshold models described in the earlier chapters assumed a single system of human thought. For example, we applied the EUT or regret theory to derive decision thresholds. We tacitly assumed that a decision-maker would define the situations when one or the other model would be more appropriate. However, there are settings when both analytical and cognitive emotions, such as regret, may need to be combined in the same model. Indeed, so-called dual-processing theories (DPT) have increasingly offered a viable explanation of cognitive processes that characterizes human decision-making. DPT assumes that the so-called type I (popularly known as system I) and type II (system II) cognitive processes govern human cognition. Type (system) I relies on intuitive, automatic, fast, narrative, experiential, and affect-based processes. Type (system) II refers to analytical, slow, verbal, deliberative, and logical cognitive processes. To consider all these processes, we developed a dual-processing threshold model (DPM) that integrates analytical type 2 functioning, involving the rational calculus of EUT with type I mechanisms employing emotions such as regret. It is essential to integrate EUT with regret because, as noted by Kahneman, “theory of choice that completely ignores feeling such as the pain of losses and the regret of mistakes is not only descriptively unrealistic but also might lead to prescriptions that do not maximize the utility of outcomes as they are actually experienced” (see Chap. 1). Although regret as a cognitive emotion shares characteristics of both system I and system II, it does not fully capture all features of system I. The DPM model includes a parameter γ , which ranges from 0 to 1 to represent the extent of involvement of system I in the decision-making process. Empirically it is measured as the relative distance between analytically derived EUT-based threshold (TEUT ) and affect-oriented, regret-based, threshold [Tregret ].   TEUT − Tregret    γ = Max TEUT , Tregret

(7.1)

Note that when γ = 0, the DPT model becomes the EUT model. When γ = 1, the DPT model reduces to system I (affect-oriented, regret-based model), where the threshold is undefined. Under these conditions, the decision-maker will experience “probability neglect” guided only by how the system I perceives utilities of benefits or harms without regard to the probability of disease (see also Chap. 1). γ can also be conceptualized in different ways, for example, as an automatic, type I response.

7.2 The Dual-Processing Threshold Model

89

The calculations will generate different results for DP threshold depending on the assumptions. Note that about 40–50% of human decisions are determined by our habits (i.e., system I). The DPT threshold probability that takes into account both system I processes (consisting of regret, weight γ , net benefits B I and net harms H I ) and system II processes (consisting of weight 1 − γ , net benefits B I I and net harms H I I , and the EUT threshold equal to pt (EUT) = 1B I I , see Chap. 2), can be defined as: 1+ H

 pt = ( pt (EUT)) 1 +

II

   γ HI BI 1− 2(1 − γ ) H I I HI

(7.2)

where BI and H I are affect-oriented, regret-based, holistic assessments of benefits and harms; BII and H II are objective (empirical) data. The calculation for pt (EUT) is based on the literature’s best evidence (BII and H II ). (See Chaps. 2 and 3 for specific definitions of B and H using empirical data.) Figure 7.1 shows key characteristics of the DPM threshold model. We can consider the EUT threshold the best-calibrated action threshold that may need further modification depending on the context. That is, the DPT threshold model reflects the view that the “rationality of action” encompasses both the formal principles of probability theory and human intuitions about good decisions. The latter means that if the system I perceives that harms are higher than benefits (BI < H I ), the threshold probability is always higher than classic EUT (dotted line). However, if BI > H I , the threshold probability is always lower than the EUT threshold (dashed line). The vignette-based empirical testing demonstrated the superiority of the DPM threshold models over the EUT and regret models. However, because the precise and accurate measurement of type I processes is difficult during the routine clinical encounter, we recommend using the DPT model in a semi-quantitative fashion, as a reality check on the evidence-driven EUT, system II-derived threshold equation. As noted in Chap. 1, rationally acting according to the EUT threshold often means prescribing treatment or ordering diagnostic tests at a very low probability of disease, which intuitively seems inappropriate and wasteful use of healthcare resources. If so, the EUT action threshold can be adjusted up or down, as shown in Fig. 7.1. Box 7.2 provides an example of how the DPM model could modify the EUT threshold. However, remember from Chap. 1, intuitive, and affect-based reasoning often violates normative axioms of rational thought. Given that in most (but not all; see Box below) clinical situations, EUT-based reasoning leads to overtreatment and overtesting, a gate-keeper to control the inappropriate care based on the EUT (a gold standard of rational decision-making) falls, paradoxically, on the imperfect system I processes. Chap. 8 places this crucial point in perspective—how using different theoretical perspectives and threshold models results in dramatically different decisions and action recommendations.

90

7 Hybrid and Dual-Processing Threshold Decision Models

Fig. 7.1 Dual-decision threshold model. Classic, expected utility threshold probability as a function of benefit/harms ratio as derived by system II, EUT (expected utility threshold, solid line). The treatment should be given if the probability of disease is above the threshold, otherwise should be withheld. Note that if the system I perceives that harms are higher benefits (BI < H I ), the threshold probability is always higher than classic EUT (dotted line). However, if BI > H I , the threshold probability is always lower than the EUT threshold (dashed line). [See comment below and Chaps. 2 and 3 for specific formulations of (net) benefits (B) and (net) harms (H)]. Figure reproduced from Djulbegovic et al. [3] with permission

Box 7.2 Application of the DPM Threshold Model: Should Allogeneic Stem Cell Transplant (AlloSCT) Be Given Over Standard Chemotherapy (ChemoRx) to a 45 Man with Acute Myeloid Leukemia (AML) with an Intermediate Risk of Disease?

Recall from Chap. 2, the question for a decision-maker is: what is the threshold probability of AML recurrence at which we are indifferent between administering alloSCT versus standard chemotherapy regimen? The best evidence from the literature reported the following: Treatment 1, AlloSCT: mortality of the disease with Rx (M rx1 ) = 41% and harms due to Rx (H rx1 ) = 19% Treatment 2, ChemoRx: mortality of the disease with Rx (M rx2 ) = 53% and harms due to Rx (H rx2 ) = 3% Therefore, assuming that RV = 1, given that pt > 1, according to the following EUT model: pt =

RVH · (Hr x1 − Hr x2 ) 1 · (19 − 3) = = 1.33 Mr x2 − Mr x1 53 − 41

we should not commend alloSCT.

References

91

Is the threshold based on the DPT model similar to the EUT threshold?      γ HI BI · · 1− pt = ( pt (EUT)) 1 + 2(1 − γ ) HI I HI The cited literature (see below, Djulbegovic et al.) included data on the elicitation of other values shown in the DPT equation as follows: Given γ = 0.33; H I = 50 (on a scale 0 to 100%); BI /H I = 100/50 = 2 (in our experience, most patients in the high-stake, life-and-death situations have maximum regret for missing out on the potential life-saving therapy); H I I = Hr x1 − Hr x2 = 19 − 3 = 16:  pt = (1.33) 1 +

0.33 · 2(1 − 0.33)



50 16



 · (1 − 2) = 0.3

Plugging these data in the equation above, adding our regret and intuitive assessment into the DPM model, we see that the DPT model generates results that dramatically differ from the results of the EUT model: alloSCT should be administered to our patient if the probability of AML recurrence exceeds 30%. Note that if BI /H I > 1, alloSCT should always be below the EUT threshold. Note: We cannot stress enough that careful definition of net benefits and net harms is crucial for the accurate calculations of the thresholds. In the papers discussing DPM model (BMC Med Inform Decis Mak. 2012;12(1):94; Eur J Clin Invest. 2015;45(5):485–93), we inadvertently failed to include treatment-related harms as a part of the definition of net benefits (see Appendix), which resulted in the somewhat different threshold than reported here. Although the description of the model and the rest of the calculations remain accurate, we use this opportunity to make this correction. It is often too easy to make this mistake during the cut-and-paste process, and a reader is advised to spend extra time on making sure that net benefits and net harms are correctly defined.

References 1. Basinga P, Moreira J, Bisoffi Z, Bisig B, Van den Ende J (2007) Why are clinicians reluctant to treat smear-negative tuberculosis? An inquiry about treatment thresholds in Rwanda. Med Decis Mak 27(1):53–60 2. Djulbegovic B, van den Ende J, Hamm RM, Mayrhofer T, Hozo I, Pauker SG et al (2015) When is rational to order a diagnostic test, or prescribe treatment: the threshold model as an explanation of practice variation. Eur J Clin Invest 45(5):485–493 3. Djulbegovic B, Hozo I, Beckstead J, Tsalatsanis A, Pauker SG (2012) Dual processing model of medical decision-making. BMC Med Inform Decis Mak 12(1):94

92

7 Hybrid and Dual-Processing Threshold Decision Models

4. Djulbegovic B, Elqayam S, Reljic T, Hozo I, Miladinovic B, Tsalatsanis A et al (2014) How do physicians decide to treat: an empirical evaluation of the threshold model. BMC Med Inform Decis Mak 14(1):47 5. Djulbegovic B, Beckstead JW, Elqayam S, Reljic T, Hozo I, Kumar A et al (2014) Evaluation of physicians’ cognitive styles. Med Decis Mak 34(5):627–637 6. Early Breast Cancer Trialists’ Collaborative group (EBCTCG) (2021) Trastuzumab for earlystage, HER2-positive breast cancer: a meta-analysis of 13,864 women in seven randomised trials. Lancet Oncol 22(8):1139–1150 7. Kahneman D (2003) Maps of bounded rationality: psychology for behavioral economics. Am Econ Rev 93:1449–1475 8. Kahneman D (2012) Thinking fast and slow (UK edition). Penguin, London 9. Mamtani A, Gonzalez JJ, Neo DT, Friedman RS, Recht A, Hacker MR et al (2018) Treatment strategies in octogenarians with early-stage, high-risk breast cancer. Ann Surgical Oncol 25(6):1495–1501 10. Muss HB, Berry DA, Cirrincione C, Budman DR, Henderson IC, Citron ML et al (2007) Toxicity of older and younger patients treated with adjuvant chemotherapy for node-positive breast cancer: the cancer and leukemia group B experience. J Clin Oncol 25(24):3699–3704 11. Rosenstock AS, Lei X, Tripathy D, Hortobagyi GN, Giordano SH, Chavez-MacGregor M (2016) Short-term mortality in older patients treated with adjuvant chemotherapy for earlystage breast cancer. Breast Cancer Res Treat 157(2):339–350 12. Stanovich KE (2011) Rationality and the reflective mind. Oxford University Press, Oxford 13. Sreeramareddy C, Rahman M, Harsha Kumar H, Shah M, Hossain A, Sayem M et al (2014) Intuitive weights of harm for therapeutic decision making in smear-negative pulmonary Tuberculosis: an interview study of physicians in India, Pakistan and Bangladesh. BMC Med Inform Decis Mak 14(1):67 14. Tsalatsanis A, Hozo I, Kumar A, Djulbegovic B (2015) Dual processing model for medical decision-making: an extension to diagnostic testing. PLoS ONE 10(8):e0134800

8

Which Threshold Model?

8.1

Introduction

As outlined in the Preface (and Chap. 1 and other chapters), this book espoused two fundamental views. The first view consists of the proposal that the threshold model represents a method to address the Sorites paradox, which is a consequence of a relationship between scientific evidence (that exists on a continuum of credibility) and decision-making (that is categorical, yes/no exercises). In clinical medicine, the typical categorical yes/no decisions revolve around administering treatment or ordering a diagnostic test with the goal of improving patients’ health. Sometimes, a decision-maker may decide to postpone a decision. Delaying the decision, however, is akin to making the “no” decision at the time when the decision was considered. Similarly, as explained in Chap. 1, we take a position that a “single decision is a recurrent decision that is made only once” and that many single-point decisions can be reformulated as repeated decisions over time.

8.1.1

A Brief Review of the Principles of Medical Decision-Making

One of the key principles of rational decision-making (see Chap. 1) is that a decision to select a certain option over another requires integrations of benefits (gains) and harms (losses) of all options under consideration. It follows that contrasting the benefits with the harms1 is intuitively an obvious first step in medical

1

Here, as elsewhere in the book, we continue to stress the importance of the definition of benefits and harms. Most often we refer to the requirement to calculate the net benefits and the net harms as illustrated throughout the book. Depending on the definition, the calculation of the threshold typically varies, sometimes dramatically so. This point is a major rationale for linking decision analysis with EBM as outlined further in the text. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 B. Djulbegovic and I. Hozo, Threshold Decision-making in Clinical Medicine, Cancer Treatment and Research 189, https://doi.org/10.1007/978-3-031-37993-2_8

93

94

8 Which Threshold Model?

decision-making. The threshold model has been developed to help determine at which balance (threshold) between the benefits and harms we are indifferent between choosing management options under consideration. When the benefits of a certain treatment option outweigh its harms compared to an alternative treatment option, we should recommend it to our patients. The principles of evidence-based medicine (EBM; see Chap. 1) indicate that “not all evidence is created equal”, and that information on the benefits and harms of treatments, diagnostic tests, etc. should be based on the totality of best available evidence. Therefore, threshold models should be populated using the best available evidence. Our second view is that the key ingredients of the decision models, such as benefits and harms, should be expressed in specific EBM metrics instead of (or, in addition to) global, holistic, utilities for benefits and harms (see Preface and Chap. 1). That is, we propose the threshold decision-making model as a crucial link between two important fields in clinical medicine: EBM and decision analysis.2 The problem is that there are many theories of decision-making that can be the basis of decision analysis. That is, developing the unified, one-size-fits-all theory of rationality does not appear to be possible. As a result, what is considered rational under one decision theory may not be regarded as rational under a different theory. To date, threshold models in clinical medicine have been developed using expected utility theory (EUT), regret theory, and dual-processing theories (DPT). As explained in Chap. 1, DPT assumes that the so-called type I processes (popularly known as the system I) and type II (system II) cognitive processes govern human cognition. Type (system) I relies on intuitive, automatic, fast, narrative, experiential, and affect-based processes. Type (system) II refers to analytical, slow, verbal, deliberative, and logical cognitive processes.

8.1.2

Different Theoretical Models Generate Different Recommendations

As noted in Chap. 1, the EUT is the only theory that satisfies all statistical and mathematical axioms of rational decision-making, such as transitivity, cancellation, etc. principles (see Table 2, Chap. 1). Therefore, the EUT-based threshold model seems to be the natural choice for bedside decision-making. The problem is that it is well-documented that people routinely violate EUT and they do not behave according to the EUT rationality principles. Additionally, the threshold models described in the earlier chapters assumed a single system of human thought. For example, we applied EUT or regret theory to derive decision thresholds. We tacitly assumed that a decisionmaker would define the situations when one or the other model would be more appropriate. However, there are settings when analytical and cognitive emotions

2

We refer to decision analysis in general terms to espouse all decision-making theories, although some authors reserve term to the analyses based on EUT only.

8.1 Introduction

95

may need to be combined in the same model. Indeed, as noted by Kahneman (see also Chap. 7), “theory of choice that completely ignores feeling such as the pain of losses and the regret of mistakes is not only descriptively unrealistic but also might lead to prescriptions that do not maximize the utility of outcomes as they are actually experienced”. This led to the development of the regret-based and DPT threshold models. As theoretically predicted and illustrated throughout the books, these three models generate three different thresholds. Consequently, the recommendations based on one model may be construed as underuse or overuse under the other threshold model. Can we say which model is more accurate? We contend that it is impossible to answer this question in universal terms, as the “truth” depends on a theoretical choice, the way we see the world. Moreover, selecting the theory under which we operate determines policy and our individual decision-making. However, we believe we can identify some general principles on how to go about choosing one versus another model or when to use more than one model to assess the stability of our recommendations. The context is of paramount importance to rationality. We suggest using EUT informed by the best research evidence in context-poor situations, particularly when time and resources to inform decision-making are less of an issue. This would apply to policy decision-making, practice guidelines, hospital policies and procedures. However, in context-rich circumstances, other types of rationality, informed by intuition and emotions, such as the aim to minimize regret, may provide a more optimal solution. Given that emotions are unavoidable in most clinical encounters, regret and DPM threshold models seem more applicable to bedside decision-making. The dual-processing threshold model may be beneficial in the circumstances dominated by high uncertainty, where we tend to rely on intuition. Similarly, because affect and emotions underpin our values and preferences (V&P; the third important principle of EBM) regret threshold model is often applicable to bedside clinical decision-making (see Chaps. 3 and 7 for elicitation of V&P).

8.1.3

Contemporary Clinical Practice Represents an Environment Favoring the Overuse of Diagnostic and Treatment Interventions

Despite the importance of context, which we just described in the previous paragraph, we do suggest always starting with the EUT threshold model using objective data on morbidity and mortality as described in Chap. 2 and in the Appendix. The next step is to complement the results of the EUT threshold model with the regret and/or DPM threshold model to assess the sensitivity of the decision. If the EUT model agrees with the regret and/or DPM threshold model, then we can be reassured about the robustness of our decisions. Note that we are not interested in the

96

8 Which Threshold Model?

absolute agreement; we are only interested in the agreement regarding the direction of the recommendations (i.e., whether all models are in favor or against given management consideration). This is because even in context-poor situations, solely relying on the EUT can be pragmatically irrational. For example, the EUT model indicates that testing for pulmonary embolism (PE) should be withheld if the prior probability PE < 0.23% (Chap. 4). This is a clinically untenable threshold as it implies that we should test everyone the minute the diagnosis of PE crosses our mind as acting according to the EUT threshold implies. This can hardly be considered a rational behavior. As a result, the American Society of Hematology Clinical Practice Guidelines Panel raised the threshold for ruling out PE to 2%. In doing so, the panel employed the regret and acceptable regret concepts. As demonstrated in the earlier chapters, the larger the benefits compared to the harms, the lower threshold above which we should act. In other words, the EUT threshold model—axiomatically most rational model—provides the impetus for overtesting and overtreatment. Paradoxically, this mainly applies to treatments that are approved by regulatory agencies such as the US FDA (Food and Drug Administration). Because the FDA only approves “safe and effective” treatments, the benefits of approved treatments usually significantly outweigh their harms; in addition, most diagnostic tests are perceived to be harmless with decent sensitivity and specificity. As a result, the testing and treatment thresholds are predictably low for the majority of tests and interventions employed in today’s practice. Therefore, the approval of most FDA-approved drugs3 within a carefully construed FDA regulatory framework and EUT—the “gold standard” theory of rationality—occurs in an environment for the overuse of diagnostic and treatment interventions. Consequently, it is unsurprising that clinicians do not act according to the EUT. When emotionally or intuitively, we assign greater weight to harms than benefits, the threshold is always greater than the EUT threshold (see Fig. 1, Chap. 7). Thus, relying on regret and intuition, clinicians can increase the threshold for diagnostic testing and the administration of treatments (see Fig. 2, Chap. 2). When V&Ps are properly elicited, such a threshold should be construed as prescriptively correct. Interestingly, however, for the drugs that are used in so-called “off-label settings”4 , this consideration does not necessarily apply. This is because many treatments, particularly for cancer, have very narrow benefit/harm ratios. As a result, the EUT threshold is predictably high, indicating the increased need for certainty in diagnosis or predictive outcomes to act. As shown in Chap. 7, under such considerations, unless we are certain that the disease will relapse, we should not administer an allogeneic stem cell transplant as a consolidative treatment to

3

The FDA does not consider costs in its decision-making. Considerations of costs may change this calculus. 4 This means that even if the FDA did not formally approve the particular drug or drug combination for a specific disease, doctors are at liberty to use it as long as they believe that such treatments would benefit their patients. Doctors also make their judgments based on other scientific studies that are not necessarily reviewed by the FDA.

8.1 Introduction

97

a 45-year-old patient whose acute myeloid leukemia (AML) achieved remission after induction therapy. However, when regret or DPM threshold models were used, the action threshold dropped to 53% and 30%, respectively. When emotionally or intuitively, we assign greater weight to benefits than harms, the threshold is always lower than the EUT threshold (see Fig. 1, Chap. 7). Thus, relying on regret and intuition, clinicians can also decrease the threshold for diagnostic testing and administration of treatments (see Fig. 2, Chap. 2). While in the case of PE, regret and intuition corrected the EUT thresholds upward in an attempt to reduce overuse, in the case of AML, the correction was downward, aiming to minimize underuse. Again, we assume that when V&Ps are properly elicited, such a threshold should be construed as prescriptively correct. As argued above, this position is defensible on normative grounds. However, in practice, we realize that time constraints, financial incentives, and conflict of interest often subvert the accurate assessment of V&P and net benefits and net harms, for that matter.

8.1.4

Simple Versus Complex Models

Nevertheless, we should acknowledge that threshold models discussed in this book are mathematically simple models. During the last couple of decades, formidable advances in statistical and mathematical techniques have given rise to the development of much more complex models than those described in this book. They are based on various techniques, including standard regression statistical techniques, Markov modeling, Monte Carlo and microsimulation modeling, and most recently, machine learning/artificial intelligence (AI) modeling (see Chap. 9). Are these complex models superior to the simple models described here? We don’t think so for the following reasons. First, these models are not transparent and difficult to understand, creating skepticism for their accuracy among the end-users. Second, they typically model probabilities and utilities well in the future based on assumptions not supported by reliable empirical findings and supporting evidence. Most of the time, simple models can be informed by empirical evidence as it was actually collected in clinical research studies. Empirical data supporting complex models are characteristically lacking. It is challenging to inform each parameter in complex models using randomized trials, systematic reviews, or even observational studies. Third, adding more variables to the models increase the total error rate in the analysis (see Chap. 9). Fourth, complex models are probably better than simple models in prediction estimates (e.g., life expectancy). However, there is no evidence that they are better than simple models in improving decision-making. As discussed in Chap. 9, as a rule, complex models do not perform well in an unstable, rapidly changing world characterized by emotions and affect—an integral aspect of human decision-making. In fact, given that science is an open-ended system, only about 22% to 50% of clinical research conclusions remain recognizably correct after just a few decades. As a result, models trained on obsolete data

98

8 Which Threshold Model?

will inevitably be wrong. Updating complex models with new evidence is a difficult, if not impossible, task. On the other hand, the simple, transparent, easily understood models can be updated with minimal effort to increase our confidence in the correctness of a decision. In addition, most complex decision models can be reduced to simple models (see Appendix for a conceptual illustration). Finally, simple models advocated in this book can be used within minutes at the bedside, while developing complex models take months or even longer to develop.

8.1.5

Adhering to Practical Wisdom

Therefore, we do not advocate replacing physicians with complex models, as some AI proponents seem to propose. As our discussion above illustrates, human intuition and affect remain irreplaceable to adjusting decision thresholds upward or downward that were derived using rational, mathematical calculus. Therefore, we see the use of (simple) decision models as a way to complement human decisionmakers along the line of the old proverbial medical wisdom: “A good doctor knows how to treat/order a diagnostic test, a better one knows when to treat/order a test, but the best one knows when not to do it …”. That is, practical wisdom is the hallmark of rationality. In this sense, pragmatically rational medical decision-making crucially depends on integrating the evidence related to the problem at hand with the patient’s goals, values, and preferences to guide our decisions and actions while taking context into account. Simple decision models can provide a general framework for more accurate and trustworthy clinical decision-making.

References 1. Cornelissen JJ, van Putten WL, Verdonck LF, Theobald M, Jacky E, Daenen SM et al (2007) Results of a HOVON/SAKK donor versus no-donor analysis of myeloablative HLA-identical sibling stem cell transplantation in first remission acute myeloid leukemia in young and middle-aged adults: benefits for whom? Blood 109(9):3658–3666 2. Djulbegovic B, Hozo I, Schwartz A, McMasters K (1999) Acceptable regret in medical decision making. Med Hypotheses 53:253–259 3. Djulbegovic B, Lyman G (2006) Screening mammography at 40–49 years: regret or regret? Lancet 368:2035–2037 4. Djulbegovic B, Hozo I (2007) When should potentially false research findings be considered acceptable? PLoS Med 4(2):e26 5. Djulbegovic B, Beckstead J, Nash DB (2014) Human judgment and health care policy. Popul Health Manag 17(3):139–140 6. Djulbegovic B, Paul A (2011) From efficacy to effectiveness in the face of uncertainty indication creep and prevention creep. JAMA 305(19):2005–2006 7. Djulbegovic M, Djulbegovic B (2011) Implications of the principle of question propagation for comparative-effectiveness and “data mining” research. JAMA 305(3):298–299 8. Djulbegovic B, Beckstead JW, Elqayam S, Reljic T, Hozo I, Kumar A et al (2014) Evaluation of physicians’ cognitive styles. Med Decis Making 34(5):627–637

References

99

9. Djulbegovic B, Elqayam S, Reljic T, Hozo I, Miladinovic B, Tsalatsanis A et al (2014) How do physicians decide to treat: an empirical evaluation of the threshold model. BMC Med Inform Decis Mak 14(1):47 10. Djulbegovic B, Hamm RM, Mayrhofer T, Hozo I, Van den Ende J (2015) Rationality, practice variation and person-centred health policy: a threshold hypothesis. J Eval Clin Pract 21(6):1121–1124 11. Djulbegovic B, Tsalatsanis A, Mhaskar R, Hozo I, Miladinovic B, Tuch H (2016) Eliciting regret improves decision making at the end of life. Eur J Cancer 68:27–37 12. Djulbegovic B, Elqayam S (2017) Many faces of rationality: implications of the great rationality debate for clinical decision-making. J Eval Clin Pract 23(5):915–922 13. Djulbegovic B, Elqayam S, Dale W (2018) Rational decision making in medicine: implications for overuse and underuse. J Eval Clin Pract 24(3):655–665 14. Djulbegovic B, Hozo I, Lizarraga D, Guyatt G (2023) Decomposing clinical practice guidelines panels’ deliberation into decision theoretical constructs. J Eval Clin Pract 15. He L, Zhao WJ, Bhatia S (2020) An ontology of decision models. Psychol Rev 129(1):49–72 16. Hozo I, Djulbegovic B (2008) When is diagnostic testing inappropriate or irrational? Acceptable regret approach. Med Decis Making 28(4):540–553 17. Hozo I, Djulbegovic B (2009) Clarification and corrections of acceptable regret model. Med Decis Making 29:323–324 18. Hozo I, Schell MJ, Djulbegovic B (2008) Decision-making when data and inferences are not conclusive: risk-benefit and acceptable regret approach. Semin Hematol 45(3):150–159 19. Kahneman D (2003) Maps of bounded rationality: psychology for behavioral economics. Am Econ Rev 93:1449–1475 20. Kahneman D (2012) Thinking fast and slow (UK edition). Penguin, London 21. Lim W, Le Gal G, Bates SM, Righini M, Haramati LB, Lang E et al (2018) American Society of Hematology 2018 guidelines for management of venous thromboembolism: diagnosis of venous thromboembolism. Blood Adv 2(22):3226–3256 22. Stanovich KE (2011) Rationality and the reflective mind. Oxford University Press, Oxford 23. Stanovich KE (2013) Why humans are (sometimes) less rational than other animals: Cognitive complexity and the axioms of rational choice. Think Reason 19(1):1–26 24. Stanovich KE (2018) How to think rationally about world problems. J Intell 6(2):25 25. Tsalatsanis A, Hozo I, Djulbegovic B (2017) Acceptable regret model in the end-of-life setting: patients require high level of certainty before forgoing management recommendations. Eur J Cancer 75:159–166

9

Medical Decision-Making and Artificial Intelligence

9.1

Introduction

In this chapter, we discuss the potential role that artificial intelligence (AI) may have in medical decision-making, the pros and cons, and the limitations and biases that might be introduced when using these novel techniques. As computing becomes more powerful and models continue to grow increasingly more complex, the potential of AI to improve decision-making is increasingly promising. Within many medical fields, however, at the time of this writing (September 2023), the promise of AI is yet to translate into everyday reality. Large language models such as ChatGPT (released November 2022) have demonstrated the complexity (and at times illogical “hallucinations”) produced by the model-based chatbot capable of passing the United States Medical Licensing Examination. Within many medical fields, however, at the time of this writing (September 2023), the promise of AI is yet to translate into everyday reality. Here, we summarize the role of AI in medical decision-making (diagnosis, prognosis, and treatment).

9.2

Machine Learning

AI is a broad term referring to a wide range of fields defined by systems or technologies with the capacity to make decisions similar to human decision-making. Here we primarily focus on machine learning (ML)—a subset of AI, which allows a machine to “learn” from past data without explicitly programming all mathematical and statistical relationships between input variables. From a statistical point of view, we can think of ML as elaborate regression methods. Machine learning algorithms depend entirely on a given task, which is the basis for their classification. Machine learning algorithms are usually classified according to types of tasks: supervised, semi-supervised, and unsupervised. These tasks

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 B. Djulbegovic and I. Hozo, Threshold Decision-making in Clinical Medicine, Cancer Treatment and Research 189, https://doi.org/10.1007/978-3-031-37993-2_9

101

102

9 Medical Decision-Making and Artificial Intelligence

differ according to whether the desired output is already known with the availability of gold standard (“truth”) data to guide the matching output for the given input. The most common type of computational learning is supervised, meaning that a learning function is matched to a training dataset for defined input and output categories. We employ the training dataset to develop a model or system, which we then use to evaluate a novel testing dataset. This supervision relies heavily on human expertise and engineering to properly design models or algorithms and optimize them for their accuracy. One extension of this property is the SuperLearner algorithm that estimates the performance of multiple machine learning models to create an optimal weighted average of the “ensemble” models to test data performance. In addition to standard regression techniques, the most popular ML-supervised algorithms comprise support vector machine (SVM), random forest (RF), gradient boosting, and neural network (NN), including popular deep learning techniques consisting of multiple hidden layers between inputs. As discussed below, deep learning has become a standard in image recognition analyses and is a rapidly growing field. The unsupervised ML algorithms employ the input data with no defined output categories as the algorithm categorizes output based on data distribution and structure. This exploratory machine learning method may identify unique patterns within a dataset that might otherwise be missed. The principal component analysis (PCA) and K-means clustering are common methods to determine the hidden patterns, typically based on some measures of similarities in the provided data. Note that NN, including deep learning techniques, can also be based on an unsupervised algorithm. Finally, aforementioned ChatGPT, which stands for “Generative Pre-Trained Transformer” is also trained with reinforcement learning and human feedback , even though the exact details of how it provides specific answers are often not transparent, or even not known. A hybrid of supervised and unsupervised learning, referred to as semisupervised learning, arises from a training dataset where some data includes defined input and output categories. An example includes natural language processing. Finally, a fourth primary class of machine learning is reinforced learning, where learning models take into account previous experience and “learn” through multiple interactions with an environment over time. Q-learning is a prototype of the reinforced learning algorithm.

9.3

Machine Learning and Clinical Care

Many studies exist where machine learning or deep learning algorithms have purportedly led to accurate clinical predictions (see Table 9.1). At the time of this writing, most AI studies are used to predict either a clinical diagnosis or prognosis. We are unaware of any studies that successfully report AI evaluating treatment effects. Nevertheless, the ongoing research has shown promise in several fields, including biosignal analysis in computer-aided diagnosis and pattern recognition

9.4 AI Challenges and Limitations

103

in interpreting medical scans, pathology slides, electrocardiograms, vital signs, and retinal imaging. Arguably, the most impactful and most outstanding promise machine learning brings to clinical practice is identifying and processing medical images. For example, machine learning algorithms have been associated with improved outcomes for predicting post-operative acute kidney injury. Machine learning algorithms were shown to be relatively equivalent to medical experts in providing accurate diagnoses based on medical images. They are increasingly applied to electronic health record (EHR) data to identify patterns of clinical importance. In recent years, there has been a “fusion” in approaches where medical imagery and electronic health data are used to train deep learning neural networks. New integrative models combine medical imagery with molecular data such as genomic or proteomic data and/ or environmental and demographic data. To date, the US Food and Drug Administration (FDA) has authorized 521 AI-based products, the vast majority being in radiology (75%) and cardiology (11%).

9.4

AI Challenges and Limitations

Despite its promises, critics have increasingly pointed to poor-quality evidence supporting AI applications in the clinical setting. The progress of using machine learning in clinical settings outside of medical image analysis has been hampered by both limitations to data availability and susceptibility to bias. The latter include spectrum bias (a bias arising from a different patient “spectrum” or demographics between clinical sites), reporting bias (bias introduced when reporting misrepresent the magnitude and/or the direction of the results), gender bias (bias made when over-considering gender), etc. Even for nine AI-based products for breast cancer screening approved by the FDA critics have highlighted important methodological issues; these include reliance on retrospective design, suboptimal external validation, and the lack of assessment of clinical utility. The creation of neural networks is often challenging due to the sheer volume of data required to obtain meaningful predictions. In some neural networks, millions of data points are used to train algorithms or neural networks, but within a limited number of institutions. Concerns about the reproducibility of the findings, conducting external validation, and overall transparency, and the reluctance to share such datasets with other investigators have been all raised. Therefore, there exist multiple sources of bias in machine learning methodologies (see Table 9.1). Measurement biases (particularly across different institutions) and spectrum bias (see definition above), or cohorts that fail to accurately represent the range of patients or symptoms, limit AI’s relevance and impact. Social bias (bias resulting from selectively overconsidering select patient demographics that inaccurately or misrepresent patients that do not share the same demographics) can disproportionately represent one particular group that might be used to train AI-based systems. These biases could be introduced statistically or by human error, leading to the incorrect diagnosis of nonor under-represented groups in the training dataset, potentially leading to inferior

Study design for included studies

Narrative review that included observational, prospective, retrospective study designs as well as interviews and six systematic reviews

Retrospective, prospective

Type of review (number of included studies)

Narrative review (n = 49)

Systematic review (n = 82)

Review primary author (Year)

[8]

[7]

Table 9.1 Reviews and exemplary studies

Diagnosis

Diagnosis

Type of clinical care assessed

(continued)

This review found that the diagnostic performance of deep learning models and healthcare professionals was relatively equivalent for diagnoses based on medical imagery (in areas including, but limited to, cancer diagnoses such as breast cancer, lung cancer, dermatological cancer, thyroid cancer, nasopharyngeal cancer). Specifically, hierarchical ROC curves showed that deep learning models were more accurate (Sensitivity 85.7%, Specificity 93.5%; n = 31) when compared to healthcare professionals (Sensitivity 79.4%, Specificity, 87.5%; n = 54). However, the authors noted that many deep learning studies were poorly reported, limiting the reliability of the reported diagnostic accuracy Key methodological issues noted: Inadequate data availability primarily due to poor reporting of AI model accuracy and the lack of external validation

In this narrative review, AI (primarily machine learning) aided surgical decision-making by accurately predicting pre- or post-operative complications (including patient morbidity and mortality). Note that this study did not include a meta-analysis Key methodological issues noted: electronic health record data need to be standardized to produce reliable models, and models need to be interpretable such that clinicians can identify why models produced a specific result, and to help quickly identify erroneous or misrepresentative data

Overall summary

104 9 Medical Decision-Making and Artificial Intelligence

Retrospective

Prospective and retrospective (n = 478)

Systematic review (n = 62)

Narrative review (n = 6 systematic reviews)

[10]

[13]

Table 9.1 (continued)

Diagnosis and prognosis

Diagnosis and prognosis

(continued)

Machine learning seldom achieves a clinical impact and is hindered by small datasets and challenges, including biases introduced with the choice of datasets. A meta-analysis across six systematic reviews indicated that larger datasets do not improve machine learning Alzheimer’s diagnosis based on medical imaging, cognitive measure, genetic, and/or patient demographic data. The authors report that studies with larger sample sizes tended to have worse predictive accuracy than those with smaller datasets Key methodological issues noted: There is a need for randomized clinical trials and AI algorithms, but a double-blind study design is not possible. Furthermore, many AI studies suffer from poorly reported algorithm evaluation and unsuitable evaluation metrics. There may be reporting bias as some studies only included “positive” studies. Lastly, quality data are limited and may be biased by failing to include the entire range of possible demographics of patients (i.e., spectrum bias)

In a review of 62 studies reporting the use of machine learning algorithms, none had the potential for clinical use due to methodological flaws and/or underlying biases Key methodological issues noted: Many studies lacked the documentation for AI algorithms, thus obfuscating their impact. Furthermore, the results from some studies were not reproducible as some studies relied on publicly available datasets that were updated. Some papers failed to report methodology (e.g., data pre-processing, details of algorithm training, sensitivity analysis, and demographics of patients in each partition) or assess biases of their own studies against a previously established framework

9.4 AI Challenges and Limitations 105

Retrospective

Adhikari Machine learning and et al. (2019) intra-operative time series data

Retrospective

Retrospective

Machine learning and radiomics

Leger et al. (2017)

Study design

Rajkomar Deep learning and et al. (2018) electronic health record (EHR) data

Method

Exemplary studies

Table 9.1 (continued)

Diagnosis

Diagnosis and prognosis

Prognosis

Type of clinical care assessed

Post-operative acute kidney injury prediction was improved with a high degree of accuracy Key study limitations: The training dataset only included data from a population in North Central Florida, USA, which may not be applicable to other populations

Models to assist clinicians in identifying clinical problems from EHR data were developed and outperformed existing EHR models for predicting mortality, unexpected readmission, and increased length of stay Key study limitations: The algorithm relied on many rare variables unique to a given medical site from given electronic health record rather than common variables that are used across several sites; the algorithm did not harmonize data between clinical sites

A subset of algorithms was identified for future radiomics studies to develop models for time-to-event endpoints Key study limitations: The algorithm failed to use censored time-to-event survival data

Major findings

106 9 Medical Decision-Making and Artificial Intelligence

9.4 AI Challenges and Limitations

107

health outcomes. Addressing AI bias by conducting rigorous studies evaluating the impact of AI on health care should become one of the key research priorities if we are to improve decision-making and health outcomes. Adhering to recently published DECIDE-AI guidelines may facilitate the critical evaluation of the AI studies and their reproducibility.

9.5

Statistical and Decision-Theoretical View of Artificial Intelligence Modeling

As mentioned above, AI and ML should be considered elaborate regression methods. Note, however, that no firm classification rules separate what constitutes “traditional” statistical models (SM) versus ML algorithms. However, unlike standard statistical methods, AI/ML techniques aim solely at prediction, not estimation or explanation. In other words, if explanation, transparency, and inferences focusing on the effect of relevant variables are essential, then traditional SM may be preferable to ML; if the prediction is all that matters, ML may be superior to SM. Harell provides a number of recommendations for when to use SM versus ML, the most important of which is the availability of an adequate sample size. SM is superior when interactions can be prespecified and are relatively small in number, and the sample size is not large (~10–20 events per candidate predictor). However, when nonlinearity is expected to be strong and cannot be isolated to a few pre-specified variables, the sample size is large (i.e., > 200 events per variable), and one does not care if the model is a “black box”, ML may be a better modeling choice. From a decision-theoretical point of view, one would expect that AI/ML algorithms work in stable worlds such as face recognition or clinical imaging. However, simpler models may outperform ML models in unstable, uncertain worlds, which invariably include changing estimates of the benefits and harms of available treatments and the role of affect and emotions that unavoidably shape patients’ values and preferences. This can also be explained by so-called the bias-variance dilemma. To estimate if the model predicts an event (e.g., diagnosis, prognosis, the choice of treatment) of interest, we can calculate the total prediction error of the algorithm. This error can be decomposed into bias and random error (see also Glossary for definitions): Total error = (bias)2 + variance + irreducible error

(9.1)

Bias and variance go in the opposite direction—as one increases, the other decreases. The goal of modeling is to find the optimal model complexity where bias and variance are equally minimal. While simpler models can be more biased because they have few parameters, they are less unaffected by variance (random error). On the other hand, variance increases as we increase the number of free parameters (variables) in the model and have a small training sample size. As a result, total error can often be smaller in simpler than complex ML models, a phenomenon known as “less is more”.

108

9.6

9 Medical Decision-Making and Artificial Intelligence

Conclusions

AI has slowly made its way into health care, holding the considerable potential to improve medical decision-making seems and health outcomes. However, for AI to be routinely used in clinical practice, AI models should be rigorously validated and transparently reported.

References 1. Bzdok D, Ioannidis JP (2019) Exploration, inference, and prediction in neuroscience and biomedicine. Trends Neurosci 42(4):251–262 2. Efron B, Hastie T (2016) Computer age statistical inference: algorithms, evidence, and data science. Cambridge University Press, Cambridge UK 3. FDA approved list of Artificial Intelligence and Machine Learning (AI/ML)-Enabled Medical Devices (Artificial Intelligence and Machine Learning (AI/ML)-Enabled Medical Devices | FDA) accessed January 18, 2023) 4. Gottlieb S, Silvis L (2023) How to Safely Integrate Large Language Models Into Health Care JAMA Health Forum 4(9):e233909–e233909. https://doi.org/10.1001/jamahealthforum.2023. 3909 5. Harrell F (2019) Road map for choosing between statistical modeling and machine learning. Statistical Thinking 6. Katsikopoulos KV, Simsek O, Buckmann M, Gigerenzer G (2021) Classification in the wild: the science and art of transparent decision making. MIT Press 7. Liu X, Faes L, Kale A, Wagner S, Fu D, Bruynseels A et al (2019) A comparison of deep learning performance against health-care professionals in detecting diseases from medical imaging: a systematic review and meta-analysis. Lancet Digit Health. Artificial Intelligence in Lung Cancer Pathology Images. 2018 8. Loftus TJ, Tighe PJ, Filiberto AC, Efron PA, Brakenridge SC, Mohr AM et al (2020) Artificial intelligence and surgical decision-making. JAMA Surg 155(2):148–158 9. Peduzzi P, Concato J, Kemper E, Holford TR, Feinstein AR (1996) A simulation study of the number of events per variable in logistic regression analysis. J Clin Epidemiol 49(12):1373– 1379 10. Roberts M, Driggs D, Thorpe M, Gilbey J, Yeung M, Ursprung S et al (2021) Common pitfalls and recommendations for using machine learning to detect and prognosticate for COVID-19 using chest radiographs and CT scans. Nature Mach Intell 3(3):199–217 11. Shah NH, Entwistle D, Pfeffer MA (2023) Creation and Adoption of Large Language Models in Medicine JAMA 330(9):866–869. https://doi.org/10.1001/jama.2023.14217 12. van der Ploeg T, Austin PC, Steyerberg EW (2014) Modern modelling techniques are data hungry: a simulation study for predicting dichotomous endpoints. BMC Med Res Methodol 14(1):1–13 13. Varoquaux G, Cheplygina V (2022) Machine learning for medical imaging: methodological failures and recommendations for the future. NPJ Digital Med 5(1):1–8 14. Vasey B, Nagendran M, Campbell B, Clifton DA, Collins GS, Denaxas S, et al (2022) Reporting guideline for the early stage clinical evaluation of decision support systems driven by artificial intelligence: DECIDE-AI. BMJ 377

Appendix

A.1

Expected Utility Theory

A.1.1

A Decision About Treatment (Rx) Versus No Treatment (NoRx): When the Diagnosis (Clinical Event1 ) Is Not Certain and No Further Diagnostic (dx) Test Is Available

We derive our results from the simple decision tree pictured in Fig. A.1. For our situation, EU (R x ) = p · U (Rx, D+) + (1 − p) · U (Rx, D−)

(A.1)

and EU (N oRx ) = p · U (N oRx, D+) + (1 − p) · U (N oRx, D−)

(A.2)

where U (Rx, D+) = U1 , U (Rx, D−) = U2 , U (N oRx, D+) = U3 and U (N oRx, D−) = U4 are utilities or utility functions for the results of Treatment and No Treatment in the presence (D+) and absence of a disease (D−), respectively, and p = p D is the probability that the disease is present. The net benefit of the treatment is defined as the difference between the utility of the outcomes of giving and withholding treatment in patients with a disease: B = U (Rx, D+) − U (N oRx, D+) = U1 − U3

(A.3)

1

See Glossary for definition; event can refer to either diagnosis/prognosis (e.g., breast cancer) when it is typically modeled in the probability branches of the decision tree, or outcomes (e.g., death), when it is modeled at the utility nodes of the tree. Note that events expressed as composite outcomes (such as breast cancer or VTE recurrence) often refer to probabilities and utilities. To avoid double counting, the model shown under Section B should be used. .

© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 B. Djulbegovic and I. Hozo, Threshold Decision-making in Clinical Medicine, Cancer Treatment and Research 189, https://doi.org/10.1007/978-3-031-37993-2

109

110

Appendix

Fig. A.1 Consider a clinical situation when no further diagnostic test is available to a physician, and he/she can choose between two alternatives only: to treat (Rx ), or not to treat (NoRx ) a patient who may or may not have a disease. Expected utility (EU) is the average utility of all possible results, weighted by their corresponding probabilities. The decision-maker should select the option with the larger expected utility

The net harms is defined as the difference between utility of the outcomes associated with withholding and administering treatment in patients without disease: H = U (N oRx, D−) − U (Rx, D−) = U4 − U2

(A.4)

Often we are interested in knowing at which probability of a disease p, are the expected utilities equal: EU (Rx) = EU (N oRx). Equations A.1–A.4 can be solved to: p · U1 + (1 − p) · U2 = p · U3 + (1 − p) · U4 p · (U1 − U3 ) + p · (U4 − U2 ) = U4 − U2 pt =

H 1 = B+H 1+

B H

(A.5)

where pt is the threshold probability at which expected value of the treatment is equal to expected value of No Treatment. Equation (A.5) shows a generic version of the threshold equation based on generic definitions of net benefits and harms as per above. Here, B and H refer to subjective, holistic assessment of net benefits and harms. To obtain a specific version of the threshold equation, we substitute net benefits and harms through disutilities related to morbidity or mortality (M) associated with treating versus not treating (or, using one treatment versus another). We distinguish M-morbidity or mortality that occur in the absence of treatment and Mrx-morbidity/mortality that occur while on the treatment; we also define Hrx as

Appendix

111

harms that occur due to treatment, as well as a relative value that a decision-maker places on avoiding disease burden versus tolerating harms of treatment (RV H ). We also assumed that most medical interventions have constant (relative) effects over the range of predicted absolute risk and are conveniently modeled in decision analyses as risk ratio (RR), or RRR (relative risk reduction)= 1 − R R.2 Thus, we express utilities in the following way3 : U1 = U (Rx, D+) = 1 − Mr x − RVH · Hr x = 1 − M · (1 − R R R) − RVH · Hr x U2 = U (Rx, D−) = 1 − RVH · Hr x U3 = U (N oRx, D+) = 1 − M U4 = U (N oRx, D−) = 1 Substituting these values into the definition of net benefit B = U1 − U3 = M · RRR − RVH · Hr x and net harms H = U4 − U2 = RVH · Hr x , we obtain a new version of the threshold probability: pt =

A.1.2

1 1+

M·RRR−RV H ·Hr x RV H ·Hr x

=

RVH · Hr x RRR · M

(A.6)

Rx Versus NoRx

1. When diagnosis (clinical event)4 is certain and no further dx test is available If the diagnosis is certain, i.e., p = p D = 1 = 100%, calculating the threshold probability of disease makes no sense. However, we could determine the threshold morbidity or mortality M = Mt , above which the decision to treat will be rational. The expected values are then EU (Rx1 ) = U1 and EU (Rx2 ) = U3 and will be in equilibrium when U1 = U3 , 1 − M · (1 − RRR) − RVH · Hr x = 1 − M M · RRR = RVH · Hr x

2

In this book we modeled treatment effects mostly in the utility nodes of decision trees. It is possible to model treatment effects in the probability branches of the trees (or, probability branches and utilities nodes), but this is rarely done. Therefore, we have adhered to the contemporary accepted practice (see also Chap. 6 for additional explanations). 3 In reality, the utility U should be defined as U = (1 − M )·(1 − H ) = 1− M − H − M · 1 1 rx rx rx rx rx Hr x , however, we assumed that the simultaneous occurring of the effect of disease and the harms of treatment is a clinically rare occurrence and is also mathematically negligible Mr x · Hr x ≈ 0, which is the reason we did not include this product in the definition. 4 See the footnote #16.

112

Appendix

Mt =

RVH · Hr x RRR

(A.7)

2. When diagnosis (clinical event) is not certain and outcomes are expressed in binary (yes/no) terms and no further dx test is available An equivalent result can be obtained with slightly different assumptions. If we assume that the probability of an event is affected by treatment, i.e., if we assume we have two different probabilities of the event, po for patients who are not treated; p R X for patients who are treated. Then, if RRR is the relative risk ratio differentiating these probabilities, we have p R X = (1 − RRR) · po . In this case, the expected values for each strategy are given as E[Rx] = p R X U1 + (1 − p R X )U2 = (1 − RRR) · po · U1 + (1 − (1 − RRR) · po ) · U2 E[N oRx] = po U3 + (1 − po )U4

If we are only interested in whether the event (e.g., death) happens or not, we have U1 = 1−1− RVH · Hr x = −RVH · Hr x ; U2 = 1− RVH · Hr x ; U3 = 1−1 = 0; and U4 = 1. In that case, the threshold for the probability of the event in the control group is the solution of the equation (1 − RRR) · po · (−RVH · Hr x ) + (1 − (1 − RRR) · po ) · (1 − RVH · Hr x ) = po (0) + (1 − po )(1) After a bit of algebra, we obtain: pt =

RVH · Hr x RRR

(A.8)

Thus, we should commit to treatment if the probability for an untreated patient, p D , is greater than the threshold probability, pt .

A.1.3

Rx1 Versus Rx2: When Diagnosis Is Not Certain and No Further dx Test Is Available

We derive our results from the simple decision tree pictured in Fig. A.2. For our situation, all the formulas are the same as above except U (Rx1 , D+) = U1 , U (Rx1 , D−) = U2 , U (Rx2 , D+) = U3 and U (Rx2 , D−) = U4 . The net benefit of the treatment is defined as the difference between the utility of the outcomes of giving and withholding a given treatment in patients with a disease: B = U (Rx1 , D+) − U (Rx2 , D+) = U1 − U3

(A.9)

Appendix

113

Fig. A.2 Consider a clinical situation when no further diagnostic test is available to a physician, and he/she can choose between two alternatives only: to treat using a particular treatment (Rx1 ), or to treat using an alternative treatment (Rx2 ) a patient who may or may not have a disease

The net harms is defined as the difference between utility of the outcomes associated with withholding and administering a given treatment in patients without disease: H = U (Rx2 , D−) − U (Rx1 , D−) = U4 − U2

(A.10)

The threshold formula is the same as before, using global, holistic metrics: pt =

R 1 = B+H 1+

B H

(A.11)

When we express the utilities in specific terms of mortalities, morbidities, and treatment efficacy (relative risk reduction), we get: U1 = U (Rx1 , D+) = 1 − Mr x1 − RVH · Hr x1 = 1 − M · (1 − RRR1 ) − RVH · Hr x1 U2 = U (Rx1 , D−) = 1 − RVH · Hr x1 U3 = U (Rx2 , D+) = 1 − Mr x2 − RVH · Hr x2 = 1 − M · (1 − RRR2 ) − RVH · Hr x2 U4 = U (Rx2 , D−) = 1 − RVH · Hr x2 Substituting these values into the definition of net benefit B = U1 −U3 = 1−M · (1 − RRR1 )− RVH · Hr x1 −1+ M ·(1 − RRR2 )+ RVH · Hr x2 = (Mr x2 − Mr x1 )− RVH ·(Hr x1 − Hr x2 ) = M ·(RRR1 − RRR2 )− RVH ·(Hr x1 − Hr x2 ) and net harms

114

Appendix

H = U4 − U2 = 1 − RVH · Hr x2 − 1 + RVH · Hr x1 = RVH · (Hr x1 − Hr x2 ), we obtain a new version of the threshold probability: pt =

A.1.4

1 1+

M·(RRR1 −RRR2 )−RV H ·(Hr x1 −Hr x2 ) RV H ·(Hr x1 −Hr x2 )

=

RVH · (Hr x1 − Hr x2 ) (RRR1 − RRR2 ) · M

(A.12)

Rx1 Versus Rx2: When Diagnosis Is Certain and No Further dx Test Is Available

If the diagnosis is certain, i.e., p = p D = 1 = 100%, calculating the threshold probability of disease makes no sense. However, as before, we could determine the threshold morbidity or mortality Mt , above which the decision to treat will be rational. The expected values are then EU (Rx ) = U1 and EU (N oRx ) = U3 and will be in equilibrium when U1 = U3 , 1 − M · (1 − RRR1 ) − RVH · Hr x1 = 1 − M · (1 − RRR2 ) − RVH · Hr x2 Mt =

A.1.5

RVH · (Hr x1 − Hr x2 ) RRR1 − RRR2

(A.13)

Rx Versus NoRx or Rx1 Versus Rx2: Number Needed to Treat and Number Needed to Harm

Popular indices of therapeutic benefit often include the number of patients who need to be treated to prevent one bad outcome or attain one good outcome (NNT). NNT represents the reciprocal of the difference in event rates between the treatment alternatives: 1 1 or = M − Mr x RRR · M 1 1 NNT = = Mr x2 − Mr x1 (RRR1 − RRR2 ) · M NNT =

(A.14)

The first formula represents the number needed to treat (NNT) in Treatment versus No Treatment scenario (Rx versus NoRx), while the second formula in expression (A.12) represents the scenario with two alternative treatments. The harmful effects of treatment can be presented in a similar way. The common way to express this is to assess the rates of adverse effects due to treatment or to calculate the NNH (the number of patients who must be treated for one individual to experience a harmful event). NNH =

1 1 or N N H = Hr x Hr x1 − Hr x2

(A.15)

Appendix

115

Fig. A.3 A decision tree conceptualizing three decisions (treat, perform a diagnostic test, and act according to test result, or withhold test/treatment) and all outcomes resulting from a chance event. The small black square represents a decision, and the small black circular nodes represent events due to chance (i.e., test results or a patient’s outcomes). Utilities (U1 − U8 ) for each outcome are shown enclosed by a rectangle. Each is represented in terms of disutilities by subtracting evidencebased measures of the effects of morbidity or mortality (M), harms of treatment (Hr x ), and harms of testing (Hte ) from perfect health (customarily set at 1). S is the test sensitivity (the frequency of true positives, TP), Sp is the test specificity (or the frequency of true negatives, TN), FN is the frequency of false negatives (FN = 1 − S), FP is the frequency of false positives (FP = 1 − Sp), and RRR refers to the relative risk reduction

The formula on the left represents the number needed to harm (NNH) in Treatment versus No Treatment scenario (Rx versus NoRx), while the formula on the right in expression (A.13) represents the scenario with two alternative treatments. Using these indices, we can combine the Formulas (A.6) and (A.10) (a scenario when diagnosis is not certain and no further dx test is available) into a single formula expressing the threshold probability as pt =

A.1.6

RVH · NNT NNH

(A.16)

Rx Versus NoRx: When Diagnosis Is Not Certain and a dx Test Is Available

We derive our results from the simple decision tree pictured in Fig. A.3 The expected values of the three possible decisions are EU(Rx) = p · U1 + (1 − p) · U2

116

Appendix

EU(Test) = S · p · U3 + (1 − Sp) · (1 − p) · U4 + (1 − S) · p · U5 + Sp · (1 − p) · U6 = S · p · (U1 − Hte ) + (1 − Sp) · (1 − p) · (U2 − Hte ) + (1 − S) · p · (U7 − Hte ) + Sp · (1 − p) · (U8 − Hte ) = S · p · U1 + (1 − Sp) · (1 − p) · U2 + (1 − S) · p · U7 + Sp · (1 − p) · U8 − Hte EU(NoRx) = p · U7 + (1 − p) · U8 Usually, when the probability of the disease is very small, the best decision is not to administer treatment (as U8 is the largest quantity); when the probability of the disease is very large the best decision is to administer treatment whenever the treatment is reasonably effective, while for moderate values of the probability, the default action is to test. The obvious question is then to find the actual threshold probabilities at which the expected utility of one action is the largest. Using our usual definitions of net benefit B and net harms H as: B = U (Rx, D+) − U (NoRx, D+) = U1 − U7 H = U (NoRx, D−) − U (Rx, D−) = U8 − U2 we get that EUT(NoRx) = EUT(Test) when at the threshold probability p = ptt : p · U7 + (1 − p) · U8 = S · p · U1 + (1 − Sp) · (1 − p) · U2 + (1 − S) · p · U7 + Sp · (1 − p) · U8 − Hte Hte = S · p · (U1 − U7 ) − (1 − Sp) · (1 − p) · (U8 − U2 ) Hte = S · p · B − (1 − Sp) · (1 − p) · H (1 − Sp) · H + Hte = S · p · B + (1 − Sp) · p · H

ptt =

FP · H + Hte (1 − Sp) · H + Hte = FP · H + TP · B (1 − Sp) · H + S · B

(A.17)

Similarly, at the threshold probability p = pr x , the expected values of administering treatment and testing are equal, EU T (Rx) = EU T (T est) p · U1 + (1 − p) · U2 = S · p · U1 + (1 − Sp) · (1 − p) · U2 + (1 − S) · p · U7 + Sp · (1 − p) · U8 − Hte p · (U1 − U7 ) = S · p · (U1 − U7 ) + Sp · (1 − p) · (U8 − U2 ) − Hte p · B − S · p · B + Sp · p · H = Sp · H − Hte

pr x =

Sp · H − Hte TN · H − Hte = Sp · H + (1 − S) · B TN · H + FN · B

(A.18)

If we use utilities defined through mortality/morbidity measures (as indicated in Fig. A.3), we have B = RRR · M − Hr x and H = Hr x and the Formulas (A.14) and (A.15) become ptt =

(1 − Sp) · Hrx + Hte (1 − Sp) · Hrx + S · (RRR · M − Hrx )

(A.19)

Appendix

117

prx =

Sp · Hrx − Hte Sp · Hrx + (1 − S) · (RRR · M − Hrx )

(A.20)

When the test is harmless, Hte = 0, the formulas can be simplified using test’s likelihood ratios: LR+ =

S 1−S and LR− = 1 − Sp Sp

Then the thresholds are given by ptt =

1 S · B 1 + 1−Sp H

=

1 B 1 + LR+ · H

and prx =

1 1 = B −· B 1 + 1−S · 1 + LR Sp H H

(A.21) Note that when the net benefit of a treatment is negative (B < 0, i.e., when RRR · M < Hrx ), the Formulas (A.15)–(A.19) all indicate the treatment should not be administered ( prx > 1) and that even a harmless test (Hte ≈ 0) should never be ordered ( ptt > 1). In fact, ptt will be greater than 1 unless the net benefit of treating those with a positive test, i.e., S ·(RRR · M−Hrx ), is greater than the harm of testing Hte .

A.2

Expected Regret Theory (ERT)

A.2.1

Rx Versus NoRx: When Diagnosis Is Not Certain and No Further dx Test Is Available

Accordingly, the regret of giving treatment in the presence of disease is equal to: Rg(Rx, D+) = max[U (Rx, D+), U (N oRx, D+)] − U (Rx, D+) = 0 Rg(Rx, D−) = max[U (Rx, D−), U (N oRx, D−)] − U (Rx, D−) = U4 − U2 = H Rg(N oRx, D+) = max[U (Rx, D+), U (N oRx, D+)] − U (N oRx, D+) = U1 − U3 = B Rg(N oRx, D−) = max[U (Rx, D−), U (N oRx, D−)] − U (N oRx, D−) = 0

Just like with expected utility we can calculate expected regret, as a sum of all regrets weighted by their probabilities E Rg(Rx) = p · Rg(Rx, D+) + (1 − p) · Rg(Rx, D−) = (1 − p) · H E Rg(N oRx) = p · Rg(N oRx, D+) + (1 − p) · Rg(N oRx, D−) = p · B The threshold probability is the probability at which the expected regret the actions is the same, i.e., when Erg(Rx) = Erg(N oRx), and obtain the same threshold formula: (1 − p) · H = p · B

118

Appendix

Fig. A.4 A decision tree showing a clinical situation when diagnosis is not certain, and no further diagnostic test is available to a physician who has to choose between two alternatives only: to treat (Rx ), or not to treat (NoRx). The tree is solved using regret theory. We define regret as the difference between the utility of the outcome of the action taken and the utility of the outcome of another action we should have taken, in retrospect (see text for details)

H − p·H = p·B pt =

H 1 = B+H 1+

(A.22)

B H

At the intersection point, where p = pt , the expected regret is maximum: Erg(Rx) = (1 − p) · H = Erg(NoRx) = p · B = pt =

H∗B . B+H

That is, unlike the threshold probability, the maximal level of expected regret does not only depend on the net benefit/harm ratio but also on the absolute magnitude of the treatment net benefit. For example, for B/H = 10, the same results will be obtained for net benefits of 10%, 1%, or 0.01%, as long as the absolute harms is 1%, 0.1%, or 0.001%, respectively. According to EUT, we should treat for any p > pt , regardless of the absolute magnitude of benefits. However, because expected regret depends not only on the benefit/harms ratio but on the absolute magnitude of treatment benefit, according to ERT, we may, for example, rationally withhold treatment if the absolute magnitude of treatment benefits is small. Below we further discuss the violation of EUT under the concept of acceptable regret. When we express the utilities via using mortalities or morbidities and treatment efficacy/effectiveness (relative risk reduction), we get net benefit B = U1 − U3 =

Appendix

119

M · RRR − RVH · Hrx and net harms H = U4 − U2 = RVH · Hrx . From this, we obtain another version of the threshold probability pt =

A.2.2

RVH · Hrx M · RRR

(A.23)

Rx Versus NoRx: When Diagnosis Is Certain and No Further dx Test Is Available

If the diagnosis is certain, i.e., p = p D = 1 = 100%, the threshold probability of disease makes no sense; however, we could determine the threshold mortality Mt , above which the decision to treat will be rational (for M > Mt ) and below which we should withhold treatment. The expected regret values are equal to E Rg(Rx ) = 0 and E Rg(N oRx ) = B; they will be in the equilibrium when B = 0. In other words, the threshold mortality is M = Mt for which M · RRR − RVH · Hr x = 0, and Mt =

A.2.3

RVH · Hr x RRR

(A.24)

Rx1 Versus Rx2: When Diagnosis Is Not Certain and No Further dx Test Is Available

In case when we have two treatments, the decision tree as analogous to the tree above, except the net benefit is given as B = (Mr x2 − Mr x1 ) − RVH · (Hr x1 − Hr x2 ) = M · (RRR1 − RRR2 ) + RVH · (Hr x2 − Hr x1 ) and the net harms are H = RVH · (Hr x1 − Hr x2 ), giving us the familiar threshold: pt =

A.2.4

1 1+

M·(RRR1 −RRR2 )−RV H ·(Hr x1 −Hr x2 ) RV H ·(Hr x1 −Hr x2 )

=

RVH · (Hr x1 − Hr x2 ) (RRR1 − RRR2 ) · M

(A.25)

Rx Versus NoRx: When Diagnosis Is Not Certain and a dx Test Is Available

The expected regret of the three possible decisions is E Rg(Rx) = (1 − p) · H E Rg(T est) = (1 − Sp) · (1 − p) · H + (1 − S) · p · B + Hte E Rg(N oRx) = p · B

120

Appendix

Fig. A.5 A decision tree conceptualizing three decisions (treat, perform a diagnostic test and act accordingly, or withhold test/treatment) and all outcomes resulting from a chance event. The small black square represents a decision, and the small black circular nodes represent events due to chance (i.e., test results or a patient’s outcomes). Utilities for each outcome are shown enclosed by a rectangle. Each is represented in terms of disutilities by subtracting evidence-based measures of the effects of morbidity or mortality (M), harms of treatment (Hr x ), and harms of testing (Hte ) from perfect health (customarily set at 1). S is the test sensitivity (the frequency of true positives, T P), Sp is the test specificity (or the frequency of true negatives, T N ), F N is the frequency of false negatives (F N = 1−S), F P is the frequency of false positives (F P = 1−Sp), and E = RRR refers to the efficacy or effectiveness of treatment, expressed as relative risk reduction

To find the threshold probability between NoRx and test decisions, p = ptt , we set E Rg(N oRx) = E Rg(T est): p · B = (1 − Sp) · (1 − p) · H + (1 − S) · p · B + Hte (1 − Sp) · p · H + S · p · B = (1 − Sp) · H + Hte ptt =

F P · H + Hte (1 − Sp) · H + Hte = FP · H + T P · B (1 − Sp) · H + S · B

(A.26)

Similarly, the threshold probability delineating test and Rx decisions, p = pr x is obtained by solving E Rg(T est) = E Rg(Rx): (1 − Sp) · (1 − p) · H + (1 − S) · p · B + Hte = (1 − p) · H Sp · p · H + (1 − S) · p · B = Sp · H − Hte pr x =

Sp · H − Hte T N · H − Hte = Sp · H + (1 − S) · B T N · H + FN · B

(A.27)

Appendix

121

Just like in Chap. 2 (expected utility theory) we can rewrite the Formulas (A.15) and (A.16) using utilities expresses through mortality/morbidity measures: ptt =

(1 − Sp) · Hr x + Hte (1 − Sp) · Hr x + S · (RRR · M − Hr x )

(A.28)

pr x =

Sp · Hr x − Hte + (1 − S) · (RRR · M − Hr x )

(A.29)

Sp · Hr x

When the test is harmless, Hte = 0, the formulas can be simplified using test’s likelihood ratios: L R+ =

1−S S and L R − = 1 − Sp Sp

Then the thresholds are given by ptt =

1 S · B 1 + 1−Sp H

pr x =

=

1

and B 1 + L R+ · H

1 1 = B −· B · 1 + 1−S 1 + L R Sp H H

(A.30)

Note that under typical conditions, ERT represents an inverse version of EUTL in most cases minimizing expected regret results in the same decision as maximizing expected utility. Accordingly, the threshold formulas shown above under ERT and EUT are identical. We now turn to important concept of acceptable regret.

A.2.5

Acceptable Regret Rx Versus NoRx: When the Diagnosis Is Not Certain, and a Diagnostic Test Is Available

In this section, we are asking the following question: Which decision should we make if we want to ensure that the regret is less than a predetermined minimal acceptable regret, R0 , even, in hindsight, such a decision proved wrong?

Solving the inequalities E Rg(Rx) = (1 − p) · H ≤ R0 E Rg(T est) = (1 − Sp) · (1 − p) · H + (1 − S) · p · B + Hte ≤ R0 E Rg(N oRx) = p · B ≤ R0 for the probability p, we obtain the following acceptable regret thresholds: Par x = 1 −

R0 H

(A.31)

122

Appendix

Patt =

R0 − (1 − Sp) · H − Hte (1 − S) · B − (1 − Sp) · H

(A.32)

R0 B

(A.33)

Pawh =

Note that the actions below the acceptable regret line (broken red line in Fig. A.6) do not necessarily agree with decisions under EUT (see Chap. 3 for further interpretation). Therefore, if the probability of the disease is larger than Par x , we can administer treatment assured that the regret will be at most R0 . Similarly, if the probability of the disease is less than Pawh we can withhold treatment and rest assured that our regret will be at most R0 . The situation with the test is a little more complicated. If (1 − S) · B > (1 − Sp) · H we can order a test without fearing regret greater than R0 if the probability of the disease is less than Patt . On the other hand, if (1 − S) · B < (1 − Sp) · H we can order a test without fearing regret bigger than R0 if the probability of the disease is more than Patt . Most commonly, however, (1 − S) · B > (1 − Sp) · H , as the net benefit is usually significantly larger than the net harms of treatment. In these cases, whenever R0 − (1 − Sp) · H − Hte > (1 − S) · B − (1 − Sp) · H > 0, or, assuming Hte ≈ 0, whenever R0 > (1 − S) · B > (1 − Sp) · H , we get that the corresponding acceptable regret threshold is Patt > 1, meaning that we can always order a diagnostic test without feeling unacceptable regret. Similarly, whenever (1 − S) · B − (1 − Sp) · H > 0 > R0 − (1 − Sp) · H − Hte .Or, assuming Hte ≈ 0, whenever (1 − S) · B > (1 − Sp) · H > R0 , we obtain that the corresponding acceptable regret threshold is Patt < 0, indicating that we can never order a diagnostic test without feeling unacceptable regret. It is important to note that threshold probabilities Par x , Patt , and Pawh in Eqs. (A.30)–(A.32) may not necessarily be ranked or follow any particular order, i.e., depending on our acceptable value, the threshold Par x may be smaller than Patt or even Pawh .

A.3

Fast-And-Frugal Trees (FFT and FFTT): An Example

Chapter 5 outlines FFT and FFTT (fast-and-frugal tree with threshold) as a powerful tool linking a series of decisions into a decision strategy (pathway). Here we illustrate details of calculations. Let’s assume, you are seeing a patient who was recently diagnosed with pancreatic cancer and presented with sudden onset of shortness of breath. You suspected pulmonary embolism (PE). You are aware of the evidence-based guidelines by the American Society of Hematology (ASH) panel, which recommends considering the computerized pulmonary angiography (CTPA), and if negative, the D-Dimer (DD) for diagnosis of PE (https://ashpublications.org/bloodadvances/article/2/22/ 3226/16134/American-Society-of-Hematology-2018-guidelines-for).

Appendix

123

124

Appendix

Fig. A.6 a A graphical representation of hypothetical expected regret thresholds ptt and prx , as well as the acceptable regret thresholds Parx , Patt and Pawh . b Further shows a relationship EUT, ERT (acceptable regret) and the overuse of underuse in delivery of healthcare interventions. If the probability of the disease is smaller than the acceptable expected regret theory (AERT) testing threshold, pat, the AERT would lead to test ordering, whereas according to EUT, we should not test. This, in turn, leads to overtesting. If the probability of the disease is larger than the AERT testing threshold, pat, according to AERT, we would be reluctant to order the test, whereas according to EUT, we should test. As a consequence, this would lead to undertesting. We think that overtesting typically occurs in the “rule out worst-case scenarios” in which physicians cannot afford to miss a particular diagnosis. Once a serious diagnostic possibility enters the physician’s mind, every patient with chest pain or shortness of breath gets a computed tomography (CT) angiogram to rule out pulmonary embolism (PE), every patient with headache receives a CT of the head to rule out brain tumor, every patient with “incidentaloma” (incidental and unexpected finding of a mass on imaging studies performed for different reasons) gets a biopsy, and so on. We think undertesting typically occurs when ordering a diagnostic test is perceived as not needed (i.e., consciously or subconsciously is felt to be risky or associated with unacceptable level of regret). Hence, patients with atypical chest pain will not get a CT angiogram to rule out PE, patients with headache do not get a CT of the head, patients with “incidentaloma” will not get a biopsy, and so on. Ptt , testing threshold according to EUT; Prx , treatment threshold according to EUT; Pat , testing threshold according to acceptable regret (Ro) model; Pawh, threshold probability below which treatment can be withheld without experiencing regret (if decision was wrong) (see also Chap. 3) (reproduced from Hozo and Djulbegovic with permission)

Let’s illustrate how can we convert these guidelines into FFT. You start by estimating the patient’s prior probability of PE. Using a popular Geneva score for PE (https://www.mdcalc.com/calc/1750/geneva-score-revised-pulmonary-emb olism), you estimate the prior PE in your patient as 45% ( p D = Pr(D+) = 0.45). The ASH panel appraised evidence on age-adjusted D-Dimer as high with estimated sensitivity = 0.99 (0.98 to 1) and specificity = 0.47 (0.45 to 0.49) based on one large study that enrolled 2885 patients. Certainty of evidence of CTPA was appraised as moderate; the panel estimated sensitivity of CT = 0.93 (0.88 to 0.96) and specificity = 0.98 (0.96 to 0.99) based on 15 studies that enrolled 3929 patients. Figure A.7 shows how this information can be organized into an FFT. Let’s show how the numbers shown in Fig. A.7 were derived. To calculate posterior probability at each exit, we apply Bayes’ theorem: Pr(D + |CT+) =

SensCT · p D ( ) SensCT · p D + 1 − SpecCT · (1 − p D)

0.93 · 0.45 = 0.9744 ≈ 97.44% 0.93 · 0.45 + (1 − 0.98) · (1 − 0.45) (1 − SensCT ) · SensDD · p D ( ) Pr(D + |CT − &DD+) = (1 − SensCT ) · SensDD · p D + SpecCT · 1 − SpecDD · (1 − p D) =

(1 − 0.93) · 0.99 · 0.45 = 0.0984 ≈ 9.84% (1 − 0.93) · 0.99 · 0.45 + 0.98 · (1 − 0.47) · (1 − 0.45) (1 − SensCT ) · (1 − SensDD ) · p D Pr(D + |CT − &DD−) = (1 − SensCT ) · (1 − SensDD ) · p D + SpecCT · SpecDD · (1 − p D) =

Appendix

125

Fig. A.7 A graphical representation of the FFT model for the management of pulmonary embolism. The computerized pulmonary angiography (CTPA or CT) test is administered, and if negative, the D-Dimer (DD) test is administered. If either of the tests results positive, the FFT recommends treating the patient, if the D-Dimer test is negative, we withdraw treatment

=

(1 − 0.93) · (1 − 0.99) · 0.45 = 0.0012 ≈ 0.12% (1 − 0.93) · (1 − 0.99) · 0.45 + 0.98 · 0.47 · (1 − 0.45)

The accuracy of entire model (decision strategy) is usually defined as ACC =

#correct classifications #total classifications .

So, in general, if the FFT (or FFTT) has n independent cues: cue1 , cue2 , . . . , cuen then the expected accuracy can be defined as ACC =

∑ exit=Rx

=





Pr(exit = Rx ∩ D+) +

Pr(exit = N oRx ∩ D−)

exit=N oRx

Pr(exit = Rx|D+) · Pr(D+)

exit=Rx

+



Pr(exit = N oRx|D−) · Pr(D−)

exit=N oRx

In our specific example, the FFT has three exits. Two are recommending treatment (C T + and C T − &D D+), and one is negative (C T − &D D−). Therefore, in this case, for FFT, we can calculate the accuracy as ACCFFT = Pr(CT + |D+) · p D + Pr(CT − &DD + |D+) · p D + Pr (CT − &DD − |D−) · (1 − p D) = SensCT · p D + (1 − SensCT ) · SensDD · p D + SpecCT · SpecDD · (1 − p D) = 0.93 · 0.45 + (1 − 0.93) · 0.99 · 0.45 + 0.98 · 0.47 · (1 − 0.45) = 0.7030 = 70.30%

126

Appendix

As explained in Chap. 5, FFT provides information of the accuracy of classification (e.g., diagnosis of PE), but it does not take the consequences of our (treatment) actions into consideration. FFTT, however, allows taking into account effects of treatment. The difference between FFT and FFTT is that FFT assigns the treatment simply based on whether the test was positive or negative, while FFTT re-evaluates the probability of disease at each exit and compares it with the action threshold. The action threshold, as discussed throughout the book, depends on the net benefits and harms of the treatment. In the case of PE, we are mostly interested in the effects of anticoagulant (AC) treatment on mortality. Based on our best assessment of evidence, we determined that net benefits (B) and net harms (H) of AC in the treatment of PE are equal to: B = M − M · (1 − RRR) − Hrx = 0.05 − 0.05 · (1 − 0.61) − 0.0031 = 0.0274 and H = 0.0031, giving us a B/H 1 = 10.16%. In ratio = 8.84, which converts into action threshold (AT ) = 1+8.84 other words, if the probability of PE is larger than 10.16%, we should administer the treatment, otherwise we should withdraw treatment. Note that while at the exits C T + and C T − &D D− the two procedures agree: administer treatment in the former and withdraw treatment in the latter case, when we consider C T − &D D+ exit, we have conflicting recommendations based on FFT (classification accuracy) versus FFTT (based on considerations of the benefit and harms of AC). Since the D-Dimer test is positive after the negative CT, according to the FFT diagram in Fig. A.7, we should administer the treatment. However, for this exit, the posterior probability of PE is 9.84%, which is less than the action threshold of 10.16%. Therefore, according to the threshold model we should withdraw treatment. So, in this case, for FFTT, we can calculate the treatment-decision accuracy as ACCFFTT = Pr(CT + | D+) · p D + Pr (CT − & DD + |D−) · (1 − p D) + Pr (CT − & DD − |D−) · (1 − p D) ( ) = SensCT · p D + SpecCT · 1 − SpecDD · (1 − p D) + SpecCT · SpecDD · (1 − p D) = 0.93 · 0.45 + 0.98 · (1 − 0.47) · (1 − 0.45) + 0.98 · 0.47 · (1 − 0.45) = 0.9575 = 95.75%

A.4

Complex Versus Simple Models

As discussed in Chaps. 8 and 9, the decisions we presented in this book using simple models could also be presented by means of complex models. These models can employ various techniques, including standard regression statistical techniques, Markov modeling, Monte Carlo, and microsimulation modeling, and most recently, machine learning/artificial intelligence (AI) modeling. However, as we explained in Chaps. 8 and 9, decision-making based on these complex models is not necessarily superior to decision-making using simple models described in this book.

Appendix

127

Fig. A.8 Simple (on the left) and complex decision models (on the right). U1 to U4 refers to utilities (health outcomes) associated with each decision; p- denotes the probability of disease or outcome (p1 to p3 ). Rx-treatment; NoRx-no treatment. The inset in the upper right corner illustrates how complex models differ from simple models by expanding utilities and introducing new probabilities describing clinical events of interest. For illustration purposes, we only show the expansion of U1, but any outcome or probability can be expanded/modified as needed. Importantly, the calculation of the events depicted in the inset can be conducted separately from the analysis in the main tree. As a result, the complex models can be reduced to simple decision models (in this case, U1 based on the derivation of expected utility in the Box on the right can be used to replace U1 in simple tree on the left)

In general, the main difference between the complex and simple decision models is that the former usually models events of interest far in the future, typically based on assumptions not supported by reliable empirical findings and supporting evidence. This is done by expanding utilities and/or probabilities of interest. Figure A.8 shows a conceptual illustration of how this is done. Utility denoted as U1 can be further expanded by n additional clinical events (see Box in Fig. A.8). Perhaps, for example, the outcome of treatment captured as U1 was associated with long-term complications (at the probability of p1 ) not initially recorded in utility U1 in the simple model. Some of these patients may end up hospitalized (at the probability of p2 ), of whom the proportion of the patient may end up in the intensive care unit (at the probability of p3 ), etc. These events, shown in the inset in Fig. A.8, can be separately modeled using various techniques. Almost universally modeling is based on the expected utility (EU) calculus (see Chaps. 1 and 8). That is, we can simply calculate the EU of the events displayed in the Box and use it as U1. Thus, by separately aggregating events in the Box, we can reduce the complex models into simple models (see Chaps. 8 and 9 for a further explanation of why simple models may be more accurate decision-making tools than complex decision models).

Glossary

Name (and any alternative name), Equation Abbreviation, Symbol, or (Equation) Name, Definition, and Equation. Name (alternative names)

Common abbreviation, symbol, or letter

Definition

Equation and/or examples

Absolute harm difference

AHD

An expression of treatment effect between two groups where absolute harm (e.g., adverse events) is assessed between experimental and control groups

AHD = Harme − Harmc e.g., harm of treatment between experimental and control groups (Harme and Harmc , respectively)

Absolute risk difference

ARD

A common expression of treatment effect between two groups where absolute risk is assessed between experimental and control groups

ARD = Riskc − Riske e.g., risk of a disease between control and experimental groups (Riskc and Riske , respectively)

Bias



A systematic distorted or erroneous conclusion as a result of some aspect of the study design, data composition, data analysis, etc.



(continued)

© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 B. Djulbegovic and I. Hozo, Threshold Decision-making in Clinical Medicine, Cancer Treatment and Research 189, https://doi.org/10.1007/978-3-031-37993-2

129

130

Glossary

(continued) Name (alternative names)

Common abbreviation, symbol, or letter

Definition

Case–control study



An observational study – that compares people with a specific disease or outcome of interest (cases) to people from the same population without that disease or outcome (controls), and which seeks to find associations between the outcome and prior exposure to particular risk factors. This design is particularly useful where the outcome is rare and past exposure can be reliably measured. Case–control studies are usually retrospective, but not always.*

Case series



A study reporting observations on a series of individuals, usually all receiving the same intervention, with no control group*



Refers to an outcome, or probability of event of interest

For example, diagnosis, risk of developing breast cancer recurrence, or death

A study design where participants (cohorts) are sampled on the basis of exposure. At baseline, all exposed or unexposed persons or both may be included. Exposure can be a risk factor, disease, or treatment. If a comparison group is included, relative risk can also be calculated. Design can be prospective or retrospective



Clinical event

Cohort study



Equation and/or examples

(continued)

Glossary

131

(continued) Name (alternative names)

Common abbreviation, symbol, or letter

Definition

Equation and/or examples

Clinical practice guidelines

CPG

National Academy of Medicine defines Clinical practice guidelines as “statements that include recommendations intended to aid decision-making and optimize patient care that is informed by a systematic review of evidence and an assessment of the benefits and harms of alternative care options”



Confounding factor



An indirect association that affects the disease or outcome of interest through some other factor. Many experimental designs aim to reduce the impact of potentially confounding factors as they can obfuscate experimental results

e.g., in some epidemiological studies demographic factors (such as socioeconomic status or country of birth) may indirectly impact the outcome of interest

Cross-level bias



A bias that introduces unwanted variation at the population level that differs from those found at the individual level

Decision theory (expected utility)



A theory that states that the optimal choice rests with the selection of the strategy associated with the optimal (greatest or smallest) expected value, calculated by averaging values across all possible outcomes, weighted by the corresponding probabilities

Effect (of treatment)



Generally understood as the magnitude of treatment’s effect on clinical outcomes (continued)

132

Glossary

(continued) Name (alternative names)

Common abbreviation, symbol, or letter

Definition

Equation and/or examples

Error (variation)



The random or systematic (i.e., non-random) difference that measurements may have from a true outcome of interest. There exist many forms of measurement error often reported, including, but not limited to: 95% confidence interval (95% CI), standard error of the mean (sem), standard deviation (SD), and range. Note that range is often considered an unreliable form of error for use in meta-analyses



Evidence-based medicine

EBM

A movement started in the early 1990s to evaluate and in turn, acquire a better empirical basis for the practice of medicine by focusing on critical appraisal, development of systematic reviews, and clinical practice guidelines



Fast frugal tree

FFT

Fast-and-frugal trees are decision trees that allow decision-makers to reach decisions quickly and accurately based on limited information

Grading of recommendations, assessments, development, and evaluations

GRADE

A system used to evaluate the quality of evidence and strength of recommendations used in systematic reviews and for guideline development



Hazard ratio

HR

A relative measure of the effect produced by a time-to-event (survival) analysis representing the increased risk with which one group is likely to experience the outcome of interest versus control

e.g., if the hazard ratio for death for treatment is 0.5, then treated patients are likely to die at half the rate of untreated patients Note that RR does not consider the timing of the event but only whether the events occurred or not. In contrast, HR considers both the total number of events and the duration/timing of each event. Because of similarities in interpretation between HR and RR, the proportional reduction (1- HR) also has a similar meaning to RRR (see below)

(continued)

Glossary

133

(continued) Name (alternative names)

Common abbreviation, symbol, or letter

Definition

Equation and/or examples

Kaplan–Meier curve



A graphical representation of survival by plotting the proportion of the patients who remain without event (“survived”) on the y-axis at different time points throughout the study (time plotted on the x-axis)



Lead-time bias



Any error or variation associated with screened and unscreened individuals where some screened individuals appear to be doing better partly (or wholly) due to a disease or outcome of interest that was detected as part of the screening process (i.e., given the extra lead time as compared to unscreened individuals)



Measurement bias (Information bias)



Any error or variation associated with measurements of disease or an outcome of interest



Meta-analysis



The statistical synthesis of data from separate but similar (i.e., comparable) studies to generate a quantitative summary of the pooled results



Net benefit or net harm



Takes into account the difference between raw benefits and harms depending on the metrics used. Usually defined using utilities or disutilities using metrics such as morbidity, mortality, survival, life expectancy, quality-adjusted life expectancy, cost

Number needed to treat to avoid one adverse outcome

NNT

The number of patients NNT = who need to be treated to prevent one patient from experiencing a bad outcome or attaining one good outcome

1 ARD

(continued)

134

Glossary

(continued) Name (alternative names)

Common abbreviation, symbol, or letter

Definition

Number needed to treat to harm one individual

NNH

The number of patients NNH = who must be treated for one patient to experience a harmful event

Odd’s ratio

OR

p-value

p

A statistical measure – used to determine the strength of evidence against a null hypothesis. Formally defined as the probability of obtaining test results at least as extreme as the ones observed, assuming the null hypothesis is accurate. Historically, the threshold (called α) for determining whether to reject the null hypothesis is 0.05. In some cases, it may be appropriate to use a more conservative threshold (i.e., 0.01 or even lower). This is a predetermined, false-positive error that a researcher accepts without rejecting the null hypothesis

Power

β

The probability that any given experiment will produce a detectable result given the number of individuals sampled within a population. A power analysis is often used to minimize sampling error to an acceptable level without oversampling. 1−β = false negative error, often tolerated at 20% level

Equation and/or examples

1 AHD

( ) The odds of the event OR = (exposed with outcomes) ∗ noexposure and without outcomes of interest in a /(noexposure with outcomes ∗ exposed without outcomes) population exposed to Odds = risk/(1−risk) a potential risk factor are divided by the odds in a control or reference group. Much like Hazard ratios, this is a relative measure of the effect of exposure

(continued)

Glossary

135

(continued) Name (alternative names)

Common abbreviation, symbol, or letter

Definition

Equation and/or examples

Publication bias



A bias where only the most promising results are published, while negative or unpromising results are often unpublished. Publication bias is often included in meta-analyses and sometimes graphically represented in many ways, including, but not limited to, funnel plots. Publication bias can occur at the level of outcomes or the entire studies.



Randomized controlled trials

RCT

Refers to an experimental design where two or more interventions are compared by being randomly allocated to participants. In most trials, one intervention is assigned to each individual but sometimes the assignment targets groups of individuals (e.g., in a household) or interventions are assigned within individuals (for example, in different orders or to different parts of the body) (cluster RCT design). A RCT study often yields high-quality evidence and are considered less biased as compared to cohort, case–control, or case series studies. This is because well-done randomization controls assure that comparison is preserved across all factors that may affect the treatment results except for an intervention of interest



(continued)

136

Glossary

(continued) Name (alternative names)

Common abbreviation, symbol, or letter

Definition

Equation and/or examples

Relative risk reduction (efficacy)

RRR (or E)

Proportion or percentage of reduction in the risk or outcome when one treatment is compared to other [as when relative risks in event rates in one group (e.g. Risk1 ) are compared to another (e.g., Risk2 )] (not to be confused with treatment efficacy related to the study design-see below)

RRR(or E) = 1 −

Recall bias



A bias introducing unwanted variation into a study where one group in a study tends to recall an event or outcome of interest differently than another group in a study



Relative value

RV

Patient relative preferences toward avoiding treatment harms versus avoiding burden of disease

RV =

Risk

R

The proportion of individuals with disease or an outcome of interest within a group. Often referred as a frequentist measure of the probability (risk)

of interest R = n outcome n total e.g., the number of patients with a measured outcome of interest given the total number of patients Note that Risk (probability) = Odds/(1 + Odds)

Selection bias



Bias due to systematic – and non-random selection of individuals from a population

Sensitivity

S

The probability of having a positive test result when the disease is present

S=

Specificity

Sp

The probability of having a negative test result when the disease is absent

Sp =

Survival analysis



An analysis that determines the number of individuals still alive with a disease or after exposed to either a potential risk factor, treatment



Risk2 Risk1

(1−value of adverse effect) (1−value of disease event)

true positive (true positive+false negative)

true negative (true negative+false positive)

(continued)

Glossary

137

(continued) Name (alternative names)

Common abbreviation, symbol, or letter

Definition

Equation and/or examples

Systematic review



The application of strategies that limit bias in the assembly, critical appraisal, and synthesis of all relevant studies on a specific topic. Meta-analysis might be (not always) a part of this process



Threshold probability

pt

The probability of a clinical event (e.g., disease) at which net benefits and net harms of treatment are equivalent (see various definitions in the Appendix and throughout the book)

pt =

Treatment Effect (Effect Size)

TE (ES)

A generic term for the relative estimate of the effect of treatment generated in a given study. Sometimes the term is used to refer to the standardized mean difference (effect size) Cohen (Cohen J. Statistical Power Analysis for the Behavioral Sciences. 2nd ed; 1988) interpreted an effect size or standardized mean difference where (approximately): • 0.2 is considered a small effect • 0.5 is considered a moderate effect • 0.8 or higher is considered a large effect.* Common estimation methods are standardized mean effect size (sometimes called Cohen’s d or Hedge’s g) In clinical medicine, effect measures often refer to ratio measures (e.g., risk ratio, odds ratio, hazard ratios) or difference measures (e.g., mean difference, risk difference)

Net Harm (Net Benefit+Net Harm)

−xt Cohen ' sd = xScwithin where xc and xt are mean values for the outcome of interest (e.g., infection load or disease severity) for the control and treatment groups, respectively. Swithin is the within-groups standard deviation pooled across all groups H edge' sg = J · d where d is Cohen’s d and J is the following correction factor:

J =1−

3 4d f −1

where d f are the degrees of freedom

(continued)

138

Glossary

(continued) Name (alternative names)

Definition

Equation and/or examples

Treatment efficacy

Refers to treatment effects under ideal circumstances such as controlled randomized experiments

Can intervention work in the ideal study setting?

Treatment effectiveness

Refers to treatment effects in non-experimental conditions, often as “real-world data”, or “real-world evidence” (typically generated in observational studies)

Does intervention work in the real-world settings? Can it be generalized to (non-trial) patients and applied to individual patients

Treatment efficiency

Typically refer to cost-effectiveness studies

Is treatment worth paying for?

The value associated with each clinical outcome expressed in different units such as length of life, morbidity or mortality rates, absence of pain, cost, or the strength of individual or societal preference for an outcome



Utilities

Common abbreviation, symbol, or letter



* https://training.cochrane.org/handbook/current/chapter-06

(see also Chap. 1) Definitions (source GRADE handbook, and Cochrane Handbook for Systematic Reviews of Interventions)