Data-Guided Healthcare Decision Making 100921201X, 9781009212014

How does data evidence matter in decision-making in healthcare? How do you implement and maintain cost effective healthc

263 36 18MB

English Pages 528 [529] Year 2023

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Data-Guided Healthcare Decision Making
 100921201X, 9781009212014

Citation preview

Data-Guided Healthcare Decision-Making

Published online by Cambridge University Press

Published online by Cambridge University Press

Data-Guided Healthcare Decision-Making Ramalingam Shanmugam Texas State University

Published online by Cambridge University Press

University Printing House, Cambridge CB2 8BS, United Kingdom One Liberty Plaza, 20th Floor, New York, NY 10006, USA 477 Williamstown Road, Port Melbourne, VIC 3207, Australia 314–321, 3rd Floor, Plot 3, Splendor Forum, Jasola District Centre, New Delhi – 110025, India 103 Penang Road, #05–06/07, Visioncrest Commercial, Singapore 238467 Cambridge University Press is part of the University of Cambridge. It furthers the University’s mission by disseminating knowledge in the pursuit of education, learning, and research at the highest international levels of excellence. www.cambridge.org Information on this title: www.cambridge.org/9781009212014 DOI: 10.1017/9781009212021 © Cambridge University Press 2023 This publication is in copyright. Subject to statutory exception and to the provisions of relevant collective licensing agreements, no reproduction of any part may take place without the written permission of Cambridge University Press. First published 2023 Printed in the United Kingdom by TJ Books Limited, Padstow Cornwall A catalogue record for this publication is available from the British Library of Congress Cataloging-in-Publication Data Names: Shanmugam, Ramalingam, author. Title: Data-guided healthcare decision making / Ramalingam Shanmugam. Description: Cambridge, United Kingdom ; New York, NY : Cambridge University Press, 2022. | Includes bibliographical references and index. Identifiers: LCCN 2022024953 | ISBN 9781009212014 (hardback) | ISBN 9781009212021 (ebook) Subjects: MESH: Health Services Administration – statistics & numerical data | Decision Support Techniques | Probability | Models, Statistical | Mathematical Computing | BISAC: MEDICAL / Epidemiology Classification: LCC RA409 | NLM W 26.55.D2 | DDC 362.102/1–dc23/eng/ 20220727 LC record available at https://lccn.loc.gov/2022024953 ISBN 978-1-009-21201-4 Hardback Cambridge University Press has no responsibility for the persistence or accuracy of URLs for external or third-party internet websites referred to in this publication and does not guarantee that any content on such websites is, or will remain, accurate or appropriate. .................................................................................................................... Every effort has been made in preparing this book to provide accurate and up-to-date information that is in accord with accepted standards and practice at the time of publication. Although case histories are drawn from actual cases, every effort has been made to disguise the identities of the individuals involved. Nevertheless, the authors, editors, and publishers can make no warranties that the information contained herein is totally free from error, not least because clinical standards are constantly changing through research and regulation. The authors, editors, and publishers therefore disclaim all liability for direct or consequential damages resulting from the use of material contained in this book. Readers are strongly advised to pay careful attention to information provided by the manufacturer of any drugs or equipment that they plan to use.

Published online by Cambridge University Press

This book is dedicated to my wife, Malarvizhi, and my son, Kathir; to my daughter, Vithya, my son-in-law, Ajay, and their son, Naveen; to my brother, Anna; and to my teacher Dr. Jagbir Singh, who all have been constantly encouraging me to write this textbook.

Published online by Cambridge University Press

Published online by Cambridge University Press

Contents List of Figures viii List of Tables xii About the Author xviii Preface xix Glossary xxi List of Notations and Symbols Prologue xxxviii Introduction xxxix

xxxvi

1 Why and How Healthcare Decisions Are Made 1 2 Are Data-Guided Healthcare Decisions Superior? 9 3 Software: Excel, Microsoft Mathematics, and JASP 55 4 How to Collect Authentic Data 91 5 Uncertainties and Their Impact on Healthcare Decisions 107 6 Why Models Are Important in Healthcare 156 7 How Healthcare Decision Trees Emerge and Function 188 8 How Are Group Decisions Practiced in Healthcare? 199 9 Tracing and Remedying Root Causes of Adversities 212

10 Healthcare Decision-Making for Cost-Effectiveness 244 11 Risk Analysis in Healthcare Decision-Making 286 12 Evaluation of Healthcare Programs 312 13 Six Sigma and Lean Management in Healthcare Sectors 349 14 Forecasting in Healthcare Sectors 426 Epilogue 459

Appendix:Statistical Tables Index 479

461

vii Published online by Cambridge University Press

Figures 2.1 Locations where the Zika virus occurred 20 2.2 Path diagram among the variables in Zika virus data 21 2.3 Stagnating patient activities in the hospital 23 2.4 Time proximities among the activities 24 3.1 Opening page of Excel 56 3.2 The options in the Home menu 57 3.3 The structure and menus in Excel 58 3.4 The options in the Insert menu 59 3.5 The options in the Data menu 59 3.6 The data entry view and menus in JASP 60 3.7 Opening up Excel data in .CSV format for JASP 60 3.8 The view of data in JASP 61 3.9 Entering and identifying the data variable type in Excel (.CSV file) 61 3.10 Exercising descriptive in JASP 62 3.11 Exercising descriptive in JASP 62 3.12 Exercising descriptive in JASP 63 3.13 Exercising correlation in JASP 63 3.14 Exercising correlation in JASP 64 3.15 Exercising principal component analysis in JASP 64 3.16 Exercising principal component analysis in JASP 65 3.17 Distribution to check the model for the data in JASP 65 3.18 Checking data distribution in JASP 66 3.19 Checking data distribution in JASP 66 3.20 Other programs in JASP 67 3.21 The classical versus the Bayesian option in a t-test in JASP 68 3.22 The classical versus the Bayesian option in ANOVA in JASP 68 3.23 The classical versus the Bayesian option in mixed models in JASP 69 3.24 The classical versus the Bayesian option in regression in JASP 70 3.25 The classical versus the Bayesian option in frequencies in JASP 71 3.26 The classical versus the Bayesian option in meta-analysis in JASP 72 3.27 The classical versus the Bayesian option in reliability in JASP 73 3.28 The classical versus the Bayesian option in structural equation modeling (SEM) in JASP 74 3.29 Summary of t-tests, regression, and frequencies in JASP 75

viii Published online by Cambridge University Press

3.30 The flexplot, linear, mixed, and generalized linear modeling options in JASP 76 3.31 The classical versus the Bayesian option in binomial data in JASP 76 3.32 The opening view of Math Solver 77 3.33 Entering data or specifying the analytic function for Math Solver 78 3.34 Specifying matrix data for Math Solver 79 3.35 The calculating view of Math Solver 79 3.36 When A and B are not independent: their visual dependency using MM 4.0 80 3.37 When A and B are independent: a visual of their probabilities using MM 4.0 80 5.1 (a) Complementary. (b) Disjoint outcomes. (c) Overlap outcomes. 110 5.2 (a) Evidence in favor of the outcome. (b) Evidence against the outcome. 114 5.3 Multiple probabilities 118 5.4 Likelihood ratios 119 5.5 Harms versus benefits 119 5.6 Receiver operating characteristic curve 120 5.7 Lab diagnostic times (in minutes) for patients 120 5.8 Quantified rating by two experts 122 6.1 Relation between the mean and the variance 163 6.2 Heterogeneity versus average incubation time 164 ^ ¼ 0:252 6.3 Comparison of survival functions with ϕ and ϕ ¼ 0 165 6.4 How patients compare in the time they spend on activities in the hospital 166 6.5 Comparison of number of ambulance services per day 166 6.6 Comparison of time patients spend in the care unit 168 6.7 Path diagram of the activities (in minutes) in the patient healthcare unit 168 6.8 Developed countries’ GDP spent on healthcare, 2000 169 6.9 Comparison of countries’ GDP spent on healthcare 170 6.10 How a few countries spend their GDP on healthcare 170 6.11 Number of words remembered by Alzheimer’s patients in a control group 171 6.12 Number of words remembered by Alzheimer’s patients in a treatment group 172 6.13 Comparison among ages Y0, Y1, Y2, Y3, and Y4 in a control group 173

List of Figures

6.14 Path diagram of the variables of age Y0, Y1, Y2, Y3, and Y4 in a control group 174 6.15 Comparison of the variables of age Y0, Y1, Y2, Y3, and Y4 in a treatment group 176 6.16 Path diagram of the variables of age Y0, Y1, Y2, Y3, and Y4 in a treatment group 177 6.17 The number of users of drug types 177 6.18 Comparison of waste proposals generated, shipped, and received in US states 178 6.19 Correlations among the types of waste proposals 179 6.20 Path among the types of waste proposals 179 6.21 Expenses ($) for usual and severe patients in six hospitals 180 6.22 Length of stay and expenses ($) for inpatients versus outpatients 180 6.23 Path diagram for the length of stay and expenses ($) for inpatients versus outpatients 181 6.24 Veney, p. 439 181 6.25 Veney, p. 439, path diagram 183 6.26 Hospital site infection to nurses from patients 183 6.27 Residual plot of exposed nurses 184 6.28 Probability mass plot of exposed nurses 185 6.29 Why Poisson is a better model for X = # exposed nurses 185 7.1 Decision tree with respect to offering services to regular and Medicaid patients 191 7.2 Decision tree to rent a building, to buy land and construct a facility, or to sell the land for profit 193 7.3 Comparison of minutes spent on activities by patients in a hospital 194 7.4 Correlation among the time spent on activities at a hospital. Pearson’s r heat map 195 7.5 Factor loadings for the time spent on activities by patients in a hospital 195 7.6 Path diagram for the time spent on activities by patients in a hospital 196 9.1 Serial structure of the root causes 215 9.2 Divergent structure of the root causes 216 9.3 Convergent structure of the root causes 216 9.4 Data defects in healthcare 221 9.5 Path diagram among the variables Y1, Y2, Y3, Y4, Y5, Y6, Y7, and Y8 222 9.6 The number of complaints about the nurses in a hospital 223 9.7 Path diagram for the number of complaints about the nurses in a hospital 224 9.8 QMHM 225 9.9 The number of times patients fell in three units 226 9.10 Correlations among adversities in a hospital 227

9.11 Path diagram for correlations among adversities in a hospital 227 10.1 The outcomes of a cost-effectiveness analysis (Cookson et al., 2021) 248 10.2 Decision tree for selecting a vendor 253 10.3 Path diagram among the following variables: demand per week, unit cost (in $), carrying rates (in %), and order cost (in $) of items in a hospital 254 10.4 Comparison of implantation annual demand, cost, carrying, and order cost 256 10.5 Path diagram of implantation annual demand, cost, carrying, and order cost 256 10.6 The correlation plot (sometimes called correlation heat map) of the variables. 256 10.7 Number of dissatisfied inpatients at discharge and at three-month follow-up 257 10.8 Dissatisfied inpatients 257 10.9 Comparison of the percent of insurers in each category in US states 259 10.10 Path diagram of the percent of insurers in each category in US states 260 11.1 Comparison of the terrorism trend in the United States 292 11.2 Strengths, weaknesses, opportunities, and threats (SWOT) analysis 292 11.3 The similarity score and the probability of cesarean section in San Antonio, Texas 295 11.4 Box plot to compare the risk of getting AIDS through various activities 296 11.5 The risk of getting AIDS through various activities 297 11.6 Path diagram of the risk of getting AIDS through various activities 298 11.7 Risk comparison for a positive result in a COVID19 test 299 11.8 Percent receiving a positive test result who contract COVID-19 299 12.1 Schematic interpretations of the process of program evaluation 317 12.2 Healthcare performance across hospitals 319 12.3 Supply versus days across hospitals 319 12.4 Healthcare professionals 319 12.5 Proximity among healthcare professionals 320 12.6 Path diagrams among hospitals 320 12.7 Performance of doctors and nurses versus outpatients and inpatients in 14 hospitals 322 12.8 Pearson’s correlation of the performance of doctors and nurses versus outpatients and inpatients in 14 hospitals 322 12.9 Path diagram to indicate interrelation of the performance of doctors and nurses versus outpatients and inpatients in 14 hospitals 322

ix Published online by Cambridge University Press

List of Figures

12.10 Comparisons of state/local, nonprofit, and forprofit hospitals in an emergency, 2012 324 12.11 Maps to compare the numbers going to state/local, nonprofit, and for-profit hospitals in an emergency, 2012 324 12.12 Path diagram of the variables in healthcare 327 12.13 Confirmed COVID-19 cases in 177 countries 327 12.14 COVID-19 deaths in 177 countries 328 12.15 Mortality rate due to COVID-19 328 12.16 Principal components to group confirmed cases, deaths, mortality rates, and longitude and latitude of the COVID-19 pandemic around the world 329 12.17 Vaccination for adults, elders (65+): share of adults aged 65+ who have received a pneumonia vaccine 330 12.18 Influenza versus pneumonia deaths in the United States 330 12.19 Principal components loadings: path diagram of influenza versus pneumonia deaths in the United States 331 13.1 Integrated lean, green, and Six Sigma strategies: a systematic literature review and directions for future research 351 13.2 Control charts and brainstorming versus Kanban visuals 351 13.3 Tele nephrology dashboard 352 13.4 Suppliers, inputs, process, outputs, and customers (SIPOC) analysis 352 13.5 Ishikawa diagram 353 13.6 Six Sigma method to reduce medication errors 356 13.7 Six Sigma method in pharmacy operations 357 13.8 Six Sigma method in a Thai hospital 357 13.9 Six Sigma method in a Thai hospital 358 13.10 Genesis of lean Six Sigma 359 13.11 Lean Six Sigma in a glucose experiment 359 13.12 Surge planning of the COVID-19 pandemic 360 13.13 Organizational resilience during the COVID-19 pandemic 361 13.14 Two pharmacological therapies according to Six Sigma methods 362 13.15 Two pharmacological therapies according to Six Sigma methods 362 13.16 Use of lean Six Sigma in knee replacement surgery 363 13.17 A systematic review of lean Six Sigma in healthcare 363 13.18 A systematic review of Six Sigma methods in healthcare 363 13.19 Six Sigma methods in healthcare 364 13.20 Six Sigma methods in healthcare 365

x Published online by Cambridge University Press

13.21 Six Sigma methods in healthcare: a systematic review 366 13.22 Control chart applications in healthcare 366 13.23 Control chart applications in healthcare 367 13.24 Control chart applications in healthcare 367 13.25 Control chart builder 368 13.26 Control charts versus surveillance of epidemics 368 13.27 Optimum quality costs 369 13.28 Cyclic relations 370 13.29 Property of the Gaussian (normal) frequency pattern 370 13.30 Sketch of interrelations 371 13.31 Hierarchy of training 372 13.32 Lean Six Sigma methods interrelations 372 13.33 The cyclic operational structure 373 13.34 Normal approximation and the binomial distribution histogram 373 13.35 Histogram from binomial data 374 13.36 Violin plot of the binomial data 374 13.37 Quantile-quantile plot of the binomial data 374 13.38 Poisson quantile-quantile plot 374 13.39 Poisson probability pattern of the data 375 13.40 Cumulative Poisson probability pattern of the data 376 13.41 Poisson survival function 376 13.42 The geometric distribution 376 13.43 COVID-19 cases examined until r = 3 hospitalized cases in a city 377 13.44 Quantile-quantile plot of negative binomial distribution 378 13.45 Survival function of the inverse binomial (r = 3; p = 0.728) 378 13.46 Survival function of the hypergeometric function 379 13.47 Wait times to consult a pediatrician 379 13.48 Survival curve depicting the chance of waiting a specified number of minutes 380 13.49 Wait times to consult a pediatrician 380 13.50 Survival curve depicting the chance of waiting a specified number of minutes 381 13.51 The chi-squared density changes its shape in accordance with its degrees of freedom 381 13.52 Wait times to consult a pediatrician 382 13.53 Survival curve depicting the chance of waiting a specified number of minutes 382 13.54 Wait times to consult a pediatrician 383 13.55 Survival curve depicting the chance of waiting a specified number of minutes 383 13.56 Normal wait time for a pediatrician 384 13.57 Survival curve depicting the chance of waiting a specified number of minutes 384

List of Figures

13.58 Control chart based on average COVID-19 cases in the United States 386 13.59 Control chart based on the standard deviation for average COVID-19 cases in the United States 387 13.60 Control charts for the proportion of COVID-19 cases in the United States 388 13.61 Probability-probability plot for the normal wait times of morning patients 389 13.62 Probability-probability plot for the normal wait times of evening patients 389 13.63 Proximity among the attributes (with 69% of the variations) 389 13.64 Comparison of the wait times to be seen by physician/surgeon 390 13.65 Average chart for patient 1 390 13.66 Average chart for patient 2 391 13.67 Standard deviation chart for patient 1 391 13.68 Standard deviation chart for patient 2 392 13.69 Range chart for patient 1 392 13.70 Range chart for patient 2 392 13.71 Box plot for wait times in morning versus evening 393 13.72 Box plot for wait times in clinics 394 14.1 COVID-19 cases and deaths reported in India, March 1–31, 2020 428 14.2 Comparative causes of deaths in the United States, 2020 437

14.3 Original, moving averages versus trend of COVID19 deaths 437 14.4 Exponential smoothing with 0.6 437 14.5 How does differencing create more stationarity? 438 14.6 COVID-19 deaths, 2020 439 14.7 Autocorrelations of COVID-19 deaths at different lags 439 14.8 Partial autocorrelations of COVID-19 deaths, 2020 440 14.9 Forecast of COVID-19 deaths 441 14.10 Dots are original series, the solid line is point forecast, and the blue segment is interval forecast. 441 14.11 One-step-ahead forecast error 441 14.12 When were COVID-19 deaths excessive? 441 14.13 Forecast error 443 14.14 Pneumonia deaths, 2020 443 14.15 Autocorrelations of pneumonia deaths at different lags 444 14.16 Partial autocorrelations of pneumonia deaths at different lags, 2020 444 14.17 When were pneumonia deaths excessive? 445 14.18 Influenza deaths, 2020 445 14.19 Autocorrelations of influenza deaths at different lags 446 14.20 Partial correlations of influenza deaths at different lags 446 14.21 Forecast of influenza deaths 447 14.22 When were influenza deaths excessive? 447

xi Published online by Cambridge University Press

Tables 2.1 Simulated data on the percent favoring the options and their costs 11 2.2 The country, its longitude, latitude, land area (1,000 square miles), percent water area, population, and number of Zika cases 19 2.3 Ambiguity in the Zika virus data 20 2.4 Correlation among the variables 21 2.5 Fit of models in Zika virus data 21 2.6 The number of minutes patients spend in the hospital 22 2.7 Pearson’s correlation heat map of activity times 23 2.8 Exponential distribution fit to the time patients spend in a hospital 25 2.9 Hospital beds, ICU beds, community health centers (CHCs), and service sites, 2018 26 2.10 Number of physicians, nurses, and stand-by medical professionals per 10,000 in US states, 2020 27 2.11 Percent of health insurance coverage in US states, 2018 29 2.12 Amount spent on health insurance in US states, 2020 30 2.13 Number of deaths due to influenza and pneumonia, the vaccination rates among adults and among seniors (65+), and the share who received vaccines in US states, 2018 31 2.14 Concentration index, number of staffed beds, bed utility rate, complication rate, number of physicians, days with cash, debt-to-equity ratio, and asset turnover rate in US states, 2018 33 2.15 Number of confirmed COVID-19 deaths, mortality rate, latitude, and longitude 34 2.16 Per capita healthcare expenditure, external healthcare expenditure, physicians, nurses, surgical specialists per 1,000 in selected countries, 2016 39 2.17 Deaths per 100,000 from lung cancer in selected countries, 2000–2005 42 2.18 Number of earthquakes, epidemics, floods, storms, and transport accidents, 2018 44 2.19 Ultraviolet of type A and B per square meter (m2), ozone level, and men and women’s mortality rate per 10,000 in selected countries, 2007 50 3.1 Excel/Word shortcut keys and effects 81 3.2 Selected Excel commands 81 3.3 Deaths from lung cancer per 100,000 inhabitants 82 3.4 Number of earthquakes, epidemics, floods, storms, and transport accidents, 2000–2010 84

xii Published online by Cambridge University Press

3.5 Latitude, ultraviolet A and B, ozone level, mortality rate 89 4.1 Sample size depends on confidence level, power, contiguity, and heterogeneity. 99 4.2 Biological data on age, waist, pulse, systolic, diastolic, cholesterol, and body mass index 101 4.3 Seizure data of placebo and treatment groups 102 4.4 List of web pages for data 103 5.1 Data on age ðxÞ, blood pressure ðyÞ, and body mass index ðzÞ. 111 5.2 Intersection of test results and state of the participants 113 5.3 (a) Intersection of test results and state of the participants. (b) Closeness between diagnostic test results and reality of being with or without disease. 113 5.4 Smoking versus malignant lung cancer 116 5.5 Intersection of chain smoking and getting malignant lung cancer according to participants 118 5.6 Lab diagnostic time (in minutes) for patients 122 5.7 Lab diagnostic times (in minutes) for patients 123 5.8 Smoking versus malignant lung cancer 123 5.9 Hospital data, 2020 125 5.10 Cesarean births versus Medicaid data 125 5.11 Breast cancer incidence versus screening data 125 5.12 Number of discovered (X) and number of malignant cells in a group of n = 100 patients 126 5.13 Recovery of SARS cases in 33 countries 126 5.14 Number of siblings (X) versus number of children with childhood cancer (Y) in n = 24 families 127 5.15 Total number of adults with diagnosed diabetes by age group, 2010 127 5.16 Reading, math, and science skills by country 128 5.17 Percent of obese women whose BMI is greater than 30 per 1,000 128 5.18 Percent of women with knowledge that HIV/AIDS spreads through unprotected sex 128 5.19 Country, Y = Per capita expenditure on health; X = Gross national income per capita (in $) 129 5.20 Percent of population with improved drinking water (Y1 ) and sanitation facilities (Y2 ) in urban and rural areas 131 5.21 Percent of male (A), female (B), or both (AB) who have abstained from sex in the past 12 months 134 5.22 Influenza versus pneumonia deaths in US states (as of January 11, 2021) 135

List of Tables

5.23 5.24 5.25 5.26 5.27 5.28 5.29 5.30 5.31

5.32

5.33

5.34 5.35 5.36 5.37 5.38 5.39 5.40 5.41 5.42 6.1 6.2 6.3 6.4 6.5 6.6 6.7 6.8

Physicians and nurses in US states 136 Hospitals and beds in US states, 2018 138 Details of population sizes and their risks, 2018 140 Country, year, rural, and urban infant mortality rate per 1,000 live births 141 Human development index 142 Population size and the number at risk of ill health 144 Number among 50 interviewed persons who suspected AIDS might spread 145 Total hospital beds, ICU beds, CHCs, and CHC service delivery sites in US states 146 Intensivist (critical care) physicians per 10,000 adults, critical care nurses and CRNAs per 10,000 adults, and second-line critical care physicians per 10,000 adults in US states 147 Average family deductible for employer-sponsored insurance, average exchange deductible for Affordable Care Act (ACA) Marketplace plans, and share of private sector enrollees enrolled in selfinsured plans in US states 148 Influenza deaths, pneumonia deaths, flu vaccination rate for adults, flu vaccination rate for adults aged 65+, and share of adults aged 65+ who have received a pneumonia vaccine 150 Safety versus motor accidents 151 Smoking versus breathing troubles 151 Smoking versus infant survival 152 Premarital sex versus extramarital sex 152 Time to fall asleep in treatment and placebo groups 152 Mother’s age, smoking status, and child’s respiratory illness 153 Cancer versus treatment 153 Did aspirin help myocardial infarction? 153 Is sex liked by both wife and husband? 153 Orthogonal design for collecting frequencies of COVID-19 cases 157 Orthogonal data collection for emergency arrivals 158 Number infected among n = 32 nurses who provided care to SARS patients 165 A statistical comparison of the exponentiality of the times taken by 11 patients 166 (a) Number of ambulance services per day. (b) Correlation. 167 Time spent on patient activity in a hospital 167 Correlation of activities (in minutes) in a patient healthcare unit 167 Are the activity times in patient healthcare units part of an exponential probability pattern? 168

6.9 Number of words remembered by Alzheimer’s patients in a control group 171 6.10 Correlation of the number of words remembered by Alzheimer’s patients in a control group 171 6.11 Do the number of words remembered by Alzheimer’s patients in a control group follow a Poisson probability pattern? 171 6.12 Number of words remembered by Alzheimer’s patients in a treatment group 172 6.13 Correlation among the number of words remembered by Alzheimer’s patients in a treatment group in years Y0, Y1, Y2, and Y4 172 6.14 Do the number of words remembered by Alzheimer’s patients in a treatment group follow a Poisson probability pattern? 172 6.15 Seizures of patients in a control group 173 6.16 Correlation among ages Y0, Y1, Y2, Y3, and Y4 in a control group 173 6.17 The variables ages Y0, Y1, Y2, Y3, and Y4 in a control group 173 6.18 Seizures of patients in a treatment group 174 6.19 Correlation among ages Y0, Y1, Y2, Y3, and Y4 in a treatment group 174 6.20 The variables ages Y0, Y1, Y2, Y3, and Y4 in a control group 174 6.21 Number of users of drugs in US states 175 6.22 Correlation among the number of users of various types of drugs 176 6.23 Are the drug users in these types normally distributed? 176 6.24 Waste generated, shipped, and received in US states 178 6.25 Does waste disposal follow a normal probability pattern? 179 6.26 Expenses, usual, and severe patients in six hospitals 179 6.27 Correlations among expenses, usual, and severe patients in six hospitals 179 6.28 Do expenses, usual, and severe patients in six hospitals follow normal, Poisson, and Poisson probability patterns, respectively? 179 6.29 Inpatient versus outpatient length of stay and expenses ($) 180 6.30 Correlations among inpatient versus outpatient length of stay and expenses ($) 180 6.31 Do inpatient versus outpatient length of stay and expenses ($) follow a distribution? 180 6.32 Age, grade completed, pregnancy order, and immunity 181 6.33 Correlation of data in Veney, p. 439 181

xiii Published online by Cambridge University Press

List of Tables

6.34 Do variables age, grade completed, and pregnancy order follow normal, Poisson, and Poisson distributions? 182 6.35 Hospital site infection to nurses from patients 182 6.36 Correlations among hospital site infection to nurses from patients 183 6.37 Are hospital site infections Poisson distributed? 183 6.38 Correlation between number of exposed and infected nurses 183 6.39 Regression of number of infected nurses in terms of number of exposed nurses (R2 = 60%) 183 6.40 Regression coefficients for number infected in terms of exposed nurses 183 6.41 Regression fit and residuals for number infected in terms of exposed nurses 184 6.42 Fit of Poisson for X = exposed nurses and binomial for Y = infected nurses 185 6.43 Mortality rates per 1,000 among selected countries, 2010 186 7.1 Time it takes for activities by the patients in a hospital 193 7.2 Number of anthrax cases, length of suffering, and number cured in five US regions 196 7.3 Diagnostic test results among two groups 196 7.4 Rehospitalizations among diabetic patients for not adhering to prescribed medicines 197 8.1 Hospice patients who received the right amount of medicine for pain management in US states, 2010 and 2011 207 8.2 Number of hospital patients who never had good communication with doctors in US states, 2010 and 2011 208 8.3 Hazardous waste generated, shipped, and received by US states, 2005 (in 1,000 tons) 209 8.4 Cesarean rates per 1,000 live births in selected countries 209 9.1 Medical errors in the state of Indiana, 2006–2014 217 9.2 Correlation Pij,i,j = 1,2,3,4,5,6 among the variables 217 9.3 Shanmugam’s (2015) example of preventable or systemic errors 220 9.4 Zhang and Horn’s (2020) data on defects in healthcare 221 9.5 Correlation among the variables y1, y2, y3, y4, y5, y6, y7, y8 222 9.6 Poisson rate 222 9.7 Principal component loadings 222 9.8 Number of complaints about the nurses in a hospital 223 9.9 Correlation between the number of complaints received Monday through Sunday 223

xiv Published online by Cambridge University Press

9.10 Principal component loadings for complaints about the nurses in a hospital 224 9.11 95% confidence interval for the complaint rate 224 9.12 QMHM data 224 9.13 QMHM data 225 9.14 Number of patient falls in three units over 15 weeks 225 9.15 Correlations among number of patient falls in three units 225 9.16 Poisson fit of number of patient falls in three units 226 9.17 Correlations among adversities in a hospital 226 9.18 Component loadings for correlations among adversities in a hospital 226 9.19 Test of normality of correlations among adversities in a hospital 226 9.20 Age characteristics of patients arriving in an emergency department 228 9.21 Starting and ending diagnosis of cases in an emergency department 229 9.22 Four emergency departments: A, B, C, and D 229 9.23 Stillbirths and related issues 229 9.24 Critical incidents with insufficient data 230 9.25 Costs and deaths 230 9.26 Number of stillbirths, age, place of delivery, and gravida 231 9.27 Costs and deaths of patients with stated conditions 231 9.28 Weissman et al. (2007) data on the number of hospital adversities 232 9.29 Prior, likelihood ratio, and posterior probabilities for cesarean versus natural birth, 2006 232 9.30 Percent of pharmacies with generic medicines 233 9.31 Number of drug users, 2020 233 9.32 Number of persons (in thousands) with a health problem limiting work, 2010 235 9.33 Television, physicians, and life expectancy 236 9.34 Relationship between IQ and brain size 237 9.35 Root causes of hypertension, 2010 238 9.36 Number of visits versus number of adverse events 239 9.37 Number of ambulance pickups in a hospital in Austin, Texas, 2011 240 9.38 Time spent (minutes) in a hospital by 11 patients (P1 through P11) 240 9.39 Entry is R (Readmitted) over N (Number of admissions) 241 9.40 Number (in %) of adverse drug events by drug class 241 10.1 Number of items in demand per week, unit cost (in $), carrying rates (in %), and order cost (in $) of items in a hospital 254

List of Tables

10.2 Correlations among the variables of demand per week, unit cost (in $), carrying rates (in %), and order cost (in $) of items in a hospital 254 10.3 Confidence intervals, Kolmogorov–Smirnov score, and p-values for normality 255 10.4 Implantation annual demand, cost, carrying, and order cost 255 10.5 Correlation among the variables 255 10.6 Confidence interval for annual demand, cost, carrying rate, and order cost 257 10.7 Number of dissatisfied inpatients at discharge and at three-month follow-up 257 10.8 Number of dissatisfied inpatients at discharge and at three-month follow-up 257 10.9 Poisson fit for number of dissatisfied inpatients at discharge and at three-month follow-up 257 10.10 Percent of insurers in each category in US states 258 10.11 Correlation of the percent of insurers in each category in US states 259 10.12 Principal component loading levels of the percent of insurers in each category in US states 260 10.13 Do the percentages of the insurers in each category in US states follow a beta distribution? 260 10.14 Impression and support for rationing cost-effectiveness analysis 261 10.15 Personal support for rationing cost-effectiveness analysis 261 10.16 Percent of insured, noninsured, Medicaid, Medicare, other insurance, and uninsured, 2011 262 10.17 Healthcare costs ($) in US states, 2009 263 10.18 Medical tourism costs for several procedures 264 10.19 Out-of-pocket expenses for healthcare in selected countries 265 10.20 Medicaid Physician Fee Index, 2010 268 10.21 Health insurance coverage of adults age 19–64, 2013 269 10.22 Disability numbers (in thousands) with a health problem limiting work, 2010 270 10.23 Total expenditure on health as a percentage of GDP, 2000–2005 271 10.24 Number of physicians by type, 2012 271 10.25 Percent of Medicaid spending by enrollment group, 2012 272 10.26 Hospital inpatient days per 1,000 population by ownership type, 2012 273 10.27 Medicare service use: hospital inpatient services, 2012 274 10.28 Uninsured rates for nonelderly adults by gender, 2014 275

10.29 Hospital beds per 1,000 population by ownership type, 2012 276 10.30 Number of heart disease deaths per 100,000 population by gender, 2012 277 10.31 Hospital emergency room visits per 1,000 population by ownership type 277 10.32 Hospital-adjusted expenses per inpatient day by ownership type, 2012 278 10.33 Retail drug prescriptions filled at pharmacies (annual per capita by age), 2013 279 10.34 Donors recovered: January 1, 1988–September 30, 2014 279 10.35 Donors recovered: January 1, 1988–September 30, 2014 280 10.36 Number of organ transplants as of December 12, 2014 281 10.37 Medical tourism costs for several procedures 282 10.38 Data on all services, primary, obstetric care, and other services 283 11.1 Vulnerability versus security violations 289 11.2 Checking the independence of blood pressure and body weight in the background, knowing and not knowing the patient’s age, in a random sample of size n = 7 289 11.3 Is staffing the cause of medication error in a hospital? 290 11.4 Terrorism incidents in the United States, 1970– 2017 291 11.5 Correlation among the number of terrorism incidents, injuries, and deaths in the United States 291 11.6 Strengths, weaknesses, opportunities, and threats (SWOT) discussions with respect to COVID19 293 11.7 Entries in 2 x 2 table true state and test result, where I is Youden’s index 293 11.8 Indices to capture and interpret similarities 294 11.9 Test results on a random sample of mammograms 294 11.10 Common versus unique features in Austin and San Antonio, Texas 294 11.11 The values of I and the probability PrðC  section in San AntonioÞ, assuming, the probability PrðC  section in San AntonioÞ 295 11.12 Risk of getting AIDS through activities, according to 50 respondents in selected countries 296 11.13 Correlation heat map about the risk of getting AIDS through various activities 297 11.14 Principal component loading (based on Varimax) about getting AIDS through various activities 297 11.15 Risk comparison for getting a positive result in a COVID-19 test 298

xv Published online by Cambridge University Press

List of Tables

11.16 Data on spread of COVID-19 across days 1, 2, 4, 5, and 24 300 11.17 Vaccinations administered, January 2021 300 11.18 COVID-19 data and policy actions, 2020 301 11.19 COVID-19 data, January 2021 301 11.20 Number of menstrual cycles missed to pregnancy among smokers and nonsmokers 302 11.21 Number of cured anthrax cases (hypothetical data) 302 11.22 Number of pregnancies (Y) versus number of abortions (X) among Romans 303 11.23 Number of nurses infected while they provided personal care to patients 303 11.24 Rehospitalization among diabetic patients due to not adhering to prescribed medicines 303 11.25 Adherence to medicines versus nonadherence 303 11.26 Reported and unreported rapes across continents 304 11.27 Conditional and unconditional risk assessment 305 11.28 Hospital beds and services delivered in US states 306 11.29 Number of closed claims, paid claims, indemnity, and allocated loss adjustment expenses (ALAE) 307 11.30 Data on Medicaid and counseling 308 11.31 Number of words remembered by Alzheimer’s patients in a control group 309 12.1 Components of deliverables versus their metrics 315 12.2 Hospitals’ relative performance 318 12.3 Principal component loadings 318 12.4 Doctors and nurses versus outpatients and inpatients in 14 hospitals 321 12.5 Principal component loadings 321 12.6 Normality of number of doctors, nurses, inpatients, and outpatients 323 12.7 Number of patients who went to state/local, nonprofit, and for-profit hospitals in an emergency, 2012 323 12.8 For emergency patients 326 12.9 Correlation of the variables 326 12.10 Correlation values of the variables in healthcare 326 12.11 Correlation among confirmed cases, deaths, and mortality rates of the COVID-19 pandemic 329 12.12 Principal components loadings among confirmed cases, deaths, mortality rates, and longitude and latitude of the COVID-19 pandemic around the world 329 12.13 Correlation among influenza versus pneumonia deaths, vaccination rate for adults, elders (65+), and

xvi Published online by Cambridge University Press

12.14 12.15 12.16 12.17 12.18 12.19 12.20 12.21 12.22 12.23 12.24 12.25 12.26 12.27 12.28 13.1 13.2 13.3 13.4 13.5

13.6 13.7 13.8 13.9 13.10 13.11 13.12 13.13

share of adults aged 65+ who have received a pneumonia vaccine 329 Principal component loadings 331 Physicians by specialty in US states 332 Outpatient hospital visits per 1,000 population by ownership type, 2010 334 Hospital Inpatient days per 1,000 population by ownership type, 2010 334 Hospital emergency room visits per 1,000 population by ownership type, 2010 335 Births attended by skilled health personnel (%) 336 Number of cancer deaths per 100,000 population by gender, 2010 337 Number of diabetes deaths per 100,000 population by gender, 2010 338 Number of heart attack deaths per 100,000 population by gender, 2010 338 Number of heart attack deaths per 100,000 population by gender, 2010 339 Physicians by specialty, November 2012 340 Number of persons using intoxicants in US states, 2010 341 Number of outpatient visits per 1,000 population, 2010 343 Number of inpatient visits per 1,000 population, 2010 344 Number of emergency room visits per 1,000 population, 2010 346 Strengths, weaknesses, opportunities, and threats (SWOT) analysis in healthcare 353 The surge of COVID-19 according to Six Sigma methods 360 Lessons learned from lean Six Sigma 369 Roles and responsibilities of different belts 371 Number (x) among citizens (m = 10) in cities who fear a pandemic like COVID-19 might recur 373 Summary from JASP 374 Random number of persons in a survey fearing another pandemic 375 Number of persons fearing a repeat of COVID19 375 COVID-19 cases (r = 3) examined until hospitalization in a selected city 376 Number (x) of COVID-19 cases examined until hospitalization in a selected city (r = 10) 376 Summary of charts 384 Values of needed variables to construct control charts 385 Wait time (in minutes) to see a physician/surgeon in five hospitals 387

List of Tables

13.14 Wait time of two patients in the morning and in the evening at 10 clinics 393 13.15 Two-way analysis of variation (AOV) with interaction 393 13.16 Centers for Disease Control and Prevention, 2019– 2020 Influenza Season Vaccination Coverage Dashboard 395 13.17 Adults aged 65+ who report having a pneumonia vaccine by gender 395 13.18 Adults who report not seeing a doctor in the past 12 months because of cost by gender 396 13.19 Adults reporting symptoms of anxiety or depressive disorder during the COVID-19 pandemic by household job loss 397 13.20 Adults reporting symptoms of anxiety or depressive disorder during the COVID-19 pandemic by age 397 13.21 Average annual deductible per enrolled employee in employer-based health insurance for single and family coverage 398 13.22 Average Marketplace deductible 399 13.23 Child flu vaccination rates by age 399 13.24 Doses received per million 400 13.25 Daily COVID-19 cases and deaths 403 13.26 Deaths caused by influenza and pneumonia 404 13.27 Community health center delivery sites and patient visits 405 13.28 Distribution of the nonelderly uninsured by federal poverty level (FPL) 407 13.29 Hospital emergency room visits per 1,000 population by ownership type 407 13.30 Flu vaccination rates for high-risk adults 408 13.31 Health insurance coverage of the total population, multiple sources of coverage 409 13.32 Hospital admissions per 1,000 population by ownership type 410 13.33 Total hospital beds 411 13.34 Hospital beds per 1,000 population by ownership type 412 13.35 Intensive care unit (ICU) beds 412 13.36 Total number of certified nursing facilities 413 13.37 Total number of residents in certified nursing facilities 414

13.38 Percent change in state tax revenue 415 13.39 Population distribution by age 415 13.40 Adults at higher risk of serious illness if infected with coronavirus 417 13.41 Adults who are severely obese 417 13.42 Share of private-sector enrollees enrolled in selfinsured plans 418 13.43 Adults who report being told by a doctor they have diabetes 419 13.44 Adults who report being told they have COPD, emphysema, or chronic bronchitis 419 13.45 Adults who report being told they have kidney disease 420 13.46 Unemployment rate (seasonally adjusted) 421 13.47 Uninsured rates for the nonelderly by federal poverty level (FPL) 422 13.48 COVID-19 vaccines delivered and administered 423 14.1 Tendency of autocorrelations and partial autocorrelations for stationarity 434 14.2 Comparative causes of death in the United States, 2020 435 14.3 Comparison of models fitting the COVID-19 data 438 14.4 Summary of forecasting models and their scores for pneumonia deaths 442 14.5 Models of influenza deaths and their scores 447 14.6 Severity of illness, satisfaction of patients with their ages 448 14.7 Meteorological index over the years 449 14.8 Number of COVID-19 cases earlier in the pandemic, 2020 450 14.9 Number of tetanus cases in India, Japan, Mexico, Nepal, Philippines, the United States, and Vietnam, 1974–2020 451 14.10 Number of deaths due to COVID-19, pneumonia, and influenza, 2019–2020 452 14.11 Number of deaths due to COVID-19, pneumonia, and influenza, 2021 454 14.12 Actual versus predicted COVID-19 cases in Saudi Arabia 455 14.13 Weekly deaths due to COVID-19, pneumonia, and influenza in New York City, January–December 2020 455

xvii Published online by Cambridge University Press

About the Author

Ramalingam (Ram) Shanmugam received a PhD degree in statistics from the business school at Temple University. Since 2016, he has been an honorary professor of international studies in the School of Health Administration at Texas State University, San Marcos. Ram is passionate about motivating and teaching students how to critically think outside the box and efficiently practice. He believes learning without ambiguities and practicing without omissions is more of an art than a science. An analogy might be this: many students (including Ram himself) take courses on how to draw pictures, but only a few of them perhaps can surpass the world-famous Michelangelo’s (March 6, 1475–February 18, 1564) painting and sculpture in spite of the many modern tools now available. Ram’s research and teaching interests include data envelopment analysis, multivariate data collection and analysis, decision-making in healthcare, probability modeling of emerging diseases, refining diagnostic methodologies, and modeling cyber security issues, among others. For his innovative and exciting teaching, Ram was nominated by his undergraduate students to the Honor Society of Phi Kappa Phi. Ram has been selected to be a fellow of the American Statistical Association (ASA) – of which he has been an active member since 1971 – for his professional achievements in teaching, research, consulting, and service, and he has been the president of the ASA’s San Antonio chapter since 2020. He participated on the International Relations Committee of the ASA during 2003–2007 and on the Advisory Committee of the ASA’s Austin chapter during 2003–2005.

xviii Published online by Cambridge University Press

In 1984, Ram was elected to be a fellow of the International Statistical Institute. In 2015, he published four books, including the comprehensive introductory textbook Statistics for Scientists and Engineers (Wiley) and Generating Functions in Engineering and the Applied Sciences, Discrete Distributions in Engineering and the Applied Sciences, and Continuous Distributions in Engineering and the Applied Sciences (Morgan & Claypool). Ram has also published or edited 161 book reviews in 21 major journals, including the Journal of the American Statistical Association (JASA), Statistics in Medicine, the Journal of Quality Technology, and Technometrics. Ram serves as a book review editor for the Journal of Statistical Computation and Simulation and on the editorial boards of 25 leading international journals across several disciplines. Ram has been invited to give keynote speeches at international conferences in the USA, Singapore, India, and Canada, among other places. Ram has also served as a reviewer/panel member for the National Aeronautics and Space Administration, the National Institutes of Health, and the National Science Foundation on various occasions. He provides consulting services to researchers in other disciplines. He has been recognized multiple times by the dean of the College of Health Professions, Texas State University, for his excellence in research and creativity. Ram currently serves on two important committees for Texas State University: the Senate’s Survey Committee and the Search Committee for the Vice President for International Affairs.

Preface

During the past several years of teaching decision-making courses, I have often heard my students express their desire that I write a textbook containing what I explained in the class. Consequently, I began to compile and compose the materials for writing this applied decision-making book for healthcare students and professionals. I have tried my best to motivate, define, explain, illustrate, and challenge the readers to do critical thinking and to make judicious practices out of these concepts and analytical tools in order to reach optimal healthcare decisions in their professional work. The examples are selected from health insurance records, hospital emergency rooms, public health surveillance, government healthcare budgets, physicians’ treatment of illnesses, malpractice lawsuits, clinical trials, globalization versus patent issues, medical tourism, drug manufacturers, long-term care facility financing, pharmacy operations, ambulance services, traffic accidents and fatalities, stocking of medicines, and so on. The data are from books, web pages, or research articles in journals I have published over the years in order to encourage the readers to use these decision-making tools. Several books have already been written on the topic of healthcare decision-making, and some of them are suitable for students who have a calculus background. Other books are too verbal as they concentrate on applications, not providing much training for students to construct new, innovative methodology. I felt the importance of these concepts and skills for the construction of novel methodology in the future. In this twentyfirst century, much of the decision-making in many fields, including healthcare, is based on data. This book therefore takes an innovative approach to educate readers/students on how healthcare decision-making evolves to be optimal when the process utilizes dataguided evidence. Not many books are available now on this approach. For this approach to be effective, this book concentrates on the central concepts of asking how much available data could be educational when the data are correctly analyzed and interpreted, training minds to extract evidence from the data, and providing some motivational exercises.

This book is designed to accommodate uncertainty in healthcare and how healthcare decision-making is enhanced to match the reality of life. Basic knowledge of algebra, precalculus, and statistical thinking might ease readability as well as help students comprehend the contents of the book’s chapters. This book is composed of 14 chapters covering the following topics: How do healthcare professionals make decisions? Are dataguided decisions superior? What details should be included on an Excel spreadsheet? How can healthcare professionals collect authentic data? What uncertainties can be expected in healthcare, and what are their impacts on decision-making? Why are models important in decision-making? How do decision trees emerge? How can group decisions be practiced? How can the root causes of adversities be remedied? What are the most cost-effective operations in decision-making? What risks come with healthcare decisions? How can readers evaluate the programs covered in this book? Advanced topics such as the advantages of data envelopment techniques, the benefits of using stochastic frontier analytics in decision-making, Six Sigma approaches in decision-making, analysis of waiting times, game theory–based conflict management in decision-making, scheduling versus PERT, inventory versus storage, simulation methods for destructive studies, and time series and forecasting methods are to appear in a future book. At the end, this book provides an Epilogue. The Appendix contains tables for normal, t, chi-squared, and F conversions of the correlation coefficient to Fisher’s Z score. The book is organized as follows. Every chapter briefly states what readers will have learned after reading the chapter, the motivation for covering the specific topic addressed in the chapter, core concepts related to that topic, the necessary (may not be sufficient) analytical tools to extract the evidence from the available data in order to effectively apply the concepts in different scenarios, a few practical examples to illustrate both the concepts and the tools, a summary of the main ideas discussed in the chapter, and a set of theoretical and applied problems. Finally, selected references for the chapter are included.

xix https://doi.org/10.1017/9781009212021.001 Published online by Cambridge University Press

Preface

1

2

3

4

5

6

This book contains 14 chapters. Chapter 1, “Why and How Healthcare Decisions Are Made,” emphasizes the need to make better decisions in life in general and in healthcare in particular. The readers will learn from several scenarios how making optimal decisions is feasible. The illustrations formulate the approaches to synthesize a complex situation into several simple components and to seek the optimal decision for each situation. The chapter demonstrates how to integrate several optimal decisions into an overall decision policy, how to interpret the concepts and methodologies to attain better healthcare decisions, and how to practice both the concepts and the methods in other healthcare settings. Chapter 2, “Are Data-Guided Healthcare Decisions Superior?” advocates the importance of data evidence in healthcare decision-making by comparing the advantages of using data with the disadvantages of not using them. Readers will learn and practice data-guided healthcare decisions, how to integrate several data analytic methodologies in order to make optimal healthcare decisions, how to interpret the concepts and methodologies needed in healthcare decision-making, and how the concepts and methods work in several healthcare scenarios. Chapter 3, “Software: Excel, Microsoft Mathematics, and JASP,” offers tutorials on how to use free available software such as Excel, Microsoft Math Solver, and JASP to extract and interpret data evidence for the sake of making better healthcare decisions. Chapter 4, “How to Collect Authentic Data,” outlines and demonstrates statistical principles and practices of collecting authentic healthcare data from various web pages, public domain databases, and reports. The importance of random sampling, of remedying the existence of the length or size bias in the collected healthcare data, and of strategies to adjust the bias in order to make healthcare decisions is explained. Chapter 5, “Uncertainties and Their Impact on Healthcare Decisions,” convinces the readers of the unavoidability of uncertainty in dealing with the healthcare system. The rules of computing the probabilities of a specified outcome from the collected data and the interpretation of such probabilities, including Bayesian reasoning and divide and rule, are emphasized. Chapter 6, “Why Models Are Important in Healthcare,” emphasizes the utility of the value system. Several specific probability distributions and statistical methods are introduced and explained. Readers will

xx https://doi.org/10.1017/9781009212021.001 Published online by Cambridge University Press

learn and apply such models in order to refine their decisions in the chance-oriented healthcare system. 7 Chapter 7, “How Healthcare Decision Trees Emerge and Function,” provides exposure to decision trees, teaches readers to appreciate the advantages of the concepts in decision trees, and familiarizes readers with strategies to optimize their final decision. 8 Chapter 8, “How Are Group Decisions Practiced in Healthcare?” offers several competing approaches that can harmonize any confusions, conflicts, or pandemonium that may arise in the context of group decision-making. 9 Chapter 9, “Tracing and Remedying Root Causes of Adversities,” demonstrates the Bayesian networks in root-cause modeling. The utility of correlation coefficients and causal networks to conduct the characterization of root-cause analysis is featured. 10 Chapter 10, “Healthcare Decision-Making for CostEffectiveness,” highlights the importance of decision trees in providing cost-effectiveness in healthcare operations. 11 Chapter 11, “Risk Analysis in Healthcare DecisionMaking,” invokes the basic concepts of Bayesian probability models in order to quantify the risk of healthcare operations, privacy, and security. 12 Chapter 12, “Evaluation of Healthcare Programs,” summarizes and explains decision analytic tools to evaluate the worthiness of existing programs and the viability of new services in the healthcare setting. 13 Chapter 13, “Six Sigma and Lean Management in Healthcare Sectors,” provides the concepts and methodologies of quality enhancement in order to offer efficient and superior healthcare services. The chapter includes several illustrations of activities in hospitals, clinics, and other service centers. 14 Chapter 14, “Forecasting in Healthcare Sectors,” illustrates with examples, graphs, charts, and formulas the techniques hospital administrators, medical/ healthcare professionals, pharmacists, and insurance professionals use to forecast future outcomes (including trends, seasonality, cyclic behavior, etc.) and to advance and maintain efficient healthcare services. Finally, an epilogue is included in each chapter that summarizes conclusive thoughts on learning and practicing the existing concepts and methodologies and on the potential for creating new, innovative methodologies for future advancement in this discipline. The appendices in each chapter contain statistical tables that are helpful to address the significance of statistics computed from the available data. To be specific, the tables pertain to the use of normal, t, chi-squared, and F conversions of the correlation coefficient to Fisher’s Z score for significance testing.

Glossary

Absorbing state Accuracy Adversity Age-specific mortality rate Akaike information criterion Alternative optimal solution Analyst ARIMA Arrival rate Artificial intelligence Attribute Autocorrelation

Autoregressive model

Average chart Bayes

In a stochastically changing dynamic process, a situation in which a patient stays forever in a state with no escape from it. A measure of closeness to the true value of the parameter. An undesirable outcome due to a decision or on its own. Another term used to refer to it is sentinel outcome. Number of people (usually per 1,000) who died in an age bracket. An idea to compute a value in order to judge whether one model performs better than another with respect to forecast in time series data analysis. There might be more than one optimal solution in linear programming, called an alternative optimal solution. A technical expert who provides details to the decision maker on a topic. More than one analyst may advise a decision maker in some special studies. Abbreviation of autoregressive integrated moving average model in time series data analysis. The number of patients per defined time duration arriving at a healthcare facility. Supervised or unsupervised learning by a machine. An observable aspect among several characteristics. Sometimes an attribute is dichotomous if it is one over another complementary outcome. A Pearson-type correlation for a time series value and its counterpart at a specified lag time earlier or later. When it is a significant amount, the system is thought to have a memory for that long lag period. A specialized regression model in which the past values of the response (another name is dependent) variable are the predictor variables. In an interpretive sense, the current value of the response variable is decided by the past incidences of itself, meaning there is a finite memory in the healthcare system. A graph used in a quality control method. Thomas Bayes (1701–1761). English statistician and philosopher. He introduced a (then controversial but now well accepted) probability concept. His formula updates an existing (prior) chance to a posterior chance by blending it with currently available data evidence. The data evidence often moderates any bias in the prior chance. PrðBÞPrðBjAÞ PrðABÞ if PrðBÞ≠ 0: ¼ PrðBÞ PrðBÞPrðBjAÞ þ PrðBÞPrðBjAÞ

Bayes formula

PrðAjBÞ ¼

Bayes’s Rule

A working formula to update a collection of prior probability values (symbolically, PrðθÞ, where θ portrays unknown entities controlling the uncertainty) with the current data evidence (symbolically by the likelihood, Likðx1 ; x2 ; …; xn jθÞ) and the update is recognized as a collection of posterior values (symbolically, Prðθjx1 ; x2 ; …; xn Þ, which is recognized as conditional probability). The data evidence is more often dominant to refine even the tilted

xxi Published online by Cambridge University Press

Glossary

Bell-shaped curve Benchmarking

Beta distribution

Bias Binomial distribution

prior opinion. The rule is Prðθjx1 ; x2 ; …; xn Þ ¼ PrðθÞLikðx1 ; x2 ; …; xn jθÞ. This process was designed by philosopher Thomas Bayes (1701–1771). This process was controversial in the beginning but is now well-accepted practice in scientific endeavors. A symmetric frequency curve meaning that 99.973% of the data values are within three standard deviations from the mean at the center of the scale. A process in which the decision maker and analyst compare their system’s productivity, services, and practices against the counterparts of the leading competitors. A probability distribution for the proportion. Symbolically, it is Γðα þ βÞ α1 f ðπÞ ¼ π ð1  πÞβ1 ; 0 < π < 1; α; β > 0: ΓðαÞΓðβÞ Intended or observational tilt in measuring a variable or attribute. The spread of the total probability of one over a finite collection of mass points (integers) based on (1) independent and (2) identical n ≥ 1 trial cases with a stable parameter 0 < p < 1 in all cases, where p ¼ Prða dichotomous outcomeÞ. The binomial probability mass function is n! Prðyjn; pÞ ¼ y!ðnyÞ! py ð1  pÞny ; y ¼ 0; 1; 2; …; n; n ≥ 1; 0 < p < 1:

Black or green belt Bootstrapping Box-Jenkins model

Break-even point Central limit theorem

Chance for threshold threat Chance node Chi-squared test

Cohort Complementary outcome Conditional probability

Confidence interval

xxii Published online by Cambridge University Press

A recognition of Six Sigma knowledge. The green belt is the first stage and the black belt is the final stage of knowledge. A simulation method to generate data so as to notice a pattern under various scenarios. A time domain–based stochastic approach to obtaining the underlying model for time series data. It captures the memory-oriented regressive, smoothingbased moving averages, and stationarity, invertibility, and seasonality in the data. An intersection of the cost and benefit curves. It reflects a fact that the frequency pattern of a measurable outcome is the wellknown symmetric, bell-shaped Gaussian (also called normal in statistical discussions) provided the sample size n is larger. The ratio of harm over the sum of harm and benefits. A symbol (often a circle) to portray the natural outcome due to a decision in decision tree analysis. A statistical procedure to test the null hypothesis of no significant association X ðobs  expÞ2 between two outcomes or variables. The test statistic is χ 2df ¼ , exp where df, obs and exp refer to degrees of freedom and to observed and expected frequency. A group of subjects used in an analysis. Absence of an outcome Probability of an outcome occurring in the presence/absence of a different outcome. Symbolically, it is PrðAjBÞ. Notice that PrðAjBÞ ≥ PrðAÞ, where PrðAÞ is recognized as an unconditional probability, implying that the conditional probability is more accurate than the unconditional probability, and that is the basis of all science. That is why researchers in all fields including healthcare seek a clue, B, to accurately assess the likelihood of an outcome. This idea occurs in data analysis, with a confidence level, 0 < 1  α < 1, where α is recognized as the risk level of a speculative bracket for an unknown to go wrong. It is a bracket for an unknown parameter based on a level of confidence.

Glossary

Constraint Convergent root cause Correlation coefficient

Correlation

Cost curve Cost-benefit analysis

Cost-effectiveness analysis

Critical path method (CPM)

Crashing Cyclical component Data envelopment

Data type

Decision

The bracket widens when the confidence level is higher. The bracket shrinks when the sampling error is lower. Any limitation of a resource can be formulated in terms of decision variables in a technique called linear programming. A situation in which two root causes, A and B, trigger an adversity, C. Their mutual correlations provide a diagnosis. The measure of a linear relationship between two continuous variables. Its domain is [−1, 1]. If the correlation coefficient is zero, it does not mean they are independent variables unless both variables are jointly bivariate normal (another name is Gaussian). On the contrary, the correlation coefficient between two independent variables is necessarily zero. A quantitative measure of interassociation between two variables (attributes) expressed in the domain of minus one to plus one. A population measure, ρx;y , of the linear relation between two quantitative variables, x and y. The sample estimate is indicated by ^ρ x;y ¼ rx;y . Note that 1 < ρx;y < 1. The positive (negative) correlation value means x and y values increase or decrease together (x increases when y decreases or vice versa). A configuration depicting the change in expense as either the outcome or the decision varies. A worthy situation in a decision-making process in which the measurable benefits might be more than the incurred measurable cost. Otherwise, the decision-making process is futile. An analysis of the costs of healthcare operations with an objective to identify the choice that yields the maximum effectiveness achievable for a given amount of spending, or another alternative that minimizes the cost of achieving a stipulated level of effectiveness. The analysis is generally used when it is not possible to measure benefits in dollar units. Critical path method is used in scheduling activities. The longest path in a project is called the critical path. The activities on a critical path are labeled as critical activities. Refers to shortening the activity time in a project by adding resources. Refers to when and how often a phenomenon in a time series analysis repeats in the long run. It is different from seasonal component in a time series. A special case of linear programming to optimize the objective function subject to several constraints involving decision variables, limited resources, and minimum demands. There are four types of data variables: nominal, ordinal, interval, and ratio. The nominal variable has no hierarchy while the ordinal variable has a defined hierarchy. The interval data are often denoted in a defined bracket. The ratio variable is expressed as the value of one variable over the value of another variable. For example, the blood types (O, A, B, AB) of the patients in a clinic constitute the nominal variable. The patients’ body mass index (BMI) is an ordinal variable. The aggregate (systolic plus diastolic) blood pressure helps classify patients into healthy (< 200 mmHg), pre- (200–230 mmHg), stage 1 (230–258 mmHg), or final-stage hypertension (> 258 mmHg) groups. Blood pressure is an interval variable. An example of the ratio type of variable is speed (miles per hour) Why is it important to know the type of data? The usual regression is devised for ordinal data while the logistic regression is meant for nominal data. Selecting one choice over others based on criteria such as cost, time, benefit, or something else. To reach an optimal decision, one ought to seek answers to (1) what needs to be done, (2) what is good for the majority (if not for all), (3) the

xxiii Published online by Cambridge University Press

Glossary

Decision analysis

Decision analyst

Decision maker

Decision-making Decision node Decision tree

Decision variable Degrees of freedom Delphi process

Delta methods

action plans, (4) the responsibilities of the decision maker and analyst and (5) their communication styles, (6) how to view outcomes as opportunity rather than problems, (7) what could provide a win-win solution for everyone, and (8) how to avoid dominance by any group. A process of selecting (with help from one or more experts labeled as analysts) one over the other options with an aim to increase the benefits by minimizing the cost or time. It is a discipline comprising the philosophy, concept, methodology, and professional practice that are necessary to discuss important decisions in a formal manner. A technical expert who can synthesize the complex information in data or statements and explain it in a simple, understandable manner to a decision maker. A person who makes an optimal decision with an objective to maximize the benefits and/or minimize the cost or time. When the decision ends up as a wrong (correct) one, the decision maker rather than the analyst is blamed (praised). A single person or a collection of persons with the responsibility to select from among alternative options. A collective process of selecting an option over others based on a value system. Symbolically denoted by a square, it denotes one action over others. A schematic display of decisions (symbolically a square) versus outcomes (symbolically a circle) in a sequential path of consequences (symbolically an arrow). The inherent idea is that every decision could result in one or another outcome and every outcome is followed by a decision. It is a graphical illustration of sequence of actions that result in utility or loss. A controllable variable pertaining to a resource in linear programming. Refers to the effective sample size, abbreviated as df. An approach used to reach a decision among conflicting members in a group without a face-to-face meeting. It is a systematic process to elicit subjective opinions from experts on a theme of interest. The facilitator of this process then anonymously feeds back to the group members the experts’ opinions. This process can be repeated if there is a need for further improvement. When a function depends on n random inputs from an entirety (i.e., f ð:Þ ¼ f ðx1 ; x2 ; …; xn Þ, then Var½ f ð:Þ ¼ ½∂x1 f ð:Þ; ∂x2 f ð:Þ; ::::; ∂xn1 f ð:Þ; ∂xn f ð:Þ COV½x1 ; x2 ; ::::; xn ½∂x1 f ð:Þ; ∂x2 f ð:Þ; ::::; ∂xn1 f ð:Þ; ∂ xn f ð:ÞT

Deming Principle

xxiv Published online by Cambridge University Press

A collection of recommendations by the well-known statistician Dr. Edwards Deming (1900–1993) to enhance the improvement in a system, including healthcare. The principles include (1) lack of constancy to deliver services, (2) importance of emphasizing short-term benefits, (3) performance appraisals, (4) mobility of management, (5) use of data evidence, and (6) addressing excessiveness in costs. Deming articulated 14 points: (1) Apply constancy to improve service and productivity. (2) Adapt a new philosophy to minimize delays, mistakes, defects, and so forth. (3) Cease dependence on mass inspection. (4) Prioritize quality, value, speed, and long-term benefits. (5) Continuously improve the system. (6) Train employees. (7) Institute supervision. (8) Create a fear-free environment. (9) Break down barriers. (10) Eliminate slogans-based operations. (11) Eliminate numerical goals, work standards, and quotas. (12) Remove barriers that hinder efficiency. (13) Encourage vigorous programs and self-improvements. (14) Act to transform toward accomplishments.

Glossary

Dependent variable

Deterioration probability Deterministic model Diagnostic odds ratio Diagnostics

Differencing

Direct costs

Disability-adjusted life years Disease-specific mortality Divergent root cause DMAIC Dual variable Earliest finish time Earliest start time Economic order quantity Effectiveness Efficacy Efficiency index in LP Empirical rule Etiology Evaluation

Excel function/commands Expectation Expected time Expected value

A measurable, varying aspect in a data analysis. This variable is connected to one or more measurable independent variables in some fashion, which is also decided with the data evidence. In economic discussions, the dependent variable is called the endogenous variable. Difference between post-positive probability and pretest probability. A model whose components might vary but in a nonrandom manner. SensitivityðSpecificityÞ The ratio ð1SensitivityÞð1SpecificityÞ . A medical test done on both ill and healthy patients. The proportion of the positive results in ill patients is called sensitivity. The proportion of the negative results in healthy patients is called specificity. Separately, sensitivity and specificity are expected to be near one in its domain [0, 1], if a diagnostic is superior. In Box-Jenkins’s approach of modeling time series data, the differences among the successive values are obtained in order to establish the stationarity. This is called the differencing operator, d, which could be one, two, three, and so on. The costs of treating a patient in a hospital or clinic. When many patients are treated, the direct costs are likely less. The expenses of tests, drugs, personnel, facilities, and so forth are examples of direct costs. An adjustment made to the age-specific life expectancy for loss of healthy life due to disability. The number of deaths per 1,000 due to a specific disease; recognized also as “excess mortality.” A situation in which one root cause, A, triggers two different adversities, B and C. Summarizes the cyclic operational structure in Six Sigma methods. The complement to the decision variable in primal linear programming (LP). When the objective function (OF) in the primal LP is to be maximized, the OF in the dual LP is to be minimized, and vice versa. The soonest an activity can be completed. The soonest an activity can start. An amount that minimizes both annual holding and ordering costs. The extent to which a medical treatment achieves health improvements. The extent to which a medical treatment attains health improvements under ideal circumstances. A percent of an operating unit in data envelopment analysis. A larger value in the efficiency index indicates greater unit efficiency. About 99.97%, 95%, and 67% of the total area is contained within three, two, or one standard deviations from the mean, respectively. In a medical treatment, the cause of an illness. A systematic determination of a subject’s merit, worth, and significance, using criteria governed by a set of standards. It can assist an organization to assess any aim, realizable concept or proposal, or any alternative. It can help in decisionmaking or in ascertaining the degree of achievement or value in regard to the aim, objectives, and results of any action that has been completed. An algorithm to get results when using the Excel software. Excel is an aggregation of several subroutines executed behind the screen. Hypothetical average of an uncertain random variable, X, whose potential possible values occur with a chance. Average activity time. A weighted average of varying values with probability as weights.

xxv Published online by Cambridge University Press

Glossary

Exponential distribution

Exponential smoothing

FCFS Feasible solution Folding back Forecasting

Frequency domain

Game theory

Gold standard Group decision-making

Hazards function HDI

Hedge Holding cost Hypothesis

Incidence

xxvi Published online by Cambridge University Press

A continuous probability distribution used to portray an uncertain service time for a patient in a healthcare setup. Symbolically, its probability density function is f ðxÞ ¼ λeλx ; x > 0; λ > 0. A special case of the weighted moving average procedure to notice the trend of a phenomenon in time series analysis. In this method, recent observation receives a high weight in order to play up its importance, and distant observations receive a low weight in order to downplay their importance. The weights are selected in a linear-convex manner. An abbreviation to refer a scenario in which the first come is first served in a healthcare facility. Value(s) of decision variables that validate every constraint in a linear program. In a decision tree analysis, the process of selecting the decision with the highest expected value after calculating the expected values of each alternative. Projecting future values based on past trends. The forecast becomes more accurate if (1) the past repeats, (2) the horizon is shorter, (3) it is done aggregately rather than individually, or (4) an interval forecast rather than a single forecast is done. A methodology based on the frequencies of stochastic time series values. Fourier (with trigonometric functions like sine, cosine, etc.) series are often involved in this approach. Because it requires calculus knowledge, this approach is not explained in this book. A mathematical approach to understand the sequence of operations performed by two feuding persons to harness maximum benefits. One person’s gain is the other person’s loss, resulting in a zero-sum result. In a sense, most decisionmaking scenarios can be cast as game theory. The combination of their strategies is called the value of the game. In diagnostics, a medical test used to determine a patient’s true disease state. A participatory process in which several persons collectively analyze problems, consider competing options along with advantageous and undesirable outcomes, and select one option over others for a set of reasons. It is also known as collaborative decision-making. This decision is no longer attributable to any single member of the group. The instantaneous probability of dying at any point in time. Refers to the human development index in the bracket [0%, 100%] the United Nations has developed for each country based on per capita income and life expectancy. A principle in inventory to optimize the cost against price increase and inflation. The cost for storing item(s) in a storage facility due to insurance, taxes, warehouse overhead, and so forth. Refers to a conjecture or opinion in a statistical scrutiny with the data evidence. There are two complementary (mutually exclusive) versions of this term, named null hypothesis (the status quo) and research (or alternative) hypothesis. The probability of incorrectly rejecting the true null hypothesis based on the data evidence (called type I error) is the p-value (also called the alpha value), where one minus the p-value is the confidence level. Likewise, the probability of incorrectly not accepting the true research hypothesis based on the data evidence (called type II error) is β  value, where 1  β could be attributed to the data’s power. The data’s power is expected to be higher as the confidence level increases. Realize that α þ β need not be equal to one because it denotes different scenarios that are not additive in the probability context. The data’s power plus the confidence level do not have to equal one. Number of new cases of an illness.

Glossary

Independence

Independent variable Index type Indirect costs Infeasible solution Influence diagrams Integrative group process Intersection Inventory

Inventory cost Invertibility

Irregular component Kaizen technique LCL Latest finish time Latest start time Lead time Length bias

Life expectancy Likelihood ratio Likert scale Linear programming Linear regression

This scenario occurs when one among the two outcomes (symbolically, A and B) is probabilistically undisturbed in the presence or absence of the other. Its probability rule, for example, is PrðAjBÞ ¼ PrðAÞ. Two outcomes, A and B, are considered independent if the occurrence of one does not influence the probability of the occurrence of the other. That is, PrðAjBÞ ¼ PrðAÞ, if A and B are two separate outcomes, where Prð:Þ refers to probability. The probability of A and B both occurring is given by the product of the individual probabilities: PrðABÞ ¼ PrðAÞPrðBÞ. (See conditional independence). Another name for the predictor variable. In economic discussions, the independent variable is called the exogenous variable. An entity to be compared with the prototype. Costs incurred irrespective of how many patients are treated. Water and electricity bills, waste disposal fees, and so forth are indirect costs. Value(s) of a decision variable the might invalidate one or more constraints in linear programming. An alternate graphical representation of decision-making problems in a model. An optimal mix of Delphi and nominal processes with an intention to attain a win-win solution for all conflicting members in a group. The simultaneous occurrence of two different outcomes, symbolically indicated by A ∩ B. Cataloging and storing needed items, weighing purchasing versus storage costs, deciding on a discount to reduce storage costs, implementing an efficient policy for lead time. The costs of purchasing and storing items include price, set-up costs, holding costs, and so forth. A required characteristic in the modeling of Box-Jenkins’s approach for time series data. All autoregressive models are invertible to a corresponding moving average version. A random fluctuation without any known reasons such as a season or cycle of a phenomenon in a time series analysis. A top-down approach in Six Sigma techniques. The low line in a quality control chart. The longest an activity can take without delaying an entire project. The latest an activity can start without delaying completion of a project. The duration between ordering and receiving an item. An increase in the probability of observing a sample unit that occurs when the unit stays longer in the sampling frame. For a variety of reasons, the collected data portray a different population than the targeted population, and this scenario is recognized as size bias in integer data and length bias in continuous data. A proper adjustment of the sampling bias is a necessity to make apt interpretations for the targeted rather than the actually sampled population. Average lifetime of a person at a specified age. Ratio of the probability of an outcome in the presence of a symptom to a counter probability of the same outcome in the absence of the symptom. A scale consisting of integers that is hierarchically indicative of a respondent’s opinion in a survey. An optimization technique to configure the best values for a set of decision variables whose values ought to meet the constraints. A mathematical linear expression, ^y ¼ β0 þ β1 x1 þ …:: þ βp xp , used to predict the value of a response (main) variable, y, based on the value of one or more predictors, x1 ; x2 ; …::; xp , where β0 is recognized as the threshold amount and β1 ; β2 ; …::βp are regression coefficients.

xxvii Published online by Cambridge University Press

Glossary

Loss

LR− LR+ Markov cycle Markov model

Maximin strategy Mean squared error Mean survival Median Medicare Meta-analysis

Minimax strategy Mixed strategy MLE

Mode Model Monte Carlo simulation

Morbidity Mortgage payment

Most probable time Moving average

Multicollinearity

xxviii Published online by Cambridge University Press

Any disadvantageous consequence of a decision is recognized as a loss. The negative utility due to a decision is in a way the loss. A loss is the opposite of gaining an utility. The loss is glued to a wrong decision. The likelihood ratio of one minus sensitivity over specificity. The likelihood ratio of sensitivity over one minus specificity. A time interval during which a patient in a cohort transits from one state to another or even remains in the same state. An abstraction of reality in which an entity moves on to the next state from its current state with a probability. It is a decision-analytic model that characterizes the prognosis of a patient and models the transitions between states. When a player seeks to maximize the value of the game in a game theory context and it results in minimum payoff to the opponent. A positive amount as it is the sum of the variance and squared value of the bias. The average length of life a patient lives among a number of patients after they all have received a certain therapy. A competitive but robust measure of the central tendency to the mean as well as mode. A federal government healthcare program in the United States to pay the medical expenses of elderly citizens. A statistical analysis that formulates a comparative confidence interval for the true mean value of a population when each study among many has drawn a sample of different size, made different assumptions, and found different significance levels in their findings. When a player seeks to minimize the value of a game as it minimizes the maximum payoff to the opponent. When a player randomly selects a strategy in a game theory context based on a probability distribution. This abbreviation refers to the maximum likelihood estimate of the parameters in an underlying model for the data. The most celebrated virtue of this statistic (a summary of data values) is that the MLE of a function of the parameters is the function of the MLEs of the parameters. A competitive measure of the central tendency to the median as well as the mode. An abstraction of the reality of a situation in which an optimal decision is to be constructed. A simulation technique named after the well-known Monte Carlo casinos in the Mediterranean region. It is often done to observe how differently the sampling outcomes could turn out on a total random basis. It is a systematic computing algorithm under repeated sampling, and the name indicates that the next value is as uncertain as what a gambler sees in a casino. The proportion of a population living with a specific disease at a defined time. When a loan amount is obtained under an agreement to pay back an additional amount at an annual rate along with the principal, how much is to be paid in a period is called the mortgage payment. The most likely length of time an activity will take under usual conditions. A floating average to capture a clear trend in a time series data analysis, usually done with equal weights. It is called a weighted moving average when the weights used are not equal. The existence of a correlation among the predictor variables, x. The estimation of a model’s parameters in regression building becomes difficult because the incidence matrix x 0 x is singular (not invertible). Statisticians have developed a

Glossary

Multiple correlation Mutually exclusive

Negative predictive value

Negotiation Net benefit Net harm Nominal process Normal distribution

Null hypothesis Objective function Observer bias Odds

Odds of an outcome Odds ratio A → B

Opportunity cost Optimal solution Optimistic time Ordering costs

remedy named ridge regression that injects a constant into the diagonal elements of the incidence matrix. The correlation between an observed variable and its predicted value. When the occurrence of one outcome excludes the occurrence of a different outcome, both outcomes are considered mutually exclusive. For example, the presence of a disease like COVID-19 and its absence in an individual at a given time are mutually exclusive. A conditional probability of being healthy for a person with a negative test result. It is expressed as a conditional probability, PrðHealthyjnegative test resultÞ, of being healthy for a person who has negative test result in a diagnostic. It is the reverse form of specificity, Prðnegative test resultjHealthyÞ. An approach to reach an agreement in a conflict involving different constituencies within the healthcare sector. The difference in utility between two ill patients when one received a treatment while the second one received no treatment. The difference in health damage between two ill patients when one received a treatment while the second one received no treatment. An approach to reach an agreeable decision among conflicting members in a group with a face-to-face meeting. A symmetric, bell-shaped frequency curve from three standard deviations below the mean to three standard deviations above the mean. It is uniquely defined by its mean and standard deviations. The area under the frequency curve refers to probability, and the total area is 100%. The current status quo of the population. Recall that the term population refers to the frequency curve of the entirety. Symbolically, it is denoted by H0 . An expression to be maximized if it is a profit function or minimized if it is a cost function. In an observational study, when the data collector knowingly or unconsciously adds a slant to the observable outcome. The ratio of the chance of something happening over the chance of it not happening. The ratio of the probability of the presence of an outcome of interest to the probability of the absence of the same outcome. Symbolically, it is PrðAÞ expressed as OddsA ¼ Prðnot AÞ. For an example, if the odds of getting lung 1 ), it implies that for every one person with lung cancer is 0.00001 (i.e., 100;000 cancer, there might be 100,000 persons without it. The ratio of the probability for an outcome to appear over the probability for that outcome to not appear. The ratio of the odds for A to occur over the odds for B to occur. For example, if the odds of getting lung cancer is 0.00001 among nonsmokers and the odds of getting lung cancer is 0.0001 among smokers, then the odds ratio of getting 1 cancer among nonsmokers in comparison to smokers is ð0:00001 0:0001 Þ ¼ 10 meaning that for every person with lung cancer among nonsmokers, there might be 10 persons with lung cancer among smokers. The value of a foregone activity. It is an integral part of decision-making. Another name for opportunity cost is regret. Decision variables that cannot be better choices to maximize or minimize the objective function. The minimum time an activity will take if a project proceeds ideally. Costs such as salaries, costs for paper, transportation fees, and so forth when item(s) are ordered.

xxix Published online by Cambridge University Press

Glossary

Orthogonality Outcome P-P plot Parameter

Pareto Priority Index Partial correlation Perfect information PERT Pessimistic time Poisson distribution

Positive predictive value

Posterior odds Posterior probability

Post-negative probability Prediction Predictive function

Nonoverlapping attributes. Something that occurs beyond the decision maker’s control. A noticeable feature from a unit. A graphical display (based on the theoretical and sample percentiles) to check the fit of a chosen probability distribution for the data. In a chance-oriented healthcare system, there might be unobservable latent variable(s) that may impact the outcome. This impact is called the parameter of the healthcare system. A quantity used to portray the cost-to-benefit ratio in Six Sigma methods. Correlation between two variables (attributes) when a third variable’s influence on both of them is at a fixed level. The difference between what happens under a diagnostic scenario and what happens under no diagnostic scenario. Abbreviation of program evaluation and review technique. Maximum time an activity will take if a significant delay occurs. The spread of the total probability of one over the infinitely many countable collections of mass points (integers) based on a large number ðn → ∞Þ of independent and identical trial cases with a stable but negligible parameter (i.e., p → 0) such that it results in a new parameter, λ ¼ np. The Poisson probability mass function is PrðyjθÞ ¼ eθ θy =y!; y ¼ 0; 1; 2; …; ∞; θ > 0. A conditional probability of illness for a person with a positive test result. It is a conditional probability, PrðDiseasejpositive test resultÞ, of having the disease for a person who has a positive test result in a diagnostic. It is the reverse form of sensitivity, Prðpositive test resultjDiseaseÞ. PosteriorOdds ¼ ðLikelihoodRatiosÞðPriorOddsÞ. An update, Prðθjx1 ; x2 ; …; xn Þ, of the prior probability, PrðθÞ, based on collected data evidence, Likðx1 ; x2 ; …; xn Þ, using the Bayes theorem. That is, Prðθjx1 ; x2 ; …; xn Þ ¼ PrðθÞLikðx1 ; x2 ; …; xn Þ, where θ portrays the parameter that governs the chance-oriented healthcare mechanism. It is the aftermathpredictable chance for an outcome, and it is based on a blend of prior probability and new data. The conditional probability of illness for a person with a negative test result. A forecast of the future value of a random variable. Alternately referred to as the marginal probability function. It is the probability of unconditionally observing the sample, x1 ; x2 ; …; xn , over all possibilities of the parameter in a chance-oriented healthcare mechanism. That is, ð∞ Prðx1 ; x2 ; …; xn Þ ¼ Prðθjx1 ; x2 ; …; xn Þdθ, where the integral refers to a ∞

Pretest probability Prevalence Principal component

Prior probability

xxx Published online by Cambridge University Press

smoothing operation. Marginal probability of illness. Number of existing and new cases of an illness. This is a proportion of the existing and new cases in a defined population. A multivariate statistical technique to orthogonalize different linear combinations of correlated variables in an observational study in terms of what is called principal components using significant eigenvalues. The probability of an outcome arising, according to current knowledge prior to pertinent data collection. Symbolically, it is denoted by PrðθÞ, where θ portrays the parameter that governs the chance-oriented healthcare mechanism. It is the predictable chance for an outcome to occur without any data information.

Glossary

Probability

Prognosis Proportion chart Prototype Publication bias

P-value Q-Q plot Quality-adjusted life year

Random sample

Random walk model Randomization Range chart Range of feasibility Range of optimality Receiver operating curve (ROC) Reduced cost

Regression analysis

Regret Relative risk Relevant cost Reorders point

A normed measure in a closed interval (0, l] to indicate the likelihood of the occurrence of an outcome. It is connected to the probability for the outcome to occur. That is, the probability of an outcome is the ratio of odds over one plus the odds of an outcome. It is a quantitative and relative description in a unit interval [0, 1] of an uncertain outcome. When the probability is zero, the outcome is unlikely. When the probability is one, the outcome is certain to be noticed. A prediction of the probable course and outcome of an illness. A graphical display in quality control. A reference entity with which an entity of interest is compared. Only when the results are statistically significant (i.e., with smaller p-value for the null hypothesis to be true) are studies published. It implies that none knows how many unpublished studies might have been performed, and this process leads to what is called publication bias. The proportion of the time the status quo (another name is null hypothesis) is likely to be true, according to the data evidence. A graphical chart (based on theoretical and sample quantiles) to check the distributional pattern of the data. A measure of the value of a health outcome that reflects both longevity and morbidity. It is the expected length of life in years, adjusted to account for diminished quality of life due to a disease. A segment from the entirety called the population is selected with an equal chance for every drawn case. When every member of the entirety receives an equal chance of being selected in a sampling process with replacement of drawn items before a new item is drawn, it is called a random sample. A specialized time series model for the successive differences of the values with no memory. A premium for the sake of equalizing, if not eliminating, any bias that may exist either in a control or in a treatment group under experimental study. A chart based on the maximum minus the minimum value of the data in quality control method. The range of values over which the dual variable is interpretable. A bracket of values over which the objective function remains optimal for the identified solution. A graphical plot of the true positive rate against the false positive rate at various thresholds. An amount by which an objective function coefficient(s) has to increase in maximization or decrease in minimization of linear programming (LP) before the decision variable takes a positive value in the optimal solution. A statistical analysis intended to project a response (dependent) variable, y, in terms of known value for the predictor (independent) variables, x1 ; x2 ; ::; xp , using a linear regressive relationship between the response and predictor ^ þβ ^ x1 þ β ^ x2 þ … þ β ^ xp , variables. Symbolically, it is portrayed by ^y ¼ β 0 1 2 p ^ ^ ^ ^ where ^y , β 0 , and β 1 ; β 2 ; ::; β p are called the predictive value, initial prediction, and regression coefficients of the corresponding predictors. A measurable quantity of what might happen if the chosen decision falls short of the optimal decision. The ratio of two probabilities in which one probability refers to one scenario and the other probability refers to a different scenario. This amount depends on the optimal solution for the decision variables. The level in inventory at which a new order is placed to replenish the storage.

xxxi Published online by Cambridge University Press

Glossary

Research hypothesis Risk

Risk averse Risk aversion Risk neutral

Risk seeking Robust

Root cause analysis Root causes Run chart Saddle point Safety probability Safety stock Sample mean Sample space Sampling error SARIMA Schematic review Schwartz Bayesian criterion Seasonal component Seasonal differencing

Seasonal index

Security risk Sensitivity

xxxii Published online by Cambridge University Press

A new opinion (conjecture) to be verified with the help of the data evidence. Symbolically, it is denoted by H1 . Liability in a defined situation. Sometimes it is expressed as a proportion or probability for easier comprehension. An unanticipated and undesirable adversity, it is expressed as a probability in an interval from zero to one. Refers to a decision maker who is unwilling to risk in order to attain an uncertain outcome. It refers to a tendency to avoid adversity in a gamble. To avoid a gamble, a patient may accept less than the expected value to be gained from the gamble. A decision maker who decides based on the average selection of others. It refers to a situation in which a decision maker makes her/his choice based on what others perform on the average. A risk-neutral decision maker is indifferent to taking a risk by selecting the expected value of others’ choices and their outcomes. Refers to a decision maker who is willing to undergo a risk in order to attain a defined benefit in a gamble. Refers to a pragmatic scenario in which no extremity (best or worst) is involved. For example, the median is more robust than the mean as the median is least affected by the extremely low or extremely high value in a random sample. A method of problem-solving that tries to identify the root causes of problems. Factors that might have caused a chain of adversities. A graph based on range in Six Sigma methodologies. A state in game theory in which neither player can improve or worsen the value of the game. Difference between post-negative probability and pretest probability. The level of inventory in which the items in storage are more than the demand for them. A measure of the central tendency of a frequency curve. The mean is not a robust measure as it vulnerable to extremely low or high values. All possible combinations of outcomes. A measure that portrays the fluctuations that could occur in the extract (otherwise called statistic) of the collected sample data due to its randomness. An abbreviation for seasonal autoregressive integrated moving average model in time series data modeling. Another version of meta-analysis used to evaluate and interpret available research evidence relevant to an investigation. It is an idea to compute a value in order to judge whether one model performs better than another with respect to forecasting in time series data analysis. The fluctuation over the calendar year of a phenomenon in a time series analysis. With respect to a defined seasonal timescale, the time series data might lack the required stationarity to build the Box-Jenkins model. In such situations, the differences among the seasonally successive time series values are obtained using a differencing operator, D, which can be one, two, three, and so on only in the context of the seasonal timescale. The season’s impact on a phenomenon in time series analysis. When the index value is more (less) than one, it refers to positive (negative) impact by the season. When the index value is just one, the season has no effect on the phenomenon. Employing a security risk management paradigm to make a particular determination of security-orientated outcomes. The proportion of patients getting positive test results in a diagnostic test to those with the disease. Symbolically, it is Se ¼ Prðpositive test resultjhaving the diseaseÞ. Sensitivity is the reciprocal

Glossary

Sensitivity analysis

Serial root cause Service rate Setup cost Shanmugam’s index Similarity index Simulation

Six Sigma

Slack variable Sources of conflict

Specificity

Stationarity Statistic Statistical process control Steady state probability Stochastic frontier

Sunk cost

of positive predictive value, PPV ¼ Prðhave the diseasejobtained positive test resultÞ. The study of how the uncertainty in the output of a mathematical model or system (numerical or otherwise) can be apportioned to different sources of uncertainty in its inputs. When one or more assumptions are no longer true, the optimal decision may not be still the best. Sensitivity analysis describes how much the optimality of the decision changes due to any change in the framework. In the discussion of root causes, when a root cause, A, triggers a direct cause, B, which results in adversity, C. The number of patients on average during a defined time period at a healthcare facility. The amount spent on items such as labor, materials, building, and so forth that is spent to set up a facility to produce or reorder item(s). S ¼ PPV þ NPV  1. When Shanmugam’s index is positive (negative), the uncertain outcome is ill (health). A percent in the interval from zero to one as a measure of similarities between the prototype and the index type. Using advanced computing methodologies, a decision maker may select an approach to undertake a fictitious data collection, analysis, and finding in the simulation. This method is appreciated when the study is costly or too destructive to perform in real-life settings. An approach to improve performance in a system originally credited to the efforts of Joseph Juran and W. Edwards Deming. The title emanated from the statistical idea that one must go beyond six standard deviations higher than the mean to spot any failure and from the fact that it is so rare. At the 6 σ level, the number of defects would be 3.4 per million. A variable whose value denotes the unused resource in a constraint. Conflict is a mismatch in the preferences of several constituencies in any healthcare setting. The sources of such conflict include (but are not necessarily limited to) scarce resources, ambiguities, personality clashes, power or status differences, fractured goals, communication breakdowns, lack of collaboration, and so forth. The five strategies to resolve conflict are unbiased assessment, acknowledgment of merits in the opposing group, change in our own attitude, the positive approach (win-win), and comprehensive actions. A conditional probability of obtaining a negative result by a healthy person. It is the proportion of healthy patients getting negative test results in a diagnostic test without the disease. Symbolically, it is Sp ¼ Prðnegative test resultjhealthy without the diseaseÞ. Specificity is the reciprocal of the negative predictive value, NPV ¼ Prðhealthy without diseasejnegative test resultÞ. A required characteristic in the time series data to capture the memory-oriented regressive component. All moving average models are stationary. An extract of the random sample drawn from a population. A sampling process in a study used to check whether the intended quality is maintained. When a healthcare system settles in a state in which the transition probability from one state to another remains without a change. A multivariate stochastic programming technique to attain the best possible objective function that is expressed as a linear model of predictable and unpredictable components. A cost that is not affected by the optimal solution.

xxxiii Published online by Cambridge University Press

Glossary

Supply chain Surplus variable Survival curve SWOT analysis

Systematic sample

T-distribution

Threshold odds Time series Total probability

Transition probability Trend Tunnel state Type-I error

Type-II error

UCL Unboundedness Uncertainty

xxxiv Published online by Cambridge University Press

A concept and operation style to meet efficiently the demand for items/services. A variable whose value in a linear programming portrays the excessive (overused) amount of a limited resource. A monotonic probability curve used to indicate the chance for surviving a time point. Strategies to attain the stated missions and goals based on the best combination of strengths, weaknesses, opportunities, and threats (SWOT). It was developed by Ken Andrews in the early 1970s. There are four quadrants based on internal strength or weakness and external opportunities or threats. Quadrant 1 reflects internals strengths matched with external opportunities. Quadrant 2 includes internal weaknesses combined with external opportunities. Quadrant 3 groups internal strengths and external threats. Quadrant 4 identifies a scenario of internal weaknesses and external threats. A sampling in which every new case is drawn from the entirety (recognized as population) with the same regular gap in the selection process. When a few members of the entirety are skipped before a member is selected for observation in a study, the draw is called a systematic sample. The systematic sample is biased about the population characteristic while the random sample is unbiased. A probability distribution used when the sample size is small and/or when the population standard deviation is replaced with the unbiased sample estimate. This result is credited to a statistician named William Gosset (January 18, 1782– March 27, 1848). When the degrees of freedom (df) are large, the t-distribution and Gaussian distribution match perfectly. The ratio of probability of harm over the probability of benefit. This refers to a collection of observed values of a phenomenon over successive periods of time in a healthcare facility. PrðAUBÞ ¼ PrðAÞ þ PrðBÞ  PrðA ∩ BÞ, where A and B are two different outcomes and U and ∩ are symbols that refer to “either or” and “together,” respectively. The chance that a patient in a particular health state will transfer to another health state during a disease cycle. A pattern over a long period of time in time series analysis. A Markov situation in which the entity under investigation moves on from one situation to another. Because of the randomness in collected data evidence, there is a chance to incorrectly reject the true null hypothesis, H0 . The chance for a Type-I error to occur is called the p-value (in a way, it is synonymous with the level of significance). Symbolically, α ¼ Prðreject H0 jH0 is trueÞ in its domain [0, 1]. The smaller value of α refers to the acceptability of the research hypothesis, H1 . The outcome of not accepting the true research hypothesis based on data evidence. The chance for a Type-II error to occur is β ¼ Prðnot accepting H1 jH1 is trueÞ in its domain [0, 1]. Smaller the value of β refers the higher data’s power. The high line in a quality control chart. In some cases of linear programming, a resource might not be limited but rather unbounded. The existence of unpredictability in a chance-oriented system that produces different outcomes. For known or hidden reasons, the outcomes (not the decision) in a healthcare or hospital system are not predictable and hence the system is called uncertain. When a feature is not quite deterministic in its occurrence or absence, there could be an associated, uncertain reason. The lack

Glossary

Union Utility Utility curve

Validity Value Value model Variance

Variance chart Venn diagram

Winter’s method

Youden’s index Zero-sum game

of certainty is attributed to a state of having limited knowledge where it is impossible to exactly describe the existing state, a future outcome, or more than one possible outcome. A possibility of either one or the other or even both outcomes, indicated by A ∪ B. The measurable consequence of a correct decision. A configuration depicting a change in gain due to a change in outcome or decision. Any measurable benefit due to a decision is called utility. The negative loss is recognized as utility in an indirect manner. The extent to which a decision-making technique measures what it is intended to measure. An indicator of priority; it can be quantitative or qualitative. A description of the hierarchical preference of the outcomes in a chanceoriented system. A measure of spread among the values. It is an expected weighted spread of potential values around the expected value of a random variable with probability as the weight. A graphical display in Six Sigma methodologies. A schematic graphical display of all possible combinations in an uncertaintyoriented system with several potential outcomes. When there is only one trial (i.e., n = 1), the binomial distribution becomes a special scenario called the Bernoulli distribution with dichotomous outcomes. Another name is dummy variable with zero and one coded values for the dichotomous outcomes. A deterministic modeling approach in a time series data analysis used to capture the initial level, trend, season, and their frequencies along with smoothing parameters. Y ¼ Sensitivity þ Specificity  1. When Youden’s index is positive (negative), the diagnostic is superior (inferior). In game theory, the gains by the two players sum to zero as one player’s gain is the other player’s loss.

xxxv Published online by Cambridge University Press

Notations and Symbols

→B →C A→B→C CDFðxÞ A A→ ðCÞ B→ ρðA; BÞ PrðAjEÞ PrðAjEÞ PrðAÞ OddsA ¼ PrðAÞ ORA → B ACB ASB BINO (n, p) CA CF CPr ðAjBÞ DEA DM DP DT E (X) Excel GD GEO (θ) HG (N, n, p) IA IB (r; θ) IGP LR LSS MC NGP NPV OD OVERL PCA PDF

A

Divergent root cause analysis (a cause A results in two outcomes B and C) Serial root causes in the sense that A causes B, which causes C Cumulative distribution function of an event X ≤ x Absence of the outcome A Convergent root cause analysis (two causes A and B cause one outcome C) Correlation measure ρ between outcomes A and B expressed in the domain [−1, 1] Divide-and-conquer rule based on the ratio of the probability to the existence of an outcome A against its absence A given the evidence E Odds of an outcome A are the ratio of PrðAÞ over PrðAÞ Odds ratio of the OddsA in one group over the OddsB of another group An outcome A convicts another outcome B An outcome A supports another outcome B Binomial probability pattern with n cases and probability of an event p Conflict analysis Cost-effectiveness Conditional probability of a yet-to-happen outcome A given already-happened outcome B Data envelopment analysis to rate the sampling units Decision maker Delphi process Decision tree Expected value of the random variable X (also recognized as mean) A Microsoft software used to calculate and plot graphs Group decision Geometric probability pattern with incidence parameter θ > 0 Hypergeometric probability pattern with population size N, sample size n, and parameter p Information analyst Inverse binomial probability pattern with r ≥ 1 cases and incidence parameter θ > 0 Integrative group process Likelihood ratio for an outcome with or without evidence Lean Six Sigma Mathew’s correlation Nominal group process Negative predictive value in proportion Orthogonal design to collect data on clues Overlap probability Principal component analysis Probability density function

xxxvi Published online by Cambridge University Press

List of Notations and Symbols

PERREL POI (λ) POST PPV Pr ðΑÞ PREC PRIOR PV RA RAV RCA RECALL RN RV Se SF SFA SIML (i; j) Sp SPC SRCA VAL Var (X)

Percent relevance probability Poisson probability pattern with incidence rate λ > 0 Posterior probability structure after infusing the data evidence Positive predictive value in proportion Probability of an outcome A in the domain [0, 1] Precision probability Prior probability structure before data collection Program evaluation Risk analysis Risk averse Root cause analysis Recall probability Risk neutral Random variable Sensitivity in proportion Survival function Stochastic frontier analysis Similarity between proto case i and index case j Specificity in proportion Statistical process control Serial root cause analysis Value of an option Variance of the random variable X (also recognized as dispersion)

xxxvii Published online by Cambridge University Press

Prologue

Basic knowledge of college algebra, precalculus, and statistics is enough to understand this book. The knowledge of data science (including statistics) is prerequisite background to comprehend and benefit from the contents of this book. Many feel quantitative topics and statistics are distasteful, difficult, or a nightmare. This feeling emerges unfortunately due to a lack of exposure to counterfactual thinking. For anyone who wonders what is counterfactual, the following example might clarify. A resident comes out of her/his home and notices the lawn is wet. The knowledge of the state of the wet lawn is synonymous with having data. If the resident quickly comes to a judgment that it must have rained in the night because the lawn is wet, that is an erroneous analytic process. The reality could have been one or another unknown cause. A possibility is that a neighbor might have forgotten to switch off the automatic sprinkler and it was a windy night. Another possibility is that the resident’s own sprinkler might have broken. The actual cause needs to be traced out based on an analytical process. Such a process involves concept, data, and methodology. All of these factors complicate the situation, making the judgment process distasteful. What is needed in decision-making, especially in healthcare, is therefore really the mindset to think counterfactually. This book aims to explain these counterfactual notions and their expressions using examples from the healthcare field.

xxxviii Published online by Cambridge University Press

The contents of Data-Guided Healthcare DecisionMaking are oriented to using the software programs Excel, Microsoft Math Solver, and JASP. Excel is a module in Microsoft Office that almost every graduate student, professor, and professional uses in these days of advanced computing. Microsoft Math Solver (freely downloadable from the web page www.microsoft.com/en-us/download) is helpful to construct mathematically expressed or numbers-based second- or third-dimensional graphs/charts. JASP is a free software in the public domain that can be downloaded from the web page https://jasp-stats.org. The data ought to be saved in Excel with the postfix .CSV to use as input in JASP. The menu in the Windows-based JASP contains statistical procedures called descriptive, tests, ANOVA, regression, nonparametric, factor analysis (including principal components), machine learning, meta-analysis, neural networking, and structural equation modeling, among others, to analyze the data. The concepts and analytics are focused on making decisions in hospital, clinical, and long-term care facilities. The illustrations in every chapter are tuned to learn how data are entered in an Excel spreadsheet and how a technique in JASP is selected and used to get the results for the data and to focus on how optimal decisions are made in healthcare settings. Conceptual and analytic exercises are constructed in every chapter for readers to capture and practice the decisionmaking process.

Introduction

This textbook is intended for graduate students in healthcare, public health, health services research, or other medical programs. Often graduate students in the health disciplines happen to be less equipped in mathematical/quantitative reasoning or skill compared to those in mathematics or statistics disciplines. This textbook is therefore written prioritizing basic concepts and methodologies (with the essential expressions and their illustrations) to practice in hospitals, clinics, insurance industries, nursing homes, or other healthcare organizations. Mathematical derivations of the expressions are omitted. For the sake of comprehension, reallife scenarios are discussed. In the illustrative examples, the motivations for selecting a particular concept and its expressions are clearly stated and explained. The data analytic results in the illustrations are interpreted. In every chapter, several challenging conceptual and analytic exercises are added to help professors who teach courses using this book and for graduate students/ researchers to extend the frontiers of data-guided decision-making. Comparable books to this are outdated and do not reflect the current state of healthcare. I have tried to use publicly available free software like JASP, Excel, and Microsoft Math Solver to analyze the data, to interpret the generated results, and then to utilize the interpretations to discuss the merits and deficiencies in both the concept and the analytic expressions in every chapter. This textbook contains 14 chapters. The chapters cover diverse topics as building blocks to achieve the ultimate aim of making optimal decisions in healthcare by any one of three constituencies: patients, healthcare administrators including physicians/nurses, pharmacists, front desk professionals, and so forth, and the related supporters including reimbursing insurance experts, policy-making federal, state, local, or hospital organizations, and academic institutions that train healthcare professionals. Chapter 1 starts with asking why and how healthcare decisions are made. Decision-making is challenging in both professional and personal life. Why is this so? The background is complex. Decisions need to be carefully constructed. When the decision happens to be the best,

several benefits are harvested. If the decision ends up unfortunately as wrong, then it results in bad consequences or losses. Chapter 2 articulates the theme that data-guided decisions are superior to what they could have been otherwise. An analogy is perhaps walking through a dark space without a flashlight. Intuitively, a person might make the best decision based on her/his natural intelligence. This might not be the case for everyone. For decision-making to be scientific, the process needs to be objective, not subjective. To be objective, decisionmaking ought to be rule based, not guess based. Also, reality has been and will be continuously refined in a Bayesian style. What is Bayesian style? Knowledge is updated to a posterior level from a prior level as a gradual buildup every time new data come into the decision-making process. More details are stated in Chapter 2. In any data-oriented scenario, it is unavoidable for the decision makers to involve software. Several software programs in these days charge an annual amount for installation and renewal. In some cases, such amounts are unfortunately too high to consider. Excel is available to almost everyone using Windows 10, which includes Excel as a component of Microsoft Office. The Microsoft Corporation’s Math Solver is a free statistical software in the public domain. JASP is a free, open-source program for doing statistical analysis prepared and provided by the University of Amsterdam. It is designed to be user friendly as an alternative to other software such as SPSS, SAS, STATISTICA, and so forth. An advantage of JASP is that it offers both classical and Bayesian analysis. Chapter 3 illustrates with examples how to use these three software programs. In the current age of advanced computing, data are generated fast and archived in web pages that can be easily accessed. In other words, healthcare practitioners are drowned in large amounts of data, but the prudent thing to do is to appraise every piece of data for validity. Chapter 4 concentrates on how to collect authentic and useful data from public or private domains for learning and practicing a particular healthcare topic.

xxxix https://doi.org/10.1017/9781009212021.002 Published online by Cambridge University Press

Introduction

With all such data, healthcare educators and administrators of healthcare services in hospitals and in clinics suspect the existence of uncertainty. While the preference is to involve and utilize data evidence to formulate the optimal decision, uncertainty is inevitable. Chapter 5 introduces definitions and properties, explains the concept of uncertainty, and demonstrates probability tools to quantify and interpret uncertainty measures in the data. Technical experts reveal that data, in general and in healthcare particularly, come in four formats: nominal, ordinal, interval, or ratio. One size (approach) does not fit all (type of data). Chapter 6 introduces popularly used integer and continuous models with their properties from among the four type of data and shows through example the usefulness of such models. Conceptually, the terminology model refers to an abstraction of the underlying uncertain chance mechanism and its generated data. Also explained in the chapter is why and how the models play an integral role in healthcare decision-making. More often than not, healthcare decision-making is not a one-snap, terminal choice all the time but is ongoing, depending on what outcome occurs from among many others that are possible. At every stage, the outcome is tracked. It is a sort of game played between the healthcare decision maker versus nature. Of course, the outcome following a decision is uncertainty oriented while the decision is not wavering (i.e., deterministic). At a stage when the outcome due to the most recent decision is undesirable, the decision maker tries to rectify the problem by choosing one decision instead of another. The ongoing process of making subsequent decisions based on favorable or undesirable outcomes constitutes decision trees. Chapter 7 demonstrates the role of decision trees in healthcare field. The planning and practice of healthcare require a harmonious decision among several constituencies in the hospital or clinic. The constituencies might be composed of patients, physicians, nurses, pharmacists, support personnel, policy makers, insurers, policy makers, and so on. Any constituency is likely to safeguard its interest and therefore will accordingly present a preference and/or a rejection of a choice over others in a group discussion. Consequently, the decision maker might not attain a consensus and will then contact all of the constituencies in an effort to reach an agreeable decision. This is called a group decision, and it is explained and illustrated in Chapter 8. For a variety of reasons, undesirable events occur in healthcare operations. Such undesirable events are recognized as sentinel or adverse events. Healthcare administrators cannot ignore sentinel events; the consequences of adverse events include lawsuits, lost licenses, a bad image among the public or future potential patients and administrators must avoid these consequences. Healthcare

xl https://doi.org/10.1017/9781009212021.002 Published online by Cambridge University Press

administrators may appoint an interdisciplinary investigative team to identify the root causes of adverse events that have already occurred. The aim is to formulate strategies in order to prevent future incidences. How healthcare professionals ought to deal with patients and with providing services is a crucial question. These issues are discussed in full detail in Chapter 9. A crucial aspect of healthcare operations is cost-effectiveness. Cost is of concern for the sponsors and supporters of these services. Healthcare services cannot be a losing proposition. The benefits to the patients, healthcare professionals, and hospital establishment need to outweigh the operational costs. These and related matters are addressed in Chapter 10. Appraising these situations in the hope of providing better services involves a risk analysis. There are three orientations to the comprehension as well as the operation of healthcare services. They are risk avers, risk neutral, and risk taker. The approaches that are helpful to understand a scenario and apply the best possible decision with respect to risk are covered in Chapter 11. On an annual basis or at a critical time, healthcare administrators must evaluate the current state of existing healthcare programs or the viability of new programs. Such an assessment is easily described but poorly adapted for a variety of reasons. Chapter 12 formulates ideas and methods to make a mock evaluation first and then a full-scale actual evaluation later. The practical difficulties and their resolutions are spelled out. Chapter 13 illustrates the concepts and methods called Six Sigma and Lean Management in Healthcare Sectors, to enhance efficient and superior healthcare services. Chapter 14 illustrates with examples, graphs, charts, formulas, the time series techniques for the hospital administrators, medical/healthcare professionals, pharmacists, and insurance professionals to make forecasts. An Epilogue is included in order to summarize and highlight the important concepts and tools featured in this book. All of these topics are the foundation for further understanding of several advanced healthcare topics in a future book. The advantages and shortcomings of advanced topics in healthcare administration like data development methods, stochastic frontier analysis, Six Sigma approaches, analysis of waiting times, game theory and conflict management, scheduling and program evaluation and review technique (PERT), inventory versus storage, simulations, and time series and forecasting methods are to be studied in another future volume. The flow in textbooks like this one is often not fast or smooth. Mathematically or data-oriented presentations require full comprehension beyond the symbols and

Introduction

notations in order to internalize the concepts and to realize their applications in real-life scenarios. These are quite different from what readers do in reading a novel. To facilitate readability, a glossary, a list of notations and symbols, a list of tables, and a list of figures are included with their explanations. Furthermore, to assist readers (healthcare graduate students, researchers, and professionals) in these practices, the three publicly available software programs outlined in the prologue are involved. Healthcare professionals who work in hospitals, clinics, insurance, nursing homes, and so forth will benefit from the contents in Volume 1 (Basic) and Volume 2 (Advanced). Though at times healthcare decision-making appears to be easy, it is really a complex process of thought and action. It is with this optimism that both volumes of this book are written.

References Anderson, D. R., Sweeney, D. J., Williams, T. A., Camm, J. D., & Cochran, J. J. (2018). An Introduction to Management Science: Quantitative Approach. Independence, KS: Cengage Learning. Dodge, Y., & Commenges, D. (eds.). (2006). The Oxford Dictionary of Statistical Terms. Oxford: Oxford University Press on Demand. Everitt, B., & Skrondal, A. (2002). The Cambridge Dictionary of Statistics (Vol. 106). Cambridge: Cambridge University Press. Helms, M. M. (2006). Encyclopedia of Management. 5th edition. Farmington Hills, MI: Thomson Gale. Taha, H. A. (2011). Operations Research: An Introduction. Upper Saddle River, NJ: Prentice Hall Press. Zedeck, S. (ed.). (2014).APA Dictionary of Statistics and Research Methods. Washington, DC: American Psychological Association.

xli https://doi.org/10.1017/9781009212021.002 Published online by Cambridge University Press

https://doi.org/10.1017/9781009212021.002 Published online by Cambridge University Press

Chapter

Why and How Healthcare Decisions Are Made

1 After studying the chapter, readers will be able to: • Appreciate and understand the importance of decisionmaking in healthcare settings. • Identify scenarios in which optimal decisions are feasible. • Formulate approaches to synthesize a complex situation into several simple components and seek optimal decisions for each situation. • Integrate several optimal decisions into an overall decision policy. • Interpret the concepts and methodologies to attain healthcare decisions. • Train in and practice both the concepts and the methods in other healthcare settings.

1.1 Motivation What is a decision? A decision is the process of selecting one option over others for the sake of some advantages. The advantages might be a minimal use of time or human resources, or constrained and/or gained profits. When no alternative appears that is better than the chosen decision in any sense, that decision is called the optimal decision. Attaining the optimum decision is not quite trivial at times due to the complexity of reality. The decision maker is often in need of technical experts called analysts who can simplify, organize, and make the decision easier for the decision maker. Decision-making is a cooperative effort to reach the optimal decision. When these joint efforts succeed, the decision maker and the analyst are appreciated and applauded. When the decision results in failure with a measurable loss, the decision maker is blamed. The decision maker therefore undertakes full responsibility for directing the decision-making process. The analyst’s role in the process, however, is not to be minimized or ignored. What makes healthcare decision-making hard? It requires decision makers to think about the interests of various groups as well as to consider only limited information and resources. There are four sources of difficulty.

First, a decision is hard simply because of its complexity – the different possible courses of action and so on. Simply remembering them all is nearly impossible. Why not synthesize the complex problem into a structure that can be easily analyzed and decided in parts: the possible outcomes, the probability of those outcomes, their and eventual consequences (e.g., costs or benefits)? Structuring tools could include decision trees or influence diagrams. Second, a healthcare decision is difficult because of the uncertainty with respect to the outcomes. Third, a healthcare decision maker is focusing on multiple objectives. He or she may have to consider the trade-off of benefits in one area against costs in another. Fourth, decisions are conflicting. The involvement of different decision makers formulates diverse opinions. Some healthcare decision makers complain that the process ignores subjective opinions. Decision-making is an iterative process with several steps. The steps are learning the scenario, identifying the aims, viewing the viable options, assigning values to the outcomes, formulating a model to capture different scenarios, quantifying the uncertainty, choosing from the alternatives, measuring the outcomes, performing a sensitivity analysis, and writing a report for future occasions. See Ozcan (2005) for types of quantitative data that are helpful to make healthcare decisions. Making decisions is a fundamental part of both our personal and our professional lives. The problem in professional settings might differ from that in personal settings. Nevertheless, the principles and strategies we adopt when seeking the optimal decision are parallel, if not identical. Personal decision-making cannot be addressed in this book, but decision-making in professional life follows a pattern and offers a promising scope. Is there a history of decision-making by chief executive officers (CEOs)? The answer is affirmative (see Goodwin and Wright, 2004). DuPont’s nuclear plants seeking to avoid a Chernobyl-like disaster, ICI America, Phillips Petroleum, the US military, ATM Limited, and Massachusetts General Hospital School are examples of enterprises that have benefited from a disciplined decision-making process. This book

1 https://doi.org/10.1017/9781009212021.003 Published online by Cambridge University Press

Data-Guided Healthcare Decision Making

exposes basic concepts and analytical tools with illustrations from healthcare settings. The ultimate beneficiaries will be patients and their loved ones in addition to physicians, laboratory researchers, nurses, allied staff, professors who teach/train future professional graduate students in healthcare professions, health insurance agents, healthcare administrators, and healthcare policy makers in government, among others. As much as decision-making opens opportunities to be innovative in order to resolve issues, to rectify past mistakes, and to promote new ideas with rewards for the betterment of patients, there are limitations to any decision, including the optimal one. When the background shifts with significant changes in available resources and/or the required productivity/services, the decision may fall short of optimality. A thorough description of such possibilities is called sensitivity analysis and is discussed in detail later in this volume. How do we really define a decision? A decision is the process by which one decision maker or decision makers collectively select one option over others in order to harvest some advantage. Of course, decision-making on a collective basis might undergo operational stumbles due to conflicts of personality or dogma. Later I discuss in detail how to harmonize conflict. Irrespective of whether the decision-making is by one person or a collective, the responsibilities are similar. In other words, when the decision is correct, yielding one or more benefits, the decision maker is credited with the success. When the decision goes wrong, the decision maker is blamed for the failure, sometimes even incurring penalties. Such possibilities induce the decision maker to be cautious. The decision maker is risk-averse (unwilling to take any risk), risk-neutral (willing to be centrist based on how others in the field have behaved), or risk-taking (quite amenable to implementing risky options). Psychologists and portfolio analysts in the healthcare field contend that a risk-taker might hit a jackpot if his or her decision happens to be the best. If the risk-taker’s decision happens to be wrong, it may bring costly or disastrous consequences. A risk-neutral decision-maker might receive modest benefits when his or her decision happens to be correct but experience a modest loss when it happens to be wrong. There might be up or down fluctuation in the benefits/losses due to the riskneutral decision maker’s action, but the consequences would be bearable. On the contrary, the decision by the riskaverse decision maker results in a small number of benefits when it is a correct decision and a small amount of loss when it is a wrong decision. There is no exciting up or depressing down from the risk-averse decision maker’s action. Such variation dictates that we ought to learn more about risk, and that is covered later in this book.

2 https://doi.org/10.1017/9781009212021.003 Published online by Cambridge University Press

Until then, we need to understand the problem and resolve it to its optimal solution if that is feasible. At times, such an exercise might not be trivial due to technical difficulties and/or constraints. In healthcare sectors, one might consider more than one attribute, a composite model of the attributes, or knowledge building to reach the best decision at every stage in the decision-making process. New data are gathered at some point in time. With the evidence provided in the recently gathered data, the existing prior opinion is updated to a posterior opinion via the Bayesian conquer-and-rule principle. Thomas Bayes (1702–1761) was a British mathematician with probability and philosophy orientations who came up with the idea that new evidence can moderate and update even an improper, less accurate prior opinion to a proper, more accurate posterior opinion. This continuous, time-oriented process of updating was controversial due to opposition from physics experts who argued that time is not a random entity. This process was not well received until statisticians accepted it and promoted it as natural and scientific. Bayesian thinking reappears later in several chapters in this book. To ease the challenges faced by the decision maker, he or she has no choice other than to appoint one or more analysts depending on the fields to be covered. The fields might range from finance, uncertainty, subsequent decisions, conflict management, adversities, risks, evaluating current programs versus creating new ones, operational efficiency, quality in service, wait time, scheduling, reviewing techniques, storage, and simulations, to time series data analytics. Neither the decision maker nor any one analyst is going to have mastered all of these, necessitating a team of several analysts. Natural by-products of multiple analysts could result in conflict, disagreement, and chaos. This creates a need to harmonize, and the strategies for doing so are addressed in Chapter 8 on group decision-making. In healthcare, the prevalence of illness, treatment types, and regulations on medical services periodically change. Consequently, the administration of healthcare services is subject to a dynamic process that transforms into a complex operation requiring judicious decision-making. Hence, data-guided decision-making (the focus of this book) makes more sense.

1.2 Concepts To facilitate understanding, appreciating, and applying the needed concepts and methods, an appropriate sequence is followed in this book. In this chapter, motivations for dataguided decision-making are exposed so as to set the stage for grasping the importance of decision-making based on evidence.

Why and How Healthcare Decisions Are Made

Chapter 2 articulates the benefits of making decisions founded on data. To acquire the skill of using Microsoft Excel (which almost everyone has via Microsoft Office), the freely available Microsoft Math Solver, and the statistical software JASP, tutorials with illustrations are included in Chapter 3. Chapter 4 addresses the fact that, in the current, technologically advanced age of web pages and information flow, healthcare decision makers need exposure to authentic data sources and sampling methods. Chapter 4 also emphasizes the principle of data orthogonality, which the statistics community exercises in its procurement of data. Uncertainties in healthcare settings (whether referring to small clinics, medium or large hospitals, supportive establishments like pharmacies, health insurance providers, emergency ambulance services, etc.) are recognized by healthcare administrators and researchers, but they sometimes do not involve such uncertainties in their decision-making. This state of affairs reflects too many technicalities in probability concepts. To ease the decision maker’s feeling of insufficiency, Chapter 5 presents basic concepts, methods, and interpretations based on real-life examples. Chapter 6 builds on the uncertainty principles examined in Chapter 5 to describe the implications of coding and basing values for establishing priorities. The chapter also explains how statistical models sharpen decisionmaking in healthcare. Though the details of composing a value system are thoroughly described later in this book, the value system created by the decision maker is indicative of his or her preferences. The analyst can therefore use that value system so as to be more efficient in providing pertinent information to the decision maker. The value system is an integral part of a larger concept called a model. What is a model? A model is an abstraction of reality. A well-known statistician, Dr. George E. P. Box (1919–2013), contended that all models are wrong, but some are useful. Per his definition, models are presented with illustrations in Chapter 6. Often in healthcare settings, decisions are made sequentially by patients and service providers. The decisions are connected and interdependent. Their outcome is random, captured, and measurable and sometimes results in a setback (a loss) or a forward positive push (a gain). The loss or gain is kept in mind when making the next decision. This collection of decisions constitutes what is called a decision tree, described with illustrations in Chapter 7. The notions of expected value and independent or conditional probabilities featured in Chapter 5 are utilized in Chapter 7. As alluded to earlier, when a decision is made with one or more analysts and decision makers, conflict is

inevitable. Chapter 8 focuses on finding the optimal method to resolve these differences. Due to a variety of shortcomings in the healthcare setting, unanticipated adversities occur, causing damage to the reputation of healthcare services and/or even the deaths of patients. Such tragedies can trigger a class-action suit against the system administrators. Decision makers need to learn about the bad consequences and how all employees can be best trained so as to avoid the occurrence (not the reporting) of adversities. Neat concepts and practical methods that can be adapted for the healthcare setting are highlighted in Chapter 9. A major concern among patients and medical/healthcare providers (including local, state, and federal government healthcare policy makers) is that healthcare has become too costly. Almost everyone involved in healthcare seeks to maintain or even lower its cost. To attain this goal requires many efforts. For these efforts to succeed, a comprehensive understanding of cost cutting is the starting point, as outlined in Chapter 10. No matter how scientific the healthcare researcher’s decisions may be, they are vulnerable to a variety of controllable causes and unpredictable events. Decision makers must educate themselves about controllable causes but prepare to deal with unpredictable events. In this process, similarity coefficients between the proto event in the past and the index event in the current scenario might be helpful. These and other related ideas with methods and illustrations are described in Chapter 11. Healthcare administrators need to periodically evaluate the currently available programs in terms of their costeffectiveness, their demand, their quality, and so forth, for the sake of license renewal by accrediting agencies. They also need to seek expansion of the healthcare services they offer. Chapter 12 describes such concepts and methods that are currently practiced in the healthcare sector. In the process of renewing licenses, raising funds, making reforms to stay competitive in the market, and working with the governing board of directors, healthcare administrators compare their own institutions with those of their competitors. When the relevant data are deterministic (precisely measurable with no interrelations among the aspects), the comparisons in an efficiency scale [0, 1] among the selective units are assessed as described with illustrations in Chapter 13. When the data are stochastic (subject to random measurement errors and/or statistical interdependencies among the aspects), the comparisons are performed as presented with illustrations in Chapter 14. Mainly due to an American-born world-famous statistician, William Edwards Deming (1900–1993), who professed, practiced, and taught the importance of

3 https://doi.org/10.1017/9781009212021.003 Published online by Cambridge University Press

Data-Guided Healthcare Decision Making

sampling techniques and statistical reasoning to identify inefficient systems, the quality of production has been enhanced in the industrial sectors. His ideas and 14 principles under the name “six sigma” garnered popularity among members of the service sectors, including those in the healthcare field. Deming’s principles with reference to healthcare operations are illustrated in Chapter 13.

1.3 Illustration In this section, I begin with what has been done in the literature. Optimal decision-making in healthcare is narrated first. Many decision-making techniques are rolled in. Burkholder et al. (2020) recommend mandating counselor competency in using ethical and decision-making models. Based on a random sample of 245 students, passages describing a thyroid scan, and basic healthcare insurance information, Dolezel et al. (2020) establish that age is related to healthcare literacy, healthcare work experience, and healthcare credentials. However, these demographic disparities are not well understood. Galetsi et al. (2020) reveal that clinicians, healthcare providers, policy makers, and patients are experiencing exciting opportunities due to big-data analytics. Greenberg et al. (2020) highlight the difficulties healthcare providers faced in making healthcare decisions during the global COVID-19 pandemic. Those hard decisions included assigning limited resources to equally needy patients, balancing their own physical and mental healthcare needs with those of patients, and providing care for all severely unwell patients with constrained resources. Healthcare is a limited resource. Its rational and fair allocation requires an evidence-based decision-making analytical model. What is an analytical model? An analytical model utilizes available data from different sources, projects alternative decisions, and produces information on healthcare costs and benefits. The increasing complexity of decision-analytic methodology has raised the need for guidelines of model development. Treskova (2020) outline a framework as follows: ◦ Obtaining data on the topic of interest. ◦ Researching the knowledge base. ◦ Writing and programming a mathematical formulation. ◦ Re-parameterizing the model. ◦ Conducting an economic evaluation. ◦ Analyzing uncertainty. ◦ Confirming that informed decision-making is superior. The importance of the healthcare decision-making process cannot be overstated.

4 https://doi.org/10.1017/9781009212021.003 Published online by Cambridge University Press

Pain management is of interest in healthcare services. Excellent pain management involves pain assessment and utilizing efficient strategies to attain less or even no pain. Educating patients on pain reduction is also a strategy. Healthcare providers must consider the use of opioids in pain reduction. In combination with opioid analgesics, non-pharmacological treatments and specific exercise regimens have proven beneficial in reducing pain. With patients who present with a history of opioid abuse, treatment choices should focus on beneficence, nonmaleficence, advocacy, patient autonomy, nurse autonomy, and veracity. Non-maleficence means not inflicting harm or pain. Advocacy is an act or process of supporting a cause or proposal. When a nurse identifies a potentially harmful situation regarding the use of opioid analgesics, healthcare professionals can educate the patient by suggesting lower-level opioid doses or non-opioid interventions. Patient autonomy refers to patients’ freedom of choice. To promote cooperation and satisfaction in pain management, nurses ensure patient involvement. Nurse autonomy denotes nurses’ obligation to provide accurate information to their patients regarding the pain regimen, including side effects, risks and benefits, and non-pharmacological treatment options. The information should be conveyed to the patient without bias or judgment. Though nurses may disagree with the pain management regimen or believe clinical findings do not correlate with the patient’s stated pain intensity, patient autonomy is not appreciated if nurses make healthcare decisions for the patient. Veracity refers to openness and honesty. Nurses operate under an ethical obligation to demonstrate veracity regarding ordered medications, their side effects, and healthcare discussions affecting the provider. Ethical pain management requires a fair approach and attention to patients’ physiological condition, potential treatment outcomes, and personal bias. Opioid-related inpatient hospitalizations continue to rise in the United States. Refer to Sturdivant et al. (2020) for a discussion. A concept called surrogate decision-making is also worth consideration. Surrogacy in this instance refers to the involvement of relatives. Parents need to be informed and empowered to select alternate surrogates, or healthcare proxies. A variety of reasons exist why patients cannot make legal decisions for themselves: they are unconscious, they have severe cognitive disabilities, or they are minors. The need for surrogates raises several questions, including who should speak for the patient (authority) and what principle should lead them (guidance). The surrogate is expected to be guided by a living will, if one has been completed, and substituted judgment is selected. In pediatric healthcare, the biological parents are the presumptive

Why and How Healthcare Decisions Are Made

decision makers and legally authorized representatives for their children unless they relinquish their rights or have their rights terminated. When the parents are not reachable, they can select extended family members or close friends. Sometimes, court-appointed guardians make healthcare decisions. An employee of the healthcareproviding organization cannot serve as the surrogate healthcare decision maker. For further discussion, refer to Fishman et al. (2020). A novel concept called decision fatigue encompasses self-regulatory, cognitive, and physiological fatigue. Decision fatigue is a widespread phenomenon in healthcare decision-making. Consult Pignatiello et al. (2020) for more details. Shang et al. (2020) promote the importance of healthcare decisions for patients moving between the hospital and the community. Infection prevention is a high priority in home healthcare decision-making. A complexity arises due to the availability of big data in healthcare settings. It raises the need for artificial intelligence (AI). Several types of AI are helpful to healthcare providers. In their practice, diagnosis and treatment recommendations, patient engagement and adherence, and administrative activities are vital. There are many instances in which AI outperforms humans. Refer to Davenport and Kolakota (2019) for an appraisal. Lysaght et al. (2019) articulate how AI transforms healthcare decision-making to encompass accountability and transparency. Of course, a concern about AI exists with respect to balancing clinical practice ethically and responsibly. Avi et al. (2019) identify three different, strictly interconnected facets that need to be considered. The first facet is sociopsychological and considers the imperfections of human nature and its connected instincts, behaviors, and problems. The second facet shows how mathematics tries to resolve problems by proposing different models and theories, each with a different level of “denaturation” from reality. The last facet is the weighted mean between the first two, and it results in a series of instruments tailored to each peculiar managerial problem. The use of information technology is emphasized in this facet so as to create a smart healthcare setting accommodating the Internet of Things, big data, and cloud computing besides AI. This information technology aims to transform the traditional medical system. This concept is evaluated in order to explain the consequences of these model changes (from disease-centered to patient-centered care), changes in informatization construction (from clinical informatization to regional medical informatization), changes in medical management (from general management to personalized management), and changes in prevention and treatment (from focusing on disease treatment to

focusing on preventive healthcare). Refer to Tian et al. (2019) for a list of advantages and disadvantages. Machine-learning techniques employed for decades are now expanding into healthcare (Ahmad et al., 2018). Clinical providers, healthcare decision makers, and their interpretation of this model prioritize implementation and utilization. As machine-learning applications are increasingly popular and more deeply integrated into the patient care continuum, prediction is imperative. FitzGerald and Hurst (2017) stress the importance of correcting physicians’ personal bias in diagnosis and treatment selection. Hawk et al. (2017) illustrate the need for harm reduction without necessarily extinguishing problematic health behaviors completely. They cover drug and tobacco use, syringe exchange, risky behaviors in sex work, and eating disorders. These and other healthcare professionals outline the following six principles for harm reduction: humanism, pragmatism, individualism, autonomy, incrementalism, and accountability without termination. Brabers et al. (2017) illustrate health literacy, referring to the personal characteristics and social resources people need in order to understand and use information to make healthcare decisions. Hawley and Morris (2017) discuss the importance of awareness of cultural differences in healthcare decision-making. Innovative and sustained efforts are needed to educate and train care providers to communicate effectively and provide culturally competent healthcare. Surgeons’ intraoperative decision-making is also a key factor in clinical practice. The four strategies surgeons can use are intuitive (recognition-primed), rule-based, option comparison, and creative decision-making. Refer to Flin et al. (2007) for details. Arvai et al. (2004) articulate ways and means of making environmental decisions with respect to healthcare. Davenport (2009) provides a list of four steps: identifying and prioritizing the decisions that must be made; examining the factors involved in each; designing roles, processes, systems, and behavior to improve decisions; and institutionalizing the new approach through training, refined data analysis, and outcome assessment. To help patients perfect shared healthcare decision-making, Elwyn et al. (2001) mention the importance of required skills and technical knowledge. Last, there is a subtle difference between a healthcare decision and a reached conclusion, as explained eloquently by Tukey (1960). The conclusion stands firm beside the decision in scientific inference making. Like in any human endeavor, science progresses through a build-up of knowledge. A conclusion is established with careful regard to the evidence, but without regard to the consequences of specific actions in specific circumstances. Conclusions are

5 https://doi.org/10.1017/9781009212021.003 Published online by Cambridge University Press

Data-Guided Healthcare Decision Making

withheld until adequate data have been accumulated. Both the decision and the conclusion are required in healthcare endeavors. The healthcare decision is built upon pure and applied science. The decision-making process varies across contexts. In psychology, decision-making refers to a cognitive process. In a healthcare setting, it describes a reasoning process based on assumptions, values given to the potential outcomes, preferences narrated by management, experts’ data, and the decision maker’s preferences. Nevertheless, decision-making is a problem-solving operando within the available resources and time. What is problem-solving? When performance deviates from routine standards, that is considered a problem. Identification of the problem is the initial step to solving it. More often than not, problems can be traced to changes in distinctive features of the system. Probing will reveal what has been and what has not been affected by a root cause. Some root causes can be pinpointed from the data. The so-called Occam’s principle (attributed to English philosopher William Ockham) – a law of parsimony advocating the necessity of simplest explanation – is applied. During this process, the decision maker might encounter analysis paralysis, a state of indecision. A major cause is the overwhelming flood of data characterized as information overload. Speier et al. (1999) define information overload as when input exceeds processing capacity. Experts suggest three types of analysis paralysis – repetitive confusing information, seeking additional information rather than deciding, and uncertainty. Each type leads to extinction by instinct, making a careless decision without any systemic planning. A remedy in this state requires implementing a structural system. Negotiation is a branch of decision-making. What is negotiation? It is a collaborative way of making an optimal decision that seeks to avoid conflict and to agree on matters of interest so as to maximize mutual benefits. Negotiation differs from coercion. Mediation is a special form of negotiation that includes a third party. When the conflicting parties accept the option given to them by the third party, it is called arbitration. Negotiation becomes distributive when one of the conflicting parties gains an amount while the other party incurs the loss. To make negotiation successful, the following strategies might help. One should be open-minded to the opponent’s view. The parties should listen to each other’s perceptions. Each party should seek opportunities to act consistently with the opposing party. Both sides should actively listen, articulate a purpose, and consider face-saving options. An integrated rather than an integrative negotiation is a wise approach. Negotiators often have strong instincts to win by compromising. They are at times soft and at other times hard, but they are

6 https://doi.org/10.1017/9781009212021.003 Published online by Cambridge University Press

principled always. Some negotiators apply tactics. Barriers to successful negotiation include die-hard bargaining, lack of trust, informational vacuums, the negotiator’s dilemma, structural impediments, negative attitude, disordered communication, and lack of dialogue. Emotions can be constructive or destructive, so negotiators should be rational. Negative emotions include anger, pride, guilt, regret, worry, and disappointment. Positive emotions include explicitness and patience. In the medical profession, decision-making involves evaluating diagnostic test results so as to select one treatment over others. In dealing with the natural world, decision-making competes with time pressure, ambiguities, and high stakes. When no one choice yields more benefits, the selected choice is declared the optimal solution. No matter who makes a decision or in what scenario, the decision needs to be ethical. What is ethics? It is a philosophical and behavior-oriented judgment to distinguish right from wrong. Ethics seeks to resolve issues triggered by wrong choices. There are three subdivisions of ethics – meta ethics, normative ethics, and applied ethics. Data-based decision-making provides a remedy for all of these issues. This approach requires high-quality data and statistical knowledge. Participative decision-making might be the best approach because of its inclusive nature. Participative decision-making suits the healthcare setting as it involves multidisciplinary stockholders, commitment to patients’ welfare, client–provider relations, service satisfaction, hospitals’ performance, and hospitals’ financial strength. Participative decision-making does present some disadvantages, however. One of them is that inclusiveness is not genuine. Time can be an issue and concerns about inefficiency, indecisiveness, or incompetence may go unheard. Knowledge, empowerment, and experts’ suggestions may not be sufficient. The provided information, communication, and mediation might be insufficient for participatory decision-making to succeed. There is no database in favor of or against participatory decision-making. A democratic operation is needed, encouraging communication. A consensus is also needed, and reaching it is not easy. Recruiting experts to assist the decision maker is costly and timeconsuming. Issues can arise due to work conflicts among the participants, lower level of influence, short-term or informal participation, insubordination, or lack of policy on representative participation, among others.

1.4 Summary Decision-making is an art. Mastering it or applying it in a particular situation relies on intelligence and/or skill level. The presentation in this book is intended to be

Why and How Healthcare Decisions Are Made

thought-provoking. I believe in a philosophy of listening or reading less, but thinking more outside the prescriptive norms so as to be creative and innovative. In my humble opinion, that approach has been instrumental in breakthroughs in humanity, science, art, culture, and even in life itself. See Chattamvelli and Shanmugam (2020, 2021) for a list of discrete and continuous probability distributions over the domain of possible values. See Goodwin and Wright (2004), Gray (2009), Marchau, Walker, Bloemen, and Popper (2019), and Panesar (2019) for decisionmaking steps.

1.5 Exercises 1. Define the concept called machine learning and describe its role in healthcare decision-making. 2. What are the technical difficulties in making better decisions with respect to poor environment and unhealthy living? Articulate strategies to overcome them.

14. Give and articulate specific examples for each one of Hawk et al.’s (2017) six principles: humanism, pragmatism, individualism, autonomy, incrementalism, and accountability without termination in a healthcare setting. 15. Narrate real-life scenarios of effective communication by caregivers to provide culturally competent healthcare. 16. Narrate an example of a good healthcare decision in the face of some uncertainty. Give an example of a poor healthcare decision whose outcome was lucky. 17. How is the concept of modeling involved in healthcare decision-making? What role do subjective judgments play in this process? 18. Illustrate a scenario in which a healthcare decision is complicated because of difficult preferences, tradeoffs, and uncertainties.

3. Match Avi et al.’s (2019) three facets in the healthcare setting.

19. Some believe crime would decrease if drugs were legalized. How would you proceed to investigate this? Describe a decision-theoretic point of view.

4. What is self-reported patient involvement in a healthcare setting?

20. Narrate the importance of negotiations in the offering of a healthcare service in a hospital.

5. Articulate the role of health literacy in healthcare decision-making.

Selected References

6. Compare and contrast the ethical model and the decision-making model.

Ahmad, M. A., Eckert, C., & Teredesai, A. (2018). Interpretable machine learning in healthcare. IEEE Intelligent Informatics Bulletin, 19(1), 559–560.

7. What different types of decision-making arise in the healthcare setting? Give some examples with data. 8. Define and comment on the vital role big data analytics play in healthcare. 9. Select a case study of the healthcare manager of a hospital and articulate his or her responsibilities, challenges in making decisions, and sources of information that might ease the process. 10. Articulate a participatory decision-making scenario in healthcare operations. 11. Identify two variables whose values are measured in a healthcare study. What are their potential values? 12. Suggest a probability distribution for the variables identified in Question 11. Are they discrete probability distributions or continuous probability distributions? Why? 13. Elaborate on why AI does a better job than humans in healthcare settings. Is it true AI can make better decision than humans? If so, give some examples.

Arvai, J. L., Campbell, V. E., Baird, A., & Rivers, L. (2004). Teaching students to make better decisions about the environment: lessons from the decision sciences. Journal of Environmental Education, 36(1), 33–44. Avi, M. (2019). An analysis of the decision-making process from a mathematical, socio-psychological, and managerial perspective. European Journal of Economics, Finance and Administrative Sciences, 102, 65–76. Brabers, A. E., Rademakers, J. J., Groenewegen, P. P., Van Dijk, L., & De Jong, J. D. (2017). What role does health literacy play in patients’ involvement in medical decision-making? PLoS One, 12(3), e0173316. Burkholder, J., Burkholder, D., & Gavin, M. (2020). The role of decision-making models and reflection in navigating ethical dilemmas. Counseling and Values, 65(1), 108–121. Chattamvelli, R., & Shanmugam, R. (2020). Discrete Distributions in Engineering and the Applied Sciences, Williston, VT: Morgan & Claypool. Chattamvelli, R., & Shanmugam, R. (2021). Continuous Distributions in Engineering and the Applied Sciences, Williston, VT: Morgan & Claypool. Davenport, T. (2009). Make better decisions. Harvard Business Review, 87(11), 117–123.

7 https://doi.org/10.1017/9781009212021.003 Published online by Cambridge University Press

Data-Guided Healthcare Decision Making

Davenport, T., & Kalakota, R. (2019). The potential for artificial intelligence in healthcare. Future Healthcare Journal, 6(2), 94–98. Dolezel, D., Shanmugam, R., & Morrison, E. E. (2020). Are college students health literate? Journal of American College Health, 68(3), 242–249. Elwyn, G., Edwards, A., Eccles, M., & Rovner, D. (2001). Decision analysis in patient care. The Lancet, 358(9281), 571–574. Fishman, M., Paquette, E. T., Gandhi, R. et al. (2020). Surrogate decision making for children: who should decide? Journal of Pediatrics, 220, 221–226. FitzGerald, C., & Hurst, S. (2017). Implicit bias in healthcare professionals: a systematic review. BMC Medical Ethics, 18(1), 1–18. Flin, R., Youngson, G., & Yule, S. (2007). How do surgeons make intraoperative decisions? BMJ Quality & Safety, 16(3), 235–239. Galetsi, P., Katsaliaki, K., & Kumar, S. (2020). Big data analytics in health sector: theoretical framework, techniques, and prospects. International Journal of Information Management, 50, 206–216.

Lysaght, T., Lim, H. Y., Xafis, V., & Ngiam, K. Y. (2019). AIassisted decision-making in healthcare. Asian Bioethics Review, 11(3), 299–314. Marchau, V. A. W. J., Walker, W. E., Bloemen, P. J. T. M., & Popper, S. W. (2019). Decision Making under Deep Uncertainty: From Theory to Practice, Berlin: Springer. Ozcan, Y. A. (2005). Quantitative Methods in Healthcare Management, San Francisco: Jossey-Bass. Panesar, A. (2019). Machine Learning and AI for Healthcare (pp. 1–73), Coventry, UK: Apress. Pignatiello, G. A., Martin, R. J., & Hickman Jr., R. L. (2020). Decision fatigue: a conceptual analysis. Journal of Health Psychology, 25(1), 123–135. Shang, J., Russell, D., Dowding, D. et al. (2020). A predictive risk model for infection-related hospitalization among home healthcare patients. Journal for Healthcare Quality, 42(3), 136.

Goodwin, P., & Wright, G. (2004). Decision Analysts for Management Judgment, Hoboken, NJ: Wiley.

Speier, C., Valacich, J. S., & Vessey, I. (1999). The influence of task interruption on individual decision making: an information overload perspective. Decision Sciences, 30(2), 337–360.

Gray, J. A. M. (2009). Evidence-Based Healthcare and Public Health: How to Make Decisions about Health Services and Public Health, Amsterdam: Elsevier Health Sciences.

Sturdivant, T., Seguin, C., & Amiri, A. (2020). Ethical decision-making for nurses treating acute pain in patients with opioid abuse history. MEDSURG Nursing, 29(1), 9–17.

Greenberg, N., Docherty, M., Gnanapragasam, S., & Wessely, S. (2020). Managing mental health challenges faced by healthcare workers during the COVID-19 pandemic. The BMJ 2020(368), m1211.

Tian, S., Yang, W., Le Grange, J. M. et al. (2019). Smart healthcare: making medical care more intelligent. Global Health Journal, 3(3), 62–65.

Hawk, M., Coulter, R. W., Egan, J. E. et al. (2017). Harm reduction principles for healthcare settings. Harm Reduction Journal, 14(1), 1–9. Hawley, S. T., & Morris, A. M. (2017). Cultural challenges to engaging patients in shared decision making. Patient Education and Counseling, 100(1), 18–24.

8 https://doi.org/10.1017/9781009212021.003 Published online by Cambridge University Press

Treskova, M. (2020). “Application of Statistical and DecisionAnalytic Models for Evidence Synthesis for Decision-Making in Public Health and the Healthcare Sector” (Doctoral dissertation, Hannover: Institutional Repository of Leibniz University Hanover). Tukey, J. W. (1960). Conclusions vs. decisions. Technometrics, 2(4), 423–433.

Chapter

Are Data-Guided Healthcare Decisions Superior?

2 After studying the chapter, readers will be able to: • Appreciate and practice data-guided healthcare decision-making. • Integrate several data-analytical methods in the decision-making process in order to make optimal healthcare decisions. • Gather and interpret the necessary concepts and methods to resolve issues in healthcare decisionmaking. • Train coming generations in the importance of dataanalytical concepts and methods across several healthcare scenarios.

2.1 Motivation First, let us examine why decisions become more authentic and accurate when they are based on evidence. Not all healthcare professionals are convinced data-based decisionmaking is prudent. To make such decision-making feasible, knowledge of data collection and its analysis is a necessity. Analysts help decision makers handle these complex technicalities. Rarely does a decision maker encounter only one technicality. Consequently, the decision maker may need more than one analyst. How do we define an analyst? An analyst has the expertise and the knowledge base to study the available information and to understand the choices in front of the decision maker along with the consequences of each. How do we define the healthcare administrator as a decision maker? The healthcare decision maker has the responsibility to make the best decision in his or her healthcare practices. When a decision results in more benefit than harm, the decision maker is applauded. If the decision brings harm or disadvantage, the decision maker is blamed, perhaps even penalized in some manner. Hence, the decision maker has to be cautious. The decision maker at times faces technical stumbles. The decision maker therefore may recruit one or more analysts who can resolve such technical matters. Likewise, the decision-making process may involve more than one

decision maker and an analyst might aid more than one decision maker. In this overall view, decision-making is teamwork, where the team consists of several analysts and decision makers. Later, we learn the ground rules on how to make the team harmonious and functional in reaching the best decision in a timely manner. What is the best decision? A general definition of the best decision is that it results in more benefit than harm. Benefit and harm have to be enumerated based on the given scenario. The analytical model helps decision makers prepare a list of potential benefits and harms. Of course, debates usually take place during the consideration of the benefits and harms. Both the data and the model play a crucial role. However, a decision has to pass the following five tests: 1. Multiple alternatives should be carefully constructed. 2. The consequences of each choice should be clearly listed. 3. Though the decision maker functions with understanding of the data, he or she should acknowledge the uncertainty surrounding the consequences of the decision. 4. The decision maker should be aware of the potential consequences for each option and should express his or her preferences to the analyst. The analyst should have provided the entire team with the relevant technical information, the available choices, and their possible consequences. Decision-making should be a joint venture. 5. The decision-making process should track uncertain outcomes, with the outcomes having different values indicative of preferences. What is a decision analysis? It is a separation of a complicated problem into several simple parts using a mathematical model. Optimal solutions and their potential consequences can then be listed for each part. The analyst can also aggregate the optimal solutions of the separate parts into an overall solution with its potential consequences and then communicate the solution to the decision maker. How should a model be defined? A model is an abstraction of the outcomes and their importance for

9 https://doi.org/10.1017/9781009212021.004 Published online by Cambridge University Press

Data-Guided Healthcare Decision Making

decision-making that uses mathematical principles at times. Of note here is that the data with quantitative numbers or qualitative information play a vital role via the chosen model. The decision-making team ought to be informed about the whole aggregated decision. What are values? Each outcome may not yield the same advantages or disadvantages. The decision maker expresses her/his preferences by assigning different values to the outcomes. The concept of values is analogous to a twosided coin. The cost is one side and the benefit (utility) is the other side. Some outcomes like the loss of confidence cannot be directly quantified. The utility for a constituency might be contextual. The benefit should not be treated as subservient to the cost in the comparisons. Besides the cost and utility, the decision-making process should take into account the concerns of the constituencies. Analysts do not have all the capability needed to capture the consequences. In other situations, analysts are uncomfortable simplifying an analysis without reducing its usefulness or accuracy. The analysts might want to interrelate the outcomes using a decision tree. Because decisions are complicated and scary at times, occasionally, decision makers reference a similar situation that arose for a competitor. This allows the decision makers to learn from their competitors’ experience, which is technically called a prototype. The analysts ought to encourage the decision makers to adapt the prototype for the sake of convenience. However, the prototype should feature the following five principles: 1. The prototype should have started as an unstructured problem. A lack of understanding of the problem usually contributes to disagreements, limited perspectives, and pandemonium. Analysts can promote better understanding. They can help check the suitability of the assumptions, causes, objectives, perceptions, constitutional similarity, available choices, outcomes desirability, and limitations of the prototype. The analysts should prepare a summary in whole and parts and bring their report to the attention of the decision makers so they can emulate it. 2. The prototype should address uncertainty with respect to several outcomes. In this step, the analysts explain what could happen if an action is undertaken. The analysts inform the decision makers about potential consequences, how to use probability models, and how to apply the Bayes theorem. 3. The prototype ought to have begun with unclear values. Compared to the values, uncertainty is considered to play a minor role. The values can be negative as much as positive. In this context, the negative benefit is losing and the negative cost is the profit. The decision makers

10 https://doi.org/10.1017/9781009212021.004 Published online by Cambridge University Press

should be straightforward and explicit with the analysts about their objectives. The decision makers should inform the analysts on whether they prefer to collapse subvalues into an overall value for the project. 4. The prototype should have dealt with several conflicts in the decision-making process. With models, uncertainty, and different values, the analysts educate the decision makers on how to minimize conflict. The analysts help the decision makers play up the importance of a win-win scenario. More important, the analysts identify administrative issues, highlight the values, present the constituents’ preferences, educate the decision makers on methods to reduce conflict, and discourage unwanted interruptions during the negotiation. 5. The prototype must not have practiced doing everything. A tendency to do it all arises in the minds of analysts and decision makers, but it is counterproductive at times. Practicality is preferable to perfection in decision-making. A good decisionmaking process does not focus on only the end results. The analysis is about the ideas, not the numbers. The findings are valuable. The interaction between analyst and decision maker is vital. Both should listen to each other. The analyst should have the habit of summarizing her/his viewpoints. The analyst should create a model and investigate its usefulness before communicating his or her findings to the decision maker. The analyst should follow many important steps for the sake of best advising the decision maker. The following 13 steps refine the analyst’s thinking and lessen risk. Recall the analysis is done jointly between the analyst and the decision maker. 1. The analysis has a specified aim with hypotheses to be verified before applying a model. All decision makers, constituencies, perspectives, and time frames are clearly and concisely defined. Turnover among analysts and decision makers is common in healthcare. For the sake of continuity, each step of decision-making is recorded in a report that includes the period, uncertainty of the outcomes, sponsors of the investigation, selected methods for the analysis with their rationale, perspectives, limitations, existing agreements versus disagreements, assumptions, and experts’ opinions. 2. The analyst and the decision maker agree on the rationale of the model. The problem is totally explored. The role of the model is unambiguously spelled out. The analyst clarifies what the resolution of the described problem should attain. Are creative options

Are Data-Guided Healthcare Decisions Superior?

3.

4.

5.

6.

allowed? The analyst outlines these efforts and provides their ratings. The purpose of the aforementioned actions includes but is not limited to staying on track, replacing a repetitive decision-making process with a mathematical formula, warning the decision maker of potential issues, ensuring all communication is clear to all relevant parties, authenticating and documenting the results, and finding ways to diminish uncertainty. It is important for the decision-making team to comprehend the objectives, to inform themselves about any frequently misunderstood terms, to become aware of practices that trigger problems, to learn about reasons for bad practice, and to separate the desirable from the undesirable. In other words, the decisionmaking process should establish protection for all patients and offer an opportunity to correct mistakes. The team designs a model for analysis before starting the decision-making process. The model defines the problem and explains why it exists, details whom the problem affects, takes into account the assumptions and objectives of each constituency, provides a creative set of available options, and lists the potential outcomes to be sought or avoided. The analyst and the decision maker expound the period for each activity. The analyst helps the decision maker break the complex outcomes down into simple manageable components with weights so their relative importance can be understood. The components of the problem need not be measured on the same scale, in which case a standardization of the scale (z-metric or percent metric) is needed. The investigative team clearly specifies the decision maker’s chosen perspectives and confirms his or her target. The uncertainty concept is not easy to understand or tackle. However, the analyst eases the uncertainty for the decision maker. The Bayes theorem is helpful here. The team describes the alternatives under consideration. The analyst analyzes the available data and makes recommendations for the decision maker. After quantifying the values and uncertainties, the analyst uses

the model to score the relative importance of each action. This scoring is done using expected value. Let us consider a simple example in order to better understand these ideas – deciding whether to implement an electronic health record (EHR) system, assuming the hospital currently maintains hard copies of patient records. The decision involves great costs. There are three mutually exclusive options when considering EHR. Option 1 is transferring completely from paper records to EHR, which may require spending up to $100 million for new computers, laptops, printers, and the staff needed to maintain and operate them. Option 2 is transferring to EHR in two phases; some of the hospital’s employees are trained in phase 1 and the remainder are trained in phase 2. This option may require spending up to $25 million. Option 3 is keeping the paper record system, which will require no spending. Though hospital management has the authority here, let us assume the board of directors of the hospital and the analyst advise the CEO of the hospital to decide based on the results of two surveys. Survey 1 is to be conducted among hospital employees. Survey 2 is to be conducted among patients, community leaders, and so forth. Note the survey results will be summarized dichotomously in the sense of percentage of respondents agreeing to the three options. The analyst summarizes the results from survey 1 and reports to the CEO that 70%, 20%, and 10% of survey respondents support option 1, option 2, and option 3, respectively. The analyst summarizes the results from survey 2 and reports to the CEO that 50%, 30%, and 20% of survey respondents support option 1, option 2, and option 3, respectively. This information is displayed in Table 2.1. 3 P The expected value, EðCos ti Þ ¼ cos tj Pij , for j¼1

survey i; i ¼ 1; 2 is the weighted average. Note the expected cost is $75.00 if survey 1 is utilized and the expected cost is $57.50 if the survey 2 is utilized. The optimal decision is the one that subscribes to the minimal cost or to the maximal benefit. Hence, the CEO,

Table 2.1 Simulated data on the percent favoring the options and their costs Outcome → Decision ↓

O1

Cost (in millions) Percent favoring in survey 1 Expected cost (in millions) in survey 1 Percent favoring in survey 2 Expected cost (in millions) in survey 2

O2

O3

Total

$100.00

$25.00

$0.00

$125.00

0.70

0.20

0.10

1.00

$70.00

$5.00

$0.00

$75.00

0.50

0.30

0.20

1.00

$50.00

$7.50

$0.00

$57.50

11 https://doi.org/10.1017/9781009212021.004 Published online by Cambridge University Press

Data-Guided Healthcare Decision Making

the board of directors, and the analyst would also choose to switch to EHR based on survey 2. 7. Both the decision maker and the analyst clearly declare the sources of the data and the rationale for the model. In this stage, the decision maker and the analyst perform a sensitivity analysis by considering changes in the cost and/or changes in the percentages in the surveys to check whether any shift has occurred. The analyst might wish to expand the surveys to include more outcomes with new percentages and more constituencies. 8. The investigative team collects the potential outcomes and prepares a report that presents the probability of each. The team members (including analysts and decision makers) gather all details, assess the approach’s merits and deficiencies, and record the results of the sensitivity analysis, the experts’ opinion, and so forth. This report is a document for others to follow in the current round of discussion as well as in future deliberations. 9. The analyst and decision maker describe the utilization of each choice. 10. The analyst presents the analyses performed in the report containing the results. 11. The analyst performs a sensitivity analysis and elaborates on its implications. 12. The team publicly discusses the selected decisions and their consequences. The team is prepared to listen to both praise and criticism from the audience. 13. Last but not least is a disclosure of all positive and negative findings. The final decisions may have limitations. It is difficult to evaluate their effectiveness because no information is available on what might have happened if the decision had not been followed. The following questions help assess the accuracy of the decision. Were the strategies real? Was the model appropriate for the situation under assessment? Were all important outcomes and their probabilities included? Did those outcomes combine evidence from several sources? Were the assigned values plausible and meaningful? What was the impact of uncertainty on the estimated values? Decisions might have limitations, including the following. The decision analysis may have oversimplified the intensity of the problem. The utilized data may have been inadequate. The values assessment may have been controversial. The outcomes of the decision analysis may not have been amenable to traditional statistical analysis. For example, quantitative analysts mention that survival rates with cancer screening are higher, and those who

12 https://doi.org/10.1017/9781009212021.004 Published online by Cambridge University Press

undergo mammography have a reduced risk of dying from breast cancer by 25%, meaning 1 less woman out of 1,000 will die of breast cancer. The importance of statistical literacy in the healthcare setting cannot be overstated. One may make healthcare decisions without leaning on data, but that is not a wise approach. An analogy is walking through untested territory in the dark without a flashlight. The walk might be safe, with no hassle or challenge, but that is by sheer luck. We can easily visualize how data can help in making better, quicker healthcare decisions if those data are relied on in an organized manner. First, we ought to define the problem. Second, we ought to collect pertinent data. Third, we ought to construct a model. Fourth, we ought to check the validity of the model in the current healthcare system because the healthcare system, like anything on earth, changes periodically, even abruptly at times. If need be, we may modify a previously used model in accordance with recent changes in the healthcare system. Fifth, based on the available resources, we ought to propose a suitable healthcare decision for the defined problem. Sixth, we as healthcare decision makers (with the entire team of analysts) ought to present our recommended solution to representatives of the organization and seek their opinion. Seventh, we ought to update the model based on that opinion. Via modeling, the analyst deconstructs a complex situation into several simple parts and then makes recommendations to the healthcare decision maker. Some decisions are harder for several nontrivial reasons. The problem is poorly articulated by the decision maker. The causes and effects of the decisions are uncertain. There might be a lack of clarity about how the decisions affect several constituencies and hence the decision maker, with help from the analyst, devises an approach to integrate all the constituencies in the decision-making process. The approach ought to clearly outline the steps, define the goals, identify the decision maker, structure the problem, reduce uncertainty, assign values for the outcomes as an indication of preferences, analyze the available courses of action, select the optimal decision, reduce conflict, and increase the probability of consensus. Data are the lifeblood of all scientific, technical, and day-to-day life. Human intelligence is built on two entities: a number system and language. Data are the natural by-product of the number system, which is intended to indicate measurable attributes. Language is sound- and sight-based communication of thoughts. There are four types of data – nominal, ordinal, interval, and ratio variables. The nominal variable has no meaningful hierarchy. For example, blood type (A, B, AB, or O) is a nominal variable as there is no order among the four blood types. Age is an ordinal variable because the divisions exhibit a

Are Data-Guided Healthcare Decisions Superior?

hierarchy. The Likert scale in a patient survey is another example of ordinal data. The interval variable type (healthy, prehypertension, stage one hypertension, or hypertension) has a hierarchy but is based on combined measurements (in units mmHg) of systolic and diastolic blood pressure. When the combined score is less than 200, the patient is classified as healthy. When the combined score is in the bracket of [200, 210], [210, 230], [230+] mmHg, the patient is classified as having prehypertension, stage 1, or stage 2 hypertension, respectively. A ratio variable is expressed in terms of one quantitative variable over another quantitative variable. A good example is driving speed, in which a speed of 60 miles per hour indicates moving 60 miles in 60 minutes. Does the type of data matter in data-guided decisionmaking? The answer is affirmative. Why is it so? To extract and interpret evidence, we need an appropriate methodology. The methodology differs in accordance with the data type. One methodology does not suit all type of data. Hence, the first thing to do is to appraise the data type before selecting a methodology. Until 1960, the healthcare systems in Canada and the United States were similar, but the Canadian federal government now covers up to 70% of healthcare in Canada. Government-controlled enterprises finance 99% of services provided by physicians and 91% of hospital costs. On the contrary, the healthcare system in the United States is of a mixed type. Medicare, Medicaid, and the State Children’s Health Insurance Program (SCHIP), which, respectively, cover eligible senior citizens (65 years or over), disabled persons living in poverty, and children, are programs that impact the US healthcare system. Medical centers and clinics facilitated by the US federal government provide healthcare to veterans (retired or disabled), their families, and their survivors. The military healthcare system, also operated by the US government, provides coverage for approximately 25% of uninsured US citizens who meet the eligibility requirements. US patients have contended that healthcare coverage, safety and compliance, clinical effectiveness, medication adherence, healthcare regimens, routine healthcare, and quality assurance measures need improvement. Whether in Canada, the United States, or elsewhere, it is quite common and understandable that patients exercise healthcare decision-making as much as physicians, nurses, pharmacists, insurance providers, and local, state, and federal policy makers. Healthcare decision makers are not necessarily physicians, nurses, or allied professionals. International cooperation is vital for better healthcare in the twenty-first century because of globalized healthcare operations, medical tourism (see

Shanmugam, 2012, 2018a) high competition, traceability, and comparability, among other factors. As much as healthcare decision-making is a complex and time-intensive exercise, its authenticity, rigor, and reasonable level of perfection are within reach when it is allied with evidence. This means we ought to look for pertinent data on the topic of decision-making, analyze those data with appropriate methods, and interpret the results. Searching for suitable methods is an integral part of the process. There might be no suitable method provided in the literature, in which situation the healthcare decision maker may have to devise a new method with or without analytically minded experts. If the literature provides many meaningful methods, the next task for the healthcare decision maker is to understand their differences and to assess which among them are suitable and superior to employ in the working scenario. If healthcare decision-making is based on an existing method or inventing a new one, the healthcare decision maker ought to be familiar with the field. To navigate further, a literature search on healthcare decision-making is required. In healthcare, any usage of big data is unavoidable in the modern computer age. Galetsi et al. (2020) narrate opportunities in the analysis of big data sets to create organizational values/capabilities in healthcare by portraying instances of healthcare advances. Ethics, even in tough and uncomfortable scenarios, cannot be sacrificed in healthcare decision-making because it relates to dichotomies, including but not limited to the following: life or death, profit or bankruptcy, building a promising career or failing professionally, trust or suspicion, exposure or protection, and safety or danger. Roshanzadeh et al. (2020) stress that the healthcare system should comply with ethical principles by being assertive about improvements. Braithwaite et al. (2020) elaborate a paradox in healthcare decision-making in regard to flat performance. Only 60% of healthcare is perceived as in concordance with evidence- or consensus-based parameters, while 30% is perceived as low quality and the remaining 10% is perceived as harmful. Zhang and Koru (2020) outline a systematic approach to evaluate Medicaid data in the United States. They identified 5 major categories and 17 subcategories of defects. Missingness, incorrectness, syntax violation, semantic violation, and duplicity are their major categories. Format mismatch, invalid code, dependency-contract violation, and implausible value types are identified as most of the data defects. They recommend healthcare organizations concentrate on data quality improvement. Too many studies have been taking place concerning healthcare topics. Some of the studies are transparent. Others are secretive studies unknown to the competitors

13 https://doi.org/10.1017/9781009212021.004 Published online by Cambridge University Press

Data-Guided Healthcare Decision Making

for a variety of practical reasons. An umbrella-type understanding of them is essential to compare their findings. They vary with respect to their targeted population sample sizes and/or data analytic methods. Their comparisons require a common denomination, which is achieved by a statistical technique called meta-analysis. Based on a metaanalysis of 112 articles, Gray et al. (2019) identify six attributes of a decision-making partner as follows: connected with the patient, demonstrated engagement in healthcare decision-making, articulated discernment of the patient’s medical issues and how to address them, demonstrated confidence in making decisions, exemplified professionalism in the decision-making process, and served in any capacity to ensure patient education and advocacy. Pollock et al. (2019) point out that evidence-based healthcare decision-making provides healthcare/medical professionals the necessary knowledge and tools. Artificial intelligence (AI) has gained popularity in healthcare services. Artificial intelligence is meant to mimic human thinking patterns. Professionals in healthcare have some concerns about AI as the technique focuses on finding right data, the right area for application, and the right approaches (see Maddox et al., 2019 for details). When working with couples seeking to increase their chance of conceiving, I developed the jumping at zero mass point convex Poisson model with a convexity property to assist them to make decisions (see Shanmugam 2018b). Healthcare decision-making is multifaceted. The medical community is not the sole decider. Healthcare regulations forbid personal bias on the part of healthcare providers. The US Congress passed the Health Insurance Portability and Accountability Act (HIPAA) to protect patients’ privacy. A critic of HIPAA might argue it is an impediment to the dissemination of medical breakthroughs, but that is not so. In other words, reaching a healthcare decision is often a joint venture involving patients (and their loved ones), caregivers like physicians, nurses, and pharmacists, and health insurance agencies. Brabers et al. (2017) examined the role played by patients whose inclinations vary when it comes to making decisions about healthcare. Their health literacy is important. People with higher health literacy are more involved in making better healthcare decisions. There is a positive association between health literacy and involvement in healthcare decision-making. See Dolezel et al. (2020) for the state of health literacy among university students. Capan et al. (2017) clarify that healthcare decision-making is a multidisciplinary effort leveraging mathematical methods and data so as to address intricate healthcare challenges that involve diagnosing and treating diseases, organ transplants, and patient flow optimization.

14 https://doi.org/10.1017/9781009212021.004 Published online by Cambridge University Press

Mustafa et al. (2017) give a synopsis of the uses of diagnostic tests in order to reach better healthcare decisions. They itemize the issues that come with administering the tests. Does healthcare decision-making occur when there is a volcanic eruption? The answer is affirmative. As an example of global healthcare data analysis, three main observations stem from volcanic eruptions that measure ash, distance of the lava flow, and the directions of the wind. Although appearing distinctive from each other, these variables do have correlations. Conscientious efforts in choosing a model for evaluating these data are necessary to see these correlations (Shanmugam and Singh, 2017). The flexing parameter describes the presumed sampling bias in data collection. The more debris a location accumulates, the more likelihood that location will be added to the data. This is called sampling bias, which is covered in Chapter 4. The adaptability of the model is enhanced by the flexing parameter. The authors named this concept flexing and bonding trivariate distribution. The data helped healthcare providers make optimal decisions for the victims of the volcano eruption that occurred in Iceland on April 14, 2010. Shanmugam (2016) analyzes healthcare decisions with respect to the life-threatening Zika virus spread via mosquito bites (especially concerning the likelihood of macrocephaly in infected unborn babies). Human fertility is an area in which the medical community and the couple who desire to have a baby make decisions jointly because fertilization occurs more on some days of the woman’s cycle. Ţăranu (2016) demonstrated the aptness of applying datamining techniques to make better decisions. As pointed out in Shanmugam (2014b), decisionmaking with respect to arrival rate, service rate, and wait time is essential. Accommodation of patients poses a great challenge amid chaos if, for example, terrorists waged a bioterrorism attack. How many of the 50 states in the United States are prepared to deal with such an emergency? According to the US General Accounting Office (USGAO), only 73% of hospitals responded to its survey about these issues (Shanmugam, 2014b). Shanmugam (2014a) appraises operational efficiency with respect to melanoma due to ozone leaks. The data envelopment technique (DET) used in this case is based on linear programming (LP) concepts. In an alternative approach using what is called stochastic frontier analysis (SFA), Shanmugam (2014c) ranked nations in terms of their efficiency with respect to melanoma due to ozone leaks. Stochastic frontier analysis is a statistical methodology to compare units based on a linear model with two error components. The error is the discrepancy between the observed unit and its estimate based on predictors. The two errors portray, respectively, noise and technical

Are Data-Guided Healthcare Decisions Superior?

inefficiency. The errors are assumed to follow a probability distribution. Young and Kaffenberger (2013) introduce counselors to a theoretical model that fosters utilization of data for decision-making, along with guidance on how to facilitate this development. Provost and Fawcett (2013) recommend hiring data scientists because data science is intricately intertwined with massive data and data-directed decisionmaking. The decision-making discipline should be improved as decision-making errors are costly. Another example of global data analysis concerns drinking water. Quality drinking water is supportive of healthy living and a deterrent to getting sick and hence hospitalization. Data-based decision-making is fundamental to assess the quality of drinking water, and this problem is not unique to one part of the world. Shanmugam and Singh (2012) utilized a new model in order to make decisions on the quality of drinking water. The World Health Organization (WHO) cautioned that poor drinking water conditions correlate with approximately 88% of diseases (Shanmugam and Singh, 2012). Access to drinking water is a factor for a significant number of citizens living in urban or rural areas, propelling the respective governments to action. The data in the WHO report depict the correlation of the urgency level with the locality’s heterogeneity and lack of access to drinking water. The patterns illustrate how an increase in heterogeneity causes a decrease in urgency levels. In discerning the urgency levels, the Y represents the number of citizens in the area who do not have access to drinking water; such observation is potential sampling bias. A modified beta distribution makes a suitable substrative model for the collected data. I evaluated a sampling issue in the structure of data collected as an infectious disease was transmitted. I provided a new model and named it a spun Poisson distribution (see Shanmugam, 2011a). I illustrated it with the prevalence of smallpox in Abakaliki, Nigeria. The incidental number of cases in a discernible environment with undetermined parameters Θ ∈ {θ, ρ} results in a Poissontype chance mechanism, where the completion of data is affected by the early removal of infected cases by the health administration. In the model, the parameters θ > 0 and ρ ≥ 0 correspondingly show the incidence rate and the impact level of removing infected cases. Accordingly, θ becomes size/length biased for both the random observation count Y and the incidence rate. Only the size/length bias on observation Y is addressed in the statistical literature. The sampling bias on the incidence rate, θ, is not covered in detail in any literature. This creates a need to acknowledge sampling bias on both the observation and the incidence rate so as to construct the environment impacted by the deletion of infected cases. The observation y is influenced by the

removal bias ρ ≥ 0 caused by the deletion. To avoid degeneration of probabilities to zero, ρ = 0 warrants a scale shift on one in the weight factor. The denominator (1 + ρθ) in the weight factor reverberates the effect of the removal bias on the incidence parameter θ. In this weighted Poisson sampling framework, when ρ = 0, it invalidates the effect of removing cases. When ρ = 1, it is designated as size/length biased sampling. Only data-guided evidence helps refine the decision-making in this case. I have also examined how healthcare decisions are made about the efficacy of a treatment with the help of data on epilepsy patients (see Shanmugam, 2011b). In this information age, healthcare is surrounded by a vast amount of data. Modern data-mining methods could help extract pertinent information, which would refine healthcare decisions (Han et al., 2011). What is data mining? It is an interdisciplinary approach to extract and distill evidence. A more appropriate name might be “knowledge mining from data.” The following steps occur in data mining: 1. Data are cleaned to eliminate discordance in sound and information. 2. Data are integrated as a preprocessing phase in a data repository. 3. Relevant data are retrieved from the database for intensive analysis. 4. Data are converted and amalgamated in configurations fit for extraction by performing summary or aggregation. Occasionally, data reduction is implemented to acquire a smaller delineation of the initial data while maintaining their integrity. 5. Data patterns are identified. 6. The patterns are evaluated based on interestingness measures. 7. The knowledge is presented via visualizations. Decision makers are receptive, and future research offers improvement strategies (Milkman et al., 2009). Shanmugam and Li (2009) demonstrate salient features in logistic regression via a case study and show how the features helped physicians make healthcare decisions for diabetic patients. Woolf et al. (2005) raise the expectation for high-quality information about clinical options. Using a receiver operating characteristic (ROC) curve, Swets et al. (2000) devised a procedure using DET to make better decisions step by step with respect to glaucoma patients. David and Brown (2012) emphasize the value of thinking critically. To augment critical thinking skills, using Excel, other computer-based operating skills, web-based data collection, and discussions in smaller groups is helpful. This process requires data collection, synthesizing evidence from those data, and illustrating the evidence and its limitations via sensitivity analysis.

15 https://doi.org/10.1017/9781009212021.004 Published online by Cambridge University Press

Data-Guided Healthcare Decision Making

Pharmaceutical companies engage in drug discovery, which involves finding the optimal combination of ingredients in drugs. Clinical trials comprise three phases. Phase 1 examines the safety of the drug on a select sample of healthy participants. Phase 2 determines the suitable dose of the new drug on participants with the disease under examination. Phase 3 employs a statistical approach to defend the potency and safety of the new drug on a bigger sample of participants with the disease. The healthcare decision-making steps in this case are the following: 1. Develop a list of goals. What are their values? 2. Classify the alternatives for each aim in terms of perfect, terrible, or reasonable. 3. Consider problems and shortcomings. 4. Predict their consequences. 5. Identify goals, constraints, guidelines, aspirations, and limitations. 6. Allow different perspectives based on the competitor or your constituency. 7. Determine strategies to attain objectives. 8. Determine generic objectives from your own interests, customers, employees, shareholders, and environmental, social, economic, safety, or health considerations. We live in a time when individuals expect quality patient education. With medical practitioners, healthcare systems, and individuals facing difficulties, informing patients about healthcare options is not clear-cut and is sometimes even hindered. Although written and digital forms of communication are being utilized more, such assistive decision-making tools are not a substitute for human agency. What can be proposed instead is to combine knowledge with high-quality decision counseling to advise patients. Medical professionals who provide decision counseling include clinicians who do not have formal informed-choice training (usual care), clinicians with formal informed-choice training, and trained third parties who serve as objective decision counselors. Any barrier that hinders patient access to high-quality information on clinical options needs to be removed before the healthcare system can efficiently reinforce informed decision-making. This requires new information technology, training strategies, and reimbursement plans. In response to the rampant global knowledge surge, clinical options have also surged, thus increasing patient demand for counseling. Confronted by the concurrence of the two growing domains of global knowledge and clinical options, the current healthcare system lacks proactivity in responding to its pressing systemic insufficiencies. For more on learning and critical thinking, see Celi et al. (2020), Clemen and Reilly (2013), Devore (2007), Dix (2020), Frize (2013),

16 https://doi.org/10.1017/9781009212021.004 Published online by Cambridge University Press

Giudici and Figini (2009), Lee and Wang (2003), Marchau et al. (2019), Munier (2011), Owens and Sox (2001), Shanmugam and Chattamvelli (2015), Stuart et al. (1994), and Weisberg (2014).

2.2 Concepts Healthcare consists of riddles. While changes in healthcare services are occurring, their performance levels stay flat. The 60–30–10 rule suggests that 60% of healthcare is on average compatible with the data. About 30% of healthcare exerts a low-value impact. About 10% of healthcare is actually harmful. To assess these numbers, a data-guided approach is necessary. Healthcare professionals, managers, and policy makers grow frustrated when they are not fully informed about the state of healthcare. New technologies increase the complexity of healthcare delivery. Does the 60–30–10 rule challenge the way the progress is made? Clearly, data-supported decision-making is a solution in this context. See Braithwaite et al. (2020) for additional details. When big data are involved, a fast number-crunching methodology becomes a necessity. Clearly, with the inclusion of EHR in healthcare, the healthcare industry is drowning in pertinent data that could strengthen the decision-making process. Extracting and utilizing clues hidden in the data requires data science. Nontrivial counterfactuals play a key role here. When an outcome occurs in healthcare, healthcare professionals wonder whether it is likely to be repeated, whether it is explainable, and where it will lead. Answers to these questions construct a path to pertinent decision-making. See Dix (2020) for additional details on the importance of evidence to refine healthcare decision-making. See Celi et al. (2020) for details on extracting and applying data. Guidance is the process of computing changes in the healthcare system only to modify healthcare decisions for the system’s improvement. Any data guidance would involve collecting pertinent data (see Chapter 4), extracting and interpreting evidence (see Chapters 3, 5, and 6), and making optimal healthcare decisions (see Chapters 8 through 12). Data guidance starts with inspection, cleansing, transforming, and modeling data. Data cleansing is the process of detecting and removing inaccurate records. The quality of the data is assessed according to validity, satisfaction of constraints, accuracy, completeness, consistency, and uniformity. Validity refers to the degree to which data measure and conform to current knowledge. Accuracy is the degree to which date conform to the true value. Completeness means the level at which all necessary information has been confirmed. Consistency refers to the degree to which a set of measures are equivalent across the entire system. Uniformity denotes the level at which a

Are Data-Guided Healthcare Decisions Superior?

set of data measures are quantified across the same units in the system. Data visualization is an interdisciplinary approach used to portray evidence. Data blending is the process of integrating data from different sources using software like Tableau or Google Data Studio. Demand forecasting involves quantitative techniques with several data. Information technology makes use of computing facilities to store, retrieve, transmit, and analyze data. Fraud, including false claims, is common in healthcare. The statistical methods that help detect fraud include error correction, tornado diagrams, and data mining. Risk analysis is all about the causes of harm and their impact on healthcare values. The ambiguities in data are defined as AmbiguityðDataÞ ¼ Max½Varð^θ mle Þ  Min½Varð^θ mle Þ, where the parameter θ of the healthcare system is estimated, perhaps using maximum likelihood estimation (MLE) (see Hessling, 2020). Why is MLE preferred? It is preferred due to its virtue. The MLE of the parameter is simply the task of the MLE. When the chosen data exhibit less ambiguity, the data guidance is likely more efficient. Otherwise (i.e., when the data ambiguity is higher), data guidance will be not much of value and hence is discouraged. In the event that the data exhibit more ambiguity or do not satisfy the requirements of the chosen model, one wonders what ought to be done. Data analysts recommend transforming the collected data. Two transformation exercises can be used to transform data. One is called a zðymeanÞ transformation and its formula is z ¼ std:deviation , where the domain for z is ð∞; ∞Þ. To use a z-transformation, the data ought to be symmetric and they can be appraised by making a histogram. By the law of large numbers (or an equivalent statistical term is the central limit theorem), data science suggests 99.97%, 95%, and 68% of the total observations would fall within three, two, and one standard deviations from the mean and the frequency pattern of the data is Gaussian (also known as normal distribution), provided that the mean, median, and mode of the data are almost equal. Otherwise (i.e., when the chosen data are asymmetric and spotted by constructing a histogram or comparing the closeness of the mean, median, and mode), analysts recommend a min-max transformation for the data using the fyminðyÞg formula U ¼ fmaxðyÞminðyÞg ,whose domain is ð0; 1Þ. The type of transformation is meant for nonsymmetric data. When the mean, median, and mode of the data are not equal to each other, the data are considered asymmetric. The decision maker ought to compute the ambiguity score and decide whether the data should be included in healthcare decision-making. In addition, a Markov chain

might exist in the data. What is a Markov chain? The Markov chain is attributed to the Russian mathematician Andrey Markov (1856–1922). It is an outcome of a type that has a certain probability of happening, depending on what it is currently. A popular example of a Markov chain is weather prediction. The probability of a sunny day tomorrow is 0.9, while the probability is 0.10 for a rainy day. The Markov chain is also seen in Bayesian data analysis, physics, chemistry, economics, signal processing, information theory, and AI. In the healthcare field, the probability of survival for a patient with a terminal illness depends on the proportion of patients with that terminal illness at the current time. In other words, the expected time to be in a healthy state is implicit in the probability of surviving in that state, and it is Eðtime in a stateÞ ¼ ð1  Prðsurviving in the stateÞÞ1 . Another measure of quantifying the information gained in chosen data is the Shannon approach, first utilized by electrical engineer Claude Shannon (1916–2001). See Shanmugam (1999, 2014, 2015, 2016a, 2016b) for additional concepts and methods to deal with measuring information in data. With notations ^θ current ; ^θ afterData ; and Á to denote the current data, after using the data and the model for the healthcare mechanism, the quantitative information in the chosen data is: inf Gained ¼ mðy1 ; y2 ; …; yn j^θ current Þ Prf^θ afterData jyi g ln Prf^θ current Þ

n X

Prf^θ afterData jyi g

i¼1

!

þ mðy1 ; y2 ; …; yn j^θ afterData Þ

n X Prf^θ afterData jyi g i¼1

! Prf^θ afterData jyi g ln ; Prf^θ current Þ

where mðy1 ; y2 ; …; yn j^θ current Þ and mðy1 ; y2 ; …; yn j^θ afterData Þ denote the likelihood, function based on the ^θ current , and ^θ afterData . Another measure of the data in favor of the null, H0 , against the opposite research, is the H1 hypothesis. The ^

1 ; y2 ;…; yn jθ 1 Þ Bayes factor (BF) is a ratio, BF ¼ Prðy . When the Prðy ; y ;…; y j^θ Þ 1

2

n

0

BF is (1, 3], (3, 20], (20, 150], or (150, ∞), the data are not worthwhile, positive, strong, or very strong in favor of the research, H1 , against the opposite, null, H0 hypothesis. The BF is an easier alternative to the classical frequentist approach. Two different goodness-of-fit tests can be exercised to check whether the collected data follow a chosen population frequency pattern. They are the chi-squared and

17 https://doi.org/10.1017/9781009212021.004 Published online by Cambridge University Press

Data-Guided Healthcare Decision Making

Kolmogorov–Smirnov (KS) tests. In the case of integer data, the chi-squared test is appropriate. If the integer data have come from a specified population frequency pattern (binomial, Poisson, inverse binomial, or something else), then v P ðOi Ei Þ2 χ 2ðv1Þdf ¼ is smaller than a critical value, Ei i¼1

χ 2ðv1Þdf;α from the chi-squared table is a specified significance level, 0 < α < 1, where v, degrees of freedom (df), Oi and Ei are, respectively, the number categories, (v  1), observed and expected counts for the ith category, provided every expected count is at least five. The expected count is computed from the sample size, n, and the MLE of the model parameter θ. In the case of a binomial or Poisson frequency pattern, the parameter is the proportion π or the rate λ. In the case of continuous data, the KS score is used to check whether the collected data have been drawn from the specified probability density function. The KS statistic quantifies a distance Dn ¼ Sup jFn ðyÞ  Fðy; θÞj between the empiry ical distribution function, Fn ðyÞ, of the sample and the cumulative distribution function, Fðy; θÞ, of the reference distribution. When the KS score exceeds a critical score, Dn;α , from the table, the reference population frequency pattern, f ðy; θÞ, is rejected with a confidence level of 0 < 1  α < 1. For an example, if the reference population frequency pattern is the Poisson probability function, f ðyjθÞ ¼ eθ θ y =y!; y ¼ 0; 1; 2; …:; ∞; θ > 0, its mean and variance are equally the rate, θ. In other words, the variance of the Poisson frequency pattern increases along with its mean, implying the heterogeneity of the reference population increases along with the mean value of its prevalence. The survival function, SðyÞ ¼ PrðY > yjθÞ, is a nonf ðyjθÞ decreasing function. The hazard function, hðyjθÞ ¼ SðyjθÞ , is indicative of an instantaneous infection rate. For an example, suppose time y spent on an activity in a hospital by patients follows an exponential frequency pattern f ðyjθÞ ¼ θeθy . The mean and variance of such an exponential frequency pattern are, respectively, 1θ and θ12 . In other words, survival and the hazard function are, respectively, exponential SðyjθÞ ¼ eθy and constant θ.

2.3 Illustration This section features two examples of data-guided healthcare decisions. The first is the occurrence of the Zika virus and how healthcare professionals ought to be prepared for such an event. The second is the time patients spend in a hospital. Time is of the essence for patients, who may be in pain and desire to complete treatment as quickly as they can. Reduction in the time patients spend in a hospital might be indicative of an efficient healthcare operation (see Table 2.2).

18 https://doi.org/10.1017/9781009212021.004 Published online by Cambridge University Press

Why was the Zika virus named as such? In 1947, the virus was first noted near the Ugandan Zika forest. In the Luganda language, zika means overgrown. The Zika virus comes from a foreign substance injected into the human body by daytime-biting mosquitoes. A low titer of two antibodies, IgM, and IgG, confirms the presence of the Zika virus. A small body of stagnant water the size of a bottle cap is enough to spread the virus in its early stages. Lately, the infection has spread throughout dense Brazilian forests, affecting thousands of individuals. Alerted by the proliferation of the Zika virus, the WHO has cautioned international travelers, especially pregnant women, to be vigilant in protecting their entire bodies from Aedes mosquitoes. The WHO strongly advises anyone who has contracted the virus to seek immediate medical attention. If a pregnant woman gets infected by the Zika virus, her baby may suffer birth defects such as microcephaly, which is a shrunken head and/or brain. Symptoms an infected person may encounter include headaches, rashes, fever, joint pain, and red eyes. Regarding the victims in Brazil during the 2015 outbreak, note the following results. The symptoms of the Zika virus are close to those of dengue or yellow fever. Within the first week of showing symptoms, the victim infected by a mosquito bite should provide a sample of blood, serum, urine, or tissue to be tested in the lab for the Zika virus. Other than not having a known antidote for the Zika virus as of this printing, the Centers for Disease Control (CDC) heavily recommends preventative measures against mosquito bites such as mosquito netting, screened or air-conditioned rooms, insect repellant, wearing pants, long sleeves, socks, and head coverings, and staying nourished. Although the Zika virus is not communicable by contact such as touching or hugging an infected person, it is communicable through sexual transmission as originally suspected. The source of the Zika virus is still a mystery, but residents in Brazil and surrounding countries seem to encourage a particular rumor. A greater population resides on the east side of the longitude. More Zika cases occur on the south side of the latitude. More Zika cases occur where there is greater land area. The longitude with population and latitude connect to different groups. The number of Zika cases, land area, and the percentage of wetland belong to one group (see Tables 2.2 and 2.3). Notice the Zika virus has occurred (see Figure 2.1) in the Southern United States, in the entire country, and in the northern part of South America. Hospital administrators and medical professionals, especially in the emergency wing, along with maternity and pediatric departments, were preparing to admit and treat Zika virus patients. The known symptoms that typically occur in less than a week include fever, red eyes, joint pain, headache, and rash.

Are Data-Guided Healthcare Decisions Superior?

Table 2.2 The country, its longitude, latitude, land area (1,000 square miles), percent water area, population, and number of Zika cases. (Source: www .who.int)

Country

Continent

Longitude (+ is westeast)

Latitude (+ is north– south)

Land area (1,000 square miles)

% Water area

Barbados

South America

59

13

166

0

Bolivia

South America

63

−17

424

1.29

Brazil

South America

47

−15

3,287

Cambodia

Asia

−104

11

181

Canada

North America

75

45

3,854

Colombia

South America

74

4

440

Dominican Republic

South America

70

19

18

Ecuador

South America

78

0

275

El Salvador

South America

89

13

8

Guatemala

South America

90

14

108

Guyana

South America

58

6

214

65

Population (million) 0.2

# Zika cases 3

11

1

204

4,000 1

2.5

15

8

35

4

8.8

42

98

70

9

8

5

15

6

7

3

1.4 40 8.4

14 0.7

1 1

Haiti

South America

72

18

10

70

10

Honduras

South America

87

14

43

0

8

5 2

Indonesia

Asia

−106

−6

735

4.85

255

9

Malaysia

Asia

−101

3

329

0.3

30

1 3

Mexico

South America

99

19

761

2.5

112

Panama

South America

79

8

29

2.9

4

3

Paraguay

South America

57

−25

157

2.3

6

6

300

Philippines

Asia

−120

14

Puerto Rico

South America

66

18

Suriname

South America

55

5

102

1

1.6

4

1

63

1.1

0.5

6

0.4

3.5

61

Thailand

Asia

−100

13

513

68

10

United States

North America

77

38

3,794

7

321

31

Venezuela

South America

66

10

353

32

27

7

Microcephaly and other brain malformations in babies are common defects derived from the mother transmitting the virus to the child during pregnancy. The pediatric wings of hospitals should prepare to deal with more microcephaly cases as the number of Zika virus cases increases. From the variables (see Table 2.4), note that the number of Zika virus cases is significantly correlated with land area, meaning healthcare decision makers in land areas near water ought to be more prepared to deal with Zika virus patients. According to the results presented in Table 2.5, the 95% confidence interval estimates for the land area for the Zika virus are between 131.2 and 901.7 square miles because the land area variable follows a normal (i.e., Gaussian) frequency pattern because of the p-value < 0.0001. The 95% confidence-based estimated population under the risk of

the Zika virus is between 20 million and 88 million. Healthcare professionals should prepare and budget accordingly. The number of Zika virus cases is likely to be between 170 and 180. Such data results are helpful. The KS statistic is a test score that indicates whether the chosen model is suitable for the (continuous) variable in the model. For example, the p-value of rejecting the normal (Gaussian) distribution for the continuous land area is low (p-value < 0.007). The probability of rejecting the normal (Gaussian) distribution for the continuous random population size is less than 0.02. Suppose the (discrete) number of Zika cases in a region at a specified time follows a Poisson distribution with the incidence rate θ. Then the collected data in Table 2.2 estimate the unknown rate to be 170 < θ < 180 with a 95% confidence interval. Because the number of Zika

19 https://doi.org/10.1017/9781009212021.004 Published online by Cambridge University Press

Data-Guided Healthcare Decision Making

Table 2.3 Ambiguity in the Zika virus data. (Source: www.who.int)

Continent Asia

Estimate of Poisson parameter ^ with outlier ^ rðθÞ Va

Estimate of Poisson parameter ^ without outlier ^ rðθÞ Va

4.36

4.36

North America

182.25

182.25

South America

55,131.69

0.42

Minimum

4.36

0.42

Maximum

55,131.69

182.25

Ambiguity in the Zika data

55,127.33 (large)

181.83 (smaller)

Figure 2.1 Locations where the Zika virus occurred

Zika virus cases Longitude(+ is west,– east) 99

–120

cases is a discrete variable, the chi-squared (not the KS score) test score is appropriate to judge the suitableness of the Poisson distribution for the variable. The collected data in Table 2.5 suggest that the probability of rejecting the Poisson distribution for the number of random Zika virus cases is less than 0.0001. According to the path diagram (see Figure 2.2), Zika virus cases occur more along the latitude, the population size increases more along the longitude, and the number of Zika virus cases is significantly occurring in the land areas around water reservoirs. The three data variables that are in close proximity are Zika virus cases, land area, and the percent of water area, according to RC1. You will come across an illustration in Chapter 3 on how to construct a path diagram using JASP. Realize the Zika virus data are ambiguous with the inclusion of Brazil and Colombia.

20 https://doi.org/10.1017/9781009212021.004 Published online by Cambridge University Press

Another example is worth mentioning that highlights the importance of data-guided decision-making. For this, consider the data in Table 2.6. The longer time a patient spends in a hospital is indicative of that hospital’s inefficient operation. Significantly correlated variables in this case are examination time, reading vital signs, referral request with waiting for the physician, and following up with placement in an examination room. Reducing the time spent on one of these significantly correlated activities cannot be accomplished without reducing the time spent on the others. See Figure 2.3 and Table 2.7 for a comparison of the time patients spend on activities in a hospital. This understanding prompts the decision maker on which activity to concentrate if a reduction of time is intended. For example, the examination by the physician takes most of the time, and it is quite understandably good

Are Data-Guided Healthcare Decisions Superior?

Table 2.4 Correlation among the variables

0.149 0.149

illi

s se

on

a Ar e

s) ile

at er

tio

w

la

0 00

Po

pu

(1 , ar ea

La

nd

(+

0.373

0.415 0.373

%

sq

so ,.

rth no is

m

ut

h)

st ) ea t,− es w is e ud tit

0.109

)

0.275

ca

−0.241 0.092

0.109 0.415

ka

0.166

0.038 −0.336 0.609

(+ de itu

0.166 0.275 0.609

0.05

# Zika cases

0.092 −0.336

Zi

0.011

Population (million)

La

0.05

#

% water Area

ng

0.104

0.021 0.104

Land area (1,000 sq miles)

Lo

0.011 −0.241 0.038

(m

Latitude (+ is north, . sout h)

0.021

n

Longitude (+ is west,− east)

Table 2.5 Fit of models in Zika virus data. (*Source: WHO.org)

Model

Lower estimate

Upper estimate

Test score

p Value

Land area (1,000 square miles)

131.21

891.72

KS = 0.33

0.007

20.26

88.10

KS = 0.31

0.02

170.15

180.75

Chi-squared = 250.00

0; β > 0, where ΓðaÞ ¼ e u du? 0

39. Show that the mean μ ¼ EðYjθÞ and variance VarðYjθÞ ¼ EðY 2 jθÞ  ½EðYjθÞ2 of the beta density α function are, respectively, μ ¼ αþβ and σ 2 ¼ μð1  μÞ= ðα þ β þ 1Þ. 40. Is f w ðyjθ0 ; α; βÞ > 0 (non-negativity) and the area (probability) under f w ðyjθ0 ; α; βÞ equal to one, where f w ðyjθ0 ; α; βÞ ¼ α, if y ¼ 0, f w ðyjθ0 ; α; βÞ ¼ ð

θ0

Þ1

θ0 Þy 1θ0 , if 0 < y < 1, and ð1  α  βÞð1θ 0 w f ðyjθ0 ; α; βÞ ¼ β, if y ¼ 1? Shanmugam and Singh (2012) called f w ðyjθ0 ; α; βÞ an urgency-biased probability density function and it is a length-biased version of the beta density. Ð1 41. Show the mean, μw ¼ yf w ðyjθ0 ; α; βÞdy ¼ ð1  αÞθ0 þ βð1  θ0 Þ, if y ¼ 0. 0

42. Show that the function f ðyjθÞ ¼ eθ θ y =y!; y ¼ 0; 1; 2; …; ; θ > 0 is non-negative and sums to one over the sample space. 43. Show that the mean EðyjθÞ and variance VarðyjθÞ are equal to the same value θ. 44. Discuss a healthcare scenario in which sampling bias exists for a Poisson outcome. The size-biased Poisson model is of the form f w ðyjθÞ ¼ yf ðyjθÞ=EðyjθÞ. Prove that the model is intact in spite of size bias. 45. The time between two storms follows probability distribution f ðyjθÞ ¼ θeθy ; y > 0; θ > 0. Show that the mean μ ¼ EðyjθÞ and variance σ 2 ¼ VarðyjθÞ are equal to μ ¼ 1=θ and σ 2 ¼ μ12 .

Are Data-Guided Healthcare Decisions Superior?

46. Derive a sampling-biased exponential model using the definition f w ðyjθÞ ¼ yf ðyjθÞ=EðyjθÞ. Obtain the mean and variance under biased sampling.

Gray, T. F., Nolan, M. T., Clayman, M. L., & Wenzel, J. A. (2019). The decision partner in healthcare decision-making: a concept analysis. International Journal of Nursing Studies, 92, 79–89.

Selected References

Han, J., Kamber, M., & Pei, J. (2011). Data Mining Concepts and Techniques. 3rd edition. Morgan Kaufmann Series in Data Management Systems. New York: Elsevier.

Albright, S. C. W. C., Winston, W., & Zappe, C. (2010). Data Analysis and Decision Making. Toronto: Nelson Education. Brabers, A. E., Rademakers, J. J., Groenewegen, P. P., Van Dijk, L., & De Jong, J. D. (2017). What role does health literacy play in patients’ involvement in medical decision-making? PLoS One, 12 (3), e0173316. Braithwaite, J., Glasziou, P., & Westbrook, J. (2020). The three numbers you need to know about healthcare: the 60–30–10 challenge. BMC Medicine, 18, 1–8. Capan, M., Khojandi, A., Denton, B. T. et al. (2017). From data to improved decisions: operations research in healthcare delivery. Medical Decision Making, 37(8), 849–859. Celi, L. A., Majumder, M. S., Ordóñez, P. et al. (2020). Leveraging Data Science for Global Health (p. 475). Berlin: Springer Nature. Chattamvelli, R., & Shanmugam, R. (2019). Generating Functions in Engineering and the Applied Sciences. Synthesis Lectures on Engineering. Williston, VT: Morgan & Claypool. Chattamvelli, R., & Shanmugam, R. (2020). Discrete Distributions in Engineering and the Applied Sciences. Williston, VT: Morgan & Claypool. Chattamvelli, R., & Shanmugam, R. (2021). Continuous Distributions in Engineering and the Applied Sciences. Williston, VT: Morgan & Claypool. Clemen, R. T., & Reilly, T. (2013). Making Hard Decisions with Decision Tools. Boston: Cengage Learning. David, I., & Brown, J. A. (2012). Beyond statistical methods: teaching critical thinking to first-year university students. International Journal of Mathematical Education in Science and Technology, 43(8), 1057–1065. Devore, J. (2007). Making Sense of Data: A Practical Guide to Exploratory Data Analysis and Data Mining. Hoboken, NJ: Wiley.

Hessling, J. P. (2020). Introductory chapter: ramifications of incomplete knowledge. Statistical Methodologies, Chapter 1. London: Intech Open. Jaccard, P. (1912). The distribution of the flora in the alpine zone. New Phytologist, 11(2), 37–50. Lee, E. T., & Wang, J. (2003). Statistical Methods for Survival Data Analysis (Vol. 476). Hoboken, NJ: Wiley. Li, M., & Chapman, G. B. (2020). Medical decision making. In Wiley Encyclopedia of Health Psychology (pp. 347–353). Hoboken, NJ: Wiley. Maddox, T. M., Rumsfeld, J. S., & Payne, P. R. (2019). Questions for artificial intelligence in health care. JAMA, 321(1), 31–32. Marchau, V. A. W. J., Walker, W. E., Bloemen, P. J. T. M., & Popper, S. W. (2019). Decision Making under Deep Uncertainty: From Theory to Practice. Berlin: Springer. Milkman, K. L., Chugh, D., & Bazerman, M. H. (2009). How can decision making be improved? Perspectives on Psychological Science, 4(4), 379–383. Munier, N. (2011). A Strategy for Using Multicriteria Analysis in Decision-Making: A Guide for Simple and Complex Environmental Projects. Dordrecht: Springer. Mustafa, R. A., Wiercioch, W., Cheung, A. et al. (2017). Decision making about healthcare-related tests and diagnostic test strategies. Paper 2: a review of methodological and practical challenges. Journal of Clinical Epidemiology, 92, 18–28. Owens, D. K., & Sox, H. C. (2001). Medical decision-making: probabilistic medical reasoning. In Medical Informatics (pp. 76– 131). New York: Springer. Ozcan, Y. A. (2005). Quantitative methods in health care management: techniques and applications. Hoboken, NJ: Wiley.

Dix, A. (2020). Statistics for HCI: making sense of quantitative data. Synthesis Lectures on Human-Centered Informatics, 13(2), 1–181.

Pollock, M., Fernandes, R. M., Newton, A. S., Scott, S. D., & Hartling, L. (2019). A decision tool to help researchers make decisions about including systematic reviews in overviews of reviews of healthcare interventions. Systematic Reviews, 8(1), 1–8.

Dolezel, D., Shanmugam, R., & Morrison, E. (2020). Are college students health literate? Journal of American College Health. 68 (3), 242–249. https://doi.org/10.1080/07448481.2018.1539001.

Provost, F., & Fawcett, T. (2013). Data science and its relationship to big data and data-driven decision making. Big Data, 1(1), 51–59.

Frize, M. (2013). Health care engineering, part I: clinical engineering and technology management. Synthesis Lectures on Biomedical Engineering, 8(2), 1–97.

Roshanzadeh, M., Vanaki, Z., & Sadooghiasl, A. (2020). Sensitivity in ethical decision-making: the experiences of nurse managers. Nursing Ethics, 27(5), 1174–1186.

Galetsi, P., Katsaliaki, K., & Kumar, S. (2020). Big data analytics in the health sector: theoretical framework, techniques, and prospects. International Journal of Information Management, 50, 206–216.

Shanmugam, R. (1999). Kullback–Leibler information and interval estimation. Communications in Statistics, 28(9), 2057– 2063.

Giudici, P., & Figini, S. (2009). Applied Data Mining for Business and Industry (pp. 147–162). Chichester: Wiley.

Shanmugam, R. (2011a). Spinned Poisson distribution with health management application. Healthcare Management Science, 14(4), 299–306.

53 https://doi.org/10.1017/9781009212021.004 Published online by Cambridge University Press

Data-Guided Healthcare Decision Making

Shanmugam, R. (2011b). What else do epileptic data reveal? American Medical Journal, 2(1), 13–28.

International Journal of Applied Mathematics & Statistics, 57(6), 21–35.

Shanmugam, R. (2012). Booming medical tourism and economic indices. International Journal of Research in Nursing, 3(2), 38–47.

Shanmugam, R., & Chattamvelli, R. (2015) Statistics for Scientists and Engineers. Hoboken, NJ: Wiley.

Shanmugam, R. (2014a). Data envelopment analysis for operational efficiency. In Encyclopedia of Business Analytics and Optimization (Vol. 2), edited by J. Wang (pp. 18–28). New York: IGI Global.

Shanmugam, R., & Li, J. (2009). Diabetic risks assessed via logistic regression. International Journal of Ecological Economics and Statistics, 14, 70–86.

Shanmugam, R. (2014b). Data guided public healthcare decision making. In Encyclopedia of Business Analytics and Optimization (Vol. 2), edited by J. Wang (pp. 30–43). New York: IGI Global. Shanmugam, R. (2014c). Probing non-adherence to prescribed medicines? A bivariate distribution with information nucleus clarifies. American Medical Journal, 5, 54–60.

Shanmugam, R., & Singh, J. (2012). Urgency biased beta distribution with application in drinking water data analysis. International Journal of Statistics and Economics, 9(A12), 56–82. Shanmugam, R., & Singh, K. P. (2017). Flexing and bonding with a trevorite probability density to explain health consequences of hazardous volcanic eruptions. Biostatistics & Biometrics International Journal, 6(2), 1–6.

Shanmugam, R. (2014d). Stochastic frontier analysis and cancer survivability. In Encyclopedia of Business Analytics and Optimization (Vol. 5), edited by J. Wang (pp. 18–26). New York: IGI Global.

Stuart, A., Arnold, S., Ord, J. K., O’Hagan, A., & Forster, J. (1994). Kendall’s Advanced Theory of Statistics. Hoboken, NJ: Wiley.

Shanmugam, R. (2015). Entropy nucleus and use in waste disposal policies. International Journal on Information Theory, 4(2), 1–12.

Ţăranu, I. (2016). Data mining in healthcare: decision making and precision. Database Systems Journal, 6(4), 33–40.

Shanmugam, R. (2016a). Data guided unraveling of mysteries in Zika virus incidences. Kenkyu Journal of Epidemiology & Community Medicine, SI 20161(100101), 1–12. Shanmugam, R. (2016b). Entropy in Nucleus to tab data information and its illustration with Wolfram syndrome cases. International Journal of Ecological Economics and Statistics, 37(3), 44–63. Shanmugam, R. (2018a). Bridging allure with importunity of medical tourism. Biostatistics & Biometrics Open Access Journal, 7(5), 555724. Shanmugam, R. (2018b). Jumping at zero mass point: convex Poisson distribution and its fecund ability application.

54 https://doi.org/10.1017/9781009212021.004 Published online by Cambridge University Press

Swets, J. A., Dawes, R. M., & Monahan, J. (2000). Better decisions through science. Scientific American, 283(4), 82–87.

Weisberg, H. I. (2014). Willful Ignorance. Hoboken, NJ: Wiley. Woolf, S. H., Chan, E. C., Harris, R. et al. (2005). Promoting informed choice: transforming health care to dispense knowledge for decision making. Annals of Internal Medicine, 143, 293–300. Young, A., & Kaffenberger, C. (2013). Making DATA work: a process for conducting action research. Journal of School Counseling, 11(2), n2. Zhang, Y., & Koru, G. (2020). Understanding and detecting defects in healthcare administration data: toward higher data quality to better support healthcare operations and decisions. Journal of the American Medical Informatics Association, 27(3), 386–395.

Chapter

3

Software Excel, Microsoft Mathematics, and JASP

After studying the chapter, readers will be able to: • • • •

Appreciate the advantages of Excel software. Use Excel commands to calculate. Apply Excel to construct charts and graphs. Utilize Microsoft Math Solver to make third-dimensional graphs. • Use JASP to analyze and interpret healthcare data.

3.1 Motivation This chapter examines three software programs that assist users to compute the necessary quantities, make summaries, construct graphical displays – including charts – from data, and create two- and three-dimensional pictures from expressions or numbers. Microsoft Excel is a module of the Microsoft Office suite of software programs (www.microsoft.com/en-us/ download). Excel is composed of several subroutines, Excel commands, through which users can perform calculations. JASP, now in the public domain (https://jaspstats.org), was created and is supported by the University of Amsterdam in the Netherlands. Input data for analysis by JASP need to be entered in an Excel spreadsheet, saved as a comma-delimited (.CSV) file, and fed to JASP software. Microsoft Math Solver, developed and maintained by the Microsoft Corporation, is free to download. It allows users to solve math and science problems. It was developed and maintained by the Microsoft Corporation. First, let us consider Excel. Microsoft Excel is a spreadsheet-based software featuring modules that calculate the value of a mathematical expression. This feature allows users to construct two- or three-dimensional graphs and pivot tables, which allow researchers to save time when carrying out data analysis. Excel uses a grid of cells arranged in numbered rows and lettered columns to perform data manipulations and arithmetic operations. It provides a battery of functions to answer statistical, engineering, and financial needs. In addition, Excel can make line graphs, histograms, charts, and limited three-dimensional graphical displays.

Excel has interactive features that can be hidden from the user. Users can create spreadsheets via a customdesigned user interface. Excel has at least 484 functions and additional features, including an analysis tool pack that performs data analysis, are available as add-ins. Excel offers Visual Basic for Applications (VBA) for advanced analysis) and Solver Add-In for optimization and equation solving (see Walkenbach, 2010). Excel also provides Monte Carlo probabilistic modeling and risk analysis. Excel is useful to store data, to perform data analysis, to make graphs and charts, and to visualize data interpretations. Refer to Quintela del Río and Francisco-Fernández (2017) for a collection of free, open-source Excel templates. The intuitive templates are easy to use to obtain descriptive statistics at specified confidence intervals, to carry out hypothesis testing of the chosen parameter values, and to make interactive Gaussian density charts, among other things. See Ayer (2016) for different types of errors displayed by Excel and remedial actions. The formulas and functions in Excel expect input values. For example, the SUM function adds numbers, but the user must specify the starting and ending cells in the spreadsheet. Similarly, the LOOKUP function helps users find a specific value. An error message alerts users when the specification has an error. Excel requires users to understand what causes errors and how to fix them. Halpern, Frye, and Marzzacco (2018) explain how to efficiently utilize Excel’s Scientific Data Analysis Toolkit (SDAT). The SDAT carries out analytical calculations. A particular feature of SDAT enables the user to build and interpret rigorous regression models. This tool can handle up to seven parameters, provided the standard error values of regression and the covariance matrix of the parameter estimates fit. It can fit a weighted regression. The SDAT can calculate descriptive statistics, integrate, differentiate, smooth the spline, plot, and fit regressions. The regression analysis tool in Excel can fit unweighted and weighted nonlinear regressions, linearized forms of an exponential decay function, and obtain the covariance matrix of propagation errors. Broman and Woo (2018) provide many examples of using the SDAT that involve chemistry,

55 https://doi.org/10.1017/9781009212021.005 Published online by Cambridge University Press

Data-Guided Healthcare Decision Making

Figure 3.1 Opening page of Excel

enzyme kinetics, vapor pressure, and quantum chemistry data analysis. Providing all details of Excel is beyond the scope of this chapter. Interested readers ought to consult specialized books or monographs. Basic functional knowledge is, however, outlined and explained in this chapter. This chapter focuses on using Excel to analyze results for the sake of making better healthcare decisions. Excel spreadsheets can be saved in different formats. See Figure 3.1 for the opening page of a .CSV Excel spreadsheet. One reason for selecting the .CSV format is to prepare the data in Excel so that it can be input into JASP. JASP can then be used to perform data analysis and model fitting. Right-clicking on the Excel sheet allows users to exercise several options – insert, delete, rename, and so forth. It is customary to designate the rows for sampled units and the columns for data variables. Excel is based on windows. Menus such as Home, Insert, Draw, and so forth are self-explanatory. The Home window offers users multiple options – cut, copy, paste, delete, and so forth (see Figure 3.2). Figure 3.3 demonstrates how data are entered in Excel. The data in Figure 3.4 are the number of defects reported in

56 https://doi.org/10.1017/9781009212021.005 Published online by Cambridge University Press

a healthcare organization. Once the data are entered into an Excel spreadsheet, users can open the Insert menu and access any one chart via the All Charts tab. Computer simulations are an alternative to traditional analytic methods. The simulations help us visualize the results of data analysis, sampling distributions, confidence intervals, hypothesis testing, and the central limit theorem, among others. See Chandrakantha (2014) for additional details. A comprehensive list of Excel formulas and functions is available in Bluttman (2013). Arsham (2011) provides a display of Excel-based data analysis options. Refer to Frye (2010) for a step-by-step guide to using Excel. Baier and Neuwirth (2007) explain how the R simplifies statistical computing. A great deal of literature addresses Excel’s flexibility across a wide range of applications. See Berk and Carey (2009) on Excel’s usefulness in healthcare data analysis. Schou (1999) provides examples of quantitative data analysis for simulation, applying linear programming methods for business forecasting. In a nutshell, Excel is versatile. See Fylstra, Lasdon, Watson, and Waren (1998) for how to use Excel to resolve optimization issues. Statistical analyses are done via the Data menu. The Data Analysis menu needs to be added first because it is a default

Software

Figure 3.2 The options in the Home menu

feature in Excel (see Figure 3.5). The Data Analysis submenu offers choices like Anova, Correlation, Covariance, Exponential Smoothing, Fourier Analysis, Histogram, Moving Averages, Random Number Generation, Rank & Percentiles, Regression, Sampling, T-Tests, and Z-Tests. The results appear in a separate spreadsheet. Excel is versatile enough to extract supportive data that can be used to refine healthcare decisions. Harmon (2011), and Jeschke, Reinke, Unverhau, and Pfeifer (2011) are also excellent sources on Excel. Heiberger and Neuwirth (2010) explain how to integrate R language with Excel to perform data analysis and create graphics. Quirk (2020) presents additional examples of the advantages of Excel. The free statistical software program Jeffrey’s Amazing Statistics Program, better known as JASP, offers an opensource alternative to high-priced software programs such as SAS and SPSS (https://jasp-stats.org). JASP is available for Microsoft Windows, MacOS, and Linux. JASP

implements various analytic methods, particularly Bayesian analysis and data science. These include t-tests, analysis of variance (ANOVA), regression model fitting, factor analysis, machine learning, meta-analysis, network analysis, and structural equation modeling (SEM). See Figure 3.6 for menus and data entry in JASP. Note that JASP is periodically updated. The data inputted in JASP should be provided only in .CSV format (see Figure 3.7). JASP provides graphics associated with basic statistical procedures such as t-tests, ANOVA, linear regression models, cluster analytics, discriminant methods, principal component analysis, SEM, and analyses of contingency tables. JASP also offers recent developments in Bayesian hypothesis testing and parameter estimation. The analyses in JASP are implemented in R and a series of R packages. JASP allows users to perform frequentist and Bayesian analysis, hypothesis building, descriptive statistics, relationship analysis with categorical data, t-tests, analysis of variance and covariance, factorial analysis of variance,

57 https://doi.org/10.1017/9781009212021.005 Published online by Cambridge University Press

Data-Guided Healthcare Decision Making

Figure 3.3 The structure and menus in Excel

correlations, linear and logistic regressions, reliability, and factor analysis, among others. See Figure 3.8 for a view of data in JASP. A specification of whether the variable is nominal, ordinal, interval, or ratio is possible in JASP (see Figure 3.9). Figures 3.10– 3.15 illustrate how to exercise descriptive in JASP so as to obtain a frequency table, scatter graph, box plots, and partial and marginal correlations, among others. Figure 3.16 provides a visual of exercising the principal component, exploratory, or confirmatory factor analysis options in JASP. Figures 3.17 and 3.18 explain how to check the underlying probability distribution of the chosen data variable for analysis in JASP. See Figure 3.19 for other programs available in JASP.

58 https://doi.org/10.1017/9781009212021.005 Published online by Cambridge University Press

Figures 3.20–3.31 depict using the classical versus the Bayesian option in a t-test, ANOVA, mixed models that extract data from longitudinal observations, regression fitting, nonparametric frequency analysis, meta-analysis, reliability analytic results, SEM, visual models, and count analysis with the help of JASP. Statistical analysis using R language (the latest version of which can be downloaded at http://cran.r-project.org) is the basis for calculations in JASP. JASP features a graphical user interface (GUI) so users can select the appropriate module or conveniently modify any option. In order to support users in academia, JASP provides functions through which to visualize analytic results in terms of tables, charts, and graphs according to the format preferred by the American Psychological Association (APA).

Software

Figure 3.4 The options in the Insert menu

Figure 3.5 The options in the Data menu

59 https://doi.org/10.1017/9781009212021.005 Published online by Cambridge University Press

Figure 3.6 The data entry view and menus in JASP

Figure 3.7 Opening up Excel data in .CSV format for JASP https://doi.org/10.1017/9781009212021.005 Published online by Cambridge University Press

Figure 3.8 The view of data in JASP

Figure 3.9 Entering and identifying the data variable type in Excel (.CSV file) https://doi.org/10.1017/9781009212021.005 Published online by Cambridge University Press

Figure 3.10 Exercising descriptive in JASP

Figure 3.11 Exercising descriptive in JASP https://doi.org/10.1017/9781009212021.005 Published online by Cambridge University Press

Software

Figure 3.12 Exercising descriptive in JASP

Figure 3.13 Exercising correlation in JASP

63 https://doi.org/10.1017/9781009212021.005 Published online by Cambridge University Press

Data-Guided Healthcare Decision Making

Figure 3.14 Exercising correlation in JASP

Figure 3.15 Exercising principal component analysis in JASP

64 https://doi.org/10.1017/9781009212021.005 Published online by Cambridge University Press

Software

Figure 3.16 Exercising principal component analysis in JASP

Figure 3.17 Distribution to check the model for the data in JASP https://doi.org/10.1017/9781009212021.005 Published online by Cambridge University Press

65

Data-Guided Healthcare Decision Making

Figure 3.18 Checking data distribution in JASP

Figure 3.19 Checking data distribution in JASP

66 https://doi.org/10.1017/9781009212021.005 Published online by Cambridge University Press

Software

Figure 3.20 Other programs in JASP

Though JASP implements various analytic methods, the software offers a provision to try specialized Bayesian methods. JASP allows users to perform both frequentist and Bayesian statistical analysis in all procedures, including t-tests, ANOVA, analysis of covariance (ANCOVA), multivariate analysis of variance (MANOVA), repeatedmeasures ANOVA, and logistic regression. JASP features an editor for visual inspection and preprocessing that facilitates data import and export. JASP permits both simple and complex (Bayesian) analyses in a user-friendly manner. What is Bayesian analysis? The Bayesian approach starts with existing knowledge (which can be imperfect or even unacceptable). It then collects and integrates recent randomly sampled data

with the new information in order to update the information into a posterior state. Bayesian concepts and methods are described in Chapter 6. JASP reports how both the posterior distribution of the parameter of interest (with a Bayesian credible interval) and the Bayes factor play a vital role in Bayesian statistical testing. The analytic results are displayed as tables or figures in APA format. JASP can also generate additional plots for Bayesian analysis, such as prior and posterior distribution plots. The Bayes factor and its robustness can be visualized in the appropriate plot. All functionalities in JASP are implemented with a GUI and users can perform analyses via simple drag-and-drop functions without any further coding.

67 https://doi.org/10.1017/9781009212021.005 Published online by Cambridge University Press

Data-Guided Healthcare Decision Making

Figure 3.21 The classical versus the Bayesian option in a t-test in JASP

Figure 3.22 The classical versus the Bayesian option in ANOVA in JASP

68 https://doi.org/10.1017/9781009212021.005 Published online by Cambridge University Press

Software

Figure 3.23 The classical versus the Bayesian option in mixed models in JASP

JASP supports several advanced statistical methods in SEM, psychometrics, and statistical testing with frequency data. The SEM provides a user-generated lavaan syntax, which is a description of how the model is estimated. Users should be aware that a recent version (3.0.0 or higher) makes use of a lavaan syntax error message to correct any error. So-called mediation analysis can be conducted in JASP once the predictors, mediators, outcome variables, and confounders are specified with a feature in GUI. Users can perform diverse frequency testing methods such as binomial and multinomial regression, log-linear regression, and contingency table analysis. These methods are available for both frequentist and Bayesian analysis.

JASP provides several options for network analysis that enable users to examine the relations between discrete entities in a data set. JASP generates reports including visualized networks, centrality, clustering, and weighted matrix tables. The machine learning functionalities in JASP helps build regression, classification, and clustering analyses. These methods require different algorithms such as boosting, K-nearest neighbors, random forest, and regularization (with both L1 and L2 regularization for Lasso, Ridge, and Elastic-net principles). The machine-learning process is performed with a cross validation by default. To assist data preprocessing, JASP is equipped with a data editor that allows users to visually inspect and preprocess their data.

69 https://doi.org/10.1017/9781009212021.005 Published online by Cambridge University Press

Data-Guided Healthcare Decision Making

Figure 3.24 The classical versus the Bayesian option in regression in JASP

Van Doorn et al. (2020) examine how Bayesian procedures are applied and data analytic results are interpreted. The four stages of Bayesian reasoning are planning the framework, executing the analysis, interpreting the results, and reporting the projectiles. Additional resources on how to best utilize JASP include Love et al. (2019) and Wagenmakers et al. (2018). An attractive alternative to the p-value in hypothesis testing is presented in Halter (2018) and Han and Dawson (2020). Microsoft Math (MM) Solver 4.0 (available for 32-bit and 64-bit) can be downloaded for free from the official Microsoft website (https://math.microsoft.com). It constructs thirddimensional graphs. The Equation Solver option is one of its extraordinary features (see Figures 3.32–3.35). In the following pages, the abbreviation Prð:Þ refers to a probability value in a closed scale ½0; 1. Probability is studied in detail in Chapter 5. We can visualize interdependence in the nconditional probability statement o PrðAjBÞ ¼

PrðA∩BÞ PrðBÞ

¼

PrðAÞ PrðBÞ

PrðAjBÞ with the help of MM n o 4.0, after designating the ratio PrðAÞ PrðBÞ in the x-axis whose

range is more than ð0; ∞Þ, the conditional probability PrðAjBÞ with the range ð0; 1Þ in the z-axis, and the other conditional probability PrðAjBÞ with a range of ð0; 1Þ in the y-axis as displayed in Figures 3.36 and 3.37.

70 https://doi.org/10.1017/9781009212021.005 Published online by Cambridge University Press

3.2 Concepts To perform a statistical analysis, the user needs to install the module titled Data Analysis, which is not a default item in Excel. To add it in the menu, open File>options>AddIns>Excel Add-ins (in Manage menu)>Go>Select all four items in the window. Tables 3.1 and 3.2 present shortcut keys and selected commands for use in Excel.

3.3 Summary This chapter has introduced several commands and their uses across three software programs: Excel, JASP, and Microsoft Mathematics. The three platforms complement each other to analyze data, extract data, interpret results, and use results as supportive evidence when making healthcare decisions. Without these programs and the data support they provide, healthcare decisions might be tedious, less precise, or inaccurate.

3.4 Exercises 1. What is the difference between “save” and “save as” in Excel? 2. What is the difference between “copy” and “cut” in Excel?

Software

Figure 3.25 The classical versus the Bayesian option in frequencies in JASP

3. What is the statistical procedure under “Data Analysis Tool Pak” in Excel?

A 10. What does the command in the menu with icon “ ↓ ” Z mean in Excel?

4. What does the specification “B2:E10” mean in Excel?

11. In which menu can font type and size be selected in Excel?

5. What does the symbol “*” mean in Excel?

12. In which menu can a stem-leaf plot be selected in Excel?

6. What does the symbol “^” mean in Excel?

13. What does the notation “$A$5” mean in Excel?

7. What does the command “=SUM (B2:B20)” mean in Excel?

14. What does the notation “A$5” mean in Excel?

8. What is the difference between the commands “SUM (C2:C11)” and “=SUM (C2:C11)” in Excel? 9. When a saved data file has .CSV in its name, what does it mean? Could the data file be appropriate for using JASP software in Excel?

15. What does the notation “$A5” mean in Excel? 16. Give healthcare examples of nominal, ordinal, interval, and ratio variables. 17. What is the difference between a histogram and a stem-leaf plot?

71 https://doi.org/10.1017/9781009212021.005 Published online by Cambridge University Press

Data-Guided Healthcare Decision Making

Figure 3.26 The classical versus the Bayesian option in meta-analysis in JASP

18. Why is it important to look at skewness before applying a t-test? 19. Compare the locations of mean and median with respect to each other if the data exhibit negative skewness. n 1X 1 20. Is an average? What is its name? n i¼1 xi 21. What does the command “ ¼ ModeðD5 : D50Þ” mean in Excel? 22. Does the command “=STDEV (E2:E10)” in Excel compute the sample or population standard deviation? 23. What does the command “=AVEDEV (B2:B10)” mean in Excel? 24. What does the command “¼ KURTðC5 : C100Þ” mean in Excel? Is it important to compute and appraise before doing a z-test? Why? 25. What does the horizontal line between the upper and lower border of the box in the middle of the BoxWhisker plot refer to? In which menu is the icon for constructing a box plot in Excel? 26. Would a box plot give a hint about lack of symmetry in data? How?

72 https://doi.org/10.1017/9781009212021.005 Published online by Cambridge University Press

27. What is wrong with the following statement? “The probability of 10 or more accidents on Highway I-35 between San Marcos and San Antonio is 1.6.” Why? 28. What is the difference between the probability density function and the probability mass function? Give a healthcare example for each and justify them. 29. Is “f ðy; θÞ ¼ eθ θ y =y!” a probability mass function? Why? What is its name? Use Excel to compute the functional value with y ¼ 4 and θ ¼ 3. 30. Is “f ðy; θÞ ¼ θeθy ; 0 < y < ∞” a probability density function? Using Excel, obtain the value of the difference PrðY > 5Þ  PrðY > 15Þ, if θ ¼ 3. 31. Obtain the value of the command “¼ NORMDIST ð40;50;4;TRUEÞ” after entering it in Excel. From this command, could you recognize the value of the population mean and standard deviation? Does this command refer to PrðY ≤ 40Þ or PrðY > 40Þ? 32. Why is the sum of the commands “¼ NORMDIST ð40;50;4;TRUEÞ” and “¼ NORMDISTð40;50;4; FALSEÞ” in Excel equal to one?

Software

Figure 3.27 The classical versus the Bayesian option in reliability in JASP

33. Show that the value of the command “¼ NORMINV ð0:90;50;4Þ” in Excel is 55.126. What does the number 0.90 refer to? 34. What is the interpretation of the Q-Q plot in which the copayments patients have paid in an emergency wing of a hospital are located around the positive diagonal line in the plot? 35. What is the probability statement Prðreject null hypothesis H0 jH0 is trueÞ ¼ α called? What is the probability statement Prðnot accepting research hypothesis H1 jH1 is trueÞ ¼ β called? Should α þ β be equal to one? Give reasons. 36. What does the command “¼ CorrelðA2 : B10Þ” compute when entered in Excel?

37. Write down the commands in Excel to compute the initial value β0 and the regression coefficient β1 in the regression equation, ^y ¼ β0 þ β1 x, when the pertinent data are entered in spreadsheet B2 through B20 for x and A2 through for y. 38. To what does the Excel command “FREQUENCY” return the frequency distribution of data-array as a vertical array, on the basis of bins-array? 39. What does the Excel command “STANDARDIZE (x, mean, standard deviation)” offer? 40. Use Excel to find the value of “=ROUND (12.3456,2),” “=ROUND (12.3456,1),” “=ROUND (12.3456,2),” and “=ROUND (12.3456,3).” Why are they different?

73 https://doi.org/10.1017/9781009212021.005 Published online by Cambridge University Press

Data-Guided Healthcare Decision Making

Figure 3.28 The classical versus the Bayesian option in structural equation modeling (SEM) in JASP

41. Use Excel to find the value of “=ROUNDUP (12.3456,2),” “=ROUNDUP (12.3456,1),” “=ROUNDUP (12.3456,2),” and “=ROUNDUP (12.3456,3).” Why are they different?

Selected References

42. Use Excel to find the value of “=ROUNDDOWN (12.3456,2),” “=ROUNDDOWN (12.3456,1),” “=ROUNDDOWN (12.3456,2),” and “=ROUNDDOWN (12.3456,3).” Why are they different?

Ayer, G. (2016) Excel. Energy Engineering, 113(2), 63–74.

43. Use Excel to find the value of “=TRUNC (12.3456,2),” “=TRUNC (12.3456,1),” “=TRUNC (12.3456,2),” “=TRUNC (12.3456,3).” Why are they different?

Arsham, H. (2011). Excel for statistical data analysis. http://home .ubalt.edu/ntsbarsh/Business-stat/excel/excel.htm. Baier, T., & Neuwirth, E. (2007). Excel COM. Computational Statistics, 22(1), 91–108. Bartlett, J. (2017). An introduction to JASP: a free and userfriendly statistics package. https://osf.io/p2hzg. Berk, K., & Carey, P. (2009). Data Analysis with Microsoft Excel: Updated for Office 2007. Scarborough, Canada: Nelson Education.

44. Use Excel to find the value of “=MOD (12.3456,2),” “=MOD (12.3456,1),” “=MOD (12.3456,2),” “=MOD (12.3456,3).” Why are they different?

Bluttman, K. (2013). Excel Formulas and Functions for Dummies. Hoboken, NJ: Wiley.

45. Use the Excel data as presented in Table 3.3 to obtain the mean and variance in the continents of Africa, America, Asia, Europe, and Oceania for the years 2000 through 2005 and plot them as a curve.

Chandrakantha, L. (2014). Excel simulation as a tool in teaching sampling distributions in introductory statistics. In Sustainability in Statistics Education: Proceedings of the Ninth International Conference on Teaching Statistics (ICOTS), edited by K. Makar, B. De Sousa, & R. Gould (pp. 1–3). New York: International Statistical Institute.

46. Use JASP to obtain the mean and variance of the variables in Table 3.4. 47. Use JASP to obtain the mean and variance of the variables in Table 3.5.

74 https://doi.org/10.1017/9781009212021.005 Published online by Cambridge University Press

Broman, K. W., & Woo, K. H. (2018). Data organization in spreadsheets. American Statistician, 72(1), 2–10.

Frye, C. (2010). Microsoft Excel 2010 Step by Step: MS Excel 2010 SbS _ p1. Hoboken, NJ: Pearson Education. Fylstra, D., Lasdon, L., Watson, J., & Waren, A. (1998). Design and use of the Microsoft Excel Solver. Interfaces, 28(5), 29–55.

Software

Figure 3.29 Summary of t-tests, regression, and frequencies in JASP

Halpern, A. M., Frye, S. L., & Marzzacco, C. J. (2018). Scientific Data Analysis Toolkit: A Versatile Add-In to Microsoft Excel for Windows. Hoboken, NJ: Pearson Education.

Jeschke, E., Reinke, H., Unverhau, S., & Pfeifer, E. (2011). Microsoft Excel 2010 Formulas and Functions Inside Out. Hoboken, NJ: Pearson Education.

Halter, C. P. (2018). Exploring Statistical Analysis Using JASP. Independently published.

Kelter, R. (2020). Bayesian alternatives to null hypothesis significance testing in biomedical research: a non-technical introduction to Bayesian inference with JASP. BMC Medical Research Methodology, 20, 1–12.

Han, H., & Dawson, K. J. (2020). JASP (Software). Mark Harmon. Harmon, M. (2011). Normality Testing in Excel: The Excel Statistical Master. Mark Harmon. www.ExcelMasterSeries .com. Heiberger, R. M., & Neuwirth, E. (2010). R through Excel: A Spreadsheet Interface for Statistics, Data Analysis, and Graphics. New York: Springer Science & Business Media.

Love, J., Selker, R., Marsman, M. et al. (2019). JASP: graphical statistical software for common statistical designs. Journal of Statistical Software, 88(2), 1–17. Quintela-del-Río, A., & Francisco-Fernández, M. (2017). Excel templates: a helpful tool for teaching statistics. American Statistician, 71(4), 317–325.

75 https://doi.org/10.1017/9781009212021.005 Published online by Cambridge University Press

Data-Guided Healthcare Decision Making

Figure 3.30 The flexplot, linear, mixed, and generalized linear modeling options in JASP

Figure 3.31 The classical versus the Bayesian option in binomial data in JASP

76 https://doi.org/10.1017/9781009212021.005 Published online by Cambridge University Press

Software

Figure 3.32 The opening view of Math Solver

Quirk, T. J. (2020). Excel 2019 for Engineering Statistics. Berlin: Springer International. Quirk, T. J., & Cummings, S. (2016). Excel 2016 for Health Services Management Statistics. Berlin: Springer. Schou, S. B. (1999). Data, statistics, and decision models with Excel. American Statistician, 53(4), 389. Van Doorn, J., Van den Bergh, D., Böhm, U. et al. (2020). The JASP guidelines for conducting and reporting a Bayesian analysis. Psychonomic Bulletin & Review, 28(3), 813–826.

Veney, J. E., Kros, J. F., & Rosenthal, D. A. (2009). Statistics for Health Care Professionals: Working with Excel (Vol. 9). Hoboken, NJ: Wiley. Wagenmakers, E. J., Love, J., Marsman, M. et al. (2018). Bayesian inference for psychology. Part II: example applications with JASP. Psychonomic Bulletin & Review, 25(1), 58–76. Walkenbach, J. (2010). Excel 2010 Power Programming with VBA (Vol. 6). New York: Hoboken, NJ: Wiley.

77 https://doi.org/10.1017/9781009212021.005 Published online by Cambridge University Press

Data-Guided Healthcare Decision Making

Figure 3.33 Entering data or specifying the analytic function for Math Solver

78 https://doi.org/10.1017/9781009212021.005 Published online by Cambridge University Press

Software

Figure 3.34 Specifying matrix data for Math Solver

Figure 3.35 The calculating view of Math Solver

79 https://doi.org/10.1017/9781009212021.005 Published online by Cambridge University Press

Data-Guided Healthcare Decision Making

Figure 3.36 When A and B are not independent: their visual dependency using MM 4.0

Figure 3.37 When A and B are independent: a visual of their probabilities using MM 4.0

80 https://doi.org/10.1017/9781009212021.005 Published online by Cambridge University Press

Software

Table 3.1 Excel/Word shortcut keys and effects. (Source: https://support .microsoft.com). (C, A, S refer to the Control, Alt, and Shift keys, respectively.)

Press

For

A+F8

Create, run, edit, or delete a macro

A+N

Go to insert tab

C+0

Hide the column

C+5

Apply or remove strikethrough

C+A

Select the entire table

C+B

Bold

C+C

Copy

C+D

Select the table

C+End

Move to last cell

C+Home

Move to the top

C+R

Move the table

C+S

Save

C+S+&

Apply outline border

C+S+_

Remove outline border

C+S+End

Extend to the last cell

C+V

Paste

C+W

Close workbook

C+X

Cut

C+Y

Redo the last action

C+Z

Undo

Esc

Return to previous navigation

F1

Excel help

F12

Open Save As dialog

Home

Go to the beginning of a row

S+F2

Insert a note

S+Tab

Move to previous cell

Tab

Move to next cell on the right

Table 3.2 Selected Excel commands

Excel commands

Results

=AVERAGE (a: b)

The average of entries in cells “a” through “b”

=BINOMIAL (a, b, c)

The cumulative binomial probability with “a” trial, “b” probability of “success” up to “c”

=CHIDIST (a, b)

The cumulative area up to “a” under a chi-squared distribution with “b” df

=CHIINV (a, b)

The percentile of chi-squared distribution corresponding to the given cumulative area “a” with “b” df

=CHITEST (x, b)

The chi-squared value given the observed “x” and expected count “b”

=COUNT (a, b)

The number of observations in a column from cell “a” to cell “b”

=COUNTIF (range, criteria)

The number in a column ranging from cell “a” to “b” meeting the criterion

=FDIST (x, df1, df2)

The area “x” under the F-distribution with numerator df1 and denominator df2

=F.INV (x, df1, df2)

The percentile of F-distribution up to the area “x” with numerator “df1” and denominator “df2”

=IF (x1, x2)

The value “x1” if the condition is true and “x2” if the condition is false

81 https://doi.org/10.1017/9781009212021.005 Published online by Cambridge University Press

Data-Guided Healthcare Decision Making

Table 3.2 (cont.)

Excel commands

Results

=MIN (a, b)

Minimum in a column from cell “a” to cell “b”

=MAX (a, b)

Maximum in a column from cell “a” to cell “b”

=MEDIAN (a, b)

Median in a column from cell “a” to cell “b”

=MODE (a, b)

Mode in a column from cell “a” to cell “b”

=NORMDIST (x, mean, standard deviation, area)

The area up to “x” under a normal distribution with “mean” and “standard deviation”

=POISSON (x, mean, cumulative)

The cumulative Poisson probability up to “x” with a specified mean parameter

=NORM.INV (x, mean, standard deviation)

The percentile up to an area “x” under normal distribution with “mean” and “standard deviation”

=RAND ()

A uniform random number in the bracket [0, 1]

=STDEV ()

Finds standard deviation

=T. DIST (x, df)

Finds the area under t-distribution with df

=T.INV (prob, df)

Finds the critical value of t distribution with specified tail probability and df

=VAR (number1, number2)

Finds the variance of number 1 through number 2

=Z.TEST (array, x, sigma)

Performs Z test for the array, x, sigma

Table 3.3 Deaths from lung cancer per 100,000 inhabitants. (Source: www.who.org)

Deaths from lung cancer per 100,000 inhabitants

2000

2001

2002

2003

2004

2005

Austria

40.31

39.6

39

38.4

37.63

36.93

Belgium

63.3

62.9

62.5

62.1

61.64

61.54

Denmark

59.2

57.6

55.9

54.1

52.6

50.88

Finland

35.66

34.3

33.5

32.6

31.48

30.32

France

43.8

44.4

45.1

45.8

46.39

47.5

Germany

46.3

46.5

46.7

46.9

46.91

46.77

Greece

52.6

53.2

53.8

54.4

54.63

55

Iceland

45.6

47.1

46.3

47.1

47.42

47.05

Ireland

39.9

40.1

40.3

40.4

40.85

41.48

Italy

55.3

55.6

55.9

56.2

56.16

55.9

Luxembourg

45.66

43.5

41.8

40.1

38.44

36.81

Malta

29.4

29.2

29

28.8

28.72

28.53

Netherlands

55

54.9

54.7

54.4

54.53

53.97

Norway

37.4

36.8

36.2

35.6

35.3

34.78

Portugal

28.11

27.5

27.2

26.8

26.51

26

Spain

45

45.3

45.3

45

44.91

44.74

Sweden

33.8

33.9

34.1

34.2

34.61

34.81

Switzerland

32.7

27

23.6

19.4

16.35

13.71

Turkey

38.5

39.6

39.5

40.2

40.38

40.27

United Kingdom

57

56.1

55.2

54.3

53.11

52.09

Albania

19.04

19.1

19.2

19

18.87

18.97

Belarus

37.06

36.2

35.4

34.4

33.36

32.36

Bulgaria

35.61

35.8

35.3

34.6

34.34

33.9

Europe

82 https://doi.org/10.1017/9781009212021.005 Published online by Cambridge University Press

Software

Table 3.3 (cont.)

Deaths from lung cancer per 100,000 inhabitants

2000

2001

2002

2003

2004

2005

Croatia

56.55

55.6

55.4

55.1

54.68

54.05

Czech Republic

55.74

57.4

58.9

60.3

62.02

63.39

Estonia

50.15

49.6

50.4

51.1

50.91

51.61

Georgia

18.57

19.1

19.4

19.8

20.17

20.33

Hungary

78.04

78.1

77.9

77.7

76.81

76.31

Latvia

43.78

44.9

45

45

45.37

45.71

Lithuania

37.2

36.4

35.2

33.8

33.03

32.17

Macedonia

29.55

29.2

29.2

29.2

28.79

28.43

Moldova

20

18.1

16.3

14.1

12.5

11.1

Poland

51.76

52.3

53.8

55

56.63

58.15

Romania

37.92

38.3

39

39.7

40.68

41.58

Russia

40.19

37.3

32.8

26.8

23.62

20.29

Serbia and Montenegro

43.2

42.9

41

40.2

39.05

38.04

Slovakia

41.64

41.2

41.2

41

40.63

40.07

Slovenia

50.2

51.8

53.5

55.2

56.69

58.83

Ukraine

38.18

37.3

36.4

35.3

34.25

33.29

Canada

53.8

54.2

54.5

54.7

54.62

54.32

United States

55.1

53.9

52.5

50.8

49.05

47.34

Argentina

16

14.96

14.09

Americas

19.5

18.4

17.2

Belize

6.6

6.7

6.7

Bolivia

18.9

19.8

20

6.6

6.53

6.43

20.7

21.29

21.84

Brazil

9.2

9.4

9.7

9.9

10.12

10.33

Chile

12.3

12.3

12.4

12.4

12.37

12.52

6.5

6.3

6.1

5.9

5.74

5.59

Colombia Costa Rica

6.34

7.1

7.8

8.4

9.16

10.01

33.03

33.3

33.4

33.4

33.66

33.51

Dominican Republic

5.3

5.8

6.3

6.8

7.39

8.03

Ecuador

3.56

3.4

3.3

3.2

3.12

3.06

El Salvador

2.1

2.1

1.8

1.9

1.83

1.75

Mexico

6.34

6.3

6.2

6.2

6.15

6.1

Nicaragua

2.8

3.2

3.6

4

4.47

5

Panama

6.46

6.4

6.3

6.2

6.12

6.02

Paraguay

4.1

4.1

4.2

4.2

4.24

4.32

Cuba

Peru Uruguay Venezuela

4.35

4.2

6.2

7.8

9.49

12.42

35.09

37.1

38.1

39.2

40.51

41.8

8.71

9

9.1

9.3

9.54

22.22

23.2

24.3

25.4

26.37

27.47

9.13

9.9

10.1

10.3

10.67

10.9

9.76

Asia Armenia Azerbaijan China

40.7

40.8

41.5

41.6

42.32

43.06

Hong Kong, China

20.7

21.8

21.6

22.3

22.93

23.46

83 https://doi.org/10.1017/9781009212021.005 Published online by Cambridge University Press

Data-Guided Healthcare Decision Making

Table 3.3 (cont.)

Deaths from lung cancer per 100,000 inhabitants

2000

2001

2002

2003

2004

2005

India

36.9

35.8

37.2

36.9

37.27

37.77

Indonesia

18.8

20.2

20

20.9

21.59

22.27

Israel

19.7

19.6

19.5

19.4

19.46

19.23

Japan

42.9

44.1

45.4

46.7

48.28

49.86

Kazakhstan

25.9

25.5

25.1

24.8

24.32

23.89

3.1

2.9

2.7

2.52

2.35

Kuwait

3.33

Kyrgyzstan Malaysia Mauritius

8.1

7.6

7.3

6.9

6.53

6.24

26.4

26.2

25.9

25.6

25.54

25.38

6.3

5.6

4.9

4.33

3.8

7.12

Pakistan

26.6

25.7

25.7

25.1

24.4

23.86

Philippines

26.9

28.3

27.5

28.1

28.71

28.76

Singapore

27.52

26.4

25.2

23.9

22.87

21.74

South Korea

24.42

26.4

28.6

30.9

33.49

36.44

Taiwan

17.6

17.1

16.9

16.5

16.26

16.1

Tajikistan

3.6

3.7

3.2

2.3

Thailand

29.4

31

1.64

31.9

33.04

33.62

Turkmenistan

2.9

2.3

2.2

2.5

2.36

2.38

Uzbekistan

4.5

4.4

4.2

4

3.85

3.7

22.6

20.6

20.3

18.9

17.86

17.18

Australia

35.6

35.3

35

34.7

34.27

33.69

New Zealand

38.9

39.8

40.7

41.5

42.36

43.17

9.5

9.3

9

8.7

8.48

8.19

Vietnam

31

2

Oceanic

Africa South Africa

Table 3.4 Number of earthquakes, epidemics, floods, storms, and transport accidents, 2000–2010. (Source: www.who.org)

Country

Earthquakes (seismic activity)

Epidemics

Floods

Storms

Transport accidents

Albania

0

0

4

8

31

Algeria

2,279

0

1,312

27

444

34

0

6

0

0

Angola

0

3,695

328

0

595

Argentina

0

6

73

57

154

Armenia

0

0

1

0

0

Australia

0

0

34

12

214

Austria

0

0

14

7

155

American Samoa

Azerbaijan

31

0

0

0

75

Bahamas

0

0

0

14

54

Bahrain

0

0

0

0

196

Bangladesh

4

245

2,404

5,588

3,803

84 https://doi.org/10.1017/9781009212021.005 Published online by Cambridge University Press

Software

Table 3.4 (cont.)

Country

Earthquakes (seismic activity)

Epidemics

Floods

Storms

Transport accidents

Barbados

0

0

0

1

0

Belarus

0

0

0

0

0

Belgium

0

0

2

7

11

Belize

0

0

1

54

0

Benin

0

819

35

0

291

Bermuda

0

0

0

4

18

Bhutan

11

0

200

12

0

Bolivia

0

28

361

20

444

Bosnia-Herzegovina

0

0

0

4

55

Botswana

0

472

3

0

0

Brazil

1

203

937

26

1,442

Bulgaria

0

0

52

2

0

Burkina Faso

0

7,711

67

0

210

Burundi

3

401

36

0

299

Cambodia

0

189

455

19

49

Cameroon

0

149

52

0

692

Canada

0

46

9

32

26

Canary Islands

0

0

23

19

113

Central African Republic

0

820

6

6

281

Chad

0

1,736

173

14

73

Chile

23

0

74

47

77

China

2,947

87,947

423

5,765

3,180

Colombia

14

0

1,009

5

587

Comoros

0

29

2

0

339

Congo

6

286

7

0

125

Cook Islands

0

0

0

0

0

41

0

44

5

10

Croatia

0

0

0

2

25

Cuba

0

0

4

40

157

Cyprus

0

0

0

0

31 29

Costa Rica

Czech Rep

0

0

38

10

Denmark

0

0

0

5

0

Djibouti

0

10

51

0

145

Dominica

0

0

0

5

0 136

Dominican Republic

3

25

723

211

East Timor

0

22

4

0

0

Ecuador

0

8

183

0

208 2,940

Egypt

0

15

18

13

1,160

341

83

345

72

Equatorial Guinea

0

15

0

0

103

Eritrea

0

0

0

0

56

Estonia

0

0

0

0

0

El Salvador

85 https://doi.org/10.1017/9781009212021.005 Published online by Cambridge University Press

Data-Guided Healthcare Decision Making

Table 3.4 (cont.)

Country

Earthquakes (seismic activity)

Epidemics

Floods

Storms

Transport accidents

Ethiopia

0

1,443

1,439

0

Fiji

0

0

26

44

0

Finland

0

0

0

0

24

France

0

1

44

68

204

French Polynesia

0

0

0

0

20

Gabon

0

51

0

0

94

Gambia

0

21

6

5

51

Georgia

6

0

1

0

82

Germany

0

0

29

85

127

Ghana

0

52

112

0

408

Greece

2

0

15

5

407

Grenada

0

0

0

40

0

Guadeloupe

1

0

0

0

20

Guam

0

0

0

5

0

Guatemala

6

1

77

1,540

348

Guinea

0

640

11

4

415

Guinea Bissau

0

623

3

0

114

Guyana

0

0

34

0

0

Haiti

0

40

2,910

3,663

270

Honduras

7

15

106

75

45

Hong Kong

0

299

0

6

54

Hungary

0

0

1

16

94

Iceland India Indonesia

240

0

0

0

0

0

37,705

1,528

13,611

976

4,295

174,843

1,190

2,810

4

3,976

Iran

27,757

76

744

43

2,604

Iraq

0

37

26

0

173

Israel

0

12

0

3

31

Italy

327

3

99

4

446

Ivory Coast

0

429

6

0

316

Jamaica

0

3

10

45

0

Japan

81

0

113

534

204

Jordan

0

0

0

14

121

Kazakhstan

3

0

1

0

0

Kenya

1

628

546

0

653

Kiribati

0

0

0

0

0

Korea Rep

0

6

198

423

251

Kuwait

0

0

0

0

0 124

Kyrgyzstan

74

0

3

4

Laos

0

46

33

16

28

Lebanon

0

0

0

0

68

Lesotho

0

28

0

1

40

86 https://doi.org/10.1017/9781009212021.005 Published online by Cambridge University Press

Software

Table 3.4 (cont.)

Country

Earthquakes (seismic activity)

Epidemics

Floods

Storms

Transport accidents

Liberia

0

42

3

0

60

Libya

0

0

0

0

492

Lithuania

0

0

0

0

0

Luxembourg

0

0

0

0

20

Macau

0

0

0

0

0

Macedonia FRY

0

0

2

1

25

Madagascar

0

691

52

958

92

Malawi

4

1,563

91

11

236

Malaysia

80

62

112

3

82

Maldives

102

0

0

0

31

Mali

0

201

63

0

237

Malta

0

0

0

0

112

Mauritania

0

55

45

0

142

Mauritius

0

0

0

5

0

29

0

344

164

565

Micronesia Fed States

0

19

0

48

0

Mongolia

0

0

41

94

24

Montenegro

0

0

0

0

0

Mexico

Montserrat Morocco Mozambique

0

0

0

0

0

628

0

221

1

592

4

482

1,012

65

332 132

Myanmar

71

30

102

138,636

Namibia

0

174

148

0

19

Nepal

0

691

1,056

0

952

Netherlands

0

0

0

11

0

New Zealand

0

0

4

2

18

Nicaragua

7

8

27

228

0

Niger

0

1,876

34

4

115

Nigeria

0

3,868

536

0

5,397

North Korea

0

4

1,229

49

175 37

Norway

0

0

0

0

Oman

0

0

0

113

79

Pakistan

73,576

163

2,265

369

1,302

Panama

2

0

66

0

72

Papua New Guinea

7

334

2

172

0

Paraguay

0

33

0

0

37

749

0

139

59

1,236 1,956

Peru Philippines

15

35

489

7,258

Poland

0

0

30

38

32

Portugal

0

0

9

4

40

Puerto Rico

0

0

4

3

106

Romania

0

0

214

29

58

87 https://doi.org/10.1017/9781009212021.005 Published online by Cambridge University Press

Data-Guided Healthcare Decision Making

Table 3.4 (cont.)

Country

Earthquakes (seismic activity)

Epidemics

Floods

Storms

Transport accidents

Russia

18

0

339

34

1,302

Rwanda

81

132

111

0

64

Samoa

148

0

0

10

0

Saudi Arabia

0

168

260

0

215

Senegal

0

343

51

2

1,469

Serbia

0

0

0

0

0

Serbia Montenegro

1

0

2

0

68

Sierra Leone

0

286

133

0

331

Singapore

0

35

0

0

0

Slovakia

0

0

4

2

23

Slovenia

1

0

0

6

0

Somalia

298

2,117

140

0

531

South Africa

2

336

152

62

1,015

Spain

0

2

35

39

529

35,399

367

348

14

131

0

2,751

242

33

676

Sri Lanka Sudan Suriname

0

0

5

0

30

Swaziland

0

32

0

1

52

Sweden

0

0

0

8

0

Switzerland

0

0

7

2

57

Syria

0

0

6

32

208

Taiwan

6

37

19

1,110

370

Tajikistan

17

0

89

0

21

Tanzania

15

278

162

0

1,035

Thailand

8,345

112

959

27

437

Togo

0

376

47

0

49

Tunisia

0

0

45

0

501

Turkey

252

24

208

39

1,046

Uganda

0

712

128

0

795

Ukraine

0

0

49

10

362

United Arab Emirates

0

0

0

0

43

United Kingdom

0

11

26

55

47

United States

3

214

299

3,288

846

Uruguay

0

0

14

9

11

Uzbekistan

0

0

0

0

52

Venezuela

0

0

160

5

417

Vietnam

0

105

2,000

1,319

396

Yemen

0

32

336

30

473

11

5,491

139

49

2,618

Zambia

0

644

60

0

259

Zimbabwe

0

4,641

112

8

369

Zaire/Congo

88 https://doi.org/10.1017/9781009212021.005 Published online by Cambridge University Press

Software

Table 3.5 Latitude, ultraviolet A and B, ozone level, mortality rate. (Source: www.who.org)

Country

Latitude

OVA/square m2

OVB/square m2

Ozone level

Men mortality

Women mortality

Argentina

34

51,300

1,409

335

0.93

0.96

Australia

38

48,100

1,253

370

0.5

0.74

Austria

48

38,800

850

275

0.76

0.84

Belgium

51

35,800

732

275

0.86

0.9

Bulgaria

43

43,600

1,052

310

0.93

0.96

Canada

50

36,900

770

320

0.8

0.91

Chile

34

51,300

1,409

300

0.93

0.92

Costa Rica

10

63,600

2,064

275

0.89

0.99

Cuba

23

58,500

1,783

275

0.96

0.98

Czech

50

36,900

770

275

0.76

0.84

Denmark

56

30,600

546

275

0.66

0.74

England

52

34,800

694

275

0.84

0.86

Finland

60

26,200

409

240

0.79

0.89

France

49

37,900

810

275

0.87

0.9

Germany

52

34,800

694

275

0.82

0.88

Greece

37

48,900

1,293

325

0.95

0.96

Hong Kong

22

59,000

1,812

260

0.98

0.99

Hungary

48

38,800

850

320

0.76

0.86

Iceland

68

17,800

197

270

0.91

0.97

Ireland

53

33,800

656

275

0.9

0.89

Israel

32

52,800

1,484

275

0.77

0.82

Italy

43

43,600

1,052

300

0.85

0.89

Japan

36

49,700

1,332

275

0.98

0.99

Korea

37

48,900

1,293

275

0.99

0.99

Luxembourg

50

36,900

770

275

0.96

0.8

Malta

36

49,700

1,332

310

0.97

0.98

Mexico

19

60,500

1,893

275

0.96

0.98

Netherlands

53

33,800

656

275

0.82

0.84

New Zealand

42

44,500

1,092

275

0.41

0.7

Norway

60

26,200

409

245

0.82

0.78

Poland

53

33,800

656

275

0.85

0.87

Portugal

40

46,200

1,175

285

0.94

0.94

Puerto Rica

18

60,900

1,916

275

0.93

0.97

Romania

44

42,700

1,011

310

0.9

0.92

Scotland

56

30,600

546

275

0.88

0.9

Singapore

1

64,700

2,127

225

0.97

0.99

Spain

41

45,500

1,133

275

0.9

0.94

Sweden

58

28,400

478

275

0.74

0.83

Switzerland

47

39,800

890

290

0.75

0.82

Trinidad

10

63,600

2,064

275

0.97

0.98

United States

37

48,900

1,293

300

0.74

0.87

89 https://doi.org/10.1017/9781009212021.005 Published online by Cambridge University Press

Data-Guided Healthcare Decision Making

Table 3.5 (cont.)

Country

Latitude

OVA/square m2

OVB/square m2

Ozone level

Men mortality

Women mortality

Uruguay

35

50,600

1,372

240

0.87

0.95

Venezuela

10

63,600

2,064

260

0.97

0.97

Yugoslavia

45

41,800

970

300

0.85

0.89

282.7

0.85

0.89

0.12

0.07

Average

40.5

43,979.5

1,115.02

Standard deviation

15.2

11,514.5

510.35

90 https://doi.org/10.1017/9781009212021.005 Published online by Cambridge University Press

26.14

Chapter

How to Collect Authentic Data

4 After studying the chapter, readers will be able to: • Collect sample data for use in healthcare decisionmaking. • Recognize and remedy length or size bias in collected data. • Apply strategies to minimize bias, variance, and meansquared error in healthcare decision-making. • Practice these concepts and methods in other applications.

4.1 Motivation In this chapter, methods to collect data from several reliable sources are articulated first. Then the importance of checking the authenticity of the data source is stated. Storing the collected data in Excel spreadsheets is vital. Refer to Hardin and Kotz (2020) for suggestions on improving data collection and amenability. Surging in popularity, mobile health (mHealth) apps foster research, clinical regimens, and individual well-being. These procedures encourage proactivity and ongoing accountability for healthcare. For the purpose of addressing pertinent healthcare inquiries and quantifying health outcomes, information gathering and assessment on selected variables within a structure coalesce into a process called data collection, which is an essential step in research in all fields, including healthcare. While methods vary across disciplines, the emphasis of all data collection should be accuracy. One objective of data collection is answering questions. Data collection is done in several steps. Sampled units are measured for information, often numeric but sometimes descriptive. Data are a set of values determining unpredictable qualitative or quantitative factors for one or more individuals/items. The singular version of data is datum. The term information can be used interchangeably with the term data. Data is used in healthcare research, hospital management, finance, and government. Data are assembled, organized, calculated, scrutinized, and communicated. They are visualized with charts, graphs, tables, or images.

Data are coded for better processing. Raw, unprocessed data are a numerical compendium after the numbers are “cleaned” and any errors corrected by researchers. At times, raw data need to be checked for outliers or data entry errors. Data processing is performed in several stages. The processed data from an early stage become the raw data in the following stage. Field data derive from an uncontrolled, unrestrained natural environment. Preliminary data are generated from empirical approaches including indirect monitoring and documentation. Although each term has its unique role and meaning, data evidence, information, knowledge, and wisdom are comparable. Data yield information appropriate for determining healthcare decisions once the details have been processed, examined, and interpreted in a specified manner. The data are informative to some but may not be so informative to others. The bulk of compiled data is summarized by Shannon entropy. Refer to Shanmugam (2016) for a modified way of shaping up data information. Knowledge is cumulative understanding gained through experience with quantitative information. Data are the least hypothetical knowledge, information is the next least hypothetical, and knowledge the most hypothetical. Gathering pertinent data is accomplished from the primary source. The first person to obtain data is an investigator. Every field involves data collection. Any information contributed by a person or an entity other than the investigator is considered secondary data. Sources for secondary data include but are not limited to censuses. Sometimes, organizational records yield data. Primary data, by contrast, are collected by the investigator. A related concept is data profiling, which is scrutinizing a data collection such as a repository for a research purpose. Researchers make informative summaries from the data. The purposes of data collection are to determine (1) whether existing data satisfy the research purpose, (2) whether there is competence in using the repository by keyword identification, abstracts, or categorization, (3) whether the data quality can be checked for conformity in format, definition, and presentation, (4) whether the risk

91 https://doi.org/10.1017/9781009212021.006 Published online by Cambridge University Press

Data-Guided Healthcare Decision Making

can be assessed by integrating data in the applications, (5) whether a discovery is possible from the metadata in the source database, including the value patterns and frequency distributions, (6) whether the accuracy of the values in the data repository has been checked, (7) whether discernment about data integrity is possible, (8) whether an enterprise exists to view all databases and data management, and (9) whether profiling is feasible using the repository to elucidate composition, subject matter, or connections. Profiling enables a researcher to comprehend anomalies, evaluate data for quality, and manage metadata. Data profiling employs explanatory statistics including order statistics, average, dispersion, and metadata. Metadata discloses patterns, illegal or missing values, and duplicates. Distinctive analyses are performed at disparate constructional levels. Data profiling is all about data repository improvement procedure and data quality, and its benefits truncate the implementation cycle and enhance users’ comprehension. Surveys are a conventional instrument for probability sampling. In the twenty-first century, internet surveys have begun to replace face-to-face, telephone, and mail surveys. The decisions resulting from surveys are influenced by cost, time, approach to asking questions, participants’ inclination to answer, and validity of responses. The statistics discipline offers a survey technique called randomized response for those who are reluctant to respond to sensitive questions. Refer to Shanmugam (2015) for details on the hidden untruthfulness in survey responses. Online surveys are a common research instrument utilized across multiple fields. The advantages of the online survey to collect pertinent data for healthcare research are: ◦ Web surveys are cost-effective, straightforward, and quick, though they are vulnerable to errors. ◦ The data collection time frame is short. ◦ The collected data are easily converted to an Excel spreadsheet. ◦ The exchange between participant and data collector is direct. ◦ Online surveys do not seem imposing and rank higher in social desirability than direct methods. ◦ Pop-up instructions are provided based on individual questions where assistance is required. ◦ Online surveys are tailored for easy access (e.g., participants can save forms to complete at a later time and drop-down menus encourage participation). The distinction between inclusion probabilities and non-probability samples is vital. Probability sampling contains a type of non-coverage. Not everyone has access to the Internet. The expediency of employing email dispenses

92 https://doi.org/10.1017/9781009212021.006 Published online by Cambridge University Press

with online survey invitations. A lack of email directories is an obstacle to deal with sampling frames. The lack of sampling frames causes difficulties with online surveys. Nonresponse is an issue in online surveys. Online survey response rates are inadequate. A phenomenon intrinsic to human nature often propels individuals to go along with statements, regardless of their content, rather than challenge them. The questionnaire design is crucial. Even with diversity in question formats, images, and multimedia, simplicity should be reinforced to aid participants’ comprehension of the questions and their willingness to respond. Their responses convey lower validity and less accuracy in the data. Questionnaire design is key to avoid measurement errors on the part of participants such as lack of capability, computer illiteracy, disinclination, or confidentiality concerns. Face-to-face surveys are appropriate in environments where the telephone or mail are not feasible. Integrating data enables healthcare practitioners to reduce waste and lower operating costs, but sharing data is challenging, in large part because the devices on which data are collected are administered by individual organizations. These and other administrative issues can be resolved using Amanuensis, which is a protected, amalgamated healthcare data system that seeks to attain better information structure. Data integrity and policies to protect confidentiality can be established. Data retrieval and calculations can bestow legitimate documentation. Data are progressively more vital in science and society. For a discussion on how students gain significant data literacy, refer to Kjelvik and Schultheis (2019). First, the disciplines of quantitative reasoning, data science, and data literacy converge. Data literacy proceeds from quantitative reasoning and data science. Boddy et al. (2019) explain how to visualize data. Dhillon and Singh (2019) mention promising research in clinical, Omics, and Sensor data sets. Clinical data include digital health documents. Omics is high-dimensional data encompassing genome, transcriptome, and proteome information. Sensor data are amassed from assorted mobile sensor gadgets that anyone can wear. Raw data are arduous to process manually. Machine learning therefore has surfaced as an instrument for examining data. Machine learning wields diverse statistical methods and the latest algorithms to envision healthcare outcomes. Miscellaneous algorithms focus on supervised, unsupervised, and reinforced machine learning. Li et al. (2019) examine how machine-learning algorithms are applied to assorted healthcare data. This leads to a challenge in managing the quality of healthcare data. Distributed and synergic data management upheld by edge computing demonstrates substantial merits in refining

How to Collect Authentic Data

overall system performance. A digital medical record displays how healthcare data are secured. The function of big data in healthcare settings is immense. Extensive healthcare data can help healthcare providers optimize patient outcomes, predict epidemics, avoid preventable diseases, lower costs, and upgrade quality of life. See Abouelmehdi et al. (2018) for a discussion. Security and patient confidentiality must be considered when discerning acceptable uses of data. Recognizing the constraints of current solutions and proposing topics for future study are crucial to creating an ethical big data environment. Refer to Kumar and Singh (2018) for the argument that big data comprises the copious volumes of structured, unstructured, and semistructured data produced by miscellaneous institutions. Such heterogeneous data are big data, and the healthcare sector is controlled by big data. The theoretical structure of big data analytics hinges on digital health documentation, visual and textual content, and clinical decision support systems. See Purushotham et al. (2018) on the role of deep learning models. Deep learning in neural networks reshapes computer vision, natural language processing, and speech recognition. Benchmarking for clinical predictions involves mortality rate and length of stay. See Wiens and Shenoy (2018) on the increasing availability of electronic health data to attain both discovery and improvement in healthcare. Machine learning (ML) helps researchers identify patterns, such as epidemiology, in the healthcare sector. Public–private alliances, associations, and administrative leadership on the federal and global level are fundamental to assemble genomic, personal, and healthcare data. Refer to Deverka et al. (2017) for an exploration of explanations versus justifications. Stakeholders need to clarify goals, provide deep insight into areas of complexity, and address the perpetual policy concerns of confidentiality, data protection, and data ownership. Communication technologies play a significant role in healthcare establishments that gather copious amounts of healthcare information, including diagnoses, treatments, and patient demographics. An appropriate modeling of these diagnoses, treatments, and demographics is essential to perceive big healthcare data before cultivating upgraded, data-directed, decision-support systems. Regressions demonstrate the importance of prediction. Refer to Kostkova et al. (2016) on how medical data sets are shared in datadriven research; this and related publications attest to the fact that data can elucidate prevalence of disease, results of treatment, and adverse reactions and side effects. “Data donors” are potential factors in gathering knowledge about diseases and improving diagnostics. Concerns about maintaining privacy, confidentiality, and control of

anonymity of the patients are legitimate and should be honored. See Khatri and Shrivastava (2016) for strategies to preserve patients’ anonymity, and to Jiang et al. (2016) for how healthcare data can be visualized in our challenging twenty-first century by integrating geospatial information, ephemeral particulars, subject matter, and heterogeneous health traits within routine observable conditions. A cyberbased healthcare data visualization needs to be done for the entire healthcare system. Two new visualization techniques are described in the literature – spatial textural visualization and information. Refer to Thara et al. (2016) for the advantages of network and cloud data centers. Data are periodically updated. For plotting and visualizing the time series data, refer to Lu et al. (2016). Knowledge discovery from a database (KDD) depends on the development of appropriate methods for extracting data. A popular method is data mining, which is practiced in the healthcare industry. See Jothi and Husain (2015) for illustrations. The ultimate goal of data mining is the improvement of healthcare practices and the development of biomedical research. Shanmugam (2014a) demonstrates pertinent knowledge of data collection and analysis through a case study of the Ebola virus. Ebola is a fatal communicable disease, a horror even medical professionals could not avoid when treating patients with the disease. The outbreak began August 26, 1976, and the virus was named Ebola due to the outbreak’s location near an African river with the same name, in the village of Yambuku, within the Mongla district of Congo. The first patient who died from Ebola, a school principal, passed away on September 8, 1976. Researchers suspected monkeys, pigs, or fruit bats were the cause of Ebola, although the virus is not airborne. Transmission is common due to lack of proper protective clothing such as gloves and masks. Another precautionary course of action involves swiftly discarding the fluids and tissues of Ebola patients. To prevent the epidemic from expanding, quarantining Ebola patients is the approach taken in some nations. Due to the lack of an effective vaccine accessible to humans, the disease retains a high mortality. Many fear the Ebola virus being used as a biological weapon. On July 31, 2014, human patients were clinically evaluated with ZMapp, formulated as a biopharmaceutical drug to treat the Ebola virus. The surviving patients endured lasting effects – painful joints, peeling skin, loss of vision, baldness, and inflammation of one or both testicles. The symptoms of Ebola start within two days of contracting the virus. They include fever, headache, muscle pain, vomiting, diarrhea, and hemorrhaging. Stemming from the Congo, the Ebola epidemic expanded to the three surrounding countries. Of the 2,127 cases recorded in these four nations, only 1,145

93 https://doi.org/10.1017/9781009212021.006 Published online by Cambridge University Press

Data-Guided Healthcare Decision Making

patients survived. The Ebola virus then proliferated globally. Approximately 194 nations have chosen to take precautions against it. Another case study to examine is menopause. Shanmugam (2014b) addresses the hospitalization of patients due to menopausal issues. Menopause is the ending of menstruation. Menopause applies to nonhuman primates as well. The menopause transition years, which take place before and after the last period, are referred to as perimenopause when the hormone levels are still going up and down inconsistently. Adjustments in hormones produce menopausal symptoms. Menopause is not a disease, but potential symptoms include palpitations, depression, irritability, and lack of concentration. The menopausal transition triggers no desire for sex. Fluctuating levels of estrogen and progesterone increase the risk of cancer, heart disease, osteoporosis, bone loss, and mortality. Age of menopause is typically 51. The time frame is approximately four years. Surgical menopause is a medical procedure. Instead of declining organically over time, estrogens decline with additional symptoms. New bivariate distribution was discovered by Shanmugam (2014b) to model the duration of an evaluation in menopause data, named bivariate menopause distribution. Another important issue to examine through big data is wait time. The wait time in hospitals needs to be reduced, considering how it affects patients. Seeing a lengthy queue as a source of exasperation and financial loss, healthcare administrators determine how to expeditiously resolve this irritating problem. Researchers have published many articles in refereed journals and edited tomes on how to efficiently deal with queuing, sometimes known as queue theory. (See http:// virtualqueueingwordpress.com for details on the perceived time people wait to receive medical attention). These scholarly writings warn that patients upon their arrival, if they see a long wait, might choose to forego the medical attention they need. Customers wait for service in restaurants, airports, banks, traffic intersections, and in grocery stores, but if they must wait in hospitals or clinics, they may be waiting in intolerable pain and become impatient. Managers of hospitals or clinics sympathize with their patients and try to ensure medical services are given quickly. The competence of the operations strategy implemented by the hospital is measured by the patient flow. A queue indicates patient flow, a delay for already arrived patients. A reduction of the wait time for patients is a priority. Rearranging the services of medical practitioners, laboratory diagnosticians, insurance agents, and pharmacists should be done so as to complement patients’ arrival patterns and medical needs.

94 https://doi.org/10.1017/9781009212021.006 Published online by Cambridge University Press

The wait time analysis asserts the need to blend productive and nonproductive services. A productive service may include recovery from a surgical procedure, which is a necessity. Immediate service should be provided. Resources and patient welfare cannot be compromised or be handled outside of the law. Hospital administrators are expected to consider ethical limitations such as patient flow in an emergency room. They assess patients’ service times in order to maintain the efficiency of hospitals and clinics. Sometimes patient services can be arranged via service providers, which can help healthcare providers anticipate patient demand. The queuing methodology explicitly helps providers calculate the optimal number of healthcare professionals so as to reduce patients’ length of stay. The public evaluates the efficiency of hospitals in terms of wait time. The implementation of queuing theories clarifies and refines the service time pattern as well. Queuing theory aids healthcare providers in drawing a conclusion after foreseeing the need for the capacity of a unit to expand. The uniform sequence for the patient in terms of arrival, queue formation, service reception, and departure is the optimal arrangement. One caveat is that reducing the wait time raises the level of satisfaction, which in turn raises demand. The wait time cost function reinforces this caveat. Nevertheless, a hospital can employ extra resources with the purpose of reducing wait time. If there is inactivity in a hospital, operational costs increase. Adding the idle cost function (due to insufficient patients) to the waiting cost function equals the cost function. Queuing ideas play an integral part in classifying and developing an exemplary procedure. More wait time inconveniences patients and causes frustration. The frustration level reflects what the patient perceives and expects regarding the services provided by the hospital. Frustration can be decreased when satisfactory queuing theories are practiced. Patient frustration reflects poorly on hospital management, which might lead to stockbrokers’ disinclination to continue their support. Agencies can withhold accreditation or licenses for a hospital. As an exemplary theory, the queuing methodology serves to determine the condition of the running operation of a hospital by placing the focus on patient satisfaction. This assessment determines the need for hiring medical professionals, adjusting their scheduling, creating a more favorable and productive working environment, and reducing patients’ wait time. Additionally, queuing methodologies for a pharmacy are used to determine the times for filling prescriptions and educating patients on prescription administration. Optimal strategies infused with a thorough and accurate

How to Collect Authentic Data

cognizance of the hospital system influence the hospital’s performance. In any discussion concerning the assessment of hospitals, an hour, a day, or any other form of time can be regarded as the time unit (TU). Outside the hospital, the term patients is exchanged with customers. The queuing concepts call for designating a congruous model first. Refer to Shanmugam (2014c) for a demonstration of collecting healthcare data on wait times. Survey questions that touch on delicate subject matter might result in respondents’ understandable reluctance to answer. Administering non-sensitive surveys is standard. Warner invented the randomized response technique (RRT) to get around practical complexities in a survey (see Shanmugam, 2014d and 2014e for details). The respondent does not have to disclose the outcome; therefore, whichever questionnaire a specific respondent completed will not be identified by anyone. With circumspection extended to each respondent, the RRT-based survey raises the viability of retrieving genuine answers. Using the insightful RRT, one ponders whether a pessimistic effect stemming from certain levels of education inculcated bias toward foreigners, a phenomenon called xenophobia. The responses could be “honest yes,” “honest no,” or “cheating.” A low or inadequate education might be a reason for xenophobia. A pertinent framework can be built to estimate the proportions. Xenophobia data exist in open-domain web pages. Shanmugam (2014d) narrated problems in survey sampling and potential remedies. The data source for earthquakes and a discussion on the C (∝) method are used here as case studies through which to examine overdispersion. The modeling becomes tedious when the data display either over or under variance. Hours as a measure of time between an earthquake and its aftershocks in the years 1973 through 1995 were accumulated from the US Geological Survey database. An earthquake taking place underwater precipitates a tsunami, generating further destruction. On December 26, 2004, an earthquake caused a tsunami that killed 280,000 humans and thousands of animals, and destroyed a massive number of properties in Indonesia, Thailand, the Maldives, Somalia, Sri Lanka, and India. The aftershocks following any major earthquake bring both instant and longer-term hardships that heavily burden governments and service agencies. Do the times exhibit an exponential probability pattern? Does more data variance represent higher volatility in earthquakes? Could the times between the main earthquake and its aftershocks be interpretable? Refer to Shanmugam (2014e) for pertinent answers. Data that guide healthcare decision-making in emergency wings of hospitals are important. The major threat in

the twenty-first century is terrorism. Bioterrorism stands out as the most horrific kind of terrorism. The General Accounting Office (2003) of the United States sent survey questionnaires to hospitals in all 50 states and Washington, DC. The response rate ranged from 44% to 100%. The higher percent signifies hospitals’ preparedness. Refer to Shanmugam (2014f) for details on data collection, analysis, and interpretation of the results. Data can also be helpful in addressing another issue of concern – patient noncompliance. Using multiple drugs, having no health insurance, or forgetting to take medicine are some reasons for noncompliance. After not taking medicines, patients with particular illnesses are hospitalized again. How should the relevant data be collected and examined? A suitable methodology is necessary for lucid data analysis. When patients refuse to adhere to treatment, they are admitted to the hospital more frequently. If data on covariates are obtainable, a prediction model can be set up. To glean better insight about the new primary component methodology of non-adherent bivariate distribution controlling data, refer to Shanmugam (2014g). Shanmugam (2014h) demonstrates data envelopment analysis (DEA), which is used for reducing input and/or raising output. In DEA, decision-making units (DMUs) estimate efficiency as the ratio of the output produced over the input. Efficiency is high when output is greater than input. Efficiency is higher when input is less than output. The data sources in the globalized and highly competitive healthcare industry are not easy to track. Stochastic frontier analysis (SFA), an alternative to DEA, based on a linear model, is a statistical method to estimate efficiency. There are two errors in SFA: the usual regression involves only one error and random noises. Stochastic frontier analysis is a generalized regression methodology. In SFA, the second error illustrates technical inefficiency. Random noise and technical inefficiency are not statistically independent. The projected values of the observable data constitute the efficiency score of the operation. Data envelopment analysis serves as a nonparametric methodology for performing SFA. The advantages versus the disadvantages of SFA and DEA are examined later in this chapter. An illustrative example that succinctly depicts the distinctions between DEA and SFA is the incidence of melanoma in various countries. The SFA methodology allows researchers to assess patients’ odds of survival without melanoma. Stochastic frontier analysis is a stochastic methodology that takes a parametric approach to analyze vital information while DEA is a deterministic methodology. Technical efficiency is the basis for comparing SFA and DEA. Stochastic frontier analysis is accomplished by adopting an implicit underlying statistical distribution for

95 https://doi.org/10.1017/9781009212021.006 Published online by Cambridge University Press

Data-Guided Healthcare Decision Making

the data. The data frequently follow a normal distribution. Stochastic frontier analysis and DEA have advantages as well as disadvantages. In the illustrative data analyzed via SFA, the data depict normality using box plots. Data for people with melanoma in the year 2003 are provided in Shanmugam (2014i). Data on bioterrorism are scarce. Governments, hospitals, and the public are haunted by the likelihood of bioterrorism in the current twenty-first century. Some hospitals do drills to practice treating victims of bioterrorism. The US federal government emphasizes the importance of drills to successfully manage bioterrorism emergency cases in hospitals. In the event of an actual bioterrorism attack, the victims would be assigned quickly and randomly to nearby or easily accessible hospitals, whether or not the hospitals have conducted drills. The chance of successfully treating a victim of bioterrorism is higher in a hospital that had undergone drills to prepare for bioterrorism. There is clearly a need for an appropriate underlying model and a statistical methodology to examine such data. This need is fulfilled in a new probability pattern named tweaked binomial distribution (TBD). A good example of bioterrorism is anthrax, a virus that attacks the lungs when inhaled. Healthcare administrators across 34 states indicated their hospitals have practiced treating anthrax patients. A number of phone calls from the public were reported from October 8, 2001, through November 11, 2001, about the threat of bioterrorism. The Centers for Disease Control (CDC) has reported 11,063 bioterrorism attacks in the United States since November 8, 2001. The US federal government and the CDC have highlighted the importance of drilling to tackle bioterrorism (see Shanmugam, 2014j). Shanmugam (2013g, 2013h) considers data from the Government Accounting Office to assess hospitals’ preparedness to treat anthrax cases. Cyberattacks are another form of terrorism that threatens patients’ welfare. Since September 11, 2001, the United States and other nations have been vigilant in safeguarding against unseen dangers, especially cyberterrorism. A cyberattack is an unsolicited penetration of a malicious software virus into computers. Some viruses reside permanently in the host computer, while others enter the host periodically. During the year 2000, the Israeli parliament, along with Israel’s defense and foreign ministries, dealt with cyberattacks that were intended as retaliation after Israel attacked the homes of Palestinians living in Jerusalem. Cyber insecurity is a nightmare, especially for hospital administrators who maintain patients’ health records in computers. Hackers may be enemies, thrill seekers,

96 https://doi.org/10.1017/9781009212021.006 Published online by Cambridge University Press

disgruntled employees, contrabandists, fraud artists, or programmers with malicious intent. Regardless of efforts made to ensure cyber protection, the cyber intrusion rate has increased. Vigilance involves periodic security risk analysis. Pertinent data on quantified cyber intrusions, level of malicious cyberattacks, and the level of protection are necessary to devise a professional approach. Refer to Shanmugam (2013i) for data on cyber insecurity. Nurses can contract deadly viruses after making strenuous efforts in caring for infected patients. Such calamitous situations cannot be avoided as long as the main priority of any nurse remains treating patients. Inhospital infection is a concern to hospital administrators. An example is severe acute respiratory syndrome (SARS). On November 27, 2002, the first SARS case emerged in southern China. Individuals infected with SARS must be isolated due to the inefficiency of antibiotics. International travelers departing from China were instrumental in spreading SARS. China regretfully acknowledged its ineptness in not quickly annihilating the SARS epidemic. Quarantine measures were taken by international travelers who might have encountered SARS. In some cases, they were ushered from airports directly to hospitals. Toronto General Hospital, as a case in point, provided treatment for admitted SARS patients (see www.nc.cdc.gov and Shanmugam, 2014k). Collecting authentic data is also crucial in the field of organ transplantation. The United States has a shortage of organs available for transplant, a shortage, that is, especially severe when considering the kidney and pancreas. The first kidney transplant occurred in the United States on June 17, 1950. Diabetic patients sometimes require a pancreas transplant. In some cases, both organs need to be replaced. In 1966, the first simultaneous kidney-pancreas transplant was done at the University of Minnesota. Every country on earth has laws that regulate organ transplants. During the period 1988–2006, the global average reported organ procurement and transplant wait time was longer than six years. In the event of transplants involving two organs, the issue of finding matches for both complicates procurement, distribution, and surgery (see http://optn.transplant.hrsa.gov and Shanmugam (2013b). Christian Barnard performed the first human heart transplant on December 3, 1967. Approximately 3,500 heart transplants take place annually. In 2004, 39 heart-lung transplants were performed in the United States. See Shanmugam (2013m) for an analysis of transplant data (also available at www.who.org). Healthcare researchers determine whether the removal of one cancerous kidney significantly reduces a person’s lifespan. Several clinical trials were conducted so as to formulate guidelines on treating patients with malignant

How to Collect Authentic Data

cancer in the kidneys, termed hypernephroma. A combination of chemotherapy, immunotherapy, and nephrectomy can be used for treatment, but not all patients undergo nephrectomy. A surgeon recommends nephrectomy only when its chances of success are high enough. An appropriate model for statistical analysis here is exponential distribution (ED), which ought to capture the outcome of nephrectomy in the treatment group. The University of Oklahoma Health Sciences Center (UOHSC) performed such an experiment. Any tweaks are further explained by Shanmugam (2013f) with the UOHSC data. Another public mental health topic is rape and the healthcare of rape victims. In addition to victims’ families and the healthcare professionals treating them, local and federal governments implement measures in an attempt to prevent rape. They focus on repairing any devastation that has already taken place. The consequences of rape include a changed demeanor in victims infused with hostility, hypersensitivity, moodiness, and social withdrawal. Rape crisis centers serve as a safe place for rape survivors. Evaluating the prevalence of rape is the driving force for governmental and healthcare agencies in working out their budgets. Keep in mind victims deal with shame and loss of dignity, which is the primary reason many rapes go unreported (see www.unodc.org for the prevalence of rape and Shanmugam (2013c) on difficulties in collecting data on rape incidences and how to resolve them). Alzheimer’s disease is another health concern for which data are crucial in attempts to resolve it. The incidence of Alzheimer’s disease in the United States is alarming. Alzheimer’s disease is named after Alois Alzheimer, the German psychiatrist who discovered it in 1908. Alzheimer’s patients include several notables such as President Ronald Reagan and Prime Minister Harold Wilson. Alzheimer’s damages brain cells, triggering severe memory loss. No effective treatment exists. An estimated 100,000 Alzheimer cases exist worldwide. Alzheimer’s reduces life expectancy by seven years. The severity of the disease is assessed by the words an Alzheimer’s patient cannot remember. Refer to Shanmugam (2013d) on how statistical modeling and data analytic results help capture improvement of Alzheimer’s due to medical treatment. Gynecologists treat women who cannot conceive a child. Conceiving a child is connected to menses and its biorhythm. How does the field of medicine define the menstrual cycle? The menstrual cycle is a routine period of changes in the woman’s uterus and ovary that signals bodily readiness for reproduction. The number of days per menstrual cycle on the average is 28. Even with hormonal changes, the cycle is regulated by the endocrine system. Healthcare researchers ascertain whether smoking can

impede hormonal changes, causing a delay in pregnancy (see Shanmugam, 2013e). Longer service duration indicates the inefficient operation of a hospital, according to healthcare consultants. A statistical analysis in Shanmugam (2013k) attests to the impact of deferred reporting of acquired immune deficiency syndrome (AIDS) cases. In the year 1981, deaths due to AIDS across the world were approximately 45 million people. Meanwhile, an estimated 75 million people suffered globally due to a delay in diagnosis. According to the World Health Organization (WHO), approximately 33.4 million people suffer with AIDS. Approximately 2 million people died in 2009, including approximately 330,000 children. The spread of AIDS is erroneously assessed. A reason for this is perhaps that the healthcare community periodically changes the definition of clinical symptoms as discoveries are made (see Centers for Disease Control, 1985). The US federal government modified its policy before reporting AIDS. The modification refers to no more mandatory inclusion of laboratory evidence in the report. In this context, note 42,670 AIDS cases were diagnosed as of March 31, 1987, whereas only 33,350 cases were reported. The reporting delays lead to underestimating the number of AIDS cases. Reporting delays occur in other illnesses and impact insurance claims as well. Refer to Shanmugam (2013j) for a methodology to make corrections for this underestimation. Before heart attacks became the leading cause of death in the United States, cancer was the number one cause of death. In treating cancer patients, the medical community deliberates this query: could medical treatment delay cancer recurrence? To an extent, this question is answered with an appropriate model, applicable statistical methodology, and extracted data. Traditionally, ED, discussed earlier in this chapter, has been chosen as an implicit model for survival time data. Exponential distribution is known to hold no memory – what happened in the past is irrelevant to current or future health status. Such a concept is utterly unrealistic, especially with respect to cancer recurrence. That is, no memory contradicts the medical belief that a medication would, at the least, postpone cancer recurrence. Furthermore, ED is apt in describing medication’s impact on the survival time for cancer patients. See Shanmugam (2013l) for details about cancer recurrence, along with data analytics from the source and ED. The kidney, brain, lung, or heart are damaged in the presence of cardiovascular disease (CVD). The factors leading to CVD include hypertension, high cholesterol levels, excessive alcohol consumption, history of genetic problems, obesity, inadequate or no physical activity, neurodevelopmental factors, diabetes mellitus, and smoking.

97 https://doi.org/10.1017/9781009212021.006 Published online by Cambridge University Press

Data-Guided Healthcare Decision Making

In the year 2010, approximately 279,098 deaths in the United States occurred due to heart disease. Someone experiences a stroke every 40 seconds in the United States, while a person dies from a heart attack every four minutes. See Ramanathan et al. (2013) for practical difficulties in the collection of big data in studies on bio-surveillance. Privacy, legal issues surrounding underage participants, and reporting norms cannot be ignored in data collection, whether of big or small data. The model parameters are estimated and utilized via Gibb’s sampling method, which efficiently exercises Metropolis-Hasting’s algorithm. See Park and Ghosh (2012) for details about collecting data concerning healthcare operations. Big data analytics is the most promising topic in healthcare studies. The loss of healthcare expenditures leads to probing of healthcare fraud, biohazard disposal, and abuse of medical waste. Healthcare programs such as Medicaid and Medicare with their service fee participate in investigations of fraud in the healthcare industry. A healthcare environment is judged by its information richness. Even with a plethora of data, the generated knowledge might be insufficient. Effective analysis tools may be lacking to discover concealed relationships and visible patterns in the data. Decision trees (DT), naive Bayes, and artificial neural networks (ANNs) are popular data-mining techniques. Refer to Srinivas et al. (2010) for warnings about data quality versus data mining. See Mohammed et al. (2009) on sharing healthcare data without losing patients’ privacy, especially in large data sets. Liu et al. (2006) cautioned data-mining approaches could lead to misunderstandings in healthcare. Most healthcare data sets consist of missing values. The DT and naive Bayesian classifiers have to be exercised carefully so as to estimate inpatients’ length of stay. The risk of bioterrorism is data mined in Bravata et al. (2004). See Fazli and Behboodian (2002) for a statistical approach based on symmetric functions and distance functions. Shanmugam (1996) explains the reasons for a lack of healthcare research. Healthcare data are presented with length bias. Shanmugam formulated an asymptotic test to scrutinize the homogeneity in length-based sample data from the Mean Exponential Family (MEF). The MEF is an innovative class of probability distributions for healthcare data analysis. See Shanmugam (1989) for the definition and properties of the MEF. This approach utilized specificity, sensitivity in the data with length bias, and their intrinsic relations. For more on this topic, refer to Consoli et al. (2019), Jones and Groom (2011), Oral (2019), Ponce et al. (2009), Tracy (2019), Weiner (2020), and Zamani Forooshani (2020).

98 https://doi.org/10.1017/9781009212021.006 Published online by Cambridge University Press

4.2 Concepts When everyone in a population is selected for observation, the collected data are termed a census. Sampling is the process of observing a segment of a population. The advantages of sampling include lower costs and faster data collection. When each member within a population receives equal odds of being selected, that is random sampling. When a selected person is removed from the pool, that is sampling without replacement. When selection is carried out by skipping a few before the next selection, that is a systematic sample. A systematic sample is biased. In some instances, a random sample of groups is selected first and then some are randomly chosen within the selected groups. This is called cluster sampling. Within probability sampling, data collection is stratified or cluster sampling. Because of the lack of confidentiality, questions on sensitive matters go unanswered at times. To remedy this difficulty, data collection is done using RRT. Snowball sampling is not scientific as the selection proceeds based on friendship with the data collector. For a variety of reasons, the sampled population might be different from the target population. When sampling is performed in proportion to group size, that is quota sampling. In many healthcare decisions, it is worthwhile to do an interim analysis for ethical, economic, and administrative reasons. Healthcare professionals and patients should have access to the results. This encourages healthcare administrators to continue as is if they desire dataguided, efficient healthcare decisions. Otherwise, healthcare management should alter data collection for the healthcare system. When healthcare administrators collect data, they carefully determine the sample size, n based on specified α (which is Prðaccepting false new opinionÞ, β (which is the probability Prðnot accepting true new opinionÞ, and the formula n ¼

2ðzε=2 þz1β Þ2 , ðσδ Þ2

where δ ¼ μnull  μresearch is

called the contiguity (drift between status quo and claim) and σ is the heterogeneity of the frequency trend. An interpretation of this formula is performed as follows. When the sampled population is more heterogeneous, researchers need a large sample. Likewise, when the center of the population frequency trend under the status quo and the claimed version is too close, the sample size has to be larger in order to make a clear decision. When α is smaller, the sample size needs to be larger. When one wants to attain more statistical power (i.e., for a higher 1  β), the sample size needs to be larger. For example, suppose α ¼ 0:05, 1  β ¼ 0:90, δ ¼ 0:3, and σ ¼ 0:71 and the sample size is n ¼ 117.

How to Collect Authentic Data

4.3 Illustration This section concentrates on the application of the concepts and methods examined so far in this chapter. Suppose we are trying to configure the appropriate sample size for a study. The sample size depends on how tolerable is the probability of type I and II errors. Let H0 and H1 be the null and research hypothesis. Note that both the null and research are complementary and mutually exclusive. The null and research cannot be true at the same time. They are not additive in the sense that they do not have to be equal to one (i.e., α þ β ≠ 1:0 because they are under different scenarios). That is, α ¼ Prðrejecting true null hypothesisÞ and β ¼ Prðnot accepting true research hypothesisÞ:

When the null hypothesis is not rejected, the data are considered not supportive of the research hypothesis. Note the probability 1  α is recognized as the confidence level and 1  β is recognized as the statistical power. Usually, α ¼ 0:05 if a 95% confidence level is desired. When an author tightens α to 0.001, he or she desires to attain a 99.9% confidence level. Every study seeks a high statistical power, which can be attained by having a larger sample for a fixed contiguity δ ¼ jμ0  μ1 j and for a fixed heterogeneity, σ in the population. See Table 4.1 before reading their interpretations. As easily seen in Table 4.1, the sample size ðnÞ increases when confidence level ð1  αÞ, statistical power ð1  βÞ, or population heterogeneity ðσÞ increases and/or when the contiguity decreases ðδÞ.

Table 4.1 Sample size depends on confidence level, power, contiguity, and heterogeneity. (Source: Shanmugam generated from Excel calculations)

α

zα=2

1β

z1β

δ

σ

0.001

3.290527

0.99

2.326348

0.05

0.3

1,319.249

0.001

3.290527

0.99

2.326348

0.05

3

131,924.9

0.001

3.290527

0.99

2.326348

0.05

30

13,192,495

0.001

3.290527

0.99

2.326348

5

0.3

0.131925

0.001

3.290527

0.99

2.326348

5

3

13.19249

0.001

3.290527

0.99

2.326348

5

30

1,319.249

0.001

3.290527

0.9

1.281552

0.05

0.3

1,264.357

0.001

3.290527

0.9

1.281552

0.05

3

126,435.7

0.001

3.290527

0.9

1.281552

0.05

30

12.643,570

0.001

3.290527

0.9

1.281552

5

0.3

0.126436

0.001

3.290527

0.9

1.281552

5

3

12.64357

0.001

3.290527

0.9

1.281552

5

30

1,264.357

0.05

1.959964

0.99

2.326348

0.05

0.3

626.5647

0.05

1.959964

0.99

2.326348

0.05

3

62,656.47

0.05

1.959964

0.99

2.326348

0.05

30

6,265,647

0.05

1.959964

0.99

2.326348

5

0.3

0.062656

0.05

1.959964

0.99

2.326348

5

3

6.265647

0.05

1.959964

0.99

2.326348

5

30

626.5647

0.05

1.959964

0.9

1.281552

0.05

0.3

588.9164

0.05

1.959964

0.9

1.281552

0.05

3

58,891.64

0.05

1.959964

0.9

1.281552

0.05

30

5,889,164

0.05

1.959964

0.9

1.281552

5

0.3

0.058892

0.05

1.959964

0.9

1.281552

5

3

5.889164

0.05

1.959964

0.9

1.281552

5

30

588.9164

0.1

1.644854

0.99

2.326348

0.05

0.3

499.8567

0.1

1.644854

0.99

2.326348

0.05

3

49,985.67

0.1

1.644854

0.99

2.326348

0.05

30

4,998,567

Sample size

99 https://doi.org/10.1017/9781009212021.006 Published online by Cambridge University Press

Data-Guided Healthcare Decision Making

Table 4.1 (cont.)

α

zα=2

1β

z1β

δ

σ

0.1

1.644854

0.99

2.326348

5

0.3

0.049986

0.1

1.644854

0.99

2.326348

5

3

4.998567

0.1

1.644854

0.99

2.326348

5

30

499.8567

0.1

1.644854

0.9

1.281552

0.05

0.3

466.2922

0.1

1.644854

0.9

1.281552

0.05

3

46,629.22

0.1

1.644854

0.9

1.281552

0.05

30

46,62922

0.1

1.644854

0.9

1.281552

5

0.3

0.046629

0.1

1.644854

0.9

1.281552

5

3

4.662922

0.1

1.644854

0.9

1.281552

5

30

466.2922

Sample size

4.4 Summary

Selected References

The chapter has introduced and explained data sources for particular topics. It has also illustrated how various types of data have been collected, analyzed, and applied to healthcare issues. The chapter has presented a formula to determine optimal sample size, confidence level ð1  αÞ, statistical power ð1  βÞ, and/or population heterogeneity level ðσÞ. All of these are connected to contiguity ðδÞ.

Abouelmehdi, K., Beni-Hessane, A., & Khaloufi, H. (2018). Big healthcare data: preserving security and privacy. Journal of Big Data, 5(1), 1–18.

4.5 Exercises 1. Using the data presented in Table 4.2, formulate a null and research hypothesis for age, waist size, pulse, systolic, diastolic, cholesterol, and BMI. Specify and justify the confidence level, statistical power, population heterogeneity level, and contiguity level for the data you wish to consider. Is the sample size adequate for testing the hypothesis?

Boddy, A., Hurst, W., Mackay, M. et al. (2019). An investigation into healthcare-data patterns. Future Internet, 11(2), 30. Bravata, D. M., McDonald, K. M., Smith, W. M. et al. (2004). Systematic review: surveillance systems for early detection of bioterrorism-related diseases. Annals of Internal Medicine, 140(11), 910–922. Centers for Disease Control (CDC). (1985). Revision of the case definition of acquired immunodeficiency syndrome for national reporting: United States. Morbidity and Mortality Weekly Report, 34(25), 373–375. Consoli, S., Recupero, D. R., & Petković, M. (eds.). (2019). Data Science for Healthcare: Methodologies and Applications. New York: Springer. Deverka, P. A., Majumder, M. A., Villanueva, A. G. et al. (2017). Creating a data resource: what will it take to build a medical information commons? Genome Medicine, 9(1), 1–5.

2. Using the data presented in Table 4.3, formulate a null and research hypothesis for the data variable of the number of seizures before and after treatment. Specify and justify the confidence level, statistical power, population heterogeneity level, and contiguity level for the data you wish to consider. Is the sample size adequate for testing the hypothesis?

Dhillon, A., & Singh, A. (2019). Machine learning in healthcare data analysis: a survey. Journal of Biology and Today’s World, 8(6), 1–10.

3. Using the data presented in Table 4.4, formulate a null and research hypothesis for the selected variables. Specify and justify the confidence level, statistical power, population heterogeneity level, and contiguity level for the data you wish to consider. Is the sample size adequate for testing the hypothesis?

Hardin, T., & Kotz, D. (2020). Amanuensis: information provenance for health-data systems. Information Processing & Management, 58(2), 102460.

100 https://doi.org/10.1017/9781009212021.006 Published online by Cambridge University Press

Fazli, K., & Behboodian, J. (2002). A construction method for measures of central tendency and dispersion. International Journal of Mathematical Education in Science and Technology, 33(2), 299–302. General, United States (2001). Fiscal Year 2003 Budget Request.

Jiang, S., Fang, S., Bloomquist, S. et al. (2016). Healthcare data visualization: geospatial and temporal integration. In Proceedings of the 11th International Conference on Computer Vision, Imaging

How to Collect Authentic Data

Table 4.2 Biological data on age, waist, pulse, systolic, diastolic, cholesterol, and body mass index. (Source: US Department of Health and Human Services)

Male

Age

Pulse

Systolic

Diastolic

1

58

Waist 90.6

68

125

78

Cholesterol 522

BMI 23.8

2

22

78.1

64

107

54

127

23.2

3

32

96.5

88

126

81

740

24.6

4

31

87.7

72

110

68

49

26.2

5

28

87.1

64

110

66

230

23.5

6

46

92.4

72

107

83

316

24.5

7

41

78.8

60

113

71

590

21.5

8

56

103.3

88

126

72

466

31.4

9

20

89.1

76

137

85

121

26.4

10

54

82.5

60

110

71

578

22.7

11

17

86.7

96

109

65

78

27.8

12

73

103.3

72

153

87

265

28.1

13

52

91.8

56

112

77

250

25.2

14

25

75.6

64

119

81

265

23.3

15

29

105.5

60

113

82

273

31.9

16

17

108.7

64

125

76

272

33.1

17

41

104.0

84

131

80

972

33.2

18

52

103.0

76

121

75

75

26.7

19

32

91.3

84

132

81

138

26.6

20

20

75.2

88

112

44

139

19.9

21

20

87.7

72

121

65

638

27.1

22

29

77.0

56

116

64

613

23.4

23

18

85.0

68

95

58

762

27.0

24

26

79.6

64

110

70

303

21.6

25

33

103.8

60

110

66

690

30.9

26

55

103.0

68

125

82

31

28.3

27

53

97.1

60

124

79

189

25.5

28

28

86.9

60

131

69

957

24.6

29

28

88.0

56

109

64

339

23.8

30

37

91.5

84

112

79

416

27.4

31

40

102.9

72

127

72

120

28.7

32

33

93.1

84

132

74

702

26.2

33

26

98.9

88

116

81

1,252

26.4

34

53

107.5

56

125

84

288

32.1

35

36

81.6

64

112

77

176

19.6

36

34

75.7

56

125

77

277

20.7

37

42

95.0

56

120

83

649

26.3

38

18

91.1

60

118

68

113

26.9

39

44

94.9

64

115

75

656

25.6

40

20

79.9

72

115

65

172

24.2

101 https://doi.org/10.1017/9781009212021.006 Published online by Cambridge University Press

Data-Guided Healthcare Decision Making

Table 4.3 Seizure data of placebo and treatment groups. (Source: Thall and Vail, 1990)

ID

Year 1 seizure

Year 2 seizure

Year 3 seizure

Year 4 seizure

Group

Before treatment seizure

Age

1

5

3

3

3

placebo

11

31

2

3

5

3

3

placebo

11

30

3

2

4

0

5

placebo

6

25

4

4

4

1

4

placebo

8

36

5

7

18

9

21

placebo

66

22

6

5

2

8

7

placebo

27

29 31

7

6

4

0

2

placebo

12

8

40

20

23

12

placebo

52

42

9

5

6

6

5

placebo

23

37

10

14

13

6

0

placebo

10

28

11

26

12

6

22

placebo

52

36

12

12

6

8

4

placebo

33

24

13

4

4

6

2

placebo

18

23

14

7

9

12

14

placebo

42

36

15

16

24

10

9

placebo

87

26

16

11

0

0

5

placebo

50

26

17

0

0

3

3

placebo

18

28

18

37

29

28

29

placebo

111

31

19

3

5

2

5

placebo

18

32

20

3

0

6

7

placebo

20

21

21

3

4

3

4

placebo

12

29

22

2

3

3

5

placebo

9

21

23

8

12

2

8

placebo

17

32

24

18

24

76

25

placebo

28

25

25

2

1

2

1

placebo

55

30

26

3

1

4

2

placebo

9

40

27

13

15

13

12

placebo

17

32

28

11

14

9

8

progabide

76

18

29

8

7

9

4

progabide

38

32

30

0

4

3

0

progabide

19

20

31

3

6

1

3

progabide

10

30

32

2

6

7

4

progabide

19

18

33

4

3

1

3

progabide

24

24

34

22

17

19

16

progabide

31

30

35

5

4

7

4

progabide

14

35

36

2

4

0

4

progabide

11

27

37

3

7

7

7

progabide

67

20

38

4

18

2

5

progabide

41

22

39

2

1

1

0

progabide

7

28

40

0

2

4

0

progabide

22

23

41

5

4

0

3

progabide

13

40

42

11

14

25

15

progabide

46

33

102 https://doi.org/10.1017/9781009212021.006 Published online by Cambridge University Press

How to Collect Authentic Data

Table 4.3 (cont.)

ID

Year 1 seizure

Year 2 seizure

Year 3 seizure

Year 4 seizure

Group

Before treatment seizure

Age

43

10

5

3

8

progabide

36

21

44

19

7

6

7

progabide

38

35

45

1

1

2

3

progabide

7

25

46

6

10

8

8

progabide

36

26

47

2

1

0

0

progabide

11

25

48

102

65

72

63

progabide

151

22

49

4

3

2

4

progabide

22

32

50

8

6

5

7

progabide

41

25

51

1

3

1

5

progabide

32

35

52

18

11

28

13

progabide

56

21

53

6

3

4

0

progabide

24

41

54

3

5

4

3

progabide

16

32

55

1

23

19

8

progabide

22

26

56

2

3

0

1

progabide

25

21

57

0

0

0

0

progabide

13

36

58

1

4

2

2

progabide

12

37

Table 4.4 List of web pages for data (Source: http://en.wikipedia.org)

Table 4.4 (cont.)

www.aacn.nche.edu

www.aoa.gov/AoARoot/Aging _ Statistics/Profile/2009

www.aacp.org

www.aoa.org/AoARoot/Aging _ statistics

www.aama-ntl.org

www.aoanet.org

www.aame.org/data/facts

www.aota.org

www.aameda.org

www.apma.org

www.aanp.org

www.apta.org

www.aapa.org

www.ardms.org

www.aapcnatl.org

www.asecho.org

www.aare.org

www.asha.org

www.aarp.org

www.aspe.hhs.gov

www.abms.org

www.asrt.org

www.acatoday.org

www.ast.org

www.acefitness.org

www.atra-tr.org

www.acgme.org

www.audiology.org

www.ache.org

www.bcbs.com

www.acp-onlline.org

www.bioethics.od.nih.gov

www.ada.org

www.bls.org

www.adha.org

www.bt.cdc.gov

www.advanced.org

www.cancer.gov

www.ahdionline.org

www.cancer.net

www.ahima.org

www.cc.nih.gov

www.ama-assn.org

www.cdc.gov

www.amerchiro.org

www.cdc.gov/nchs/fastats

103 https://doi.org/10.1017/9781009212021.006 Published online by Cambridge University Press

Data-Guided Healthcare Decision Making

Table 4.4 (cont.)

Table 4.4 (cont.)

www.chiropractic.org

www.nasa.gov

www.cia.gov

www.nationalgeographic.org

www.cit.nih.gov

www.nccam.nih.gov

www.cms.hhs.gov

www.ncmhd.nih.gov

www.csr.nih.gov

www.ncoa.org

www.dentalassistant.org

www.ncsbn.org

www.dh.gov.uk

www.nei.nih.gov

www.diabetes.niddk.nih.gov

www.nhcoa.org

www.drugwatch.com

www.nhlhi.hih.gov

www.eatright.org

www.nhs.uk

www.ete-online.com

www.nia.nih.gov

www.familydoctor.org

www.nia.nih.gov/alzheimers

www.fic.nih.gov

www.niaaa.nih.gov

www.flu.gov

www.niaid.nih.gov

www.genome.gov

www.niams.nih.gov

www.ghr.nlm.nih.gov

www.nibib2.nih.gov

www.hc-sc.gc.ca

www.nichd.nih.gov

www.health.nih.gov//category

www.nidcd.nih.gov

www.healthcare.gov

www.niddk.nih.gov

www.healthreform.gov

www.nids.nih.gov

www.healthypeople.gov

www.niehs.nih.gov

www.helpguide.org

www.nigms.gov

www.history.nih.gov

www.nih.gov/medlineplus.gov

www.hrsa.gov

www.nimh.nih.gov

www.ibef.org

www.ninr.nih.gov

www.ihs.gov

www.nlm.nih.gov

www.india.gov.in

www.nln.org

www.indianhealthcare.in

www.norr.nih.gov

www.jointcommission.org

www.npr.org

www.kff.org

www.nursingworld.org

www.kidshealth.org

www.oaa.org

www.liu.se/tema/inhph

www.oecd.org

Www.ll.georgetown.edu

www.oeed.org

www.mayoclinic.com

www.ornl.gov

www.mcdonalds.com

www.owl-national.org

www.medicalreservecorps.gov

www.pharma.org

www.medicaltravel.com

www.pharmacist.com

www.medicare.gov

www.pharmacytechnician.com

www.mgma.com

www.sdms.org

www.mypyramid.gov

www.statelocalgov.net

www.n4a.org

www.stemcells.nih.gov

www.nacds.org

www.svunet.org

www.nadl.org

www.thehastingscenter.org

www.naemt.org

www.tricare.osd.mil

104 https://doi.org/10.1017/9781009212021.006 Published online by Cambridge University Press

How to Collect Authentic Data

Table 4.4 (cont.) www.unaids.org www.understanding-medicaltourism.com www.understanding-medicaltourism.com/medical-tourismstatistics.php www.unesco.org www.unicef.org www.unicef.org/infobycountry www.urban.org/health_policy/long-term_care www.usdoj.gov/atr www.usdoj.gov/atr/public www.va.gov/health www.whitehouse.gov www.who.int www.who.int/en www.worldbank.org www.worldchiropracticalliance.org www.yourdiseaserisk.wustl.edu

Liu, J., Bier, E., Wilson, A. et al. (2016). Graph analysis for detecting fraud, waste, and abuse in healthcare data. AI Magazine, 37(2), 33–46. Liu, P., Lei, L., Yin, J. et al. (2006). Healthcare data mining: prediction inpatient length of stay. In 2006 3rd International IEEE Conference Intelligent Systems (pp. 832–837). London: Institute of Electrical and Electronics Engineers. Lockwood, K. J., Harding, K. E., Boyd, J. N., & Taylor, N. F. (2020). Home visits by occupational therapists improve adherence to recommendations: process evaluation of a randomized controlled trial. Australian Occupational Therapy Journal, 67(4), 287–296. Lu, H. M., Wei, C. P., & Hsiao, F. Y. (2016). Modeling healthcare data using multiple-channel latent Dirichlet allocation. Journal of Biomedical Informatics, 60, 210–223. Mohammed, N., Fung, B. C., Hung, P. C., & Lee, C. K. (2009). Anonymizing healthcare data: a case study on the blood transfusion service. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 1285–1294). New York: Association for Computing Machinery. Oral, E. (2019). Surveying sensitive topics with indirect questioning. In Statistical Methodologies. London: Intech Open.

and Computer Graphics Theory and Applications (Vol. 2) (pp. 214–221), edited by N. Magnenat-Thalmann, P. Richard, L. Linsen et al. https://doi.org/10.5220/0005714002120219. Jones, S., & Groom, F. M. (eds.). (2011). Information and Communication Technologies in Healthcare. Boca Raton, FL: CRC Press. Jothi, N., & Husain, W. (2015). Data mining in healthcare: a review. Procedia Computer Science, 72, 306–313. Khatri, I., & Shrivastava, V. K. (2016). A survey of big data in healthcare industry. In Advanced Computing and Communication Technologies (Vol. 1), edited by J. K. Mandal, D. Bhattacharyya, & N. Auluc (pp. 245–257). Singapore: Springer. Kjelvik, M. K., & Schultheis, E. H. (2019). Getting messy with authentic data: exploring the potential of using data from scientific research to support student data literacy. CBE: Life Sciences Education, 18(2), es2.

Park, Y., & Ghosh, J. (2012). A probabilistic imputation framework for predictive analysis using variably aggregated, multi-source healthcare data. In Proceedings of the 2nd ACM SIGHIT International Health Informatics Symposium (pp. 445–454). New York: Association for Computing Machinery. Ponce, J., Hernández, A., Ochoa, A. et al. (2009). Data mining in Web applications. International Journal of Mathematical Education in Science and Technology, 32(6), 873–886. Purushotham, S., Meng, C., Che, Z., & Liu, Y. (2018). Benchmarking deep learning models on large healthcare datasets. Journal of Biomedical Informatics, 83, 112–134. Ramanathan, A., Pullum, L. L., Steed, C. A. et al. (2013). Integrating heterogeneous healthcare datasets and visual analytics for disease bio-surveillance and dynamics. In 3rd IEEE Workshop on Visual Text Analytics. London: Institute of Electrical and Electronics Engineers.

Kostkova, P., Brewer, H., de Lusignan, S. et al. (2016). Who owns the data? Open data for healthcare. Frontiers in Public Health, 4(7).

Shanmugam, R. (1989). Asymptotic homogeneity tests for mean exponential family distributions. Journal of Statistical Planning and Inference, 23, 227–241.

Kumar, S., & Singh, M. (2018). Big data analytics for healthcare industry: impact, applications, and tools. Big Data Mining and Analytics, 2(1), 48–57.

Shanmugam, R. (1996). Effective sample size in length biased data. Applied Statistical Science, 1, 89–100.

Lee, E. T., & Wang, J. W. (2003). Statistical Methods for Survival Data Analysis. 3rd edition. Hoboken, NJ: Wiley. Li, X., Huang, X., Li, C., Yu, R., & Shu, L. (2019). EdgeCare: leveraging edge computing for collaborative data management in mobile healthcare systems. IEEE Access, 7, 22011–22025. Lindsey, K. (1997). Modelling Frequency and Count Data. 1st edition. Oxford: Oxford University Press.

Shanmugam, R. (2013a). Mosaic masonries to interpret diagnostic test results. American Medical Journal, 4(1), 12–20. Shanmugam, R. (2013b). Shortage level of matching kidney and pancreas organs for implant is estimated. International Journal of Research in Nursing, 4(2), 40–46. Shanmugam, R. (2013c). Informatics about fear to report rapes using bumped-up Poisson model. American Journal of Biostatistics, 3(1), 17–29.

105 https://doi.org/10.1017/9781009212021.006 Published online by Cambridge University Press

Data-Guided Healthcare Decision Making

Shanmugam, R. (2013d). Alzheimer’s disease prognosis is captured by a down-upsized incidence Poisson distribution. American Medical Journal, 4(2),150–159.

Shanmugam, R. (2014g). Probing non-adherence to prescribed medicines? A bivariate distribution with information nucleus clarifies, American Medical Journal, 5, 54–60.

Shanmugam, R. (2013e). Does smoking delay pregnancy? Data analysis by a tweaked geometric distribution answer. International Journal of Research in Medical Sciences, 1(4), 343–348.

Shanmugam, R. (2014h). Data envelopment analysis for operational efficiency. In Encyclopedia of Business Analytics and Optimization (Vol. 2), edited by J. Wang (pp. 18–28). New York: IGI Global.

Shanmugam, R. (2013f). Tweaking exponential distribution to estimate the chance for more survival time if a cancerous kidney is removed. International Journal of Research in Nursing, 4(1), 29–33. Shanmugam, R. (2013g). Unified survival functions are derived and illustrated using hospitals’ preparedness data to treat anthrax cases. International Journal of Statistics and Economics, 12(3), 82–95. Shanmugam, R. (2013h). Probabilistic health-informatics and bioterrorism. International Journal of Communication and Computer, 10, 28–32. Shanmugam, R. (2013i). Hacking-Vigilance distribution with application to assess cyber insecurity level. International Journal of Information and Education Technology, 3(3), 300–303. Shanmugam, R. (2013j). Alternate to traditional goodness of fit test with illustration using service duration to patients in hospitals. International Journal of Statistics and Economics, 11(2), 31–43. Shanmugam, R. (2013k). Odds to quicken reporting already delayed cases: AIDS incidences are illustrated. International Journal of Nursing in Research, 4(1), 1–13. Shanmugam, R. (2013l). Is cancer recurrence postponed by a treatment? A new model answers. American Medical Journal, 4 (1), 43–62. Shanmugam, R. (2013m). Does over or under dispersion in inverse binomial data suggest anything? A case in point is the waiting time for both heart-lung transplants. American Journal of Biostatistics, 3(2), 30–37. Shanmugam, R. (2014a). Health broken woven Poisson spheres to manage deadly Ebola incidences. American Journal of Infectious Diseases, 10, 143–154. Shanmugam, R. (2014b). “Bivariate distribution” for infrastructures among operative, natural, and no menopauses. American Journal of Biostatistics, 4, 34–44. Shanmugam, R. (2014c). How do queuing concepts and tools help to effectively manage hospitals when the patients are impatient? A demonstration. International Journal of Research in Medical Sciences, 2, 1076–1084. Shanmugam, R. (2014d). A bivariate probability model to identify “honesty” versus “cheating” in economic surveys: xenophobia is illustrated. American Journal of Economics and Business Administration, 6, 42–48.

Shanmugam, R. (2014i). Stochastic frontier analysis and cancer survivability. In Encyclopedia of Business Analytics and Optimization (Vol. 5), edited by J. Wang (pp. 18–26). New York: IGI Global. Shanmugam, R. (2014j). Tweaked binomial distribution to capture the impact of drilling to cure bioterror victims in hospitals. International Journal of Statistics and Economics, 13(1), 40–45. Shanmugam, R. (2014k). An assessment of nurses’ sufficient immunity when treating infectious patients using bumped-up binomial model. International Journal of Research in Medical Sciences, 2(1), 132–138. Shanmugam, R. (2015). Refined randomized response model for suspicious answers: illicit drug users in U.S.A. are illustrated. International Journal of Ecological Economics & Statistics, 36, 15–27. Shanmugam, R. (2016). Entropy in Nucleus to tab data information and its illustration with Wolfram syndrome cases. International Journal of Ecological Economics and Statistics, 37(3), 44–63. Srinivas, K., Rani, B. K., & Govrdhan, A. (2010). Applications of data mining techniques in healthcare and prediction of heart attacks. International Journal on Computer Science and Engineering (IJCSE), 2(2), 250–255. Thall, P. F. and Vail, S. C. (1990). Some covariance models for longitudinal count data with over dispersion. Biometrics, 46, 657–671. Thara, D. K., Premasudha, B. G., Ram, V. R., & Suma, R. (2016). Impact of big data in healthcare: a survey. In 2016 2nd International Conference on Contemporary Computing and Informatics (IC3I) (pp. 729–735). New York: Institute of Electrical and Electronics Engineers. Tracy, S. J. (2019). Qualitative Research Methods: Collecting Evidence, Crafting Analysis, Communicating Impact. Hoboken, NJ: Wiley. Weiner, J. (2020). Why AI/data science projects fail: how to avoid project pitfalls. Synthesis Lectures on Computation and Analytics, 1(1), i–77. Wiens, J., & Shenoy, E. S. (2018). Machine learning for healthcare: on the verge of a major shift in healthcare epidemiology. Clinical Infectious Diseases, 66(1), 149–153. Xu, H. D., & Basu, R. (2020). How the United States flunked the COVID-19 test: some observations and several lessons. American Review of Public Administration, 50(6–7), 568–576.

Shanmugam, R. (2014e). C (∝) method to check daunting over/ under variances to understand times to aftershocks since a major earthquake. Computer, Electronics, Electrical, and Communication, 59, 190–193.

Zamani Forooshani, M. (2020). A Tool for Integrating Dynamic Healthcare Data Sources (Master’s thesis, Universität Politècnica de Catalunya).

Shanmugam, R. (2014f). Data guided public healthcare decision making. In Encyclopedia of Business Analytics and Optimization (Vol. 2), edited by J. Wang (pp. 30–43). New York: IGI Global.

Zenuni, X., Raufi, B., Ismaili, F., & Ajdari, J. (2015). State of the art of semantic web for healthcare. Procedia: Social and Behavioral Sciences, 195, 1990–1998.

106 https://doi.org/10.1017/9781009212021.006 Published online by Cambridge University Press

Chapter

5

Uncertainties and Their Impact on Healthcare Decisions

After studying the chapter, readers will be able to: • Assess the level of uncertainty when making healthcare decisions. • Adjust for uncertainty in order to predict better healthcare outcomes. • Quantify and utilize measures of uncertainty in healthcare outcomes. • Determine the impact of data on the probability of a specific outcome. • Demonstrate the Bayesian approach to updating knowledge in light of new data. • Practice these concepts in similar healthcare applications.

5.1 Motivation First, let us examine the role of uncertainty in scientific inquiries in general and in healthcare decision-making in particular. Weurlander (2020) warns physicians to be careful to deal with uncertainty before making decisions to treat patients. Koffman et al. (2020) discuss reasons for involving uncertainty in healthcare, especially with respect to the COVID-19 pandemic. Uncertainty is not easily defined because of inadequate, incomplete, and ambiguous information. Many occurrences in personal and professional life exhibit patterns of complete unpredictability – climate, disease outbreaks, financial volatility, natural disasters. Especially in healthcare, a specified outcome might be seen or missing. This vagueness is framed as uncertainty and raises fundamental challenges. Understanding how uncertainties appear is perhaps the beginning of solving this issue. Like an atom can be decomposed to its constituent parts of electrons, neutrons, and protons, the probability of uncertainty can be decomposed to its axioms. Refer to Camio et al. (2019) and Scoones (2019) for more discussion of how uncertainty is identified and illustrated. For example, in a hospital with a full intensive care unit (ICU), physicians must decide whether to turn away a new patient who seeks critical care or create a vacancy by

prematurely discharging a current occupant. This dilemma is a consequence of having a full ICU and healthcare professionals should calculate ahead of time the probability that the ICU will have a vacancy. This is feasible by analyzing patterns of service time and patient inflow and outflow rates. This phenomenon is discussed in the medical literature, which examines the factors that cause heavier inflow and/or slower outflow of inpatients in the ICU. A review of the medical literature could help in procuring the models that govern patient discharge decisions. In the absence of mathematical models, a simulation framework can be devised as an alternate approach. The simulation might capture the rate of transfer from an ICU to a general ward, the time it takes to prepare an ICU bed for an incoming patient, patient length of stay (LoS), and so forth. No matter how robust medical practice is, public health is vulnerable. Communities are inevitably exposed to viruses – both known and unknown – and remain susceptible to disease. Currently known diseases serve to remind us of the frailties inherent in the human condition. That fragility is connected to uncertainty. Both patients and physicians desire to manage fragility through medical diagnoses and treatment. However, diagnosis and prognosis are subjected to uncertainty. In this process, patients benefit from the principles of uncertainty. Those who are healthy benefit through preventative measures. See Rogers and Walker (2016) for a criticism of methods and remedial actions that can be taken to handle uncertainty in healthcare. One way to deal with the uncertainty that hides in healthcare data is consider it in terms of illnesses that occur never, once, and repeatedly. The evidence is translated to knowledge discovery. At times, the evidence is overwhelming and the process becomes too tedious. Why not seek a simpler approach? If the treatment of any illness is successful, it is confirmed in the data-mining results, in patterns of no, one, or repeated episodes. Refer to Shanmugam (2015) for a demonstration. Chapter 4 discussed the consequences of reporting delays for healthcare decision-making. Some reporting

107 https://doi.org/10.1017/9781009212021.007 Published online by Cambridge University Press

Data-Guided Healthcare Decision Making

delays might not be intentional but rather are due to ongoing changes in federal regulations and/or medical definitions. In such a process, the reporting probability in a current period is chained to the previous period’s “odds of quickening.” See Shanmugam (2013b) for a discussion of delays in reporting AIDS cases due to the difficulty in estimating and/or confirming the number of persons with AIDS. Misconceptions exist about the role of uncertainty. For clarification, see Kachapova and Kachapov (2012). The role of uncertainty in hospital site infection should be clearly understood for the welfare of both patients and healthcare professionals. For details, see Shanmugam (2012). Whether working on a surgical site or an outpatient wing, physicians, nurses, and administrators need to be protected from viruses spread by infected patients. Hospitals cater to two types of patients. The first type of patient is fully disinfected. The second type comprises patients who are part of a heavy influx of patients to a hospital and are not sufficiently disinfected – a potential source of hospital site infection. To reduce infection from this second type of patient, hospital management advises the medical team to undergo compulsory procedures. For example, 32 nurses who treated SARS patients in a Toronto hospital were infected by their patients. (See the pertinent data in www.nc.cdc.gov. See Jerak-Zuiderent (2012) for an illustration of necessary safety precautions.) A source for evaluating uncertainty (especially the sensitivity and specificity of a pandemic) in diagnostic data is borderline patients who later transition from the healthy group to the ill group or vice versa. Illnesses like dementia or blood pressure are cases in point. Analogous situations arise in signal processing. Engineers encounter data with clear signals as well as blurred, “hazy” signals with removable noise. In medical image processing, blurred signals are common. Borderline patients might be in an embryonic (progressive) stage of the illness during the test but could later turn out to be among the diseased cases (without proper medical treatment) or the healthy cases (medically treated). Refer to Shanmugam (2010) for an illustration of how borderline cases affect sensitivity and specificity during an epidemic. Uncertainty in diagnostic medical testing has been tracked since the time of Aristotle (fourth century BC). Healthcare researchers characterize the data in two ways. The first is deductive reasoning. To make healthcare decisions, relevant tests are performed on urine, blood, and tissue, and EKG, X-rays, or sonogram may be carried out. To comprehend uncertainty, let π ¼ PrðDÞ represent prevalence, the proportion of diseased (D) patients. The principle of double anchoring is based on connecting the outcomes of two time periods. Sensitivity is a proportion

108 https://doi.org/10.1017/9781009212021.007 Published online by Cambridge University Press

Se ¼ PrðþjDÞ. Among diseased patients, the proportion Se is likely to get a positive test result (+) in the pretest period. The positive predictive value (PPV)PPV ¼ PrðDjþÞ is the proportion of diseased patients among those who receive a positive test result. If both events, D and +, are (stochastically) independent, they are irrelevant to each other. What are their relevance to each other and their reciprocity? Shanmugam (2008) answers this question with the concept of authentication. Notice the duality between Se and PPV. Both are necessary for a complete diagnostic scenario. What is speculated in one is evidence of the outcome in the other. Shanmugam (2008) calls such a duality a double-anchored syllogism. Parity in the reciprocal syllogism in one brings understanding of the syllogism in the other. Likewise, specificity Sp ¼ PrðjDÞ is the proportion of healthy (D) participants likely to obtain a negative test result (−). The negative predictive value ðNPV ¼ PrðDjÞ is the healthy proportion of those who receive a negative test result. Both Sp and NPV have dual meanings as well. How important is parity? How are syllogisms connected to it? The double-anchored relationship is implicit in the equations p ð1  p ÞJ ¼ πð1  πÞI and p ð1  NPVÞ þ ð1  p ÞPPVÞ ¼ π, where p ¼ Prðnegative resultÞ, π ¼ PrðDÞ, the famous Youden index is I ¼ Se þ Sp  1, and the new, complementary version is the Shanmugam index J ¼ PPV þ NPV  1. For a diagnostic to be superior, the Youden index I > 0 from the physician’s point of view should be a high positive value and the Shanmugam index J > 0 from the patient’s point of view should also hold a high positive value. See the related discussions and illustrations in Shanmugam (2008). A less popular measure outside the clinical trials is Cohen’s (unweighted) kappa. In general, sensitivity, specificity, and kappa are three useful but interrelated statistical measures of agreement. For an example, when both sensitivity and specificity are below 7/8, then a kappa of 3/4 or greater is considered an excellent agreement. Feuerman and Miller (2005) narrate how uncertainty plays a key role to connect these three measures. The conditional probability or the scenario of statistical independence between two outcomes can be better explained with joint data on both variables in their categorical tables. Refer to Willows et al. (2003) and Joarder and al-Sabah (2002) for approaches to analyzing risk based on uncertainty. See Anderson (1988) for an interesting explanation of the role uncertainty plays in Venn diagrams. The pictorial representations in Venn diagrams can clarify uncertainty. To learn more about this line of thinking about uncertainty and probability, refer to Agresti (2003), Alemi and Gustafson (2007), Ben-Naim (2008),

Uncertainties and Their Impact on Healthcare Decisions

Congdon (2005), Dimitrakakis and Ortner (2019), Gökalp (2017), Good (1983), Marukatat (2009), Wang and Park (2020), and Zhang (2019).

5.2 Concepts Now, we ought to capture the conceptual meaning of uncertainty, which is not only intuitive but also quite practical in almost all walks of personal and professional life, including healthcare. The genius Albert Einstein (1875–1955) noted nature itself has chaotic elements and its chaotic system exhibits regularity. Scholars have wondered why nature is not deterministic. Naturalists, philosophers, scientists, and so forth might answer that life would have been monotonous, boring, and much more susceptible to destruction without uncertainty. For a variety of reasons, outcomes do not appear very predictably. As the founding authority of the statistics profession, Karl Pearson, wrote in The Grammar of Science, all of science (i.e., engineering or medicine) revolves around the uncertainty principle. Van der Bles et al. (2019) articulate well the importance of communicating uncertainty. The concept of uncertainty is an integral part of decision-making. Uncertainty has three ingredients – numbers, logic, and science. The science is at two levels. That is, it may be direct or indirect. Uncertainty is pervasive with scope to predict the likelihood of a future outcome. This process might be ambiguous, suspect, or uneasy. The risk of encountering an outcome is discussed in an investigation of uncertainty. A description of uncertainty is confounded with the selected model for analyzing the data. The eminent statistician George Box announced, “all models are wrong, but some are useful” (Skogen et al., 2021). Scientific hypotheses are usually cast in terms of parameters in the model that are not directly observed. Doubt concerning a hypothesis is connected to the variability within the sample population, data measurement error, computational or systematic inadequacies of the measured variable, limited knowledge of the underlying processes, or experts’ disagreement. The magnitude of uncertainty is vital in all of these situations. Direct uncertainty concerns the non-repeatability of the data. Indirect uncertainty reflects the credibility of the data. Communicating uncertainty is done step by step. An explicit underlying model with its properties is selected first for the data, an assessment of uncertainty, its verbal interpretation, alternate scenarios, and explicit denial of uncertainty if the evidence negates the interpretation. What is probability? Probability is a number that quantifies uncertainty. Probability is a speculative number applied to future rather than past incidence. Estimating probability does not eliminate or reduce uncertainty. In a

sense, nature plays a game the outcomes of which are not totally predictable for a variety of reasons. Probability is like a magnifying glass that examines the mystery behind an uncertain outcome. Probability does not alter uncertainty but rather helps us understand its existence and function. Objective as well as subjective sources of data unravel the uncertainty associated with a specified outcome. The Bayes theorem explains the process of updating prior knowledge by incorporating new data. Among many outcomes, two might be independent. The independence between the two outcomes might become conditionally dependent due to the presence of the third outcome. The frequency-based approach suggests that the probability of an outcome is an empirical proportion of the number of times an outcome might occur over the number of possible observations. Probability is a number between zero and one that narrates the likelihood of an uncertain outcome. The first axiom is 0 ≤ PrðAÞ ≤ 1 for an outcome A, where the notation Prð:Þ is an abbreviation for probability. When the probability is zero, the outcome is unlikely. When the probability is one, the outcome is certain. The probability of a patient being healthy is one-half, meaning it is one of two possibilities (healthy or ill). A person cannot be judged both healthy and ill as health and illness are mutually exclusive. The collection of all possibilities is the sample space O. The second axiom is PrðOÞ ¼ 1:0, meaning data collection has to observe one or the other. Another example of the use of probability in healthcare is blood type. The sample space is the entire collection O ¼ fA; B; AB; Og. The statement PrðOÞ ¼ 1:0 implies blood type is always one possibility among four, with each of the four blood types being mutually exclusive of the other three. In other words, PrðOÞ ¼ PrðAÞ þ PrðBÞ þ PrðABÞ þ PrðOÞ, where PrðAÞ; PrðBÞ; PrðABÞ; PrðOÞ are marginal proportions for the blood types A; B; AB; O; respectively. Only when these proportions are exactly 25% are the blood types considered equally likely. The probability statement PrðOÞ denotes the proportion of patients with a blood type other than O, which suggests it could be any one blood type among the remaining three. The complementary outcome O refers to any outcome other than O. In other words, PrðOÞ ¼ 1  PrðOÞ ¼ PrðAÞ þ PrðBÞ þ PrðABÞ in this scenario. In general, PrðOÞ ¼ 1, 0 ≤ PrðAÞ ≤ 1. The union O ∪ A indicates that blood types O and A are mutually exclusive. In other words, PrðO ∩ AÞ ¼ Prð∅Þ ¼ 0 and PrfO ∪ AÞ ¼ PrðOÞ þ PrðAÞ, where ∩ and ∅ symbolize unity and an empty set, respectively. What if this is not the outcome of interest? Two rules

109 https://doi.org/10.1017/9781009212021.007 Published online by Cambridge University Press

Data-Guided Healthcare Decision Making

(a)

(b)

(c)

Figure 5.1 (a) Complementary. (b) Disjoint outcomes. (c) Overlap outcomes.

connect complementary outcomes with union (equivalently, either or counting) and intersection (togetherness), according to probabilist Augustus DeMorgan (1806– 1871). If the outcomes A and B overlap in a Venn diagram (meaning they can be seen together), then PrðA ∪ BÞ ¼ PrðAÞ þ PrðBÞ  PrðA ∩ BÞ. When there is no overlap (meaning they are mutually exclusive, with a statement A ∩ B ¼ ∅, an empty space in the Venn diagram), then PrðA ∩ BÞ ¼ 0. Consequently, PrðA ∪ BÞ ¼ PrðAÞ þ PrðBÞ. When two outcomes are observed as independent of each other, then the conditional probability of outcome A to occur in the presence of outcome B is exactly the same as the marginal probability of outcome A. The conditional probability of outcome A to occur in the absence of outcome B is exactly the same as the marginal probability of outcome A. This means PrðAjBÞ ¼ PrðAÞ ¼ PrðAjBÞ. Equivalently, the probability of the simultaneous occurrence of outcomes A and B is the product of their marginal probabilities. That is, PrðA ∩ BÞ ¼ PrðAÞPrðBÞ. To see this result, let us start with the definition of independence and expand. The conditional probability ∩ BÞ statement simplifies as: PrðAjBÞ ¼ PrðA PrðBÞ ¼ PrðAÞ. That is, PrðA ∩ BÞ ¼ PrðAÞPrðBÞ, if A and B are independent. Also, it is easy to prove that, if A and B are independent, then any combination (with the presence of A or its absence A or with the presence of B or its absence B) is independent. In the probability framework, the correlation between A and B is jcorrðA; BÞj ¼ PrðAÞPrðBÞ PrðA ∩ BÞ . When A and B are independent, their correlation is just one. As an example of conditional probability, consider two aspects about a COVID-19 patient entering a hospital during the pandemic. The two aspects are having health insurance (A) and seeking admission (B) to the ICU. The conditional probability of A, given B, is ∩ BÞ PrðAjBÞ ¼ PrðA PrðBÞ ; if PrðBÞ≠0. If the hospital does not have an ICU or no one has sought admission into ICU in the past, then PrðBÞ ¼ 0 and conditional probability is not applicable. Otherwise, conditional probability is meaningful and estimable. To understand the independence, suppose 60% of all patients sought admission to the ICU. Then PrðBÞ ¼ 0:60

110 https://doi.org/10.1017/9781009212021.007 Published online by Cambridge University Press

and conditional probability fits. From archived hospital records, the analyst has noticed 40% of ICU patients had health insurance, which means PrðA ∩ BÞ ¼ 0:40. The conditional probability of A given B is: ∩ BÞ 0:40 2 PrðAjBÞ ¼ PrðA PrðBÞ ¼ 0:60 ¼ 3 ≈ 0:67. Assume 50% of patients entering a hospital have health insurance, whether or not they sought admission to ICU. In other words, PrðAÞ ¼ 0:50 without any additional knowledge. The presence of B has increased the probability of A from 0.50 to 0.67, implying their dependence. In other words, PrðAjBÞ≠PrðAÞ or PrðAjBÞ ¼ PrðAÞPrðBÞ. The concept of conditional probability is the foundation of all science, according to Pearson. Furthermore, a concept exists called conditional independence that is symbolically stated as PrðAjB; CÞ ¼ PrðAjCÞ, meaning the presence of outcome B does not add new information on the likelihood of outcome A. Such conditional independence expands to a generalization PrðA ∩ BjCÞ ¼ PrðAjCÞPrðBjCÞ, meaning that, in the population possessing the characteristic C, the probability of both A and B occurring is the product of the conditional probability of A given C times the conditional probability of B given C. Conditional independence is the foundation of what is called partial correlation. What is the correlation ρxy between the quantifiable variables x and y? The correlation is symmetric in the sense ρxy ¼ ρyx . The correlation is a standardized measure of the covariance between x and y in the sense that the correlation is always between −1 and 1. When the correlation is −1, the relation between them is a perfect inverse. When the correlation is 1, the relation between them is such that, if one variable increases (decreases) at a certain rate, then the other variable increases (decreases) at the same rate. When the correlation is zero, there is no visible relation between. An interpretation of zero correlation has to be done cautiously. If two measured variables are independent of each other with no physical connection, then their correlation coefficient should be zero. The converse is not always true. The fact that the correlation coefficient between the two variables is zero does not mean the two variables are

Uncertainties and Their Impact on Healthcare Decisions

Table 5.1 Data on age ðxÞ, blood pressure ðyÞ, and body mass index ðzÞ

Age in years

Blood pressure in mmHg

BMI in Kg/m2

1

52

103

15

2

34

169

22

3

99

198

30

4

46

93

14

5

52

104

16

6

93

186

28

7

47

93

14

8

75

151

23

9

41

81

21

10

54

107

26

Patient ID

necessarily independent. As an example, age and body weight are positively correlated. Another example is the following. Note the number of hours sleeping and the age of an infant are negatively correlated. The partial correlation is a quantifiable, standardized measure of the two variables x and y when a third intervening variable z is fixed. The partial correlation is indicated by ρxyjz and the partial correlation has the property 1 ≤ ρxyjz ≤ 1. The two variables are conditionally independent of each other if and only if their partial correlation ρxyjz is zero. This is called vanishing partial correlation, according to which two variables x and y are conditionally independent of each other if ρxy ¼ ρxz ρzy . To comprehend these statements, consider the data in Table 5.1. Their correlations are ρage;bloodPressure ¼ meaning 0:73; ρage;BMI ¼ 0:70; ρBMI;bloodPressure ¼ 0:77, ρage;bloodPressure ¼ 0:73 is not equal to the product ρage;BMI ρBMI;bloodPressure ¼ 0:54. There is no vanishing partial correlation and hence no conditional independence. The Bayesian approach to unraveling uncertainty is interesting in the sense that it updates prior opinion to posterior opinion after blending old and new information via the likelihood function. Today’s posterior opinion becomes tomorrow’s prior opinion. The Bayesian approach is a continual process. The Bayesian approach is now accepted and applied in many disciplines, including healthcare. In the past, the Bayesian approach was controversial. Reverend Thomas Bayes was a clergyman and mathematician. Bayes summarized his conditional probability concept in terms of what is now popularly called the Bayes theorem in his classic 1763 publication, An Essay towards Solving a Problem in the Doctrine of Chances. The Bayes theorem is the foundation for artificial intelligence, and it

expresses a degree of belief as a probability. Bayesian inference is fundamental in data analysis. The rationale behind Bayesian thought is the following. ∩ BÞ Start with the conditional probability PrðAjBÞ ¼ PrðA PrðBÞ . Outcome B can occur with or without the presence of outcome A. That means PrðBÞ ¼ PrðB ∩ AÞ þ PrðB ∩ AÞ. The simultaneous (compound) outcomes of both A and B are indicated by B ∩ A. The notation B ∩ A refers to the simultaneous occurrence of B and the absence of A. Outcomes A and A are mutually exclshowusive. Now, PrðB ∩ AÞ ¼ PrðAÞPrðBjAÞ and PrðB ∩ AÞ ¼ PrðAÞ PrðBjAÞ. Upon substitution, Bayes discovered that PrðA ∩ BÞ PrðA ∩ BÞ ¼ PrðBÞ PrðA ∩ BÞ þ PrðA ∩ BÞ PrðAÞPrðBjAÞ if PrðBÞ≠0: ¼ PrðAÞPrðBjAÞ þ PrðAÞPrðBjAÞ

PrðAjBÞ ¼

In the context of using data y and the unknown parameter θ of the chance-oriented healthcare mechanism, Bayesian thought transforms, with A ¼ y; B ¼ θ, to PrðθÞPrðyjθÞ if PrðyÞ≠0. Note PrðθÞ, PrðθjyÞ ¼ PrðθÞPrðyjθÞþPrðθÞPrðyjθÞ

PrðθjyÞ, PrðyjθÞ in the numerator and the denominator PrðyÞ are the prior, posterior, likelihood, and marginal probability functions, respectively. The ratio of the probability of the presence of an outcome over the probability of its absence is called the odds of the outcome. The odds and probability are synonymous in the sense that one can be configured if the other is known. For example, when a cancer researcher mentions that the odds of getting lung cancer in a location is 1 out of 10,000, that means that, for every 10,000 persons without lung cancer, there is just 1 person with it. A statistician translates this information to a probability using the conversion Odds . The probability of lung cancer is probability ¼ 1þOdds

ð1=10:000Þ ≈ 0:00009. slim 1þð1=10;000Þ DeMorgan’s law 1 states the complement of union of two outcomes A and B is the intersection of their complementary outcomes A and B. Symbolically stated, !   A[B¼A \ B . DeMorgan’s law 2 states the complement of the intersection of A and B is the union of A and B. !   Symbolically stated, A \ B ¼ A [B. Note D ¼ PrðA ∩ BÞPrðA ∩ BÞ  PrðA ∩ BÞPrðA ∩ BÞ is called the probability determinant in   a matrix arrangePrðA ∩ BÞ PrðA ∩ BÞ of intersection ment Μ ¼ PrðA ∩ BÞ PrðA ∩ BÞ probabilities. The determinant is not zero, only if PrðAÞ≠0; PrðAÞ≠1 or LikeðBÞ≠LikeðBÞ), where PrðBjAÞ PrðBjAÞ LikeðBÞ ¼ PrðBjAÞ and LikeðBÞ ¼ PrðBjAÞ are the likelihood

111 https://doi.org/10.1017/9781009212021.007 Published online by Cambridge University Press

Data-Guided Healthcare Decision Making

of the presence of B and the absence of B given A. The matrix is invertible in the sense that M  M1 ¼  1 0 ¼ M 1  M, where 0 1 Μ1 ¼

1 fPrðBjAÞPrðBjAÞ  PrðBjAÞPrðBjAÞg   PrðBjAÞ=PrðAÞ PrðBjAÞ=PrðAÞ PrðBjAÞ=PrðAÞ PrðBjAÞ=PrðAÞ

provided PrðBjAÞ ¼ PrðBjAÞ and PrðBjAÞ ¼ PrðBjAÞ. Known matrix results are  utilizedin the following a11 a12 , the determinant steps. Given a matrix, M ¼ a21 a22 equation is λ2  ða11 þ a22 Þλ þ ða11 a22  a12 a21 Þ ¼ 0. If ða11  a22 Þ2 þ 4a12 a21 > 0; then there are two distinct and real roots. If ða11  a22 Þ2 þ 4a12 a21 ¼ 0; then those roots are the same. If ða11  a22 Þ2 þ 4a12 a21 < 0; then there are two conjugate imaginary roots. In our context, a11 ¼ PrðA ∩ BÞ, a12 ¼ PrðA ∩ BÞ, a21 ¼ PrðA ∩ BÞ, and a22 ¼ PrðA ∩ BÞ. The factor is then ða11  a22 Þ2 þ 4a12 a21 ¼ 0 ¼ 2 fPrðA ∩ BÞ  PrðA ∩ BÞg þ 4PrðA ∩ BÞPrðA ∩ BÞ. In trying to clearly understand uncertainty, professionals have defined the following. The precision of an ∩ Bj outcome B is enhanced by an amount PrjA PrjAj due to an ∩ Bj outcome A. Is it reciprocal? The answer is PrjA PrjBj , if A and B denote retrieved and relevant outcomes. The similarity in jA ∩ Bj ; 0 < α < 1, A and B is SimilarityðA; BÞ ¼ αjAjþð1αÞjBj where 0 ≤ α ≤ 1 is an arbitrary value. When the value α ¼ 0 is selected, outcome jAj is discounted with the full involvement of outcome jBj. When the value α ¼ 1 is selected, outcome jBj is discounted with the full involvement of outcome jAj. Otherwise (i.e., 0 < α < 1), outcomes jBj and jAj are proportionally involved. The measure of overlap between outcomes jBj and jAj is jA ∩ Bj . The relevance between outOverlapðA; BÞ ¼ minfjAj;jBjg jA ∩ Bj comes jBj and jAj is RelevanceðA; BÞ ¼ jA ∪ Bj, where ∪ denotes either A or B, meaning at least one of them, and ∩ denotes togetherness. The angle of the one-to-one relation between outcomes jA ∩ Bj ffiffiffiffiffiffiffiffi. Outcome A might jBj and jAj is AngleðA; BÞ ¼ p jAjjBj

enhance outcome B by a probability level PrðAjBÞ EnhancementðA ↑ BÞ ¼ PrðAÞPrðBÞ , where the notation PrðAjBÞ denotes the conditional probability of outcome A appearing when outcome B has already occurred. Likewise, outcome A might dampen outcome B by a probability level PrðAjBÞ . DampensðA ↓ BÞ ¼ PrðAÞPrðBÞ

112 https://doi.org/10.1017/9781009212021.007 Published online by Cambridge University Press

The treatment of patients relies heavily on diagnostic tests. Notable diagnostic tests include sugar level in blood, white blood count, calcium, urea in urine, and biopsy, among many others. The diagnostic analysis is often performed before physicians/nurses prescribe medications. The outcome in every diagnostic test is either positive (þ) or negative (), whether or not the participant is diseased (D) or healthy (D). Outcomes D and D are mutually exclusive, as are positive and negative outcomes. Mutually exclusive indicates that it is impossible to observe the two outcomes simultaneously, which is indicated by PrðD ∩ DÞ ¼ 0 ¼ Prð þ ∩ Þ, where ∩ is again the notation for unity. Their cross-combinations þ ∩ D;  ∩ D; þ ∩ D;  ∩ D are possible but do not necessarily have equal probabilities. The conditional probabilities of positive and negative test results are, respectively, sensitivity Se ¼ PrðþjDÞ and specificity Sp ¼ PrðjDÞ. From a physician’s point of view, sensitivity and specificity need to be higher. When sensitivity is higher, the diagnostic is considered accurate among the diseased. Likewise, when specificity is higher, the diagnostic is considered accurate among the healthy. Physicians select a diagnostic test with higher sensitivity as well as higher specificity. To check whether a diagnostic is inferior or superior, Youden devised a combined index, which is YI ¼ Se þ Sp  1. When the Youden index is positive, the diagnostic is judged superior. When the Youden index is negative, the diagnostic is inferior. The receiver operating characteristic (ROC) curve is the graph of sensitivity in terms of one minus the specificity. The maximum vertical line from the positive diagonal line connecting the corner Euclidean points (0, 0) and (1, 1) in the unit square containing the ROC is the Youden index. However, the proportion π ¼ PrðDÞ of participants taking the diagnostic test might be diseased, but the proportion remains unknown. Note 1  π ¼ PrðDÞ, the proportion of healthy participants, is also an unknown (see Table 5.2). Using the aforementioned probability concepts, the probability of a person receiving a positive test result is PrðDÞPrðþjDÞ ¼ πSe . Likewise, the probability of a person receiving a negative test result is PrðDÞPrðjDÞ ¼ ð1  πÞSp . The probability of a person being ill and receiving a negative test result is PrðDÞPrðjDÞ ¼ πð1  Se Þ. The probability of a person being healthy and receiving a positive test result is PrðDÞPrðþjDÞ ¼ ð1  πÞð1  Sp Þ. The marginal probability of receiving a positive test result is PrðþÞ ¼ ð1  Sp Þ þ πYI, which is an upward line with initial value ð1  Sp Þ and a positive slope πYI. As an example, consider the incidence of COVID-19. When a pandemic such as the recent COVID-19 pandemic intensifies, the prevalence rate π of the disease increases and

Uncertainties and Their Impact on Healthcare Decisions

consequently the chance of a positive test result increases for any citizen tested for it. Healthcare professionals ought to attribute the higher proportion of positive test results to both factors – the intensifying pandemic and/or improved testing availability. This situation can be simplified with the help of a statistician via an experiment designed to collect pertinent data. The possibilities are summarized in Tables 5.2 and 5.3. The marginal probability of receiving a negative test result is PrðÞ ¼ 1  PrðþÞ ¼ Sp  πYI, which is a downward line with an initial value Sp and a negative slope πYI. The proportion receiving a negative test result needs to be more initially, which is achieved by improving the test. This situation also requires simplification. Healthcare is an interactive enterprise involving patients as much as healthcare professionals. So far, this book as explored how healthcare professionals provide the best care for patients. Now, this chapter adopts the point of view of patients, diagnostic test results, and patients’ true state with respect to an illness. Not all participants who receive a positive test result have the illness, and not all participants who obtain a negative test result are healthy (i.e., immune from the illness). To explain these conflicting data, Shanmugam (2008) constructed Table 5.3. Recall the conditional probabilities PPV ¼ PrðDjþÞ and NPV ¼ PrðDjÞ are termed the positive predictive value (PPV) and the negative predictive value (NPV), respectively. From patients’ point of view, the PPV and NPV need to be higher. When the PPV is higher, the diagnostic is considered accurate for those with positive test results in the sense that they are mostly (not necessarily all) vulnerable to Table 5.2 Intersection of test results and state of the participants State→ Result ↓

Illness ðDÞ

Healthy ðDÞ

Total

þ

πSe

ð1  πÞð1  Sp Þ

ð1  Sp Þ þ πYI



πð1  Se Þ

ð1  πÞSp

Sp  πYI

Total

π

ð1  πÞ

1

the illness. Likewise, when the NPV is higher, the diagnostic is considered accurate for those with negative test results, meaning they are mostly (not necessarily all) immune to the illness. The diagnostic might be inferior or superior. In this unclear scenario, the Shanmugam index Sh ¼ PPV þ NPV  1 might be appropriate to address patients’ concern. When the Shanmugam index is positive, the diagnostic is superior from patients’ point of view. When the Shanmugam index is negative, the diagnostic is inferior from patients’ point of view. The ROC is the graph of PPV in terms of one minus the NPV. See Table 5.3 for additional categorizations. A tendency to overestimate the probability of an outcome is called conjunctive. A tendency to underestimate the probability of an outcome is called disjunctive. Whether the prior odds are conjunctive or disjunctive, the likelihood ratio (LR) will moderate the prior wrong guess and rectify it for better statistical power when extracting and utilizing data (see Figure 5.2). The probability of receiving a positive test result is tied to both sensitivity and specificity from their intrinsic relation: PrðþÞ ¼ PrðDÞYI þ 1  Sp. In the diagnostic literaSe e and LRðÞ ¼ 1S play ture, the ratios LRþ ¼ 1S Sp p a significant role in expressing the total power of the test. A discussion of these ratios is beyond the scope of this book. The following queries arise in diagnostic discussions. How to distinguish an abnormal test result from a normal one? How to appraise a test’s ability to reveal a patient’s true state? What is the optimal level of a diagnostic test? What are the pitfalls of the predictive values? How to identify the sources of any bias? Is the bias avoidable? How to construct a diagnostic test when the measurement of a variable falls on a continuous scale? How to integrate data from several diagnostic tests? These questions are answered utilizing the ROC. The ROC is a two-dimensional graph of sensitivity on the vertical axis and one minus the specificity on the horizontal axis. The cut-off value of a diagnostic test is the locus on the ROC curve at which the slope of the tangent

Table 5.3(a) Intersection of test results and state of the participants State→ Result ↓

Illness ðDÞ

Healthy ðDÞ

Total

þ

½ð1  Sp Þ þ πYIPPV

½ð1  Sp Þ þ πYI½1  NPV

ð1 þ PPV  NPVÞ ð1  Sp þ πYIÞ



½Sp  πYI½1  PPV

½Sp  πYINPV

ðNPV þ 1  PPVÞ ðSp  πYIÞ

Total

PPVð1  Sp Þ þ πYIfPPV  ð1  PPVÞg

fð1  NPVÞð1  Sp Þ þ ðNPVÞSp g þ πYIfð1  NPVÞ  NPVg

1 þ ðPPV  NPVÞ f1  2ðSp  πYIÞg

113 https://doi.org/10.1017/9781009212021.007 Published online by Cambridge University Press

Data-Guided Healthcare Decision Making

Figure 5.2 (a) Evidence in favor of the outcome. (b) Evidence against the outcome.

1

Posterior odds (LR > 1)

(LR < 1)

Prior odds

0

satisfies the relation slope of ROC ¼ HB PrðDÞ PrðDÞ , where H is the net harm and B is the net benefit. The net harm is due to a false positive result. The net benefit is due to a true positive result. When data are time oriented, Markov models are appropriate. In general, the Markov model describes how test results change over time. The Markovian concept was initiated by Russian mathematician Andrey Markov. The literature is vast covering novel applications of conditional probabilities and dependencies between two healthcare outcomes A and B. Another measure of data is indicated by a dilution level of independence. The originator of this highly appreciated concept is Shogenji (1999). According to his definition, it is ShðA; BÞ ¼ PrðA ∩ BÞ PrðAÞPrðBÞ. This idea is related to an overlap index, PrðA ∩ BÞ OverlapðA; BÞ ¼ PrðA ∪ BÞ, and factual support, which, by PrðAjBÞPrðAjBÞ . definition, is FactualSupportðA; BÞ ¼ PrðAjBÞþPrðAjBÞ

Another favorite measure of dependency is defined as Favorite ¼ PrðAjBÞ  PrðAÞ. The relevance measure of dependency is Relevance ¼ PrðAjBÞ PrðAÞ . The odds of mutual PrðAjBÞ dependency are Odds ¼ PrðAjBÞ . Schippers (2014) introduced

a measure called coherence to capture a level of dependency: 8 PrðAjBÞ  PrðAjBÞ > >  < PrðAjBÞ ≥ PrðAÞ 1  PrðAjBÞ SchippersðA; BÞ ¼ if otherwise > PrðAjBÞ  PrðAjBÞ > : PrðAjBÞ by his definition. Carnap (1962) defines a relevance measure of an outcome’s dependency: PrðA ∩ BÞ  PrðAÞPrðBÞ. He also introduced a counterfactual measure of an outcome’s dependency: PrðAjBÞ  PrðAjBÞ. A corroboration measure n o of dependency is CorroborationðA; BÞ ¼ PrðAjBÞPrðAÞ PrðAjBÞþPrðAÞ

f1 þ PrðBÞPrðBjAÞg. From these equations,

114 https://doi.org/10.1017/9781009212021.007 Published online by Cambridge University Press

n mutual support for the evidence is

PrðAjBÞPrðAÞ 1PrðAÞ

o PrðBÞ.

The z-measure of such support is defined as 8 PrðAjBÞ  PrðAÞ > >  < PrðAjBÞ ≥ PrðAÞ 1  PrðAÞ . An ingredient if PrðAjBÞ  PrðAÞ otherwise > > : PrðAÞ of the measure of coherence is 1

PrðBjAÞ . PrðBÞ

PrðAÞ PrðAjBÞ

or its complement

A measure of justification is

ln PrðAjBÞln PrðAÞ . ln PrðAÞ

A

comparison of the effectiveness of all of these measures of dependency has not yet been performed. The similarity between two groups i and j is Similarityði; jÞ ¼ 1  distanceði; jÞ. The distance could be ðpmÞ p , where m is the number of mismatches between the two groups and p is the total number of attributes in each group. In a diagnostic testing, the distance could be or dðSe; SpÞ ¼ πSeþð1πÞSp dðSe; SpÞ ¼ πSeþð1πÞSp πð1SeÞ ð1πÞð1SpÞ , due to the Jaccard coefficient. There are several variations of Paul Jaccard’s index, which quantifies the closeness between two related healthcare outcomes. The general definition of the Jaccard index h i1 PrðA ∩ BÞ PrðAÞ PrðBÞ ¼ þ  1 , which is JðA; BÞ ¼ PrðA ∪ BÞ PrðA ∩ BÞ PrðA ∩ BÞ falls in the interval [0, 1]. Equivalently, the Jaccard index is JðA; BÞ ¼ ½fPrðBjAÞg1 þ fPrðAjBÞg1  11 , which simplifies to JðA; BÞ ¼ ½fPrðBg1 þ fPrðAg1  11 when

Table 5.3(b) Closeness between diagnostic test results and reality of being with or without disease State→ Test result ↓

Illness D 1

Healthy D 1

 1

½fð1  πÞð1  SpÞg1  11

þ

½fπSeg



½fπð1  SeÞg1  11 ½fð1  πÞSpg1  11

Uncertainties and Their Impact on Healthcare Decisions

A and B are independent. When A and B are empty, the Jaccard index is one. A variation of Jaccard’s concept yields πSe JðD; þÞ ¼ πSeþπð1SeÞþð1πÞð1SpÞ ¼ ½fπSeg1  11 as a measure of closeness between being with the disease (D) and receiving a positive test result (+). Likewise, the Jaccard index JðD; Þ ¼ πð1SeÞ 1 1  1 is a measure πð1SeÞþπSeþð1πÞSp ¼ ½fπð1  SeÞg of the closeness between being with the disease (D) and receiving a negative test result (−). The Jaccard index ð1πÞð1SpÞ JðD; þÞ ¼ ð1πÞð1SpÞþπSeþð1πÞSp ¼ ½fð1  πÞð1  SpÞg1 11 exhibits the closeness between being without the disease (D) and receiving a positive test result (+). The ð1πÞSp ¼ Jaccard index JðD; Þ ¼ ð1πÞSpþð1πÞð1SeÞþð1πÞð1SpÞ

½fð1  πÞSpg1  11 identifies the closeness between being without the disease (D) and receiving a negative test result (−). See Table 5.3 for a summary. The literature defines the minimum conditional confidence as CImin ¼ minfPrðA=BÞ; PrðB=AÞg. The maximum conditional confidence is CImax ¼ maxfPrðA=BÞ; PrðB=AÞg. Kulczynski developed a compromise measure between these two confidence intervals (see Albatineh et al., 2006 for details), which is KulezynskiðA; BÞ ¼ ð1=2ÞfPrðA=BÞ þ PrðB=AÞg. The harmonizedffi lift measure (HLM) is HLMðA; BÞ ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi PrðA=BÞPrðB=AÞ. The imbalance measure is PrðAÞPrðBÞ ¼ PrðAÞPrðBÞ imbalanceðA; BÞ ¼ PrðAÞþPrðBÞPrðAUBÞ PrðA ∩ BÞ . Carnap (1962) suggests Carnap1 ðA; BÞ ¼ PrðAjBÞ  PrðAÞ; if PrðBÞ > 0 and Carnap2 ðA; BÞ ¼ PrðA ∩ BÞ PrðAÞPrðBÞ; if PrðBÞ > 0 to portray the impact of outcome B on outcome A. Christensen (1999) introduces a combined measure of the presence and absence of outcome B on outcome A: ChristensenðA; BÞ ¼ PrðAjBÞ  PrðAjBÞ; if 1 > PrðEÞ > 0. Crupi et al. (2009) introduce a measure of the influence of outcome B on outcome A: 8 C1min ðA; BÞ 8 > > > > < 1  PrðAÞ < PrðAjBÞ ≥ PrðAÞ Crupimin ðA; BÞ ¼ C1min ðA; BÞ ; if PrðAjBÞ < PrðAÞ > : > PrðAÞ ¼ 0 > > : 1  PrðAÞ 1 or

8 C1max ðA; BÞ 8 > > > > < 1  PrðAÞ < PrðAjBÞ ≥ PrðAÞ Crupimax ðA; BÞ ¼ C1max ðA; BÞ ; if PrðAjBÞ < PrðAÞ : > : > PrðAÞ ¼ 0 > > : 1  PrðAÞ 1 Good (1960) and Fitelson (2001) n separately o apply a measure,

GoodFitelsonðA; BÞ ¼ ln

PrðAjBÞ PrðAjBÞ

; if PrðAjBÞ > 0;

PrðAjBÞ > 0, to describe the combined influence of the presence and absence of outcome B on outcome A. Kemney and Oppenheim (1952) promote a measure, KemneyOppenheimðA; BÞ ¼

PrðAjBÞ  PrðAjBÞ ; PrðAjBÞ þ PrðAjBÞ if PrðAjBÞ > 0; PrðAjBÞ > 0;

as an influence of the presence and absence of outcome B on outcome A.n Milne o (1996) offered a measure, MilneðA; BÞ ¼ ln PrðAjBÞ PrðAÞ ; if PrðAjBÞ > 0, to capture the influence of outcome B on outcome A. Mortimer (1988) introduced a different measure, MortimerðA; BÞ ¼ PrðAjBÞ PrðAÞ; if PrðBÞ > 0, to quantify the influence of outcome B on outcome A. Nozick (1981) introduced a countermeasure, NozickðA; BÞ ¼ PrðBjAÞ  PrðBjAÞ; if PrðAÞ > 0, to capture the influence of outcome B on outcome A. In the context of a diagnostic test, the precision of data can be worked out using πSe=ðπSe þ ð1  πÞ ð1  SpÞÞ. The recall measure is estimated using the formula πSe=ðπSe þ πð1  SeÞÞ. The Bayesian posterior probability, f ðθjyÞ, is a refinement over the prior probability, f ðθÞ, due to new data. This becomes feasible with the involvement of the likelihood function LðyjθÞ and the marginal (predictive) density mðyÞ. That is, f ðθjyÞ ¼ f ðθÞLðyjθÞ=mðyÞ. This process is noticed graphically in a Venn diagram (see Figure 5.1). In this era of abundant data, measures of distance help explain the dependency between healthcare outcomes. The Euclidean distance between two vectors x ¼ ðx1 ; x2 ; …::; xn Þ and y ¼ ðy1 ; y2 ; …::; yn Þ is dðx; yÞ ¼ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ðx1  y1 Þ2 þ ðx2  y2 Þ2 þ …: þ ðxn  yn Þ2 . The cosine similarity between the two vectors is cosineðx; yÞ ¼ x:y ‖ x ‖ ‖ y ‖ ¼ corrðx; yÞ. Or one can use the Tanimoto index,

cosineðx; yÞ ¼ ‖ x ‖ þx:y ‖ y ‖ x:y, to capture the distance between the two observed vectors. However a researcher calculates the distance between vectors x and y, he or she must capture three major correlations between them. The vectors might be characteristics of patients or healthcare services they received. Famous correlations include Pearson’s correlation, Spearman’s correlation, and Mathew’s correlation. All three require the values of the quantitative variables to have a symmetric, bell-shaped (Gaussian or normal) frequency pattern and fall within three standard deviations from the mean. Pearson’s correlation is a parametric version in the sense that the data values are involved in the correlation’s final value, meaning the data have no outlier.

115 https://doi.org/10.1017/9781009212021.007 Published online by Cambridge University Press

Data-Guided Healthcare Decision Making

Table 5.4 Smoking versus malignant lung cancer. (Source: Shanmugam, 2008) Cancer→ Smoking ↓

Malignant lung cancer ðLÞ

Benign lung cancer ðLÞ

Total

Chain smoker ðSÞ

60

20

Non-chain smoker ðSÞ

10

10

20

Total

70

30

100

An outlier is an uneven impact or a distortion. With the presence of one or more outliers, Spearman’s nonparametric correlation is appropriate. To calculate Spearman’s correlation, the values of both vectors have been ranked separately and the correlation of their mutual rank in the modified vectors is calculated. Mathew’s correlation is devised for an analysis of 2 × 2 categorical data. Mathew’s correlation falls in the closed domain ½1; 1. The term closed here refers to the possible inclusiveness of endpoints −1 and +1. The correlation value is indicative of a linear (i.e., upward, downward, or horizontal) trend, but never of a curve relationship. When the correlation is zero, it does not imply the two characteristics are independent unless the data on both variables are drawn from a bivariate Gaussian population. However, the correlation based on collected data of two independent characteristics ought to be near zero. Their computing formulas are: ρPearson ¼

n X

ðxi  xÞðyi  yÞ=ðn  1Þ;

i¼1

ρSpearman ¼

b X

ðdi  dÞ2 =ðn  1Þ;

i¼1

Se þ Sp  1 ρMathew ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ; fSe þ 1  Sp gfSp þ 1  Se g where n; x; y are sample size, average of one variable x; and another variable y. The factor ðn  1Þ is labeled degrees of freedom (df). In this context, some healthcare professionals use the Hopkins statistic y=ðy þ xÞ as a correlation measure.

5.3 Illustration To comprehend the concepts in the previous section, let us consider in this section the example of chain smoking versus the incidence of malignant lung cancer. Consider the following data connecting smoking (S) and malignant lung cancer (L). From the hospital records, researchers can collect and classify the data as in Table 5.4.

116 https://doi.org/10.1017/9781009212021.007 Published online by Cambridge University Press

80

70 80 ¼ 1  PrðLÞ, PrðSÞ ¼ 100 ¼ Note PrðLÞ ¼ 100 60 1  PrðSÞ, PrðSjLÞ ¼ 70 ¼ 1  PrðSjLÞ and PrðSjLÞ ¼ 20 30 ¼ 1  PrðSjLÞ. The odds of smoking are oddsðSÞ ¼ PrðSÞ PrðSÞ

¼ 4. The odds of getting malignant lung cancer are

PrðLÞ ¼ 7. This means for every one chain oddsðLÞ ¼ PrðLÞ

smoker there are four non-chain smokers. For every malignant lung cancer case, there are seven benign lung cancer cases. The following assessments are based on these data. The probability of a person being a chain smoker is PrðSÞ ¼ 0:80. The probability of any person having malignant lung cancer is PrðLÞ ¼ 0:70. The probability of anyone being a chain smoker as well as having malignant lung cancer is PrðS ∩ LÞ ¼ 0:60. The probability of a randomly chosen person being a chain smoker or having malignant lung cancer is: PrðS ∪ LÞ ¼ PrðSÞ þ PrðLÞ  PrðS ∩ LÞ ¼ 0:80 þ 0:70  0:60 ¼ 0:90: Are chain smoking and malignant lung cancer independent of each other? The answer depends on the conditional probability of an individual getting malignant lung cancer if he or she is a chain smoker. That is, PrðLjSÞ ¼ PrðL ∩ SÞ 0:60 PrðSÞ ¼ 0:80 ¼ 0:75, which is different from the unconditional probability of getting malignant lung cancer of 0.70, meaning the presence of chain smoking has inflated the chance of getting malignant lung cancer. Hence the likelihood of chain smoking and getting malignant lung cancer are not independent of each other. The correlation between two outcomes L and S is jcorrðS; LÞj ¼ PrðSÞPrðLÞ PrðS ∩ LÞ ¼ ð0:80Þð0:70Þ 0:60

¼ 0:56 0:60 ¼ 0:93. Any one in the collection fS; Sg is not independent of any one in the collection fL; Lg. Now, let us examine the validity of DeMorgan laws 1 and 2. The probability for the complement of the union is !  PrðS [ LÞ ¼ 1  PrðS [ LÞ¼ 1  0:90 ¼ 0:10 ¼ PrðS \ LÞ and ! PrðS \ LÞ ¼ 1  PrðS \ LÞ ¼ 0:40 ¼ PrðSÞ   PrðS [ LÞ  þPrðLÞ

Uncertainties and Their Impact on Healthcare Decisions

Given that a person has malignant lung cancer, how probable is it that he or she might have been a chain smoker? The answer is in substitution in the Bayes formula PrðSjLÞ ¼

PrðSÞPrðLjSÞ PrðSÞPrðLjSÞ þ PrðSÞPrðLjSÞ

¼

0:8ð0:75Þ ¼ 0:85; if PrðLÞ≠0: 0:8ð0:75Þ þ 0:2ð0:50Þ

Note D ¼ PrðS ∩ LÞPrðS ∩ LÞ  PrðS ∩ LÞPrðS ∩ LÞ ¼ 0:6ð0:1Þ 0:1ð0:2Þ ¼ 0:04 is the probability determinant matrix arrangement Μ¼    0:6 0:2Þ PrðS ∩ LÞ PrðS ∩ LÞ ¼ of intersection 0:1 0:1 PrðS ∩ LÞ PrðS ∩ LÞ probabilities. The determinant is not zero. Also PrðSÞ≠0; PrðSÞ≠1 or LikeðLÞ≠LikeðLÞ), where the PrðLjSÞ LikeðLÞ ¼ PrðLjSÞ ¼ 0:75 and LikeðLÞ ¼ 0:50 ¼ 1:5

in 

PrðLjSÞ PrðLjSÞ

a

¼ 0:25 0:50 ¼ 0:50 are the likelihoods of the presence of

L and the absence of L. The evidence in S is one and a half times more than the evidence in S in favor of an outcome L. With respect to the absence of L, the evidence in S is twice more than in S. The matrix is  invertible in the sense that on  1 0 ¼ M1  M, were M  M1 ¼ 0 1 1 Μ1 ¼ fPrðLjSÞPrðLjSÞ  PrðLjSÞPrðLjSÞg   PrðLjSÞ=PrðSÞ PrðLjSÞ=PrðSÞ PrðLjSÞ=PrðSÞ PrðLjSÞ=PrðSÞ   2:5 0:5 ¼ 2:5 1:5 provided fPrðLjSÞPrðLjSÞ  PrðLjSÞPrðLjSÞg ¼ 0:25.   0:6 0:2 Given a matrix, M ¼ , the determinant 0:1 0:1 2 equation is λ  0:7λ þ 0:04 ¼ 0. If ða11  a22 Þ2 þ 4a12 a21 ¼ 0:33, then there are two distinct and real roots, 0.162 and 0.739. The precision in outcome L is enhanced by an amount PrjS ∩ Lj 0:6 PrjSj ¼ 0:8 ¼ 0:75 due to outcome S. Is it reciprocated by outcome L? The answer is yes but by a different amount, PrjL ∩ Sj 0:6 PrjLj ¼ 0:7 ≈ 0:86, where S and L denote chain smoking and malignant lung cancer. The similarity in outcomes S and L is SimilarityðA; BÞ ¼ jS ∩ Lj 1 60 0 ≤ α ≤ 1; where αjSjþð1αÞjLj ¼ 80αþ70ð1αÞ ¼ 6½7 þ α ; 0 ≤ α ≤ 1 is an arbitrary value. When the value α ¼ 0 is selected, outcome jSj is discounted with the full

involvement of outcome jLj. When the value α ¼ 1 is selected, note outcome jLj is discounted with the full involvement of outcome jSj. Otherwise (i.e., 0 < α < 1), outcomes jLj and jSj are proportionally involved. The overlap measure between the outcomes jLj and jSj is jS ∩ Lj 60 ¼ minf80:70g ≈ 0:86. The releOverlapðL; SÞ ¼ minfjSj;jLjg vance between outcomes jLj and jSj is RelevanceðS; LÞ ¼ jS ∩ Lj 60 jS ∪ Lj ¼ ð60þ10þ20Þ ≈ 0:67, where the notation ∪ denotes either S or L, meaning at least one of them and the notation ∩ denote togetherness. The angle of the one-to-one relation between outcomes jLj and jSj is: AngleðS; LÞ ¼ jS ∩ Lj 6 p ffiffiffiffiffiffiffi ¼ pffiffiffiffiffiffi ≈ 0:80. jSjjLj

8ð7Þ

Outcome S might enhance outcome L by a probability PrðSjLÞ 0:86 level EnhancementðS ↑ LÞ ¼ PrðSÞPrðLÞ ¼ ð0:80Þð0:70Þ ≈ 1:53, where PrðSjLÞ denotes the conditional probability of outcome S appearing when outcome L is seen. Likewise, outcome S might dampen outcome B by a probability level PrðSjLÞ 0:67 ¼ 0:8ð0:3Þ ≈ 2:79. DampensðS ↓ LÞ ¼ PrðSÞPrðLÞ

Is chain smoking diagnostic of malignant lung cancer later? The sensitivity is then Se ¼ PrðSjLÞ ¼ 0:86 and the specificity is Sp ¼ PrðSjLÞ ¼ 0:33. From a physician’s point of view, the sensitivity is acceptable but the specificity needs improvement. When the sensitivity is higher, the diagnostic is considered accurate among malignant lung cancer cases. Likewise, when the specificity is not higher, the diagnostic is considered inaccurate among nonmalignant cancer cases. The physician needs to consider other symptoms for not having malignant cancer incidence. Is a diagnostic test inferior? To check whether the diagnostic is inferior or superior, a combined Youden index can be used: YI ¼ Se þ Sp  1 ¼ 0:19. Chain smoking is a factor but not the strongest cause of malignant lung cancer. When the Youden index is positive, the diagnostic is reasonably superior. A proportion π ¼ PrðLÞ ¼ 0:70 of the community are chain smokers, which is quite high. Note 1  π ¼ PrðLÞ ¼ 0:30 denotes an estimate of the unknown proportion of healthy participants. The probability of a person having malignant lung cancer and chain smoking is PrðLÞPrðSjLÞ ¼ πSe ¼ 0:70ð0:86Þ ¼ 0:60. Likewise, the probability of a person being without malignant lung cancer and chain smoking is PrðDÞPrðSjLÞ ¼ ð1  πÞSp ¼ 0:3ð0:33Þ ¼ 0:099. The probability of a person having malignant lung cancer and not chain smoking is PrðLÞPrðSjLÞ ¼ πð1  Se Þ ¼ 0:7ð1  0:86Þ ¼ 0:098. The probability of a person being without malignant lung cancer and chain smoking is PrðLÞPrðSjLÞ ¼ ð1  πÞ ð1  Sp Þ ¼ ð1  0:70Þð1  0:33Þ ¼ 0:201. The last two probabilities portray the conflict between chain smoking and the incidence of malignant lung cancer.

117 https://doi.org/10.1017/9781009212021.007 Published online by Cambridge University Press

Data-Guided Healthcare Decision Making

Table 5.5 Intersection of chain smoking and getting malignant lung cancer according to participants State→ Result ↓

Malignant lung cancer ðLÞ

No malignant lung cancer ðLÞ

Total

S

PrðS ∩ LÞ ¼ 0:50 þ 0:14π

PrðS ∩ LÞ ¼ 0:335 þ 0:195π

PrðSÞ ¼ 0:835 þ 0:335π

S

PrðS ∩ LÞ ¼ 0:082  0:047π

PrðS ∩ LÞ ¼ 0:165  0:095π

PrðSÞ ¼ 0:247  0:142π

Total

PrðLÞ ¼ 0:582  0:093π

PrðLÞ ¼ 0:500 þ 0:100π

0.25

0

0.5

0.75

1

y 0.769

0.769

0.384

0.384

0

0

0.25

0.5

0.75

x

1

0

Figure 5.3 Multiple probabilities

The marginal probability of being a chain smoker is PrðSÞ ¼ ð1  Sp Þ þ πYI ¼ ð1  0:33Þ þ 0:19π, which is an upward line with an initial value ð1  Sp Þ ¼ 0:67 and a positive slope YI ¼ 0:19 in terms of the malignant lung cancer rate. Writing it another way as π ¼ 3:52 þ 5:26PrðSÞ, the proportion π with malignant lung cancer is zero when the proportion PrðSÞ of chain smokers is 0.67. The proportion with malignant lung cancer increases when the proportion of chain smokers is greater than 0.67. The marginal probability of not being a chain smoker is PrðSÞ ¼ 1  PrðSÞ ¼ Sp  πYI ¼ 0:33  0:19π, which is a downward line with initial value Sp ¼ 0:33, and a negative slope YI ¼ 0:19 in terms of the malignant lung cancer rate. Writing it another way as π ¼ 1:74  5:26PrðSÞ, the proportion π of malignant lung cancer cases in the community decreases when the proportion PrðSÞ of non-chain smokers increases. Not all participants who are chain smokers have malignant lung cancer. Not all participants who are not chain smokers are immune to malignant lung cancer. The PPV of a chain smoker getting malignant lung cancer is PPV ¼ PrðLjSÞ ¼ 0:75. The NPV of a non-chain smoker not getting malignant lung cancer is NPV ¼ PrðLjSÞ ¼ 0:50. From patients’ point of view, the PPV is not high enough, but the NPV needs a lot of improvement.

118 https://doi.org/10.1017/9781009212021.007 Published online by Cambridge University Press

Whether the diagnostic is inferior or superior from patients’ point of view, the Shanmugam index is helpful: Sh ¼ PPV þ NPV  1 ¼ 0:25. When the Shanmugam index is positive, the diagnostic is superior from patients’ point of view. Patients’ belief is stronger than that of health professionals in terms of connecting chain smoking and getting malignant lung cancer since Sh is higher than YI. Please see Figures 5.3, 5.4, and 5.5 and Table 5.5. Note the probability of chain smoking in a diagnostic is tied to an increase in malignant lung cancer: PrðSÞ ¼ 0:835 þ 0:335π. In the diagnostic literature, the supportive and discourSe ¼ 1:28 and aging LRs for chain smoking are LRþ ¼ 1S p 1Se LRðÞ ¼ Sp ¼ 0:42. This means support for chain smoking is 3.04 times more than discouragement of chain smoking. The ROC is also of interest:   H PrðLÞ 0:500 þ 0:100π ¼D : slope of ROC ¼ B PrðLÞ 0:582  0:093π The harm is due to incorrectly attributing malignant lung cancer to chain smoking. The net benefit B is due to correctly naming chain smoking as a cause of malignant lung cancer. See Figure 5.6 for details. Notice the lines in Figure 5.6 are for increasing Pr ðS ∩ LÞ; decreasing PrðS ∩ LÞ; increasing PrðS ∩ LÞ; and decreasing PrðS ∩ LÞ as the

Uncertainties and Their Impact on Healthcare Decisions

Figure 5.4 Likelihood ratios

z=7.6

z=5

z=2.6

y=2.6

y=5

y=7.6

x=0.26 x=0.5 x=0.96

Figure 5.5 Harms versus benefits

prevalence π of malignant lung cancer increases. The probability of the presence of chain smoking with the presence (or absence) of malignant lung cancer increases as the prevalence π of malignant lung cancer increases. The probability of the absence of chain smoking with the presence (or absence) of malignant lung cancer decreases as the prevalence π of malignant lung cancer increases. Another example is the average time a patient spends in a health state, which is 1{1-survival probability}. Markov models are built based on their independence. Shogenji (1999) measures the dilution level of independence: PrðS ∩ LÞ ¼ 1:07. A measure of relative overlap ShðS; LÞ ¼ PrðSÞPrðLÞ PrðS ∩ LÞ is OverlapðS; LÞ ¼ PrðS ∪ LÞ ¼ 0:67. The factual support is

measure is Favorite ¼ PrðSjLÞ  PrðSÞ ¼ 0:06. The relevance measure is Relevance ¼ PrðSjLÞ PrðSÞ ¼ 1:075. The odds PrðSjLÞ measure is Odds ¼ PrðSjLÞ ¼ 1:268. Schippers’s measure of

coherence

is

SchippersðS; LÞ ¼ PrðSjLÞPrðSjLÞ ¼ 0:575; 1PrðSjLÞ

because PrðSjLÞ ≥ PrðSÞ. Carnap’s relevance measure is PrðS ∩ LÞ  PrðSÞPrðLÞ ¼ 0:04. A counterfactual measure is PrðSjLÞ  PrðSjLÞ ¼ 0:19. The corroboration measure is   PrðSjLÞ  PrðSÞ CorroborationðS; LÞ ¼ PrðSjLÞ þ PrðSÞ f1 þ PrðLÞPrðLjSÞg ¼ 0:0549:

PrðSjLÞPrðSjLÞ FactualSupportðS; LÞ ¼ PrðSjLÞþPrðSjLÞ ¼ 0:124. The favorite

119 https://doi.org/10.1017/9781009212021.007 Published online by Cambridge University Press

Data-Guided Healthcare Decision Making

Figure 5.6 Receiver operating characteristic curve 4.063 0

0.25

0.5

4.063 1

0.75

y

3.25

3.25

2.438

2.438

1.625

1.625

0.813

0.813

0 0

0.5

0.25

0.75

0 x

1

Figure 5.7 Lab diagnostic times (in minutes) for patients

Lab diagnostic times (in minutes) for patients Hem 8

Hem 18

Apter

AMY

Ca

Gluccose

Chem 7

K

HCG

ALP

ALT

B

AST

BBSP

n o The evidential support is PrðSjLÞPrðSÞ PrðLÞ ¼ 0:21. 1PrðSÞ The z-measure of the evidential support is PrðSjLÞPrðSÞ ¼ 0:3 because PrðSjLÞ ≥ PrðSÞ. An ingredient 1PrðSÞ of the measure of coherence is

PrðSÞ PrðSjLÞ

0:2 ¼ 0:14 ¼ 1:428 or its

¼ 0:166. A measure of justification complement 1  PrðLjSÞ PrðLÞ PrðSÞ is ln PrðSjLÞln ¼ 0:324. ln PrðSÞ In a diagnostic test, the distance could be d1 ðSe ; Sp Þ ¼

πSe þð1πÞSp πð1Se Þ πSe þð1πÞSp ð1πÞð1Sp Þ

¼ 3:78 þ 2:357ð1=πÞ

or

d2 ðSe ; Sp Þ ¼

¼0:5 þ 1:283foddsðπÞg, due to the Jaccard coef-

ficient. See Figure 5.7 for the distances d1ðSe ; Sp Þ and d2 ðSe ; Sp Þ in terms of the prevalence π of malignant lung cancer.

The Jaccard index is JðS; LÞ ¼ ½fPrðLjSÞg1 þ fPrðSjLÞg1  11 ¼ 0:668, which would be 1 1 1 JðS; LÞ ¼ ½fPrðLg þ fPrðSg  1 ¼ 0:595 when S and L are independent. The Jaccard index changed by 0.073 because of their dependency. When S and L are empty (which is not the case in this illustration), the Jaccard index is one. The Jaccard index is a measure of closeness. Applying 0:86π ¼ ½f0:86πg1  11 is a it, JðL; SÞ ¼ 0:86πþ0:14πþ0:67ð1πÞ measure (blue) of the closeness between getting malignant lung cancer L and chain smoking S. 0:14π JðL; SÞ ¼ 0:14πþ0:86πþ0:33ð1πÞ ¼ ½f0:14πg1  11 is a measure (green) of the closeness between getting malignant lung cancer L and not chain smoking S. JðL; SÞ ¼ 0:67ð1πÞ 0:67ð1πÞþ0:81πþ0:33ð1πÞ

120 https://doi.org/10.1017/9781009212021.007 Published online by Cambridge University Press

¼ ½f0:676ð1  πÞg1  11 is a

Uncertainties and Their Impact on Healthcare Decisions

measure (pink) of the closeness between not getting malignant lung cancer L and chain smoking S. JðL; SÞ ¼ ¼ ½f0:33ð1  πÞg1  11 is a measure (gray) of the closeness between not getting malignant lung cancer L and not chain smoking S (see Figure 5.7). We can now apply the aforementioned measures of the impact of these outcomes on one another. The conditional confidence is CImin ¼ minfPrðS=LÞ; PrðL=SÞg ¼ 0:75. The maximum conditional confidence is CImax ¼ max fPrðS=LÞ; PrðL=SÞg ¼ 0:86. Kulczynski’s measure is KulezynskiðA; BÞ ¼ ð1=2ÞfPrðS=LÞ þ PrðL=SÞg ¼ 0:805. pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi The HLM is HLMðS; LÞ ¼ PrðS=LÞPrðL=SÞ ¼ 0:803. 0:33ð1πÞ 0:33ð1πÞþ0:14ð1πÞþ0:67ð1πÞ

The

imbalance

measure

is

imbalanceðS; LÞ ¼

PrðSÞPrðLÞ PrðSÞþPrðLÞPrðSULÞ

¼ 0:111. Carnap (1962) suggests Carnap1 ðS; LÞ ¼ PrðSjLÞ  PrðSÞ ¼ 0:06; if PrðLÞ > 0 and another measure, Carnap2 ðS; LÞ ¼ PrðS ∩ LÞ  PrðSÞPrðLÞ ¼ 0:04; if PrðLÞ > 0, to portray the impact of outcome L on outcome S. Christensen’s (1999) combined measure is ChristensenðS; LÞ ¼ PrðSjLÞ  PrðSjLÞ ¼ 0:19; if 1 > PrðSÞ > 0. Crupi et al.’s (2009) measure is min ðS;LÞ Crupimin ðS; LÞ ¼ C11PrðSÞ ¼ 3:75 or Crupimax ðS; LÞ ¼ C1 max ðS;LÞ 1PrðSÞ

¼ 4:3. Good (1960) and Fitelson’s (2001) measure n o PrðSjLÞ is GoodFitelsonðS; LÞ ¼ ln PrðSjLÞ ¼ 0:249. Kemney and

Oppenheim’s

(1952)

measure

is

Kemney

PrðSjLÞPrðSjLÞ ¼ 0:124. Milne’s (1996) OppenheimðS; LÞ ¼ PrðSjLÞþPrðSjLÞ n o measure is MilneðS; LÞ ¼ ln PrðSjLÞ ¼ 0:072. Mortimer’s PrðSÞ

(1988) measure is MortimerðS; LÞ ¼ PrðSjLÞ  PrðSÞ ¼ 0:06. Nozick’s (1981) countermeasure is NozickðS; LÞ ¼ PrðLjSÞ  PrðLjSÞ ¼ 0:25. The precision is πSe=ðπSe þ ð1  πÞð1  SpÞÞ ¼ ½0:220 þ 0:779π1 1 . Suppose the storage department of a hospital desires to procure and store items on site. Most of the items (e.g., drugs, vaccinations) used in the healthcare field have expiration dates after which they cannot be used. The storage department may be tempted to overstock, incurring additional costs for storage, inspection, ordering, maintenance, and so forth. Such costs might exceed the original purchase cost. Running out of supplies might lead to the death of patients in addition to inconvenience to healthcare professionals and patients alike. Not all items from any vendor offer the required quality. Before selecting a vendor and placing an order, the manager of the storage department asks the vendor or its medical agent for information about the outgoing quality level of the vendor’s products. Suppose there are two

vendors A and B. Let G and G denote good and defective items, respectively. From past experience, the manager recognizes that, if an item meets the required quality, there is an 80% probability vendor A supplied it and a 20% probability vendor B supplied it. Likewise, if an item is defective, there is a 10% probability vendor A supplied it and a 90% probability vendor B supplied it. To be on the safe side, the storage department manager desires for 60% of the stocked items to be of good quality. The manager has to decide whether to contact vendor A or vendor B for the next round of procurement. The manager chooses to exercise Bayesian probability concepts. For this purpose, he or she needs to cast the information in a probability framework as follows. The marginal probabilities are PrðGÞ ¼ 0:60 ¼ 1  PrðGÞ. The conditional probabilities are PrðAjGÞ ¼ 0:80 ¼ 1  PrðBjGÞ and PrðAjGÞ ¼ 0:10 ¼ 1  PrðBjGÞ. The next task is to compute and compare the conditional probability of receiving a quality item from vendor A with the counter-conditional probability of receiving a quality item from vendor B. These are ∩ AÞ PrðGÞPrðAjGÞ PrðGjAÞ ¼ PrðG PrðAÞ ¼ PrðGÞPrðAjGÞþPrðGÞPrðAjGÞ

¼

ð0:60Þ0:80 ¼ 0:923: ð0:60Þ0:80 þ ð0:40Þ0:10

Note PrðGjAÞ ¼ 1  PrðGjAÞ ¼ 0:077 and ∩ BÞ PrðGÞPrðBjGÞ PrðGjBÞ ¼ PrðG PrðBÞ ¼ PrðGÞPrðBjGÞþPrðGÞPrðBjGÞ

¼

ð0:60Þ0:20 ¼ 0:250: ð0:60Þ0:20 þ ð0:40Þ0:90

Note PrðGjAÞ ¼ 1  PrðGjAÞ ¼ 0:077 and PrðGjBÞ ¼ 1  PrðGjBÞ ¼ 0:750. An interpretation of these conditional probabilities would ease decision-making. If the manager decides to purchase 1,000 items from vendor A, 923 items will be of good quality. If the manager decides to purchase 1,000 items from vendor B, 250 items will be of good quality. Consequently, vendor A is preferable. Another example of comparing performance is the time (in minutes) for patients to receive test results (see Figure 5.8 and Tables 5.6 and 5.7). The test to read a patient’s glucose level consumes more time than the test to read a patient’s vitamin K level. Of course there are outliers in the data. Reading of some biochemicals is correlated with biochemicals in the lab tests. The significant correlations are highlighted in Table 5.7. The reading of Hem 8 is significantly correlated with AMY and Ca. The reading of Hem 18 is significantly correlated with Ca and HCG. The reading of Apter is significantly correlated with Chem 7 and ALP. The reading of AMY is significantly correlated with Ca. The reading of glucose is significantly correlated with ALT.

121 https://doi.org/10.1017/9781009212021.007 Published online by Cambridge University Press

Data-Guided Healthcare Decision Making

Table 5.6 Lab diagnostic time (in minutes) for patients. (Source, Shanmugam, 2013a)

Lab test (in minutes) for 15 patients

Hem 8 Hem 18 Apter AMY

Ca

Glucose

Chem 7 K

HCG

ALP ALT B

AST

BBSP

P1

28

29

27

28

38

52

28

12

18

29

29

29

29

26

P2

34

38

36

37

44

54

37

25

29

39

39

38

39

32

P3

29

24

35

29

33

49

27

18

18

30

30

36

28

18

P4

33

39

35

27

34

43

35

11

20

32

32

23

31

26

P5

21

27

36

28

21

51

33

19

23

32

29

25

29

39

P6

18

26

27

29

23

56

30

27

15

34

36

28

33

28

P7

26

20

29

27

20

60

31

11

15

32

23

32

28

31

P8

23

23

33

26

28

37

27

19

14

34

25

34

32

29

P9

30

27

29

25

23

39

25

14

16

32

28

32

34

26

P10

24

26

33

24

27

40

33

15

18

34

28

29

32

28

P11

23

28

34

25

22

29

32

14

19

34

3

38

34

31

P12

24

28

40

28

27

43

34

15

22

38

29

36

32

29

P13

20

34

36

22

27

44

29

18

18

36

33

23

33

19

P14

27

37

32

31

29

50

25

16

18

33

32

25

29

28

P15

8

29

28

22

25

43

25

12

21

33

33

28

32

32

Figure 5.8 Quantified rating by two experts

The reading of Chem 7 is significantly correlated with HCG. An organized way of applying the Bayes theorem in a healthcare setting is the following. The steps are illustrated with the example of getting malignant lung cancer due to smoking or due to another cause. For this purpose, consider the following randomly selected data from hospital archives. 1. Select the event to target. The analyst should check that the enumeration of events is mutually exclusive and exhaustive (the sum of the considered events is equal to 1). If these events are indicated by E1 ; E2 ; …; En , then PrðEi ∩ Ej Þ ¼ 0 for i≠j denotes mutual exclusion. That n X the sum is exhaustive is denoted by PrðEi Þ ¼ 1. i¼1

Recall the implicit relation between the odds and the probability of an event Ei . Knowing the odds amounts to knowing the probability and vice versa because

122 https://doi.org/10.1017/9781009212021.007 Published online by Cambridge University Press

PrðEi Þ PrðEi Þ ¼ 1  PrðEi Þ PrðE i Þ Probability of presence of Ei ¼ Probability of absence of Ei

OddsEi ¼

and PrðEi Þ ¼

OddsEi : 1 þ OddsEi

In this example of getting malignant lung cancer due to smoking, let M be the targeted event of getting malignant lung cancer. Let M be the event of not getting malignant lung cancer. Note M and M are mutually exclusive since PrðM ∩ MÞ ¼ 0. Exposure to smoking occurs in two ways. Let S and S denote smoking and not smoking, respectively. Both S and S are mutually exclusive and independent. Also, PrðS ∩ SÞ ¼ 0, PrðMÞ þ PrðMÞ ¼ 1 because there are only two outcomes (getting malignant lung cancer or not

Uncertainties and Their Impact on Healthcare Decisions

Table 5.7 Lab diagnostic times (in minutes) for patients

ρrow;column

Hem 8

Hem 18

Apter AMY Ca

Hem 8

1

Hem 18

0.3

1

Apter

0.3

0.3

1

AMY

0.6

0.3

0.2

1

Ca

0.6

0.6

0.2

0.6

1

Glucose

0.1

Chem 7

0.4

−0

−0

K HCG

0.2

Glucose Chem 7 K

−0.3

0.5

0.2

1

0.3

0.6

0.4

0.3

0.1

1

0.1

0.1

0.5

0.2

0.3

0.2

HCG

ALP

ALT

0.6

0.5

0.5

0.5

0.1

0.6

0.2

1

−0

0.3

0.6

0.3

0.1

−0.1

0.5

0.5

0.5

1

ALT

0

0.4

0.4

0.5

0.6

0.4

0.3

0.2

1

−0

AST

B

0.2

−0.4

0.2

0.3

0.2

−0.2

0.2

0.2

0.2

0.3

−0.4

AST

0.1

0.4

0.2

0.2

0.3

−0.3

0.4

0.5

0.5

0.7

0.1

0.4

1

BBSP

−0.2

−0.1

0.2

−0.3

0.1

0.3

0.1

0.4

0.2

−0.2

0.1

0.1

−0

Table 5.8 Smoking versus malignant lung cancer Smoking→ Malignant cancer ↓

Yes ðSÞ

No ðSÞ

Yes ðMÞ

60

20

No ðMÞ

10

10

20

Total

70

30

100

Total 80

getting malignant lung cancer). The odds of getting maligPrðMÞ ¼ 0:80 nant lung cancer are OddsM ¼ PrðMÞ 0:20 ¼ 4:0. 2. In this step, the evidence in the clues is collected and summarized to update the prior odds, using the concept of divide and conquer. The formula is    PrðEi jC1 ; C2 ; …; Cn Þ PrðC1 ; C2 ; …; Cn jEi Þ PrðEi Þ ¼ ; PrðE i jC1 ; C2 ; …; Cn Þ PrðC1 ; C2 ; …; Cn jE i Þ PrðE i Þ n o n o PrðEi Þ PrðC1 ;C2 ;…;Cn jEi Þ where PrðE is the prior odds, is the LR, PrðC ;C ;…;C jE Þ Þ i

1

2

n

i

C1 ; C2 ; …; Cn offer data support to obtain the posterior PrðEi jC1 ;C2 ;…;Cn Þ odds, and PrðE is the event Ei of interest. jC ;C ;…;C Þ i

1

2

n

In this example, the relation between the prior odds, LR, and posterior odds is     PrðMjS; SÞ PrðSjMÞ PrðSjMÞ PrðMÞ ; ¼ PrðSjMÞ PrðMÞ PrðMjS; SÞ PrðSjMÞ n where

PrðMÞ PrðMÞ

o n o n o PrðSjMÞ PrðSjMÞ , PrðSjM , , and Þ PrðSjMÞ

PrðMjS;SÞ PrðMjS;SÞ

the (subjective) prior odds, the data-based LR due to smoking, the LR due to not smoking, and the (data-

1 1

moderated) posterior odds of getting malignant lung cancer, respectively. 3. In this step, the independent clues should be identified carefully and observed for data. The clues should be mutually exclusive. If the clues are not statistically independent of each other, divide and conquer will not work. Care should be exercised to select independent clues using correlation. In this example are two independent clues – smoking ðSÞ and not smoking ðSÞ. 4. In this step, the researcher should decide the levels of each clue. The greater the number of levels, the more complicated the data analysis. Two or three levels might suffice. Usually, the analyst helps the decision maker select the levels. If the clue is quantitative, a low and high level could be chosen. The clue could also be considered at low, medium, and high levels. In this example, the analyst may suggest keeping the variable of smoking at the simplest two levels (smokers versus nonsmokers). 5. If the decision maker or the analyst suspects the clues might be correlated (not independent), he or she should use probabilities to verify it. If a clue Ci and a clue Cj are (statistically) n independent o ofneach other, on then the o statement

denote

BBSP

1

ALP

−0

B

PrðCi ∩ Cj jEi Þ PrðCi ∩ Cj jE i Þ

¼

PrðCi jEi Þ PrðCi jE i Þ

PrðCj jEi Þ PrðCj jE i Þ

; i≠j is

valid. When this statement is not valid, the two clues are not independent of each other and hence are not to be considered together. Either one clue or the other alone should be selected and a new clue sought for

123 https://doi.org/10.1017/9781009212021.007 Published online by Cambridge University Press

Data-Guided Healthcare Decision Making

consideration. In this example, we assume S and S are independent because each person independently decides to be a smoker or a nonsmoker. n o PrðCi jEi Þ 6. Estimate the likelihood ratio PrðC ; i ¼ 1; 2; …; n jE Þ i

i

for each independent clue. When the LR is more than one, the clue is supportive of the targeted event Ei . When the LR is less than one, the clue is not supportive of the targeted event Ei . When the LR is zero, the clue is neutral for the targeted event n Ei . o PrðSjMÞ 60=80 n In this o example, the LRs are PrðSjMÞ ¼ 10=20 ¼ 1:5 and PrðSjMÞ ¼ 20=80 10=20 ¼ 0:5. The product of these two LRs is PrðSjM Þ (1.5) (0.5) = 0.75. An interpretation of these LRs is the following. Recall that, when the LR is more than one, the data are supportive of the targeted event. Smoking is support of getting malignant lung cancer. Recall also that, when the LR is less than one, the data are against the targeted event. Not smoking is not supportive of getting malignant lung cancer. 7. The decision maker and/or analyst should come up with an estimate (subjective or otherwise) of the prior PrðEi Þ for the targeted event. If odds, PriorðOddsEi Þ ¼ PrðE iÞ the prior estimate is incorrect, that is not a concern because the evidence will likely moderate it so a correct posterior estimate can be generated. In this example, suppose the (subjective) prior estimate of n theo odds of getting malignant lung cancer is PrðMÞ PrðM Þ

¼ 0:25 ¼ 14. An interpretation of the prior esti-

mate is that, for every four persons without malignant lung cancer, there is one person with it. 8. The decision maker ought to use scenarios to forecast alternative outcomes. The scenarios should be coherent and consistent for the targeted event. In this example, the (objective) posterior estimate of the odds of getting malignant lung cancer is the prior estimate of the odds of getting malignant lung cancer times the LRs, PrðMjS;SÞ PrðM jS;SÞ

3 ¼ 0:75ð0:25Þ ¼ 16 ¼ 0:1875 because the n o PrðMÞ prior odds of getting malignant lung cancer are PrðMÞ =

which is

0.25. An interpretation of this posterior estimate is that, for every 16 persons without malignant lung cancer, there are 3 persons with it. 9. The researcher should validate this model before utilizing it. One way of validating it is to consult two different experts. The correlation between their quantified ratings can be calculated. If the correlation is less than 0.7, then the model is not likely to predict effectively. A correlation of less than zero indicates the two experts disagree. When the correlation is zero, the two experts neither agree nor disagree. Alternately, the

124 https://doi.org/10.1017/9781009212021.007 Published online by Cambridge University Press

experts’ ratings can be graphed in a two-dimensional display (see Figure 5.8). 10. In this step, the decision maker and/or the analyst presents a forecast in terms of the posterior odds or probability. In this example, the posterior probability of getting 3=16 OddsM ¼ 1þ3=16 ¼ malignant lung cancer is PrðMjS; SÞ ¼ 1þOdds M

≈ 0:15, meaning that, out of every four persons under consideration, three get malignant lung cancer, in comparison to the prior estimate of 0.20. The data moderated the prior estimate, and that is the very purpose of using divide and conquer.

3 19

5.4 Summary This chapter has presented basic definitions and formulas to quantify uncertainty in healthcare data. Readers can solidify their understanding of these concepts via the following exercises. On some occasions, the events are not unique or their frequency is unavailable due to time and budget constraints. Still, the judgments of experts can serve as an acceptable basis for forecasts. When researchers consider many more clues than necessary, the situation may become unnecessarily complicated. The Bayes theorem, based on the concept of divide and conquer, is an approach used to aggregate variables’ impact.

5.5 Exercises 1. What is the probability of running out of stock tomorrow if the hospital’s pharmacy runs out of anesthesia once every two years (assuming 365 days per year)? 2. What is the probability of running out of stock tomorrow if the hospital’s pharmacy runs out of anesthesia once a year (assuming 365 days per year)? 3. What is the probability of running out of stock tomorrow if the hospital’s pharmacy runs out of anesthesia once in the period January 1 through March 31? What assumption have you made in your calculation? 4. Consider the following data in Table 5.9. ◦ What is the probability of hospitalization? ◦ What is the probability of hospitalization given the person is male? ◦ What is the probability of hospitalization given the person is female? ◦ What is the probability of hospitalization given the person is younger than 60?

Uncertainties and Their Impact on Healthcare Decisions

Table 5.9 Hospital data, 2020. (Source: www.kff.org)

Patient ID

Gender

Age

Hospitalized

Insured

1

Male

60+

Yes

Medicaid

2

Male

260

14

1,594

≤260

4

11

>260

1

124

< < y ¼ 1; 2; …:; ∞ jyj! if y¼0 > : efð1ρÞ=θþρθg > y ¼ 1; 2; 3; …; ∞ : y ρθ ðρθÞ e =y! Patients with severe acute respiratory syndrome (SARS) are quarantined in hospitals. Hospital employees risk infection. Their immunity level is insufficient to protect them, resulting in hospital site infection. The first SARS case was recorded in China. Fever, sore throat, and so forth are symptoms of SARS. No antibiotic cures SARS. No known vaccine exists against it. In airports, travelers were taken to hospitals, as mentioned in Shanmugam (2014a). The data of SARS patients in a Toronto hospital were analyzed using a bumped-up binomial distribution (BBD). In the Toronto hospital, 32 nurses provided services to 16 patients. The nurses were cautious to avoid hospital site infection. Still, some were infected during their service to SARS patients. According to Shanmugam, the estimate of infection is ^ Immunity ¼ 0:25 with a p-value of 0.001 (Shanmugam, 2014a). The statistical power for accepting the hypothesis H1 : immunity ¼ 0:50 is estimated to be 0.948. That is, Prðrejecting H1 : immunity ¼ 0:50jdata evidenceÞ ¼ 1  Statistical Power ¼ 1  Prðaccepting H1 : immunity ¼ 0:50jdata evidenceÞ ¼ 1  0:948 ¼ 0:052 After an attack of bioterrorism, the victims are dispatched to nearby hospitals for treatment. The chance a bioterrorism victim will be successfully cured in a hospital whose staff members have conducted bioterrorism drills ought to be higher than the chance in a hospital whose staff members did not. To make an educated healthcare decision, the analyst should capture the impact of drilling. Shanmugam (2014) creates a tweaked binomial distribution (TBS) for this purpose. Using anthrax incidence data from five regions of the United States, Shanmugam explains the methodology based on TBD as follows:   n! PrðYjn; ; πÞ ¼ ð1  πÞn y!ðn  yÞ!  y 

n 1 π π ; 1þ = 1þ 1  f þ πg 1π 1  ð þ πÞ y ¼ 0; 1; 2; …; n; 0 < π < 1;  < 1  π with mean

μ ¼ EðYjn; ; πÞ ≈ ½1 þ ð1  πÞð1 þ Þnπ and variance σ 2 ¼ VarðYjn; ; πÞ ≈ ½1   þ ð1  πÞnπð1  πÞ2 : The deadly Ebola virus emerged in Africa with the first recorded death due to Ebola occurring on September 8, 1976. The Ebola virus was never airborne and gloves and masks protect humans against it. In the period from July 27 to August 13, 2014, mortality due to Ebola was higher. No effective medication existed at the time. Ebola patients’ bodily fluids or tissues transmit the virus. A person with symptoms of Ebola was classified as suspected, probable, or confirmed. Shanmugam (2014d) applies Poisson distribution to the Ebola data. See Shanmugam (2013g) for a list of diseases. A donor’s organ should match the recipient’s biosystem, as pointed out in Shanmugam (2013). Organ transplantation should obey legal, ethical, medical, and administrative laws. When a recipient needs organs, finding them gets tougher and longer. The number of patients waiting for multiple organs follows a negative binomial distribution with ð1  pÞr denoting the probability of finding r matching organs and py denoting the probability y patients will find r organs. The shortage level for organs is 0 ≤  < 1. The probability of not finding r matching organs changes by a factor of 1  . Shanmugam (2013) applies a tweaked negative binomial distribution (TNBD) to pancreas and kidney transplantation data. The probability mass function of TNBD is   y   ðr þ y  1Þ! p 1p r ; PrðYjr; ρ; Þ ¼ y!ðr  1Þ! 1 1 0 ≤  < 1; 0 < p < 1; y ¼ 0; 1; 2; …:; : The mean and variance of the TNBD are   p μ;ρ;r ¼ EðYj; ρ; rÞ ¼ r 1p and σ 2;ρ;r

  μ ðμ þ rÞ : ¼ VarðYj; ρ; rÞ ¼ r

A similar tweaked exponential (TE) model is necessary to estimate the survival time if one cancerous kidney is removed, as explained in Shanmugam (2013). The TE model fits better in this case than does an exponential model. The probability function of the TE is  f ðyjθ; Þ ¼ ð1Þ θ e

ð1Þy θ

; y ≥ 0; θ > 0; 0 ≤  < 1 with mean

159 https://doi.org/10.1017/9781009212021.008 Published online by Cambridge University Press

Data-Guided Healthcare Decision Making

Eðyjθ; Þ ¼ θ þ



 1



θ and variance    1 σ 2 ¼ Varðyjθ; Þ ¼ θ2 þ 1 þ 1 1 θ.

of

of

Along with hospital site infection, repeated abortions are another concern of healthcare professionals. One might question whether repeated abortions trigger an obsession. An answer to this requires spiral binomial probability distribution (SBPD), as demonstrated in Shanmugam (2012b). The SBPD for y number of pregnancies is   xy y x! ; 0 < θ < 1; PrðYjx; θ; ρÞ ¼ ð1þρyÞ y!ðxyÞ! θ ð1  θÞ 1þρxθ ρ ≥ 0; y ¼ 0; 1; 2; …; x; with mean μx;θ;ρ ¼ EðYjx; θ; ρÞ ¼

patient should continue the same medicine or switch to another. A methodology is needed to analyze patient data. For this purpose, let Y > 0 be a random time during which a tumor reoccurrence is noted. Given the data on recurrence times, could a period of length τ occur without tumor? With parameters θ, β, and ν > 0, the model to use is   β β β  1 eθ1 =ðy þ   vÞθ ; y > v; 0 < θ < β f ðyjθ; β; vÞ ¼ θ with mean and variance, respectively, μθ;β;v ¼ Eðyjθ; β; vÞ ¼

 1þ

 ρð1  θÞ xθ ð1 þ xθρÞ

and variance σ 2x;θ;ρ ¼ VarðYjx; θ; ρÞ ¼ f1 þ ρ½1 þ ðx  2Þθxθð1  θÞ; as shown in Shanmugam and Radhakrishnan (2011). Patients and healthcare professionals alike desire efficient hospital operation. Is it happening? The variance is valuable in healthcare studies. Expressing the variance σ2 as a function of the mean service level, one can devise a method to assess the attained efficiency, as in Shanmugam (2011). Specifically, the curvature of the function and its shifting angle are utilized to interpret the efficiency of healthcare services. The curvature and the shifting angle vary, as shown in Shanmugam (2011). In healthcare operations, the rate of services plays a crucial role in the comprehension of service efficiency, as demonstrated in Shanmugam and Radhakrishnan (2011). Odds ratios are different versions of this rate. In epidemiology applications, the incidence rate is confused with the prevalence rate. The prevalence rate is indicative of existing and new cases in a specified time duration. Additionally, statisticians utilize the hazard rate to explain the curve of death. Other notable rates include the trivial vital rates. The incidence jump rate adds an additional interpretation of the collected data. Data analysis can also be used to evaluate whether a medication is successful in reducing tumor recurrence. The answer to this question starts as a model for analyzing data, as explained in Shanmugam (2011). Data analytic results formulate an early warning system to alert healthcare professionals to tumor recurrence. The collected data are at times length biased. A statistical check on the existence of length bias should be made as a caution. Expressions for length-biased situations need to be developed first. Then the expressions can be compared with their counterparts in the non-length-biased scenario. The early warning system helps establish whether a

160 https://doi.org/10.1017/9781009212021.008 Published online by Cambridge University Press

 ðβθ

 2Þ

þv

and σ 2θ;β;v ¼ Varðyjθ; β; vÞ ¼

ð þ μÞ 2 μ: ð  μÞ

Healthcare professionals, including physicians, are interested in providing the best medical care to patients with epilepsy. To achieve such medical care, Shanmugam (2011) develops and utilizes a model of epileptic patients’ data so as to identify patterns. Some patients are prone to a large number of seizures. The proneness is a nonmeasurable parameter. The Poisson model does not work in this case. Profiling assists in finding another treatment for the patient. The patient’s proneness is identified using principal components. Let X and Y indicate susceptibility to and immunity from seizures. For a healthy person, Y exceeds X. When Y exceeds X, the patient’s safety is certain. The correlation is scale free, where σ x and σ y are coefficient ρx;y ¼ CovðX;YÞ σx σy standard deviations. Note the variance is Var[Y] = E [Var (Y | X = x] + Var [EY |X = x)]. A linear regression is Y ¼ β0 þ β1 x þ ε and it predicts Y at a specified level X = x, where the notation ε denotes the noise. The regression coefficient (slope) is β1 ¼ ρx;y σ y  σ x . An alternate method is the Bayesian approach. In the Bayesian approach, the nonobservable θ is a random variable with a prior distribution p(θ). An update of the prior distribution by blending it with the likelihood function l(y|θ) as data and marginal distribution m(y) is done as in Shanmugam (2006a, 2006b). The updated version is the posterior distribution. An anesthesiologist sedates patients before surgery. The length of a patient’s unconsciousness should exceed the surgery time. Insurance law requires a patient to recover consciousness after surgery. When a patient does not recover completely, the anesthesiologist injects an anti-sedation drug. The process needs scrutiny. An intervened exponential distribution (IED) helps in this case, as demonstrated in

Why Models Are Important in Healthcare

Shanmugam et al. (2002). The probability density function of IED is 8 ðyτÞ=ρ e  eðyτÞ=θ > > if ρ≠1 < θðρ  1Þ f ðyjθ; τÞ ¼ ðyτÞ=θ > ðy  τÞ > : e if ρ ¼ 1: 2 θ The mean and variance of the IED are, respectively, μθ;τ ¼ Eðyjθ; τÞ ¼ ð1 þ ρÞθ þ τ and σ 2θ;τ ¼ Varðyjθ; τÞ ¼

ð1 þ ρ2 Þμ2 2 2 2 ¼ θ ð1 þ ρ Þ: ð1 þ ρÞ

The geometric distribution illustrates the probability pattern of the first success in repeated trials. The outcome of each trial can be success with probability 0 < ð1  θÞ < 1 or failure with probability 0 < θ < 1. The relation between the variance and the mean of the geometric distribution is VarðYÞ ¼ EðYÞf1  EðYÞg. The geometric distribution does not match several chance-oriented healthcare mechanisms. One instance is seen in the operating room. An intervened geometric distribution (IGD) interprets cardiovascular and related respiratory diseases, as shown in Bartolucci et al. (2001). The probability density function of the IGD is 8 y < ρ  1 y1 θ ð1  θÞð1  ρθÞ PrðYjθ; ρÞ ¼ ρ  1 : θy1 yð1  θÞ2  ρ≠1 if ; y ¼ 1; 2; 3; …:; ∞; θ > 0 ρ¼1

Shanmugam (2001). With a posterior estimate of , the posterior probability of attaining a successful prevention is Se ð1ÞðSe þSp 1Þþð1Sp Þ and the posterior probability of failure to ð1ÞS

p prevent an epidemic is ð1ÞðSe þSp 1Þþð1S , where Se is the eÞ

probability a zero-excluded Poisson distribution will match the pattern of epidemic incidences under a failed prevention and Sp is the probability a zero-excluded Poisson distribution will match the pattern of epidemic incidences under a successful prevention, as illustrated in Shanmugam (2001). The equation ðSe þ Sp  1Þ is recognized as the Youden index, I. Unless I ¼ 0 ¼ ðSe þ Sp  1Þ, the posterior probability of attaining a successful prevention is not one minus the posterior probability of failure. The graph of sensitivity in terms of one minus specificity is known as the receiver operating characteristic (ROC) curve in the medical decision-making literature. The locus of I ¼ 0 ¼ ðSe þ Sp  1Þ in the ROC is an upper inclining diagonal line, and it is synonymous with an ideal rather than an actual situation. A generalized but more realistic index to portray the success or failure of epidemic prevention efforts is the Shanmugam index, Sh ¼ ð1  ÞSp þ Se  1, as described in Shanmugam (2001). Another related model is the intervened Poisson distribution (IPD), as illustrated in Shanmugam (1985, 1992). To illustrate the concept of intervention in the data collection process, let 0 < θ < ∞ and 0 < ρ < ∞ denote the incidence rate and the effect of a medical intervention. After the intervention, the incidence rate is ρθ. When ρ ¼ 0, the incidence is wiped out. When ρ is in the domain (1, ∞), the incidence rate is higher and the intervention was counterproductive. The probability density function of the IPD is PrðY ¼ yjθ; ρÞ ¼ ½ð1 þ ρÞy  ρy θy =y!½eρθ ðeθ  1Þ; θ; ρ > 0; y ¼ 1; 2; 3; …; ∞; ;

with mean μ ¼ EðYjθ; ρÞ ¼ ð1  θÞ1 þ ρθð1  ρθÞ1

where

and variance

σ 2ρ;θ

σ 2 ¼ VarðYjθ; ρÞ ¼ ρθð1  ρθÞ1 f1 þ ρθð1  ρθÞ1 g þð1  θÞ1 f1þð1  θÞ1 gþfð1  1=θÞθg=ð1  θÞ2 : Another interesting application of this concept in healthcare is in preventing epidemics. Not all interventions are effective in preventing epidemics, as explained in Shanmugam (2001). He illustrates the idea using plague data in Indian villages. Let ϕ be an unknown probability that a medical intervention will be a success. When no data are available, the best prior guess is ϕ = 1/2. After pertinent data are collected, a better posterior estimate of  is feasible once a statistical procedure is constructed, as in

μρ;θ ¼ EðYjρ; θÞ ¼ ½1 þ ρ þ ðeθ  1Þ1 θ

and

ðeθ θ1Þ2 eθ

¼ VarðYjρ; θÞ ¼ EðYjρ; θÞ  are the mean and variance, respectively, as in Shanmugam (1985). The IPD is useful in reliability analysis, congestion, queueing studies, and so forth. In a dynamic healthcare system, patients’ health status might change periodically and an understanding of these changes is easier with a Markov chain, as illustrated in Voskoglou (1994) and Perdikaris (1994). Mathematical skill, especially in matrix and probability theory, is a prerequisite to comprehend finite Markov chain theory, as illustrated in Uche (1987). Many models are used in healthcare studies, as Kapur (1984) points out. Plunkett (1983) outlines how mathematics clarifies complex interactions between people and activities of the human mind. For more on modeling, refer

161 https://doi.org/10.1017/9781009212021.008 Published online by Cambridge University Press

Data-Guided Healthcare Decision Making

to Alemi and Gustafson (2007), Chattamvelli and Shanmugam (2020, 2021), Rosenfeld and Kraus (2018), Schoenbach and Rosamond (2000), and Walck (2007).

6.2 Concepts A model is a description of the chance-oriented mechanism that generates data. The data are drawn randomly from the population of the chance-oriented mechanism. Let the random sample be y1 ; y2 ; y3 ; …; yn . For reasons of practicality, the sampled population is different from the intended population. For an example, consider survey data on patients’ length of stay in a hospital. In this scenario, the longer a patient stays in the hospital, the higher that patient’s chance of being selected to participate in the survey. This phenomenon is called observational bias. The intended population is f ðyjθÞ=NðθÞ, where y, θ, f ðyjθÞ, and NðθÞ are the observed random variable, the nonobservable unknown parameter of the healthcare system, the frequency pattern of the observations, and the normalizing constant for all y. All the normalizing constant does is shrink or expand the frequency pattern appropriately to make the function f ðyjθÞ deliver probabilities. When the random variable attains one value or another only in a set of countable integers, the frequency pattern is classified as a probability mass function. Otherwise, the frequency pattern is called the probability density function. The sampled population is indicated by wðyÞf ðyjθÞ=N w ðθÞ, where wðyÞ and N w ðθÞ are called the bias function and a new normalizing constant for all y. Consequently, the intended population mean and variance are different from the mean EðYjθÞ and variance VarðYjθÞ of the sampled population. Consider two examples, one for discrete cases and another for a continuous case. Suppose the number of persons with positive COVID-19 test results among a random sample of n persons who took the COVID-19 test is Y. Note Yi ¼ 1 is a code when the ith person gets a positive test result with an unknown probability 0 < π < 1 and Yi ¼ 0 otherwise (i.e., a negative test result) with an unknown probability 0 < 1  π < 1, where i ¼ 1; 2; …; n. The dichotomous nature of the probability structure for the ith person is called the Bernoulli distribution. Realize n X Yi , which follows a binomial probability distribuY¼ i¼1

tion. In other words, the binomial model is    π  π y n! ; 0 < π < 1; fbinomial yjθ ¼ ¼ 1π 1π y!ðn  yÞ! y ¼ 0; 2; 3; …:; n; ; Nbinomial ðθÞ ¼ ð1  πÞn ;

162 https://doi.org/10.1017/9781009212021.008 Published online by Cambridge University Press

π is the odds of a positive test where the parameter θ ¼ 1π θ θ result. Note EðYjθÞ ¼ n ð1þθÞ and VarðYjθÞ ¼ n ð1þθÞ 2 . In

other words, EðYjπÞ ¼ nπ and VarðYjπÞ ¼ nπð1  πÞ. Their relationship is VarðYjπÞ ¼ EðYjπÞ 1  EðYjπÞ , n

which is depicted in Figure 6.1. To sketch it, code z ¼ VarðYjπÞ, x ¼ EðYjπÞ, y ¼ n and then use Microsoft Math Solver to witness the relationship. When the mean increases, the variance increases in proportion. The variance is indicative of dilution of homogeneity. In other words, when the number of persons getting positive test results increases (i.e., in an increasing COVID-19 virality), there will be more heterogeneity in the incidences of COVID-19 (see Figure 6.2). When observational bias occurs in collecting the data on the number Y of persons who get a positive test result, the model is a sampling biased binomial probability distribution. In other words, the sampling biased binomial model is    π y  π n! ; ¼y f w binomial yjθ ¼ 1π 1π y!ðn  yÞ! y ¼ 0; 2; 3; …:; n; 0 < π < 1;  π w Nbinomial θ¼ ¼ nπ=ð1  πÞn : 1π The sampling biased binomial model in terms of probability implies Ew ðYjπÞ ¼ nπ½1 þ ðn  1Þπ and Varw ðYjπÞ. The mean Ew ðYjπÞ under sampling bias is inflated by an amount nðn  1Þπ2 from the mean EðYjπÞ under a regular sampling situation. A distortion occurs. One has to be careful about making data-based inferences. The underlying model for data collection is crucial. Consider a continuous case. Suppose the incubation time between contracting COVID-19 and exhibiting symptoms is likely to follow a gamma probability pattern. The incubation time Y > 0 is a random variable (RV) with probability density function. f ðyjθ; αÞ ¼ eθy θα yα1 ; y > 0; θ > 0, Nbinomial ðθ; αÞ ¼ ΓðαÞ where θ is the contagion rate and α > 0 is the collective impact of following preventive measures such as masking and practicing social distancing. The mean is EðYjθ; αÞ ¼ α=θ and the variance is VarðYjθ; αÞ ¼ α=θ2 . Note the relation between the mean and variance in the stochastic incubation times. The variance is a measure of and its interheterogeneity. That is, VarðYjθ; αÞ ¼ EðYjθ;αÞ θ pretation reveal heterogeneity increases with increased incubation time but the collective impact θ of following preventive measures moderates heterogeneity’s volatility.

Why Models Are Important in Healthcare

z=3

z=2

z=1

x=1 x=2 y=1

x=3 y=2

y=3

Figure 6.1 Relation between the mean and the variance

6.3 Illustration Shanmugam (2014) downloads data on hospital site infection from www.nc.cdc.gov (see Table 6.3). Hospital site infection is nosocomial. The word nosos means disease. Any hospital site infection scares healthcare professionals. Infection spreads to susceptible patients. Contaminated equipment, bed linens, air droplets, and so forth could also spread infection. Sometimes the source of infection is not clear. The Centers for Disease Control points out that there are approximately 1.7 million infections from microorganisms like bacteria or fungi. In the United States, hospital site infections lead to 99,000 deaths a year. These numbers highlight the importance of hospital surveys. Nosocomial infections cause pneumonia, urinary illness, bloodstream difficulties, or antimicrobial resistance. These and

other hospital site infections complicate medical treatment (see Figure 6.3). The underlying model for the data in Table 6.3 is BBD, whose probability density function is PrðY ¼ yjπ; Þ ¼   n π ny π y Þ ð1Þ ; 0 < π < 1  ; 0 ≤  < 1; y ¼ ð1  1 y 0; 1; 2; …; n;, where y denotes the number of infected nurses, π is an unknown probability a nurse will receive an infection from a patient, and  is an unknown probability a nurse will have sufficient immunity to resist the virus. π Þ and VarðYjπ; Þ ¼ EðYjπ; Þ Note EðYjπ; Þ ¼ nð1 n o EðYjπ;Þ 1 n . The probability a nurse will have insufficient immunity is ð1π 1 Þ. The maximum likelihood esti^¼ mates (MLE) are 

y

s2y yð1nÞ y s2y þyð1nÞ

^

Þj ^ ¼ jyð1 and π . n

163 https://doi.org/10.1017/9781009212021.008 Published online by Cambridge University Press

Data-Guided Healthcare Decision Making

Figure 6.2 Heterogeneity versus average incubation time

Using the data in Table 6.3, the parameters could be estimated and interpreted in terms of the nurses’ immunity, as done in Shanmugam (2014). The interpretations are valuable medical knowledge for hospital administrators who make judicious assignments for nurses to treat infected patients. Using the data in Table 6.3, the MLEs ^ mle ¼ 0:25 and π ^ mle;^ ¼ 0:10. When none of the are  nurses have sufficient immunity, the infection rate is ^ mle;¼0 ¼ 0:13. The null hypothesis H0 :  ¼ 0 means a π negligible proportion of nurses has sufficient immunity. When half of the nurses have sufficient immunity, H1 : 1 ¼ 0:5. The probability of rejecting the true null hypothesis is 0.001, which is the p-value. The statistical power of accepting a true specific research hypothesis H1 : 1 ¼ 0:5 is 0.948. The odds of zero infection are 0.009 with  ¼ 0 and 0.034 with  ≠ 0 (see Figure 6.4).

164 https://doi.org/10.1017/9781009212021.008 Published online by Cambridge University Press

^

π = 0.033 refers to the probability no The estimate ð1^ Þ^ nurse will be infected. The odds ratio for zero infection is π eOddsðπÞ ðeπe1Þ= 10.13. In other words, for every situation with  ¼ 0, there are 10 with  ≠ 0 to have zero infection. The probability of having a specified number of infected nurses is consistently more when  ¼ 0 than when ^ ¼ 0:252. With BBD, it became possible to compute the  informatics shown in Table 6.4. Another example of modeling data is the number of ambulance services per day (see Table 6.5a). If the numbers were smaller and portrayed rarer events, the Poisson distribution could have been the underlying model. Because of the law of large numbers, as shown in Shanmugam and Chattamvelli (2015), the normal distribution is selected for the data. The number Y of ambulance services per day

Why Models Are Important in Healthcare

Table 6.3 Number infected among n = 32 nurses who provided care to SARS patients. (Source: Shanmugam, 2014)

Service

Y

Service continues

Y

Medication given

5

Venipuncture

6

Patient assessed

6

y=

4.313 2.229

Patient transferred

7

s2y =

Aspirate endotracheal

3

^ =  mle

0.252

Peripheral insertion

3

^ π^ mle; =

0.101

Intubation

3

π^ mle;¼0 =

0.135

Mask manipulated

3

p value =

0.001

Commodes manipulated

3

statistical power to accept true  ¼ 0:5

Oxygen mask manipulated

7

Mouth/dental care

5

Nebulizer treatment

3

Electrocardiogram performed

4

Odds ratio, eOddsðπÞ ðeπe1Þ ^ nurse is inf ectedÞ= Prðno

Radiology set up

4

Suctioning after intubation

4

Suctioning before intubation

3

0.948

Odds

PrðY¼ 0jπ;  ≠ 0Þ PrðY> 0jπ; ≠ 0Þ

for zero infection, when ≠0 is

0.009

Odds

PrðY¼ 0jπ; ¼ 0Þ PrðY> 0jπ; ¼ 0Þ

for zero infection policy when  ¼ 0 is

0.034

π

10.13 0.033

^ ¼ 0:252 and  ¼ 0 Figure 6.3 Comparison of survival functions with 

follows a Gaussian (normal) probability distribution. The 1 yμ 2 model is PrðY ¼ yÞ ¼ σ p1ffiffiffiffi e2ð σ Þ ; ∞ < y < ∞. 2π Figure 6.5 compares ambulance services for Monday through Friday. It is interesting that Friday has low occurrences compared to Monday through Thursday. Only Monday is evenly correlated with Tuesday through Thursday (see Table 6.5b). An example of the time a patient takes in each required activity at a patient care unit is the following (see Table 6.6). The correlation of activities (in minutes) in a patient healthcare unit are computed and displayed in Table 6.7. Time is of the essence in healthcare. Table 6.6 surveys how

nurses caring for random patients (P1, P2, P3, P4, P5, and P6) spent their time (in minutes) on the activities of providing medication, collecting blood/lab results, feeding patients, responding to patient calls, and transporting patients. The exponential probability distribution y f ðyjλÞ ¼ 1λ eðλÞ ; y > 0; λ > 0 is appropriate with mean μy ¼ EðyjλÞ ¼ λ and variance σ 2y ¼ VarðyjλÞ ¼ μ2y . According to Table 6.8, the Kolmogorov-Smirnov (KS) score and its p-value suggest the time spent is indeed exponential probability distribution. The KS test is a powerful nonparametric technique. The p-value is the probability of rejecting the exponential probability

165 https://doi.org/10.1017/9781009212021.008 Published online by Cambridge University Press

Data-Guided Healthcare Decision Making

Table 6.4 A statistical comparison of the exponentiality of the times taken by 11 patients

Patient number

95% Confidence interval for the exponential rate

Kolmogorov-Smirnov statistic

p-Value

Lower

Upper

1

0.05

0.18

0.29

0.32

2

0.04

0.16

0.39

0.07

3

0.03

0.14

0.26

0.41

4

0.04

0.15

0.33

0.19

5

0.04

0.17

0.27

0.41

6

0.04

0.16

0.32

0.20

7

0.05

0.18

0.29

0.32

8

0.03

0.14

0.35

0.13

9

0.04

0.16

0.29

0.29

10

0.03

0.13

0.28

0.35

11

0.04

0.15

0.32

0.21

Comparison of the times in activities by the patients in hospital p1

p2

p3

p4

p5

p6

p7

p8

p9

25 20 15 10 5 0

# Ambulance services/day Mon

Tue

3,500 3,000 2,500 2,000 1,500 1,000 500 0 Figure 6.5 Comparison of number of ambulance services per day

166 https://doi.org/10.1017/9781009212021.008 Published online by Cambridge University Press

Wed

Thurs

Fri

p10

Figure 6.4 How patients compare in the time they spend on activities in the hospital. (Source: Shanmugam, 2014)

Why Models Are Important in Healthcare

distribution as a model for the data. Table 6.8 shows the underlying model for the time spent is exponential. See Figure 6.6 for a comparison of the activities. Feeding the patient takes the most time. The medication is negatively correlated with responding to patients and transporting patients. These five variables are grouped into two principal components (PC1 and PC2). PC1 consists of feeding patients and responding to patient calls. PC2

Table 6.5(a) Number of ambulance services per day. (Source: Shanmugam, 2011)

consists of feeding patients, responding to patient calls, collecting blood/lab results, and transporting patients (see Figure 6.7). How much of a nation’s gross domestic product (GDP) is spent on healthcare? This gives an impression about the importance of healthcare. The GDP is used for international comparisons. The United States, Canada, Australia, India, and China had high spending on healthcare in 2000 (see Figure 6.8). Let us consider China, Germany, India, and the United States. These countries split into two principal components (PC). PC1 consists of India and the United States. PC2 consists of China and Germany (see Figures 6.9 and 6.10). Let inverse binomial distribution (IBD) be a model for the number of words recognized by Alzheimer’s patients. Alzheimer’s disease (AD) causes dementia. In early stages, AD creates difficulty with memory. When AD advances, it

Month

Mon

Tue

Wed

Thurs

Fri

Apr

2,356

2,245

2,213

2,215

1,542

May

2,427

2,312

2,279

2,281

1,588

June

2,309

2,200

2,169

2,171

1,511

July

2,299

2,191

2,160

2,162

1,505

Aug

2,328

2,218

2,186

2,188

1,523

Sep

3,391

2,279

2,246

2,248

1,565

Oct

2,396

2,283

2,251

2,253

1,568

Nov

2,388

2,275

2,243

2,245

1,563

Mon

1

Dec

2,302

2,193

2,162

2,164

1,507

Tue

0.3468

1

Jan

2,402

2,289

2,256

2,258

1,572

Wed

0.3441

0.9999

1

Feb

2,372

2,261

2,228

2,231

1,553

Thurs

0.3434

1

1

1

Mar

2,382

2,270

2,237

2,239

1,559

Fri

0.3438

0.9999

0.9999

1

Table 6.5(b) Correlation

Mon

Tue

Wed

Thurs

Fri

1

Table 6.6 Time spent on patient activity in a hospital. (Source: Ozcan, 2005)

Patient care unit activities (in minutes)

Medication

P1

4

P2

3

P3

4

P4 P5 P6

Collecting blood/lab

Feeding patients

Responding to patients calls

Transporting patients

8

18

4

11

7

21

5

11

6

20

4

12

4

9

21

7

12

5

10

21

6

9

3

7

20

8

10

Table 6.7 Correlation of activities (in minutes) in a patient healthcare unit

Correlation Medication

Medication

Collecting blood/lab

Feeding patients

Responding to patients calls

Transporting patients

1

Collecting blood/lab

0.69

1

Feeding patients

0.04

0.25

1

0.31

0.45

Responding to patients calls

−0.2

Transporting patients

−0.3

−0.5

−0.1

1 −0.3

1

167 https://doi.org/10.1017/9781009212021.008 Published online by Cambridge University Press

Data-Guided Healthcare Decision Making

Table 6.8 Are the activity times in patient healthcare units part of an exponential probability pattern?

Activity

95% Lower est. λ

95% Upper est. λ

KolmogorovSmirnov score

p-Value

Medication

0.05

0.47

0.54

0.05

Collect blood/lab specimen

0.03

0.23

0.53

0.06

Feeding patients

0.01

0.09

0.59

0.03

Responding to patient calls

0.03

0.31

0.51

0.09

Transporting patients

0.02

0.16

0.56

0.04

Time spent by patients in healthcare unit Medication

Collecting blood/lab

Responding to patients calls

Transporting patients

Feeding patients

25 20 15 10 5 0 Figure 6.6 Comparison of time patients spend in a care unit

Responding to patients calls

RC2

Feeding patients/collecting trays

Transporting patients

RC1

Collecting blood/lab specimen

Medication

168 https://doi.org/10.1017/9781009212021.008 Published online by Cambridge University Press

Figure 6.7 Path diagram of the activities (in minutes) in the patient healthcare unit

Why Models Are Important in Healthcare

Percent of GDP spent on health in 2005 Series1 2005

9.8

5.2

15.2

4.7 2.1

5

8.8

Figure 6.8 Developed countries’ GDP spent on healthcare, 2000

results in disorientation. Patients with AD withdraw from society. Approximately 29.8 million people have AD. Treating AD costs $200 billion per year in the United States. Let Y be a random number of words recognized by an AD patient. An underlying model for Y is a Poisson distribution PrðyÞ ¼ eλ λy =y!; y ¼ 0; 1; 2; 2; …::; ; λ > 0, where the parameter λ > 0 denotes the rate of remembering words. The rate varies stochastically from one person to another as a gamma probability pattern f ðλjr; pÞ ¼ p r Þ ΓðrÞ; r > 1; 0 < p < 1, where r and p λr1 eλð1pÞ=p =ð1p are hyper parameters. The unconditional probability mass r y function for Y is IBD Prðyjr; pÞ ¼ ΓðrþyÞ y!ΓðrÞ p ð1  pÞ ; r ≥ 1; 0 < p < 1; y ¼ 0; 1; 2; …:; ∞. The mean and variance p of IBD are μy ¼ Eðyjr; pÞ ¼ r ð1pÞ and σ 2y ¼ Varðyjr; pÞ ¼ μy ð1pÞ.

p Note ð1pÞ is the odds of recognizing a word. The variance in IBD is not less than its mean. The mean and variance are equal in the Poisson model. The variance

is less than the mean in the binomial model. The chisquared and p-values show the data are inverse binomially distributed, as seen in Table 6.11. The 95% confidence interval for the incidence rate is shown in Table 6.11. A similar number of words remembered by Alzheimer’s patients in a treatment group is summarized in Table 6.12 with the correlation of the data variables in Table 6.13. The number of words recognized prior to the period and in every year are positively correlated in both the control and the treatment groups, as suggested by Figures 6.11 and 6.12. The chi-squared and p-values show the treatment group data are inverse binomially distributed, according to Table 6.14. See the 95% confidence intervals for the incidence rate in the treatment group in Table 6.14. The number of words recognized prior to the experimental period and the number of words recognized annually at every year are positively correlated. This positive correlation is indicative of an improvement pattern in the treatment group.

169 https://doi.org/10.1017/9781009212021.008 Published online by Cambridge University Press

Data-Guided Healthcare Decision Making

Comparison of a few countries USA

Germany

India

China

16 14 12 10 8 6 4 2 0 Figure 6.9 Comparison of countries’ GDP spent on healthcare

Figure 6.10 How a few countries spend their GDP on healthcare

The seizure data in control and treatment groups are in Tables 6.15 and 6.18, respectively. Epileptic seizures are documented as early as 2,000 BC in Akkadian texts and in ancient Greek medical literature. Epilepsy is a Greek word meaning to seize or afflict. Greeks performed surgery in order to cure epilepsy. Provoked seizures are due to low blood sugar or alcohol. The data variables are age (in years), initial number of seizures Y0 , and the number of seizures Y1 ; Y2 ; Y3 ; Y4 in years 1 through 4.

170 https://doi.org/10.1017/9781009212021.008 Published online by Cambridge University Press

Figures 6.13 and 6.15 compare Y0 , Y1 ; Y2 ; Y3 ; Y4 in a control group versus a treatment group. The initial number Y0 in the treatment group is significantly more than its counterpart in the control group, according to Figures 6.14 and 6.16. Poisson distribution requires an equal mean and variance. Because the variance and mean are unequal in the seizure data, the random variables Y0 , Y1 ; Y2 ; Y3 ; Y4 are assumed to follow an incidence rate restricted Poisson (IRRP) distribution. The probability mass function of IRRP distribution is

Why Models Are Important in Healthcare

Table 6.9 Number of words remembered by Alzheimer’s patients in a control group. (Source: Shanmugam, 2013)

Control group

Y0

Y1

Y2

Y4

Y6

1

20

19

20

20

18

2

14

15

16

9

6

3

7

5

8

8

5

4

6

10

9

10

10

5

9

7

9

5

6

6

9

10

9

11

11

7

7

3

7

6

3

8

18

20

20

23

21

9

6

10

10

13

14

10

10

15

15

15

14

11

5

8

7

3

12

12

11

11

8

10

13

10

2

9

14

17

12

14

15

16

15

13

y y1   λ y λe½β =y!eλ ; y ¼ 0; 1; 2; …; ∞; 1þ β λ > 0; 1 < β < 1;

PrðYjλ; βÞ ¼

as described in Shanmugam (1991). The MLEs are ^λ ¼ ysy

3=2

^ ¼ 0, the incidence rate is ^ ¼  ^λ . When β and β qffiffiffi 1

y s2y

unrestricted and the IRRP model becomes a regular Table 6.10 Correlation of the number of words remembered by Alzheimer’s patients in a control group

Year

Y0

9

Y0

1

3

2

Y1

0.73

1

15

13

Y2

0.72

0.81

1

9

Y4

0.62

0.83

0.8

1

Y6

0.58

0.83

0.71

0.85

7

16

7

10

4

10

5

17

5

0

5

0

0

18

16

7

7

6

10

19

5

6

9

5

6

20

2

1

1

2

2

21

7

11

7

5

11

22

9

10

17

10

6

23

2

5

6

7

24

7

3

5

25

10

13

26

7

5

Y1

Y2

Y4

Y6

1

Table 6.11 Do the number of words remembered by Alzheimer’s patients in a control group follow a Poisson probability pattern?

p-Value

6

Year Lower Upper Chiestimate with estimate with squared 95% CI 95% CI score

5

5

Y0

8.13

10.48

65.28

tjillness2Þ ¼ eμt , where the parameter μ > 0 denotes the inverse of the average survival time after curing the second illness. This survival function is a decreasing function. The reliability of a treatment for a patient having two bon-independent illnesses is the product of their reliabilities if the first illness is treated before the second illness is treated. That is, Spatient ðtÞ ¼ 1  f1  eλt gf1  eμt g If the two illnesses

can be treated simultaneously, then the reliability of the patient having a quality life is then Spatient ðtÞ ¼ eλt eμt ¼ eðλþμÞt .

13.4 Summary In this chapter, the similarities and differences between the Six Sigma methods and lean management principles were exposed and explained with healthcare examples. The genesis of both concepts was provided so as to motivate readers for further development. Waste in healthcare services was emphasized. The value refers to all activities (both value-added and

389 https://doi.org/10.1017/9781009212021.015 Published online by Cambridge University Press

Data-Guided Healthcare Decision Making

Chomparative Box plots Clinic

1 = Morning, 2 = Evening

WaitingTime Patient1

WaitingTime Patient1

Figure 13.64 Comparison of the wait times to be seen by physician/ surgeon

100 90 80

Minutes

70 60 50 40 30 20 10 0 Figure 13.65 Average chart for patient 1

non-value-added activities) with the purpose of improving the quality of healthcare services and providing a quality life for patients. The complexity of these issues was eased. The efficient utility of labor, time, and resources was shown to be within reach using Six Sigma Methods and lean management principles, which can be used to address overmedication, storage of drugs, lack of energy/knowledge of healthcare professionals, inefficient healthcare service and utility of service time, transportation of patients and physicians/surgeons/ nurses, and patient safety.

390 https://doi.org/10.1017/9781009212021.015 Published online by Cambridge University Press

13.5 Exercises 1. Identify a scenario in a healthcare setting in which treatments are offered for diabetes. 2. Itemize the steps that help increase efficiency in treatment for diabetic patients. 3. Construct a Six Sigma plan for storage of blood in a hospital.

Six Sigma and Lean Management in Healthcare Sectors

Figure 13.66 Average chart for patient 2

Figure 13.67 Standard deviation chart for patient 1

4. Write a proposal to install an efficient informationprocessing center for healthcare professionals, patients, and insurance agents. 5. Make an operational plan to utilize wireless technology for the best communication between patients and doctors/nurses/pharmacists. 6. Construct a Six Sigma study to reduce (if not totally eliminate) adverse events in a hospital/clinic. 7. How would you use Six Sigma methods or lean management concepts to protect patient privacy?

8. Describe a Six Sigma plan to reduce hospital-acquired infections. 9. Elaborate a lean management approach to reduce wait time and/or healthcare service time in a hospital. 10. How would a Six Sigma or lean management plan help control COVID-19 infections within a hospital/ clinic? 11. Elaborate a plan using lean management and/or Six Sigma concepts to offer the best treatment for COVID19 during a pandemic.

391 https://doi.org/10.1017/9781009212021.015 Published online by Cambridge University Press

Figure 13.68 Standard deviation chart for patient 2

Figure 13.69 Range chart for patient 1

Figure 13.70 Range chart for patient 2

https://doi.org/10.1017/9781009212021.015 Published online by Cambridge University Press

Six Sigma and Lean Management in Healthcare Sectors

Table 13.14 Wait time of two patients in the morning and in the evening at 10 clinics (Source: www.kff.org)

Table 13.14 (cont.)

Clinic

Wait time of Time first patient (1 = morning, 2= evening)

Wait time of second patient

9.00

2.00

37.00

28.00

10.00

1.00

49.00

45.00

10.00

2.00

28.00

37.00

1.00

1.00

87.00

27.00

Average ðxÞ

53.4

35.35

Standard deviation ðsÞ

26.32

11.69

Clinic

Wait time of Time first patient (1 = morning, 2= evening)

Wait time of second patient

1.00

1.00

87.00

27.00

1.00

2.00

47.00

37.00

2.00

1.00

82.00

42.00

2.00

2.00

71.00

23.00

3.00

1.00

24.00

35.00

3.00

2.00

49.00

26.00

4.00

1.00

73.00

34.00

4.00

2.00

94.00

65.00

5.00

1.00

10.00

23.00

5.00

2.00

35.00

34.00

Table 13.15 Two-way analysis of variation (AOV) with interaction

6.00

1.00

49.00

26.00

Source

6.00

2.00

78.00

35.00

Clinic

7.00

1.00

74.00

53.00

Morning/Evening

7.00

2.00

28.00

31.00

Interaction

8.00

1.00

17.00

25.00

8.00

2.00

42.00

25.00

9.00

1.00

94.00

56.00

df

SS

6,492.12

9

MS

F

P-value

721.47

1.71

0.13

180.62

1

180.62

0.42

0.52

56,769.49

29

422.15

-

-

(df = degrees of freedom, SS = sum of squares, MS = average sum of squares, F = F-value, P = p-value)

Figure 13.71 Box plot for wait times in morning versus evening

393 https://doi.org/10.1017/9781009212021.015 Published online by Cambridge University Press

Data-Guided Healthcare Decision Making

Figure 13.72 Box plot for wait times in clinics

12. What is the confounding (contextual) scenario in devising the best treatment for COVID-19?

How do lean management or Six Sigma methods help in the simulation approach?

13. Collect, analyze, and interpret data on the top 10 causes of death in US states. How would lean management or Six Sigma methods help reduce the mortality rates from these causes?

21. Use lean management and/or Six Sigma concepts to reduce patient wait time.

14. Develop a strategic public health policy to control the COVID-19 pandemic in US states using lean management and/or Six Sigma methods. Then construct a SWOT analysis.

22. Is there any waste or inefficiency in the operation of ambulances? If so, how would lean management and/ or Six Sigma concepts help resolve them?

15. Do lean management or Six Sigma methods help increase competition in the delivery of healthcare? Select scenarios to illustrate your main ideas.

23. Download the COVID-19 deaths data from the webpage https://ourworldindata.org/coronavirus and construct control charts for the average using range and standard deviation. Analyze the proportion of days in each month on which the number of deaths exceeded the average for the month.

16. Identify the wastes in the treatment of COVID-19 and offer remedies to fix them using lean management and/or Six Sigma methods.

24. Construct, compare, and interpret control charts for the proportions in each group for the data in Table 13.16.

17. Explain with data, scenarios, or cases how healthcare disparities changed during the COVID-19 pandemic.

25. Construct, compare, and interpret control charts for the proportions in each group for the data presented in Table 13.17.

18. Write a bioassay on the communication barriers between COVID-19 patients, medical professionals, and insurance agents using the Six Sigma methods.

26. Construct, compare, and interpret control charts for the proportions in each group for the data presented in Table 13.18.

19. Itemize the issues in the emergency wing of a hospital. As a health administrator, how would you remedy each issue using lean management and/or Six Sigma methods?

27. Construct, compare, and interpret control charts for the proportions in each group for the data presented in Table 13.19.

20. Do you visualize the benefits or importance of simulation to make the treatment of COVID-19 efficient?

394 https://doi.org/10.1017/9781009212021.015 Published online by Cambridge University Press

28. Construct, compare, and interpret control charts for the proportions in each group for the data presented in Table 13.20.

Six Sigma and Lean Management in Healthcare Sectors

Table 13.16 Centers for Disease Control and Prevention, 2019–2020 Influenza Season Vaccination Coverage Dashboard (Source: www.cdc.gov/flu/fluvaxview/reportshtml/reporti1920/repor tii/index.html). Adult Flu Vaccination Rates by Age (Source: KFF)

Table 13.16 (cont.)

Location

All Aged 18– adults 49 years

Aged 50– 64 years

Aged 65+

Location

All Aged 18– adults 49 years

Aged 50– 64 years

Aged 65+

South Carolina

0.477

0.337

0.527

0.720

Alabama

0.458

0.344

0.481

0.688

South Dakota

0.551

0.477

0.542

0.732

Alaska

0.421

0.384

0.421

0.544

Tennessee

0.451

0.357

0.476

0.636

Texas

0.422

0.337

0.452

0.647

Utah

0.487

0.421

0.518

0.698

Vermont

0.545

0.452

0.545

0.731

Virginia

0.557

0.475

0.585

0.735

Washington

0.534

0.439

0.547

0.744

Washington, DC

0.493

0.431

0.522

0.685

West Virginia

0.509

0.385

0.526

0.714

Wisconsin

0.566

0.458

0.579

0.762

Wyoming

0.438

0.332

0.473

0.626

Arizona

0.432

0.335

0.414

0.653

Arkansas

0.517

0.425

0.536

0.699

California

0.475

0.377

0.534

0.662

Colorado

0.514

0.413

0.554

0.762

Connecticut

0.563

0.436

0.604

0.780

Delaware

0.516

0.391

0.590

0.680

Florida

0.418

0.304

0.373

0.66

Georgia

0.43

0.321

0.460

0.696

Hawaii

0.468

0.378

0.485

0.644

Idaho

0.414

0.327

0.427

0.611

Illinois

0.498

0.403

0.530

0.694

Indiana

0.48

0.384

0.489

0.705

Iowa

0.538

0.443

0.567

0.724

Kansas

0.509

0.407

0.529

0.736

Kentucky

0.484

0.416

0.489

0.631

Louisiana

0.440

0.329

0.494

0.655

Maine Maryland

0.530

0.407

0.552

0.718

0.531

0.422

0.574

0.752

Massachusetts 0.568

0.464

0.595

0.776

Michigan

0.483

0.374

0.454

0.751

Minnesota

0.534

0.455

0.538

0.725

Mississippi

0.441

0.323

0.488

0.679

Missouri

0.475

0.374

0.486

0.700

Montana

0.477

0.401

0.445

0.666

Nebraska

0.553

0.461

0.584

0.743

Nevada

0.423

0.313 0.443

0.422 0.551

0.678 0.726

Table 13.17 Adults aged 65+ who report having a pneumonia vaccine by gender (Source: www.kff.org)

Location

All adults

Male

Female

Alabama

0.718

0.698

0.734

Alaska

0.641

0.580

0.697

Arizona

0.724

0.698

0.746

Arkansas

0.714

0.680

0.740

California

0.683

0.652

0.708

Colorado

0.783

0.747

0.813

Connecticut

0.745

0.720

0.764

Delaware

0.753

0.740

0.765

Florida

0.668

0.650

0.684

Georgia

0.697

0.667

0.721

Hawaii

0.663

0.604

0.709

Idaho

0.691

0.643

0.734

Illinois

0.690

0.654

0.717

New Hampshire

0.542

New Jersey

0.452

0.346

0.483

0.661

Indiana

0.722

0.682

0.754

New Mexico

0.488

0.379

0.495

0.717

Iowa

0.738

0.689

0.777

0.736

0.699

0.765

0.486

0.397

0.485

0.689

Kansas

North Carolina 0.535

0.414

0.564

0.789

Kentucky

0.734

0.721

0.745

North Dakota

0.440

0.542

0.725

Louisiana

0.695

0.643

0.737

0.745

0.728

0.760

New York

0.529 0.486

0.371

0.521

0.698

Maine

Oklahoma

0.523

0.440

0.515

0.733

Maryland

0.766

0.733

0.791

Oregon

0.482

0.385

0.498

0.682

Massachusetts

0.750

0.700

0.789

Michigan

0.726

0.694

0.751

Minnesota

0.715

0.660

0.760

Mississippi

0.666

0.636

0.689

Ohio

Pennsylvania

0.530

0.420

0.563

0.715

Rhode Island

0.568

0.468

0.618

0.751

395 https://doi.org/10.1017/9781009212021.015 Published online by Cambridge University Press

Data-Guided Healthcare Decision Making

Table 13.17 (cont.)

Table 13.18 (cont.)

Location

All adults

Male

Female

Location

All adults

Male

Female

Missouri

0.758

0.724

0.786

Georgia

0.178

0.147

0.206

Montana

0.728

0.714

0.741

Hawaii

0.082

0.077

0.086

Nebraska

0.761

0.719

0.794

Idaho

0.145

0.119

0.171

Nevada

0.670

0.597

0.734

Illinois

0.133

0.120

0.145

New Hampshire

0.766

0.732

0.796

Indiana

0.126

0.115

0.137

New Mexico

0.716

0.685

0.743

Iowa

0.085

0.088

0.082

New York

0.650

0.613

0.677

Kansas

0.131

0.109

0.154

North Carolina

0.759

0.733

0.778

Kentucky

0.121

0.119

0.123

North Dakota

0.741

0.709

0.768

Louisiana

0.148

0.144

0.152

Ohio

0.747

0.727

0.763

Maine

0.123

0.116

0.130

Oklahoma

0.765

0.732

0.791

Maryland

0.109

0.105

0.113

Oregon

0.742

0.686

0.788

Massachusetts

0.087

0.080

0.092

Pennsylvania

0.744

0.729

0.755

Michigan

0.117

0.109

0.124

Rhode Island

0.744

0.705

0.773

Minnesota

0.100

0.086

0.114

South Carolina

0.738

0.704

0.765

Mississippi

0.172

0.152

0.191

South Dakota

0.731

0.731

0.732

Missouri

0.143

0.126

0.159

Tennessee

0.730

0.707

0.747

Montana

0.103

0.088

0.118

Texas

0.711

0.666

0.748

Nebraska

0.126

0.114

0.137

Utah

0.761

0.750

0.770

Nevada

0.151

0.142

0.161

Vermont

0.717

0.650

0.773

New Hampshire

0.114

0.098

0.130

Virginia

0.756

0.722

0.783

New Mexico

0.139

0.132

0.146

Washington

0.765

0.710

0.811

New York

0.115

0.119

0.112

Washington, DC

0.705

0.664

0.733

North Carolina

0.159

0.139

0.179

West Virginia

0.736

0.715

0.754

North Dakota

0.092

0.075

0.110

Wisconsin

0.772

0.753

0.786

Ohio

0.121

0.116

0.125

Wyoming

0.691

0.657

0.722

Oklahoma

0.162

0.140

0.183

Guam

0.424

0.394

0.450

Oregon

0.135

0.119

0.150

Puerto Rico

0.337

0.336

0.339

Pennsylvania

0.100

0.093

0.106

Rhode Island

0.086

0.084

0.089

South Carolina

0.149

0.127

0.169

Table 13.18 Adults who report not seeing a doctor in the past 12 months because of cost by gender (Source: www.kff.org)

Location

All adults

Male

Female

Alabama

0.181

0.158

0.203

Alaska

0.135

0.118

0.154

Arizona

0.139

0.125

0.152

Arkansas

0.157

0.139

0.174

California

0.119

0.108

0.129

Colorado

0.121

0.103

0.139

Connecticut

0.099

0.099

0.099

Delaware

0.106

0.095

0.116

Florida

0.16

0.158

0.162

396 https://doi.org/10.1017/9781009212021.015 Published online by Cambridge University Press

South Dakota

0.098

0.091

0.105

Tennessee

0.149

0.151

0.147

Texas

0.188

0.145

0.229

Utah

0.143

0.125

0.162

Vermont

0.093

0.099

0.087

Virginia

0.120

0.104

0.135

Washington

0.115

0.105

0.125

Washington, DC

0.104

0.097

0.100

West Virginia

0.134

0.133

0.135

Wisconsin

0.106

0.105

0.107

Wyoming

0.135

0.121

0.149

Guam

0.206

0.181

0.231

Puerto Rico

0.147

0.148

0.146

Six Sigma and Lean Management in Healthcare Sectors

Table 13.19 Adults reporting symptoms of anxiety or depressive disorder during the COVID-19 pandemic by household job loss (Source: www.kff.org)

Location

All Household job No household adults loss in the past job loss in the past 4 weeks 4 weeks

Alabama

0.359

0.636

0.292

Alaska

0.274

0.453

0.237

Arizona

0.334

0.571

0.273

Arkansas

0.264

0.615

0.215

California

0.303

0.497

0.248

Colorado

0.279

0.495

0.244

Connecticut

0.268

0.495

0.237

Delaware

0.258

0.527

0.215

Florida

0.266

0.420

Georgia

0.305

Hawaii

0.329

Table 13.19 (cont.)

Location

All Household job No household adults loss in the past job loss in the past 4 weeks 4 weeks

South Carolina

0.320

0.655

0.238

South Dakota

0.252

0.639

0.199

Tennessee

0.298

0.627

0.245

Texas

0.316

0.546

0.246

Utah

0.281

0.545

0.237

Vermont

0.255

0.696

0.193

Virginia

0.259

0.627

0.204

Washington

0.283

0.500

0.238

0.280

0.471

0.244

0.229

Washington, DC

0.581

0.237

West Virginia

0.341

0.624

0.287

0.455

0.302

Wisconsin

0.248

0.555

0.206

Wyoming

0.275

0.615

0.242

Idaho

0.289

0.549

0.261

Illinois

0.240

0.410

0.214

Indiana

0.288

0.513

0.250

Iowa

0.253

0.516

0.224

Kansas

0.275

0.454

0.247

Kentucky

0.343

0.573

0.314

Louisiana

0.298

0.496

0.229

Table 13.20 Adults reporting symptoms of anxiety or depressive disorder during the COVID-19 pandemic by age (Source: www.kff.org)

Location

All Ages adults 18–29

Ages 30–49

Ages 50–64

Ages 65+

Alabama

0.355

0.562

0.400

0.300

0.235

Alaska

0.318

0.466

0.324

0.315

0.139

Arizona

0.344

0.494

0.387

0.325

0.223

Arkansas

0.348

0.507

0.450

0.247

0.191

Maine

0.237

0.533

0.208

Maryland

0.304

0.513

0.254

Massachusetts 0.265

0.495

0.228

Michigan

0.290

0.607

0.236

Minnesota

0.216

0.484

0.184

Mississippi

0.330

0.456

0.288

Missouri

0.300

0.605

0.239

Montana

0.269

0.444

0.251

Nebraska

0.255

0.608

0.207

Nevada

0.356

0.625

0.281

New Hampshire

0.299

0.593

0.249

Hawaii

0.285

0.520

0.349

0.288

0.119

New Jersey

0.219

0.424

0.182

Idaho

0.308

0.627

0.331

0.218

NSD

New Mexico

0.366

0.589

0.294

Illinois

0.297

0.415

0.338

0.248

0.201

New York

0.267

0.377

0.242

Indiana

0.332

0.479

0.404

0.229

0.173

North Carolina 0.312

0.559

0.275

Iowa

0.294

0.451

0.336

0.237

0.161

0.287

0.488

0.307

0.270

0.162

0.348

0.734

0.379

0.242

0.227

California

0.373

0.572

0.390

0.316

0.240

Colorado

0.315

0.553

0.359

0.194

0.167

Connecticut

0.315

0.528

0.441

0.236

0.164

Delaware

0.247

0.482

0.283

0.220

0.154

Florida

0.350

0.623

0.397

0.277

0.202

Georgia

0.336

0.469

0.350

0.322

0.203

North Dakota

0.271

0.537

0.218

Kansas

Ohio

0.312

0.509

0.275

Kentucky

Oklahoma

0.375

0.756

0.293

Louisiana

0.344

0.562

0.338

0.338

NSD

0.257

NSD

0.360

0.198

0.142

Oregon

0.360

0.656

0.295

Maine

Pennsylvania

0.258

0.557

0.210

Maryland

0.325

0.568

0.382

0.238

0.137

Rhode Island

0.262

0.596

0.209

Massachusetts 0.316

0.553

0.378

0.231

0.150

Michigan

0.525

0.374

0.310

0.178

0.335

397 https://doi.org/10.1017/9781009212021.015 Published online by Cambridge University Press

Data-Guided Healthcare Decision Making

Table 13.20 (cont.)

Location

All Ages adults 18–29

Ages 30–49

Ages 50–64

Ages 65+

Minnesota

0.247

0.371

0.297

0.200

0.130

Mississippi

0.309

NSD

0.428

0.379

NSD

Missouri

0.327

0.556

0.355

0.307

0.135

Montana

0.361

0.653

0.412

0.223

0.268

Nebraska

0.312

0.368

0.422

0.253

0.152

Nevada

0.326

0.475

0.379

0.282

0.207

New Hampshire

0.273

0.543

0.354

0.221

0.135

New Jersey

0.327

0.348

0.362

0.335

0.242

New Mexico

0.308

0.419

0.341

0.260

0.254

New York

0.312

North Carolina 0.312

0.453 0.424

0.371 0.393

0.239 0.311

0.180 0.142

North Dakota

0.314

0.625

0.325

0.255

NSD

Ohio

0.271

0.546

0.300

0.242

0.098

Oklahoma

0.291

0.498

0.350

0.233

0.148

Oregon

0.333

0.541

0.422

0.240

0.169

Pennsylvania

0.302

0.453

0.407

0.294

0.116

Rhode Island

0.265

NSD

0.305

0.291

NSD

Table 13.21 Average annual deductible per enrolled employee in employer-based health insurance for single and family coverage (Source: www.kff.org)

Location

Average family deductible

Average single deductible

Alabama

$3,029

$1,616

Alaska

$3,626

$1,869

Arizona

$4,017

$2,418

Arkansas

$3,586

$1,839

California

$3,329

$1,675

Colorado

$3,469

$1,907

Connecticut

$4,199

$2,289

Delaware

$3,002

$1,703

Florida

$3,632

$1,993

Georgia

$3,659

$1,914

Hawaii

$2,619

$1,264

Idaho

$3,499

$1,933

Illinois

$3,849

$1,876

Indiana

$3,937

$2,122

Iowa

$4,064

$2,202

Kansas

$3,607

$1,904

Kentucky

$3,798

$2,101

South Carolina

0.298

0.508

0.331

0.243

0.235

Louisiana

$4,299

$2,037

South Dakota

0.266

0.556

0.293

0.184

NSD

Maine

$3,994

$2,303

Tennessee

0.319

0.400

0.402

0.233

0.214

Maryland

$3,009

$1,673

Texas

0.346

0.608

0.313

0.243

0.269

Massachusetts

$3,151

$1,593

Utah

0.251

0.406

0.256

0.169

0.142

Michigan

$2,856

$1,579

Vermont

0.314

0.571

0.307

0.274

0.248

Minnesota

$4,160

$2,272

Virginia

0.258

0.401

0.251

0.288

0.133

Mississippi

$3,468

$1,587

Washington

0.297

0.463

0.358

0.242

0.130

Missouri

$4,222

$2,160

Washington, DC

0.374

0.585

0.365

0.446

0.108

Montana

$3,842

$2,521

Nebraska

$3,799

$2,042

West Virginia

0.421

0.646

0.532

0.332

NSD

Nevada

$3,100

$1,810

Wisconsin

0.260

0.455

0.295

0.229

0.132

$4,379

$2,386

Wyoming

0.220

NSD

0.288

0.163

0.089

New Hampshire New Jersey

$3,456

$1,713

New Mexico

$3,992

$2,011

New York

$2,899

$1,655

North Carolina

$4,005

$2,281

North Dakota

$3,980

$1,950

Ohio

$4,132

$2,101

Oklahoma

$4,053

$2,165

Oregon

$3,634

$1,958

Pennsylvania

$2,981

$1,646

Rhode Island

$4,031

$1,983

South Carolina

$4,155

$2,151

29. Construct, compare, and interpret control charts for the average in each group for the data presented in Table 13.21. 30. Construct, compare, and interpret control charts for the average in each group for the data in presented in Table 13.22. 31. Construct, compare, and interpret control charts for the proportion in each group for the data presented in Table 13.23.

398 https://doi.org/10.1017/9781009212021.015 Published online by Cambridge University Press

Six Sigma and Lean Management in Healthcare Sectors

Table 13.21 (cont.)

Table 13.22 (cont.)

Location

Average family deductible

Average single deductible

Location

Average exchange deductible for ACA marketplace plans

South Dakota

$4,222

$2,408

$4,615

$2,334

North Dakota

$3,495

Tennessee Texas

$4,174

$2,155

Ohio

$4,786

Utah

$3,842

$1,781

Oklahoma

$3,120

Oregon

$4,363

South Carolina

$4,386

South Dakota

$3,970

Tennessee

$3,706

Texas

$2,977

Utah

$4,263

Virginia

$3,473

West Virginia

$3,618

Table 13.22 Average Marketplace deductible (Source: www.kff.org)

Wisconsin

$4,167

Location

Wyoming

$2,234

Vermont

$3,330

$1,935

Virginia

$3,313

$1,688

Washington

$3,435

$1,793

Washington, DC

$2,679

$1,306

West Virginia

$3,645

$1,959

Wisconsin

$3,904

$2,061

Wyoming

$3,579

$1,895

Average exchange deductible for ACA marketplace plans

Alabama

$1,902

Alaska

$3,525

Table 13.23 Child flu vaccination rates by age (Source: www.kff.org)

Arizona

$4,334

Location

Arkansas

$3,906

All 6 months to children 4 years

5–12 years

13–17 years

Delaware

$3,023

Alabama

0.578

0.664

0.614

0.458

Florida

$2,734

Alaska

0.571

0.728

0.580

0.423

Georgia

$2,237

Arizona

0.581

0.703

0.616

0.430

Hawaii

$2,226

Arkansas

0.659

0.684

0.708

0.572

Illinois

$3,965

California

0.650

0.807

0.647

0.505

Indiana

$4,474

Colorado

0.709

0.832

0.718

0.600

Iowa

$3,344

Connecticut

0.780

0.900

0.814

0.641

Kansas

$3,017

Delaware

0.681

0.756

0.707

0.596

Kentucky

$4,450

Florida

0.558

0.644

0.546

0.504

Louisiana

$2,956

Georgia

0.556

0.660

0.558

0.469

Maine

$4,549

Hawaii

0.670

0.697

0.685

0.612

Michigan

$4,495

Idaho

0.555

0.674

0.577

0.458

Mississippi

$1,946

Illinois

0.605

0.711

0.617

0.505

Missouri

$3,148

Indiana

0.602

0.722

0.608

0.476

Montana

$4,705

Iowa

0.663

0.762

0.680

0.563

Nebraska

$3,597

Kansas

0.661

0.790

0.677

0.527

New Hampshire

$3,930

Kentucky

0.597

0.676

0.607

0.515

Louisiana

0.593

0.673

0.619

0.483

New Mexico

$3,810

Maine

0.675

0.790

0.664

0.611

Maryland

0.748

0.763

0.757

0.718

Massachusetts 0.766

0.841

0.759

0.719

Michigan

0.601

0.582

0.460

North Carolina

$3,193

0.549

399 https://doi.org/10.1017/9781009212021.015 Published online by Cambridge University Press

Data-Guided Healthcare Decision Making

Table 13.23 (cont.)

Location

All 6 months to children 4 years

5–12 years

13–17 years

Minnesota

0.664

0.749

0.699

0.528

Mississippi

0.519

0.678

0.519

0.411

Missouri

0.597

0.741

0.598

0.484

Montana

0.570

0.758

0.536

0.488

Nebraska

0.680

0.823

0.685

0.573

Nevada

0.520

0.669

0.530

0.398

New Hampshire

0.726

0.839

0.715

0.685

New Jersey

0.723

0.873

0.739

0.575

New Mexico

0.687

0.779

0.684

0.605

New York

0.696

0.788

0.704

0.596

North Carolina 0.644

0.778

0.630

0.558

North Dakota

0.690

0.779

0.718

0.572

Ohio

0.597

0.698

0.621

0.484

Oklahoma

0.595

0.714

0.606

0.459

Oregon

0.641

0.789

0.633

0.529

Pennsylvania

0.686

0.798

0.673

0.608

Rhode Island

0.783

0.853

0.820

0.692

South Carolina

0.621

0.685

0.636

0.546

South Dakota

0.703

0.759

0.736

0.619

Tennessee

0.637

0.729

0.681

0.502

Texas

0.628

0.772

0.621

0.530

Utah

0.600

0.748

0.594

0.515

Vermont

0.682

0.798

0.686

0.601

Virginia

0.703

0.819

0.722

0.591

Washington

0.669

0.771

0.688

0.570

Washington, DC

0.747

0.854

0.741

0.661

West Virginia

0.577

0.678

0.607

0.467

Wisconsin

0.646

0.770

0.676

0.510

Wyoming

0.590

0.767

0.577

0.435

32. Construct, compare, and interpret control charts for the averages in each group for the data presented in Table 13.24 (2020–2021) (Source: www.kff.org) 33. Construct, compare, and interpret control charts for the averages in each group for the data presented in Table 13.25 (2020–2021). 34. Construct, compare, and interpret control charts for the averages in each group for the data presented in Table 13.26 (2019). 35. Construct, compare, and interpret control charts for the averages in each group for the data presented in Table 13.27 (2020). 36. Construct, compare, and interpret control charts for the proportion in each group for the data presented in Table 13.28 (2019). 37. Construct, compare, and interpret control charts for the average in each group for the data presented in Table 13.29 (2019). 38. Construct, compare, and interpret control charts for the proportion in each group for the data presented in Table 13.30 (2019–2020). 39. Construct, compare, and interpret control charts for the proportion in each group for the data presented in Table 13.31 (2019–2020). 40. Construct, compare, and interpret control charts for the average in each group for the data presented in Table 13.32 (2019–2020). 41. Construct, compare, and interpret control charts for the average in each group for the data presented in Table 13.33 (2019). 42. Construct, compare, and interpret control charts for the rate in each group for the data presented in Table 13.34 (2019).

Table 13.24 Doses received per million

Recipient

Region

Income

Doses Received

Doses Received per Million

Afghanistan

South and Central Asia

Low income

3,312,050

85,080.69

Algeria

Middle East and North Africa

Lower middle income

604,800

13,792.15

1,182,870

35,990.40

17,550

17,9213.30 77,440.86

Angola

Sub-Saharan Africa

Lower middle income

Antigua & Barbuda

Western Hemisphere

High income

Argentina

Western Hemisphere

Upper middle income

3,500,000

Bangladesh

South and Central Asia

Lower middle income

6,503,920

39,492.04

Barbados

Western Hemisphere

High income

70,200

244,283.50

400 https://doi.org/10.1017/9781009212021.015 Published online by Cambridge University Press

Six Sigma and Lean Management in Healthcare Sectors

Table 13.24 (cont.)

Recipient

Region

Income

Belize

Western Hemisphere

Lower middle income

Doses Received 111,150

Doses Received per Million

Benin

Sub-Saharan Africa

Lower middle income

302,400

Bhutan

South and Central Asia

Lower middle income

500,000

647,994.10

Bolivia

Western Hemisphere

Lower middle income

1,008,000

86,352.91

Botswana

Sub-Saharan Africa

Upper middle income

81,000

34,444.27

Brazil

Western Hemisphere

Upper middle income

3,000,000

14,113.70

Burkina Faso

Sub-Saharan Africa

Low income

302,400

14,466.63

Cambodia

East Asia and the Pacific

Lower middle income

1,061,100

63,466.82

Cameroon

Sub-Saharan Africa

Lower middle income

Canada

Western Hemisphere

High income

Central African Republic

Sub-Saharan Africa

Low income

Colombia

Western Hemisphere

Upper middle income

279,537.50 24,943.910

303,400

11,429.28

1,000,000

26,495.57

302,400

62,611.75

6,000,000

117,917.80 54,801.55

Congo – Brazzaville

Sub-Saharan Africa

Lower middle income

302,400

Congo – Kinshasa

Sub-Saharan Africa

Low income

250,320

2,794.95

Costa Rica

Western Hemisphere

Upper middle income

500,000

98,152.50

Cote d’Ivoire

Sub-Saharan Africa

Lower middle income

1,556,230

58,996.66

Djibouti

Sub-Saharan Africa

Lower middle income

151,200

153,036.10

Ecuador

Western Hemisphere

Upper middle income

1,000,000

56,679.51

El Salvador

Western Hemisphere

Lower middle income

3,188,370

491,562.00

Eswatini

Sub-Saharan Africa

Lower middle income

302,000

260,308.00

Ethiopia

Sub-Saharan Africa

Low income

1,664,150

14,475.45

Fiji

East Asia and the Pacific

Upper middle income

150,080

167,417.00

Gambia

Sub-Saharan Africa

Low income

302,400

125,131.20

Georgia

Europe and Eurasia

Upper middle income

500,000

125,339.20

Ghana

Sub-Saharan Africa

Lower middle income

1,229,620

39,572.05

Grenada

Western Hemisphere

Upper middle income

29,250

259,956.10

Guatemala

Western Hemisphere

Upper middle income

4,500,000

251,178.20

Guinea-Bissau

Sub-Saharan Africa

Low income

302,400

153,658.70

Guyana

Western Hemisphere

Upper middle income

146,250

185,936.50

Haiti

Western Hemisphere

Lower middle income

500,000

43,849.91

Honduras

Western Hemisphere

Lower middle income

3,106,470

313,638.90

Indonesia

East Asia and the Pacific

Lower middle income

4,500,160

16,452.55

Iraq

Middle East and North Africa

Upper middle income

500,000

12,430.85

Jamaica

Western Hemisphere

Upper middle income

208,260

70,330.52

Jordan

Middle East and North Africa

Upper middle income

500,000

49,004.52

Kenya

Sub-Saharan Africa

Lower middle income

2,556,380

47,541.72

Kosovo

Europe and Eurasia

Upper middle income

538,200

278,102.30

Laos

East Asia and the Pacific

Lower middle income

1,080,000

148,442.30

Lesotho

Sub-Saharan Africa

Lower middle income

302,400

141,159.90

Liberia

Sub-Saharan Africa

Low income

302,400

59,790.30

Madagascar

Sub-Saharan Africa

Low income

302,750

10,933.15

Malawi

Sub-Saharan Africa

Low income

304,350

15,909.60

Malaysia

East Asia and the Pacific

Upper middle income

1,000,000

30,896.62

401 https://doi.org/10.1017/9781009212021.015 Published online by Cambridge University Press

Data-Guided Healthcare Decision Making

Table 13.24 (cont.)

Recipient

Region

Income

Maldives

South and Central Asia

Upper middle income

Doses Received 128,700

Doses Received per Million 238,094.40

Mali

Sub-Saharan Africa

Low income

151,200

7,466.36

Mauritania

Sub-Saharan Africa

Lower middle income

302,400

65,037.01

Mexico

Western Hemisphere

Upper middle income

1,300,000

10,082.78

Moldova

Europe and Eurasia

Upper middle income

301,000

74,616.45 57,459.80

Mongolia

East Asia and the Pacific

Lower middle income

188,370

Morocco

Middle East and North Africa

Lower middle income

302,400

8,192.778

Mozambique

Sub-Saharan Africa

Low income

302,400

9,675.12

Nepal

South and Central Asia

Lower middle income

Niger

Sub-Saharan Africa

Low income

1,534,850

52,677.36

302,400

12,492.44

Nigeria

Sub-Saharan Africa

Lower middle income

4,000,080

19,404.72

Pakistan

South and Central Asia

Lower middle income

15,800,060

71,528.33

Panama

Western Hemisphere

Upper middle income

500,000

115,881.10

Papua New Guinea

East Asia and the Pacific

Lower middle income

302,400

33,798.94

Paraguay

Western Hemisphere

Upper middle income

2,000,000

280,405.40

Peru

Western Hemisphere

Upper middle income

2,000,000

60,657.81

Philippines

East Asia and the Pacific

Lower middle income

6,429,220

58,670.89

Rwanda

Sub-Saharan Africa

Low income

489,060

37,758.81 18,060.28

Senegal

Sub-Saharan Africa

Lower middle income

302,400

Sierra Leone

Sub-Saharan Africa

Low income

113,490

14,227.18

Somalia

Sub-Saharan Africa

Low income

302,400

19,026.98 95,440.65

South Africa

Sub-Saharan Africa

Upper middle income

5,660,460

South Korea

East Asia and the Pacific

High income

1,400,000

27,306.85

South Sudan

Sub-Saharan Africa

Low income

152,950

13,663.90

Sri Lanka

South and Central Asia

Lower middle income

1,600,100

74,724.76

St. Kitts & Nevis

Western Hemisphere

High income

11,700

219,957.90

St. Lucia

Western Hemisphere

Upper middle income

52,650

286,719.40

St. Vincent & Grenadines

Western Hemisphere

Upper middle income

35,100

316,367.30

Sudan

Sub-Saharan Africa

Low income

606,700

13,836.03

Suriname

Western Hemisphere

Upper middle income

140,400

239,331.50

Taiwan

East Asia and the Pacific

High income

2,500,000

104,968.00

Tajikistan

South and Central Asia

Lower middle income

1,500,100

157,282.10

Tanzania

Sub-Saharan Africa

Lower middle income

1,058,400

17,718.49

1,500,000

21,489.98

305,370

36,886.06

1,687,960

142,822.10

117,000

2,557.88

Thailand

East Asia and the Pacific

Upper middle income

Togo

Sub-Saharan Africa

Low income

Tunisia

Middle East and North Africa

Lower middle income

Uganda

Sub-Saharan Africa

Low income

Ukraine

Europe and Eurasia

Lower middle income

Uruguay

Western Hemisphere

High income

2,188,000

50,030.00

500,000

143,937.60

Uzbekistan

South and Central Asia

Lower middle income

4,214,460

125,920.60

5,000,100

51,368.12

Vietnam

East Asia and the Pacific

Lower middle income

Yemen

Middle East and North Africa

Low income

151,200

5,069.41

Zambia

Sub-Saharan Africa

Lower middle income

302,400

16,449.13

402 https://doi.org/10.1017/9781009212021.015 Published online by Cambridge University Press

Six Sigma and Lean Management in Healthcare Sectors

Table 13.25 Daily COVID-19 cases and deaths (Source: www.kff.org)

Location

Daily Cases Per Million Population

Daily Deaths (7-Day Rolling Average)

Daily Deaths Per Million Population

2,864

582

135

27

917

NA

3

4

Arizona

2,443

329

52

7

Arkansas

1,275

421

22

7

California

8,323

211

102

3

Colorado

1,757

303

15

3

Connecticut

638

179

5

2

Delaware

467

473

4

4

Florida

8,434

388

328

15

Georgia

4,816

450

125

12

444

316

8

6

Idaho

1,223

669

17

9

Illinois

3,155

251

36

3

Indiana

3,198

473

42

6

Iowa

1,729

547

12

4

Kansas

1,173

403

17

6

Kentucky

3,646

814

37

8

Louisiana

1,438

309

48

10

Alabama Alaska

Hawaii

Maine

Daily Cases (7-Day Rolling Average)

466

345

4

3

Maryland

1,214

201

16

3

Massachusetts

1,733

251

15

2

Michigan

3,230

324

30

3

Minnesota

2,192

387

11

2

Mississippi

1,472

496

33

11

Missouri

1,917

312

40

6

Montana

930

860

9

9

Nebraska

596

308

2

1

Nevada

956

305

23

7

New Hampshire

401

293

2

2

2,185

246

20

2

588

279

10

5

New York

5,054

261

37

2

North Carolina

5,764

544

70

7

470

615

2

2

Ohio

6,523

558

18

2

Oklahoma

1,887

474

39

10

Oregon

1,592

375

16

4

Pennsylvania

4,672

366

37

3

Rhode Island

511

484

1

1

3,404

652

61

12

417

467

3

3

4,601

668

65

9

New Jersey New Mexico

North Dakota

South Carolina South Dakota Tennessee

403 https://doi.org/10.1017/9781009212021.015 Published online by Cambridge University Press

Data-Guided Healthcare Decision Making

Table 13.25 (cont.)

Location

Daily Cases (7-Day Rolling Average)

Daily Cases Per Million Population

Daily Deaths (7-Day Rolling Average)

Texas

12,322

420

293

10

Utah

1,399

430

12

4

207

331

1

2

Virginia

3,428

399

37

4

Washington

2,928

381

41

5

196

274

1

1

West Virginia

1,695

950

26

14

Wisconsin

3,179

545

13

2

Wyoming

530

910

5

9

Vermont

Washington, DC

American Samoa Guam Northern Mariana Islands Puerto Rico Virgin Islands

Daily Deaths Per Million Population

0

N/A

0

N/A

136

N/A

1

N/A

1

N/A

0

N/A

239

76

8

2

31

N/A

0

N/A

Table 13.26 Deaths caused by influenza and pneumonia (Source: www.kff.org)

Table 13.26 (cont.)

Location

Influenza and Pneumonia Deaths

Influenza Deaths

Pneumonia Deaths

Location

Influenza and Pneumonia Deaths

Influenza Deaths

Pneumonia Deaths

Alabama

16.6

1.2

15.3

Maryland

Alaska Arizona

7.1 10.3

NSD 1.0

11.4

1.2

10.2

4.7

Massachusetts 13.2

1.5

11.7

9.3

Michigan

12.8

1.5

11.3

7.5

1.2

6.3

Arkansas

16.6

2.6

14.0

Minnesota

California

12.5

1.3

11.3

Mississippi

22.6

1.1

21.5

6.0

Missouri

13.1

1.4

11.6

10.5

2.0

8.5

Colorado

7.8

1.8

Connecticut

11.0

1.7

9.4

Montana

Delaware

10.5

NSD

9.0

Nebraska

14.5

2.7

11.8

7.3

Nevada

13.3

1.3

12.0

New Hampshire

10.3

1.9

8.4

New Jersey

10.9

1.2

9.7

New Mexico

13.2

2.2

11.1

New York

15.1

Florida

8.2

0.9

Georgia

11.8

1.0

10.8

Hawaii

16.8

1.8

15.0

Idaho

11.4

2.8

8.7

Illinois

13.4

1.0

12.4

Indiana

11.6

1.9

9.7

Iowa

13.2

2.1

11.0

Kansas

14.0

2.1

11.8

Kentucky

15.7

2.3

13.4

Louisiana

12.1

1.1

11.0

Maine

14.7

2.2

12.5

404 https://doi.org/10.1017/9781009212021.015 Published online by Cambridge University Press

16.5

1.4

North Carolina 13.8

1.8

12.0

North Dakota

14.3

NSD

12.8

Ohio

12.6

1.5

11.2

Oklahoma

13.7

1.8

11.9

9.0

2.9

6.1

13.4

1.6

11.9

Oregon Pennsylvania

Six Sigma and Lean Management in Healthcare Sectors

Table 13.26 (cont.)

Location

Influenza and Pneumonia Deaths

Influenza Deaths

Pneumonia Deaths

Rhode Island

11.7

2.1

9.6

South Carolina

10.8

1.6

9.3

South Dakota

15.9

2.6

13.2

Tennessee

16.2

2.0

14.2

Texas

11.3

1.4

9.9

Utah

11.2

1.9

9.2

Vermont

5.8

3.1

2.7

Virginia

11.0

1.5

9.5

Washington

10.0

2.7

7.3

Washington, DC

10.8

N/A

9.8

West Virginia

16.1

2.0

14.1

Wisconsin

10.0

1.4

8.5

Wyoming

14.6

NSD

11.8

43. Construct, compare, and interpret control charts for the average in each group for the data presented in Table 13.35 (2018). 44. Construct, compare, and interpret control charts for the average in each group for the data presented in Table 13.36 (2020). 45. Construct, compare, and interpret control charts for the average in each group for the data presented in Table 13.37 (2020). 46. Construct, compare, and interpret control charts for the proportion in each group for the data presented in Table 13.38 (July 2020–June 2021). 47. Construct, compare, and interpret control charts for the proportion in each group for the data presented in Table 13.39 (2019). 48. Construct, compare, and interpret control charts for the proportion in each group for the data presented in Table 13.40 (2019).

Table 13.27 Community health center delivery sites and patient visits (Source: www.kff.org)

Location Alabama

Total Community Health Centers 17

Service Delivery Sites 167

Clinical Visits

Virtual Visits

816,696

107,191 96,936

Alaska

27

201

448,175

Arizona

23

205

2,161,084

973,631

Arkansas

12

168

901,358

74,411

California

175

2,019

16,709,746

7,726,902

Colorado

19

235

1,996,085

636,632

Connecticut

16

349

1,048,929

1,012,904

3

15

89,659

59,377

Florida

Delaware

47

642

4,687,276

748,157

Georgia

35

323

1,728,299

171,371

Hawaii

14

83

530,725

125,613

Idaho

14

149

743,768

110,296

Illinois

45

447

3,630,847

1,373,889

Indiana

27

245

1,517,300

331,077

Iowa

14

98

678,272

110,639

Kansas

19

122

722,475

71,657

Kentucky

25

421

1,740,750

296,459

Louisiana

36

360

1,194,398

426,846

Maine

18

166

675,357

130,159

Maryland

17

130

972,622

364,513

Massachusetts

37

265

2,090,451

1,586,298

405 https://doi.org/10.1017/9781009212021.015 Published online by Cambridge University Press

Data-Guided Healthcare Decision Making

Table 13.27 (cont.)

Location

Total Community Health Centers

Service Delivery Sites

Clinical Visits

Virtual Visits

Michigan

39

371

1,732,285

558,150

Minnesota

16

103

417,046

131,409

Mississippi

20

250

822,121

85,351

Missouri

28

339

1,718,794

382,734

Montana

14

104

347,212

79,949

Nebraska

7

71

282,735

54,008

8

51

242,661

119,162

New Hampshire

Nevada

10

54

279,376

108,688

New Jersey

23

136

1,495,960

277,310

New Mexico

16

230

962,413

568,393

New York

63

828

6,772,412

2,416,924

North Carolina

39

356

1,734,665

342,715

104,209

14,005

4

25

Ohio

North Dakota

51

377

2,288,624

984,202

Oklahoma

21

125

837,200

89,290

Oregon

30

238

1

10

Pennsylvania

42

355

Rhode Island

8

57

South Carolina

23

226

South Dakota

4

47

Tennessee

29

Texas Utah Vermont

Palau

1,055,011

560,354

N/A

N/A

2,014,308

798,671

433,021

381,313

1,542,360

115,817

220,642

10,512

232

1,252,699

248,906

72

608

4,548,282

1,109,642

13

60

455,168

54,420

11

84

503,850

161,078

Virginia

26

179

1,050,016

195,870

Washington

27

391

3,272,295

929,685

8

81

555,200

350,842

Washington, DC West Virginia

28

397

1,352,236

313,962

Wisconsin

16

184

854,553

158,894

Wyoming

6

16

86,691

13,751

American Samoa

1

6

Federated States of Micronesia

4

14

Guam

1

2

N/A

N/A

63,330

0

N/A

N/A

Marshall Islands

1

5

N/A

N/A

Northern Mariana Islands

1

2

N/A

N/A

22

125

1,126,937

372,940

2

6

N/A

N/A

Puerto Rico US Virgin Islands

406 https://doi.org/10.1017/9781009212021.015 Published online by Cambridge University Press

Six Sigma and Lean Management in Healthcare Sectors

Table 13.28 Distribution of the nonelderly uninsured by federal poverty level (FPL) (Source: www.kff.org)

Location

Under 100%

100– 199%

200– 399%

400%+

Alabama

0.299

0.292

0.283

0.126

Alaska

0.108

0.216

0.344

0.332

Arizona

0.191

0.285

0.369

0.154

Arkansas

0.230

0.346

0.335

0.088

California

0.175

0.259

0.367

0.199

Colorado

0.149

0.243

0.386

0.222

Connecticut

0.154

0.245

0.348

0.254

Delaware

0.193

0.258

0.332

Table 13.28 (cont.)

Location

Under 100%

100– 199%

200– 399%

400%+

South Dakota

0.257

0.301

0.302

0.140

Tennessee

0.265

0.298

0.304

0.133

Texas

0.226

0.299

0.324

0.151

Utah

0.224

0.257

0.357

0.162

Vermont

0.154

0.195

0.416

0.235

Virginia

0.178

0.265

0.358

0.199

Washington

0.163

0.243

0.392

0.202

0.218

0.272

0.292

0.217

0.217

Washington, DC

Florida

0.211

0.277

0.350

0.162

West Virginia

0.198

0.304

0.338

0.161

Georgia

0.252

0.291

0.319

0.138

Wisconsin

0.162

0.300

0.362

0.176

Hawaii

0.199

0.173

0.300

0.328

Wyoming

0.137

0.297

0.366

0.201

Puerto Rico

0.425

0.367

0.166

0.042

Idaho

0.204

0.326

0.289

0.182

Illinois

0.192

0.276

0.326

0.206

Indiana

0.213

0.281

0.350

0.156

Iowa

0.198

0.278

0.351

0.173

Kansas

0.249

0.309

0.292

0.149

Kentucky

0.229

0.287

0.323

0.160

Louisiana

0.244

0.260

0.334

0.163

Maine

0.185

0.296

0.339

0.181

Maryland

0.170

Massachusetts 0.169

0.236 0.196

0.326 0.331

Location

State/Local Government

Nonprofit ForProfit

Alabama

234

116

158

Alaska

30

324

97

Arizona

23

234

74

Arkansas

39

397

79

California

53

234

44

Colorado

60

252

53

Connecticut

10

454

13

Delaware

N/A

491

N/A

Florida

69

267

138

Georgia

28

365

43

Hawaii

21

330

N/A

Idaho

62

236

87

0.239

Illinois

28

391

25

0.267 0.304

Michigan

0.205

0.277

0.351

0.168

Minnesota

0.171

0.224

0.398

0.207

Mississippi

0.316

0.289

0.281

0.114

Missouri

0.272

0.314

0.288

0.126

Montana

0.146

0.302

0.313

0.239

Nebraska

0.264

0.306

0.302

0.128

Nevada

0.208

0.270

0.355

0.167

New Hampshire

0.134

0.224

0.343

0.300

New Jersey

0.172

0.235

0.354

Table 13.29 Hospital emergency room visits per 1,000 population by ownership type (Source: www.kff.org)

New Mexico

0.241

0.254

0.347

0.158

Indiana

88

359

55

New York

0.184

0.200

0.346

0.271

Iowa

113

298

6

North Carolina 0.243

0.291

0.328

0.138

Kansas

81

232

98

North Dakota

0.193

0.292

0.242

0.274

Kentucky

56

470

77

Ohio

0.195

0.280

0.365

0.160

Louisiana

174

355

75

Oklahoma

0.295

0.267

0.299

0.139

Maine

18

518

N/A

Oregon

0.176

0.272

0.340

0.212

Maryland

N/A

365

N/A

Pennsylvania

0.221

0.252

0.320

0.207

Massachusetts 18

372

101

Rhode Island

0.161

0.267

0.350

0.222

Michigan

14

441

46

South Carolina

0.257

0.289

0.305

0.150

Minnesota

50

310

N/A

Mississippi

240

193

141

407 https://doi.org/10.1017/9781009212021.015 Published online by Cambridge University Press

Data-Guided Healthcare Decision Making

Table 13.29 (cont.)

Table 13.30 (cont.)

Location

State/Local Government

Nonprofit ForProfit

Missouri

68

358

51

Montana

5

391

28

Nebraska

42

415

20

Nevada

42

110

158

New Hampshire

N/A

389

134

Location

All Adults Ages 18–64

Adults Ages 18–64 At High Risk

Adults Ages 18– 64 Not at High Risk

Connecticut

0.498

0.611

0.464

Delaware

0.460

0.565

0.422

Florida

0.328

0.375

0.320

Georgia

0.364

0.493

0.325

0.413

0.501

0.390

6

378

24

Hawaii

New Mexico

67

227

125

Idaho

0.357

0.447

0.334

New York

85

366

N/A

Illinois

0.444

0.550

0.417

24

Indiana

0.418

0.507

0.390

New Jersey

North Carolina 147

253

North Dakota

N/A

418

N/A

Iowa

0.484

0.577

0.457

Ohio

41

520

29

Kansas

0.445

0.541

0.417

Oklahoma

99

280

112

Kentucky

0.440

0.477

0.428

Oregon

26

329

18

Louisiana

0.382

0.483

0.346

Pennsylvania

N/A

443

40

Maine

0.463

0.546

0.434

Rhode Island

N/A

499

N/A

Maryland

0.474

0.579

0.445

South Carolina

138

220

109

Massachusetts

0.509

0.581

0.492

Michigan

0.402

0.453

0.386

South Dakota

21

358

27

Minnesota

0.482

0.567

0.462

Tennessee

96

250

151

Mississippi

0.375

0.505

0.336

Texas

63

203

156

Missouri

0.410

0.544

0.367

Utah

23

158

88

Montana

0.416

0.529

0.383

Vermont

N/A

509

N/A

Nebraska

0.499

0.565

0.484

Virginia

22

314

89

Nevada

0.348

0.465

0.314

Washington

81

267

35

0.485

0.580

0.454

Washington, DC

N/A

475

117

New Hampshire New Jersey

0.393

0.460

0.376

West Virginia

48

470

65

New Mexico

0.417

0.555

0.372

Wisconsin

N/A

383

7

New York

0.426

0.520

0.401

Wyoming

266

79

41

North Carolina

0.462

0.532

0.443

North Dakota

0.473

0.577

0.446

Ohio

0.423

0.520

0.390

Oklahoma

0.463

0.538

0.438

Oregon

0.420

0.531

0.390

Pennsylvania

0.469

0.589

0.436

Table 13.30 Flu vaccination rates for high-risk adults (Source: www .kff.org)

Location

All Adults Ages 18–64

Adults Ages 18–64 At High Risk

Adults Ages 18– 64 Not at High Risk

Alabama

0.390

0.477

0.360

Alaska

0.395

0.428

0.388

Arizona Arkansas

0.359 0.460

0.424 0.516

0.343 0.437

California

0.426

0.535

0.401

Colorado

0.456

0.573

0.430

408 https://doi.org/10.1017/9781009212021.015 Published online by Cambridge University Press

Rhode Island

0.518

0.575

0.503

South Carolina

0.401

0.473

0.375

South Dakota

0.497

0.565

0.480

Tennessee

0.397

0.456

0.381

Texas

0.371

0.475

0.349

Utah

0.445

0.551

0.422

Vermont

0.486

0.606

0.448

Six Sigma and Lean Management in Healthcare Sectors

49. Construct, compare, and interpret control charts for the proportion in each group for the data presented in Table 13.41 (2019).

Table 13.30 (cont.)

Location

All Adults Ages 18–64

Adults Ages 18–64 At High Risk

Adults Ages 18– 64 Not at High Risk

Virginia

0.511

0.563

0.496

Washington

0.474

0.558

0.453

Washington, DC

0.455

0.550

0.438

West Virginia

0.436

0.502

0.405

Wisconsin

0.504

0.579

0.483

Wyoming

0.379

0.462

0.358

50. Construct, compare, and interpret control charts for the proportion in each group for the data presented in Table 13.42 (2019). 51. Construct, compare, and interpret control charts for the proportion in each group for the data presented in Table 13.43 (2019). 52. Construct, compare, and interpret control charts for the proportion in each group for the data presented in Table 13.44 (2019).

Table 13.31 Health insurance coverage of the total population, multiple sources of coverage (Source: www.kff.org)

Location

Employer Only

Non-group Only

Medicaid Only

Medicare Only

Military

Uninsured

Alabama

0.472

0.055

0.135

0.067

0.021

0.097

Alaska

0.484

0.035

0.151

0.038

0.053

0.115

Arizona

0.451

0.052

0.159

0.078

0.015

0.111

Arkansas

0.420

0.054

0.199

0.078

0.014

0.091

California

0.480

0.067

0.195

0.056

0.009

0.078 0.078

Colorado

0.534

0.069

0.128

0.059

0.023

Connecticut

0.529

0.048

0.159

0.059

0.007

0.059

Delaware

0.497

0.041

0.144

0.057

0.018

0.066

Florida

0.403

0.095

0.118

0.092

0.017

0.131

Georgia

0.489

0.056

0.123

0.059

0.022

0.134

Hawaii

0.543

0.041

0.125

0.049

0.040

0.041

Idaho

0.490

0.086

0.109

0.061

0.014

0.105

Illinois

0.546

0.052

0.144

0.060

0.007

0.073

Indiana

0.533

0.044

0.131

0.064

0.010

0.088

Iowa

0.544

0.049

0.144

0.057

0.009

0.047

Kansas

0.543

0.056

0.096

0.054

0.020

0.092

Kentucky

0.470

0.039

0.192

0.070

0.014

0.064

Louisiana

0.418

0.049

0.229

0.071

0.014

0.089

Maine

0.465

0.057

0.118

0.077

0.015

0.081

Maryland

0.547

0.054

0.140

0.043

0.019

0.059

Massachusetts

0.559

0.054

0.152

0.046

0.005

0.030

Michigan

0.509

0.052

0.156

0.048

0.006

0.058

Minnesota

0.578

0.052

0.127

0.046

0.007

0.048

Mississippi

0.422

0.047

0.173

0.070

0.018

0.129

Missouri

0.520

0.057

0.103

0.074

0.013

0.101

Montana

0.430

0.080

0.154

0.078

0.018

0.083

Nebraska

0.568

0.069

0.083

0.059

0.016

0.079

Nevada

0.495

0.055

0.136

0.076

0.017

0.115

New Hampshire

0.562

0.053

0.100

0.066

0.012

0.064

409 https://doi.org/10.1017/9781009212021.015 Published online by Cambridge University Press

Data-Guided Healthcare Decision Making

Table 13.31 (cont.)

Location

Employer Only

Non-group Only

Medicaid Only

Medicare Only

Military

Uninsured 0.079

New Jersey

0.557

0.054

0.124

0.059

0.005

New Mexico

0.366

0.040

0.247

0.070

0.018

0.098

New York

0.498

0.058

0.186

0.053

0.004

0.053

North Carolina

0.463

0.067

0.130

0.065

0.024

0.114

North Dakota

0.557

0.090

0.078

0.043

0.021

0.074

Ohio

0.526

0.040

0.154

0.067

0.008

0.067

Oklahoma

0.455

0.055

0.126

0.066

0.020

0.149

Oregon

0.493

0.057

0.152

0.069

0.009

0.071

Pennsylvania

0.518

0.051

0.142

0.058

0.008

0.057

Rhode Island

0.540

0.059

0.128

0.064

0.008

0.043

South Carolina

0.454

0.061

0.134

0.069

0.022

0.108

South Dakota

0.515

0.083

0.089

0.071

0.018

0.096

Tennessee

0.478

0.056

0.142

0.069

0.018

0.102

Texas

0.476

0.057

0.120

0.056

0.016

0.184

Utah

0.605

0.093

0.061

0.043

0.012

0.096

Vermont

0.484

0.048

0.170

0.063

0.010

0.044

Virginia

0.541

0.052

0.101

0.055

0.044

0.080

Washington

0.529

0.050

0.149

0.054

0.018

0.066

Washington, DC

0.549

0.065

0.186

0.027

0.013

0.036

West Virginia

0.440

0.025

0.195

0.073

0.013

0.066

Wisconsin

0.565

0.053

0.110

0.060

0.008

0.058

Wyoming

0.511

0.071

0.081

0.065

0.018

0.123

Puerto Rico

0.236

0.079

0.337

0.098

0.005

0.078

Table 13.32 Hospital admissions per 1,000 population by ownership type (Source: www.kff.org)

Location

State/Local Government

Nonprofit ForProfit

Alabama

59

34

37

Alaska

4

49

16

Arizona

3

68

19

Arkansas

10

89

21

California

13

58

14

Colorado

11

51

16

Connecticut

3

99

3

Delaware

N/A

105

1

Florida

16

65

42

Georgia

4

85

9

Hawaii

7

73

N/A

410 https://doi.org/10.1017/9781009212021.015 Published online by Cambridge University Press

Table 13.32 (cont.)

Location

State/Local Government

Nonprofit ForProfit

Idaho

13

49

16

Illinois

4

97

4

Indiana

13

83

14

Iowa

24

73

1

Kansas

24

53

31

Kentucky

15

97

13

Louisiana

25

78

20

Maine

2

92

1

Maryland

N/A

90

0

Massachusetts 2

95

16

Michigan

102

10

2

Six Sigma and Lean Management in Healthcare Sectors

Table 13.32 (cont.)

Location

Table 13.33 (cont.)

State/Local Government

Nonprofit ForProfit

Location

Total Hospital Beds

Beds per 1,000 Population

Minnesota

8

90

N/A

Mississippi

49

47

20

Connecticut

7,246

2.03

Delaware

2,112

Missouri

15

97

2.17

14

Florida

55,733

2.59

Montana

0

Nebraska

5

86

7

Georgia

24,896

2.34

97

3

Hawaii

2,731

Nevada

1.93

8

27

56

Idaho

3,468

1.94

New Hampshire

N/A

76

13

Illinois

31,262

2.47

Indiana

18,237

2.71

New Jersey

2

111

7

Iowa

9,459

3.00

New Mexico

17

45

32

Kansas

9,544

3.28

New York

17

98

N/A

Kentucky

14,151

3.17

North Carolina 30

64

4

Louisiana

15,226

3.28

North Dakota

N/A

120

2

Maine

3,482

2.59

Ohio

10

107

5

Maryland

10,894

1.80

Oklahoma

17

64

28

Massachusetts 15,561

2.26

Oregon

9

70

3

Michigan

25,550

2.56

Pennsylvania

0

112

8

Minnesota

13,901

2.46

Rhode Island

N/A

110

N/A

Mississippi

10,883

3.66

South Carolina

31

50

24

Missouri

18,455

3.01

South Dakota

2

112

9

Montana

3,614

3.38

Nebraska

6,038

3.12

Nevada

6,309

2.05

New Hampshire

2,804

2.06

New Jersey

20,355

2.29

New Mexico

3,765

1.80

New York

52,084

2.68

North Carolina 21,673

2.07

North Dakota

3,323

4.36

Ohio

33,303

2.85

Oklahoma

11,305

2.86

Oregon

6,975

1.65

Pennsylvania

35,448

2.77

Table 13.33 Total hospital beds (Source: www.kff.org)

Rhode Island

2,191

2.07

Location

Total Hospital Beds

Beds per 1,000 Population

South Carolina

12,055

2.34

South Dakota

4,222

4.77

Alabama

15,248

3.11

Tennessee

18,366

2.69

Alaska

1,606

2.20

Texas

65,187

2.25

Arizona

14,038

1.93

Utah

5,664

1.77

Arkansas

9,145

3.03

Vermont

1,275

2.04

California

73,109

1.85

Virginia

18,143

2.13

Colorado

10,665

1.85

Washington

12,613

1.66

Tennessee

21

67

30

Texas

13

44

40

Utah

10

45

22

Vermont

N/A

82

N/A

Virginia

9

69

18

Washington

14

58

6

Washington, DC

N/A

141

35

West Virginia

6

112

16

Wisconsin Wyoming

0 46

84 18

3 8

411 https://doi.org/10.1017/9781009212021.015 Published online by Cambridge University Press

Data-Guided Healthcare Decision Making

Table 13.33 (cont.)

Table 13.34 (cont.)

Location

Total Hospital Beds

Beds per 1,000 Population

Location

State/Local Government

Nonprofit

ForProfit

Washington, DC

3,186

4.51

New York

0.46

2.22

N/A

North Carolina

0.63

1.31

0.13

West Virginia

6,385

3.56

North Dakota

N/A

4.27

0.09

Wisconsin

11,894

2.04

0.21

Wyoming

1,990

3.44

Table 13.34 Hospital beds per 1,000 population by ownership type (Source: www.kff.org)

Ohio

0.26

2.38

Oklahoma

0.54

1.55

0.77

Oregon

0.22

1.38

0.06

Pennsylvania

0.00

2.49

0.28

Rhode Island

N/A

2.07

N/A

South Carolina

0.76

1.06

0.53 0.23

Location

State/Local Government

Nonprofit

ForProfit

South Dakota

0.24

4.30

Alabama

1.42

0.81

0.87

Tennessee

0.49

1.44

0.76

Texas

0.32

0.97

0.96 0.55

Alaska

0.29

1.55

0.36

Arizona

0.11

1.36

0.47

Utah

0.26

0.95

Arkansas

0.26

2.14

0.63

Vermont

N/A

2.04

N/A

0.20

1.50

0.42

California

0.33

1.21

0.32

Virginia

Colorado

0.38

1.10

0.38

Washington

0.38

1.18

0.10

Connecticut

0.05

1.93

0.00

N/A

3.53

0.98

Delaware

N/A

2.13

0.04

Washington, DC

Florida

0.41

1.34

0.85

West Virginia

0.24

2.78

0.54

Wisconsin

0.00

1.96

0.08

Wyoming

2.54

0.54

0.36

Georgia Hawaii

0.21 0.21

1.90 1.72

0.24 N/A

Idaho

0.46

1.06

0.42

Illinois

0.12

2.22

0.13

Indiana

0.40

1.93

0.38

Iowa

0.90

2.04

0.06

Kansas

1.07

1.39

0.81

Kentucky

0.39

2.38

0.40

Louisiana

0.74

1.80

0.73

Maine

0.06

2.47

0.07

Maryland

N/A

1.79

0.01

Massachusetts

0.05

1.84

0.37

Michigan

0.07

2.23

0.26

Minnesota

0.34

2.13

N/A

Mississippi

1.64

1.23

0.79

Missouri

0.43

2.18

0.40

Montana

0.12

3.04

0.22

Nebraska

N/A

2.99

0.13

Nevada

0.21

0.58

1.26

New Hampshire

N/A

1.67

0.39

New Jersey

N/A

2.07

0.22

New Mexico

0.36

0.75

0.69

412 https://doi.org/10.1017/9781009212021.015 Published online by Cambridge University Press

Table 13.35 Intensive Care Unit (ICU) beds (Source: www.kff.org)

Location

ICU Beds

ICU Beds per 10,000 Population

Alabama

1,870

3.9

Alaska Arizona Arkansas

130

1.8

1,742

2.5

856

2.9

California

8,131

2.1

Colorado

1,770

3.2

731

2.1

Connecticut Delaware

249

2.7

Florida

6,226

3.0

Georgia

2,703

2.6

219

1.6

Hawaii Idaho

333

1.9

Illinois

3,426

2.8

Indiana

2,358

3.6

Iowa

622

2.0

Kansas

878

3.1

Six Sigma and Lean Management in Healthcare Sectors

Table 13.36 Total number of certified nursing facilities (Source: www .kff.org)

Table 13.35 (cont.)

Location

ICU Beds

ICU Beds per 10,000 Population

Location

Number of Nursing Facilities

Kentucky

1,447

3.3

Alabama

Louisiana

1,518

3.4

Alaska

20

288

2.2

Arizona

146

1,227

2.1

Arkansas

224

Maine Maryland

228

Massachusetts

1,555

2.3

California

1,186

Michigan

2,749

2.8

Colorado

224

2.3

Connecticut

210

Minnesota

1,277 931

3.2

Delaware

Missouri

2,092

3.5

Florida

704

Montana

248

2.4

Georgia

359

548

2.9

Hawaii

43

1,118

3.7

Idaho

82

Illinois

713

Mississippi

Nebraska Nevada

46

New Hampshire

252

1.9

Indiana

533

New Jersey

1,882

2.2

Iowa

431

460

2.2

Kansas

326

New York

4,420

2.3

Kentucky

284

North Carolina

3,168

3.2

Louisiana

277

278

3.8

Maine

Ohio

3,622

3.2

Maryland

226

Oklahoma

1,164

3.1

Massachusetts

372

837

2.0

Michigan

435

Pennsylvania

3,643

2.9

Minnesota

367

Rhode Island

279

2.8

Mississippi

204

1,459

3.0

Missouri

517

150

1.8

Montana

70

Tennessee

2,309

3.5

Nebraska

196

Texas

7,149

2.6

Nevada

66

Utah

687

2.2

New Hampshire

73

94

1.6

New Jersey

Virginia

2,007

2.5

New Mexico

Washington

1,493

2.0

New York

617

Washington, DC

401

6.0

North Carolina

428

West Virginia

643

3.7

Wisconsin

1,506

2.7

Wyoming

102

1.8

New Mexico

North Dakota

Oregon

South Carolina South Dakota

Vermont

53. Construct, compare, and interpret control charts for the proportion in each group for the data presented in Table 13.45 (2019). 54. Construct, compare, and interpret control charts for the proportion in each group for the data presented in Table 13.46 (2021).

North Dakota

93

362 71

80

Ohio

954

Oklahoma

298

Oregon

129

Pennsylvania

688

Rhode Island

79

South Carolina

189

South Dakota

104

Tennessee

314

413 https://doi.org/10.1017/9781009212021.015 Published online by Cambridge University Press

Data-Guided Healthcare Decision Making

Table 13.36 (cont.)

Table 13.37 (cont.)

Location

Number of Nursing Facilities

Location

Texas

1,211

New Mexico

Utah

97

New York

Vermont

35

North Carolina

Virginia

286

North Dakota

Washington

202

Ohio

71,767

Oklahoma

18,233

Washington, DC

17

West Virginia

123

Oregon

Wisconsin

352

Pennsylvania

Wyoming

36

Rhode Island South Carolina South Dakota

Table 13.37 Total number of residents in certified nursing facilities

Location Alabama Alaska

Number of Nursing Facility Residents 22,659 626

Number of Nursing Facility Residents 5,231 101,326 36,397 5,310

7,111 72,936 7,223 16,266 5,601

Tennessee

26,531

Texas

88,809

Utah

5,529

Vermont

2,428

Virginia

27,853

Arizona

11,515

Washington

14,975

Arkansas

16,242

Washington, DC

California

101,171

Colorado

16,283

Connecticut

21,732

Delaware

70,149

Georgia

33,069

Hawaii

3,584

Idaho

4,071

Illinois

63,601

Indiana

38,528

Iowa

22,602

Kansas

16,399

Kentucky

22,537

Louisiana

25,515 5,812

Maryland

23,788

Massachusetts

36,675

Michigan

36,749

Minnesota

23,618

Mississippi

16,063

Missouri

36,775

Montana

3,912

Nebraska

10,364

Nevada

5,567

New Hampshire

6,235

New Jersey

9,397

Wisconsin

21,495

Wyoming

2,294

4,094

Florida

Maine

West Virginia

2,134

41,396

414 https://doi.org/10.1017/9781009212021.015 Published online by Cambridge University Press

55. Construct, compare, and interpret control charts for the proportion in each group for the data presented in Table 13.47 (2021). 56. Construct, compare, and interpret control charts for the proportion in each group for the data presented in Table 13.48 (September 24, 2021).

Selected References Ahmed, A., Page, J., & Olsen, J. (2020). Enhancing Six Sigma methods methodology using simulation techniques: literature review and implications for future research. International Journal of Lean Six Sigma Methods, 11(1), 211–232. https://doi.org/10.11 08/IJLSS-03-2018-0033. Akmal, A., Podgorodnichenko, N., Foote, J. et al. (2021). Why is quality improvement so challenging? A viable systems model perspective to understand the frustrations of healthcare quality improvement managers. Health Policy, 125(5), 658–664. Alnadi, M., & McLaughlin, P. (2021). Critical success factors of lean Six Sigma methods from leaders’ perspective. International Journal of Lean Six Sigma Methods. 12, 1073–1088. https://doi .org/10.1108/IJLSS06-2020-0079. al-Zuheri, A., Vlachos, I., & Amer, Y. (2021). Application of lean Six Sigma methods to reduce patient waiting time: literature

Six Sigma and Lean Management in Healthcare Sectors

Table 13.38 Percent change in state tax revenue (Source: www.kff .org)

Table 13.38 (cont.)

Location

Personal Total Income Tax Revenue Tax

Corporate Income Tax

Sales Tax

Nebraska

0.21

0.28

0.46

0.09

Nevada

0.03

N/A

N/A

0.07

0.19

N/A

0.46

N/A

0.16

New Hampshire

0.32

0.13

New Jersey

0.17

0.06

0.26

0.16

1.66

0.07

New Mexico

0.01

0.04

−2.96

−0.03

0.88

0.07

New York

0.37

0.51

0.47

0.05

Location

Personal Total Income Tax Revenue Tax

Corporate Income Tax

Sales Tax

Alabama

0.22

0.24

1.57

0.14

Alaska

−0.18

N/A

−0.02

N/A

Arizona

0.30

0.44

0.66

Arkansas

0.16

0.19

California

0.53

0.73

Colorado

0.32

0.39

Connecticut

0.23

0.34

0.08

0.12

North Carolina 0.26

0.28

1.30

0.16

Delaware

0.19

0.28

0.46

N/A

North Dakota

−0.08

0.32

0.60

−0.12

Florida

0.16

N/A

0.37

0.10

Ohio

0.15

0.29

N/A

0.12

Georgia

0.14

0.15

0.42

0.13

Oklahoma

0.18

0.23

0.94

0.10

Hawaii

0.09

0.42

3.74

−0.11

Oregon

0.35

0.36

0.44

N/A

Idaho

0.23

0.28

0.43

0.20

Pennsylvania

0.23

0.28

0.65

0.19

Illinois

0.23

0.22

0.72

0.13

Rhode Island

0.19

0.31

0.57

0.13

Indiana

0.21

0.27

0.90

0.13

0.09

0.08

0.47

0.05

Iowa

0.18

0.18

0.69

0.12

South Carolina

Kansas

0.26

0.38

0.69

0.10

South Dakota

0.13

N/A

0.72

0.12

Kentucky

0.11

0.08

0.38

0.12

Tennessee

0.22

N/A

0.18

0.14

Texas

0.13

N/A

N/A

0.20

Utah

0.36

0.54

1.09

0.17

Vermont

0.22

0.61

0.19

0.17

Virginia

0.14

0.13

0.50

0.13

Washington

0.13

N/A

N/A

0.10

West Virginia

0.11

0.16

1.11

0.11

Wisconsin

0.15

0.17

0.57

0.08

Wyoming

NR

N/A

N/A

−0.05

Louisiana

0.16

0.15

1.79

0.09

Maine

0.14

0.13

0.32

0.16

Maryland

0.14

0.21

0.10

−0.02

Massachusetts 0.15

0.13

0.49

0.11

Michigan

0.26

0.33

1.10

0.20

Minnesota

0.16

0.17

0.52

0.07

Mississippi

0.17

0.23

0.54

0.12

Missouri

0.22

0.28

0.72

0.07

Montana

0.22

0.32

0.43

N/A

Table 13.39 Population distribution by age (Source: www.kff.org)

Location

Children 0–18

Adults 19–25

Adults 26–34

Adults 35–54

Adults 55–64

65+

Alabama

0.237

0.087

0.115

0.251

0.135

0.175

Alaska

0.259

0.093

0.139

0.246

0.132

0.130

Arizona

0.240

0.093

0.121

0.241

0.124

0.182

Arkansas

0.247

0.087

0.115

0.248

0.131

0.173

California

0.237

0.092

0.137

0.263

0.123

0.149

Colorado

0.231

0.089

0.141

0.265

0.126

0.148

Connecticut

0.217

0.086

0.113

0.259

0.148

0.177

Delaware

0.218

0.081

0.118

0.242

0.144

0.197

Florida

0.209

0.080

0.114

0.249

0.137

0.211

Georgia

0.253

0.089

0.122

0.267

0.125

0.145

415 https://doi.org/10.1017/9781009212021.015 Published online by Cambridge University Press

Data-Guided Healthcare Decision Making

Table 13.39 (cont.)

Location

Children 0–18

Adults 19–25

Adults 26–34

Adults 35–54

Adults 55–64

65+

Hawaii

0.228

0.072

0.116

0.255

0.132

0.197

Idaho

0.267

0.085

0.120

0.242

0.125

0.162

Illinois

0.237

0.087

0.126

0.258

0.132

0.160

Indiana

0.246

0.091

0.118

0.250

0.134

0.161

Iowa

0.244

0.089

0.114

0.243

0.136

0.174

Kansas

0.257

0.092

0.116

0.241

0.131

0.164

Kentucky

0.237

0.089

0.114

0.254

0.137

0.170

Louisiana

0.250

0.088

0.123

0.248

0.132

0.160

Maine

0.190

0.075

0.110

0.252

0.160

0.213

Maryland

0.233

0.081

0.123

0.265

0.138

0.160

Massachusetts

0.209

0.087

0.132

0.260

0.141

0.171

Michigan

0.227

0.091

0.118

0.246

0.142

0.177

Minnesota

0.243

0.082

0.125

0.252

0.137

0.162

Mississippi

0.254

0.089

0.110

0.251

0.131

0.165

Missouri

0.236

0.085

0.121

0.246

0.139

0.172

Montana

0.226

0.085

0.114

0.236

0.143

0.196

Nebraska

0.260

0.088

0.122

0.240

0.129

0.162

Nevada

0.235

0.083

0.131

0.262

0.126

0.163

New Hampshire

0.201

0.078

0.115

0.258

0.163

0.186

New Jersey

0.230

0.082

0.117

0.267

0.139

0.165

New Mexico

0.240

0.090

0.117

0.238

0.132

0.183

New York

0.217

0.086

0.134

0.257

0.136

0.169

North Carolina

0.235

0.086

0.118

0.260

0.133

0.169

North Dakota

0.245

0.102

0.137

0.232

0.128

0.156

Ohio

0.234

0.085

0.119

0.248

0.14

0.174

Oklahoma

0.256

0.091

0.123

0.242

0.127

0.160

Oregon

0.214

0.084

0.128

0.261

0.129

0.183

Pennsylvania

0.219

0.079

0.123

0.248

0.145

0.187

Rhode Island

0.207

0.088

0.127

0.253

0.148

0.178

South Carolina

0.230

0.082

0.116

0.249

0.138

0.184

South Dakota

0.257

0.082

0.115

0.235

0.136

0.174

Tennessee

0.235

0.086

0.124

0.254

0.134

0.167

Texas

0.272

0.094

0.130

0.261

0.114

0.129

Utah

0.308

0.109

0.132

0.242

0.095

0.115

Vermont

0.195

0.086

0.111

0.247

0.158

0.204

Virginia

0.234

0.084

0.120

0.265

0.135

0.163

Washington

0.230

0.084

0.138

0.260

0.129

0.160

Washington, DC

0.195

0.090

0.216

0.269

0.103

0.127

West Virginia

0.212

0.081

0.105

0.251

0.146

0.205

Wisconsin

0.230

0.085

0.116

0.251

0.143

0.175

Wyoming

0.251

0.083

0.117

0.242

0.135

0.172

Puerto Rico

0.196

0.099

0.105

0.253

0.135

0.213

416 https://doi.org/10.1017/9781009212021.015 Published online by Cambridge University Press

Six Sigma and Lean Management in Healthcare Sectors

Table 13.40 Adults at higher risk of serious illness if infected with coronavirus (Source: www.kff.org)

Table 13.40 (cont.)

Location

At-risk adults, as a share of all adults ages 18 and older

Share of adults under age 65 at risk

Older adults, as a share of all at-risk adults

South Carolina

0.414

0.240

0.554

0.510

0.198

0.494

South Dakota

0.353

0.174

0.615

0.391

0.208

0.591

Tennessee

0.416

0.260

0.506

0.435

0.277

0.503

Texas

0.348

0.214

0.488

0.300

0.170

0.522

Location

At-risk adults, as a share of all adults ages 18 and older

Share of adults under age 65 at risk

Older adults, as a share of all at-risk adults

Alabama

0.431

0.271

Alaska

0.328

Arizona Arkansas California

0.333

0.180

0.560

Utah

Colorado

0.313

0.157

0.591

Vermont

0.391

0.196

0.621

Connecticut

0.360

0.184

0.599

Virginia

0.359

0.194

0.569

Delaware

0.413

0.227

0.583

Washington

0.351

0.190

0.568

Florida

0.421

0.220

0.612

0.195

0.479

0.362

0.218

0.509

Washington, DC

0.318

Georgia Hawaii

0.391

0.194

0.625

West Virginia

0.493

0.323

0.511

Idaho

0.362

0.187

0.594

Wisconsin

0.365

0.190

0.593

Wyoming

0.364

0.185

0.604

Illinois

0.362

0.202

0.553

Indiana

0.399

0.244

0.514

Iowa

0.369

0.189

0.601

Kansas

0.380

0.217

0.547

Kentucky

0.436

0.283

0.488

Louisiana

0.421

0.273

0.484

Maine

0.425

0.231

0.594

Maryland

0.371

0.215

0.536

Alabama

0.070 0.056

Table 13.41 Adults who are severely obese (Source: www.kff.org)

Location

Share of Adults Reporting Severe Obesity (BMI of 40 or Higher)

Massachusetts 0.346

0.174

0.600

Alaska

Michigan

0.412

0.247

0.532

Arizona

0.048

Minnesota

0.339

0.170

0.600

Arkansas

0.070

0.496

California

0.036 0.029

Mississippi

0.425

0.272

Missouri

0.405

0.240

0.538

Colorado

Montana

0.390

0.194

0.623

Connecticut

0.041 0.064

Nebraska

0.366

0.200

0.567

Delaware

Nevada

0.361

0.196

0.569

Florida

0.040

Georgia

0.061

New Hampshire

0.405

0.233

0.553

Hawaii

0.037

New Jersey

0.346

0.176

0.596

Idaho

0.056

New Mexico

0.394

0.211

0.588

Illinois

0.052

New York

0.369

0.204

0.561

Indiana

0.066

North Carolina 0.390

0.227

0.542

Iowa

0.060

North Dakota

0.346

0.187

0.565

Kansas

0.060

Ohio

0.398

0.229

0.549

Kentucky

0.066

Oklahoma

0.408

0.253

0.509

Louisiana

0.071

Oregon

0.398

0.225

0.562

Maine

0.056

Pennsylvania

0.398

0.218

0.578

Maryland

0.050

Rhode Island

0.383

0.214

0.561

Massachusetts

0.035

417 https://doi.org/10.1017/9781009212021.015 Published online by Cambridge University Press

Data-Guided Healthcare Decision Making

Table 13.41 (cont.)

Table 13.42 (cont.)

Location

Total

Firms with Fewer than 50 Employees

Firms with 50 Employees or More

0.043

Arizona

0.676

N/A

0.741

0.076

Arkansas

0.542

N/A

0.619

0.051

California

0.425

0.13

0.486

Montana

0.044

Colorado

0.678

0.246

0.752

Nebraska

0.056

Connecticut

0.483

0.145

0.559

0.042

Delaware

0.552

0.238

0.611

New Hampshire

0.047

Florida

0.609

N/A

0.683

Georgia

0.584

N/A

0.645

New Jersey

N/A

Hawaii

0.289

0.258

0.302

New Mexico

0.050

Idaho

0.614

N/A

0.752

New York

0.045

Illinois

0.585

0.201

0.670

North Carolina

0.055

Indiana

0.696

N/A

0.774

North Dakota

0.049

Iowa

0.639

0.13

0.735

Ohio

0.065

Kansas

0.568

N/A

0.659

Oklahoma

0.074

Kentucky

0.635

N/A

0.701

Oregon

0.045

Louisiana

0.56

N/A

0.653

Pennsylvania

0.056

Maine

0.531

N/A

0.626

Rhode Island

0.051

Maryland

0.527

N/A

0.616

South Carolina

0.072

Massachusetts 0.565

0.168

0.637

South Dakota

0.048

Michigan

0.624

N/A

0.726

Tennessee

0.062

Minnesota

0.618

N/A

0.684

Texas

0.056

Mississippi

0.659

N/A

0.748

Utah

0.045

Missouri

0.597

0.176

0.683

Vermont

0.040

Virginia

0.052

Washington

0.052

Washington, DC

0.057

West Virginia

0.075

Wisconsin

0.058

Wyoming

0.049

Guam

0.075

Puerto Rico

0.048

Location

Share of Adults Reporting Severe Obesity (BMI of 40 or Higher)

Michigan

0.064

Minnesota Mississippi Missouri

Nevada

Table 13.42 Share of private-sector enrollees enrolled in self-insured plans (Source: www.kff.org)

Location

Alabama Alaska

Total

0.585 0.645

Montana

0.543

N/A

0.690

Nebraska

0.703

0.219

0.774

Nevada

0.564

N/A

0.647

New Hampshire

0.561

N/A

0.644

New Jersey

0.529

0.175

0.599

New Mexico

0.642

N/A

0.733

New York

0.575

0.096

0.678

North Carolina 0.614

N/A

0.696

North Dakota

0.609

0.132

0.726

Ohio

0.629

N/A

0.693

Oklahoma

0.576

N/A

0.675

Oregon

0.537

N/A

0.642

Pennsylvania

0.631

0.168

0.711

Firms with Fewer than 50 Employees

Firms with 50 Employees or More

Rhode Island

0.417

N/A

0.488

South Carolina

0.59

N/A

0.685

N/A

0.676

South Dakota

0.536

N/A

0.647

0.709

Tennessee

0.683

N/A

0.743

N/A

418 https://doi.org/10.1017/9781009212021.015 Published online by Cambridge University Press

Six Sigma and Lean Management in Healthcare Sectors

Table 13.42 (cont.)

Table 13.43 (cont.)

Location

Total

Firms with Fewer than 50 Employees

Firms with 50 Employees or More

Location

Yes

No Yes, PregnancyRelated

No, Prediabetes or Borderline Diabetes

Texas

0.66

0.1

0.749

Utah

0.624

N/A

0.708

Missouri

0.103

0.006

0.024

Vermont

0.648

Virginia

0.624

0.125

0.754

Montana

0.076

0.007

0.898

0.019

N/A

0.716

Nebraska

0.102

0.009

0.875

0.015

Washington

0.545

0.127

0.624

Nevada

0.109

0.007

0.849

0.034

0.092

0.008

0.887

0.013

0.867

Washington, DC

0.510

N/A

0.606

New Hampshire

West Virginia

0.680

N/A

0.787

New Mexico

0.123

0.006

0.846

0.025

0.105

0.011

0.868

0.016

Wisconsin

0.642

N/A

0.727

New York

Wyoming

0.693

0.309

0.835

North Carolina 0.118

0.009

0.846

0.026

North Dakota

0.089

0.013

0.882

0.016

Ohio

0.120

0.011

0.849

0.021

Oklahoma

0.122

0.010

0.847

0.021

Oregon

0.086

0.013

0.873

0.028

Pennsylvania

0.108

0.009

0.861

0.022

Rhode Island

0.104

0.012

0.869

0.015

South Carolina

0.134

0.016

0.832

0.018

South Dakota

0.106

0.017

0.867

0.010

Tennessee

0.138

0.010

0.837

0.015

Texas

0.122

0.009

0.853

0.015

Utah

0.080

0.014

0.880

0.027

Vermont

0.087

0.010

0.891

0.012

Virginia

0.109

0.010

0.858

0.022

Washington

0.094

0.012

0.878

0.017

Table 13.43 Adults who report being told by a doctor they have diabetes (Source: www.kff.org)

Location

Yes

Yes, No PregnancyRelated

No, Prediabetes or Borderline Diabetes

Alabama

0.140

0.010

0.839

0.012

Alaska

0.073

NSD

0.892

0.021

Arizona

0.109

0.010

0.858

0.023

Arkansas

0.136

0.013

0.832

0.019

California

0.101

0.014

0.848

0.037

Colorado

0.070

0.008

0.901

0.020

Connecticut

0.096

0.012

0.874

0.019

Delaware

0.128

0.012

0.842

0.019

Washington, DC

0.087

0.004

0.894

0.015

Florida

0.117

0.007

0.856

0.020

West Virginia

0.157

0.007

0.814

0.023

Georgia

0.120

0.009

0.855

0.016

Wisconsin

0.087

0.005

0.900

0.008

Hawaii

0.105

0.014

0.850

0.032

Wyoming

0.078

0.006

0.903

0.013

Idaho

0.103

0.018

0.868

0.012

Guam

0.117

0.011

0.842

0.030

Illinois

0.113

0.008

0.866

0.013

Puerto Rico

0.167

0.006

0.786

0.042

Indiana

0.124

0.012

0.850

0.014

Iowa

0.103

0.008

0.875

0.014

Kansas

0.108

0.009

0.868

0.015

Kentucky

0.133

0.008

0.840

0.019

Louisiana

0.126

0.014

0.844

0.016

Table 13.44 Adults who report being told they have COPD, emphysema, or chronic bronchitis (Source: www.kff.org)

Location

Share of Adults Told They Have COPD, Emphysema, or Chronic Bronchitis

Maine

0.106

0.016

0.854

0.024

Maryland

0.110

0.009

0.860

0.021

Alabama

0.099

Massachusetts 0.084

0.017

0.881

0.019

Alaska

0.046

0.020

Arizona

0.067 0.105 0.044

Michigan

0.111

0.010

0.859

Minnesota

0.088

0.011

0.883

0.018

Arkansas

Mississippi

0.148

0.011

0.827

0.014

California

419 https://doi.org/10.1017/9781009212021.015 Published online by Cambridge University Press

Data-Guided Healthcare Decision Making

Table 13.44 (cont.)

Table 13.44 (cont.)

Location

Share of Adults Told They Have COPD, Emphysema, or Chronic Bronchitis

Location

Share of Adults Told They Have COPD, Emphysema, or Chronic Bronchitis

Colorado

0.045 0.052

Washington, DC

0.045

Connecticut Delaware

0.085

West Virginia

0.123

Florida

0.077

Georgia

0.074

Hawaii

0.043

Idaho

0.051

Illinois

0.058

Indiana

0.087

Iowa

0.061

Kansas

0.064

Kentucky

0.108

Louisiana

0.086

Maine

0.092

Maryland

0.054

Massachusetts

0.049

Michigan

0.084

Minnesota

0.044

Mississippi

0.094

Missouri

0.088

Montana

0.068

Nebraska

0.057

Nevada

0.080

New Hampshire

0.062

Wisconsin

0.056

Wyoming

0.068

Guam

0.032

Puerto Rico

0.048

Table 13.45 Adults who report being told they have kidney disease (Source: www.kff.org)

Location

Share of Adults Told They Have Kidney Disease

Alabama

0.035

Alaska

0.018

Arizona

0.042

Arkansas

0.040

California

0.030

Colorado

0.018

Connecticut

0.024

Delaware

0.044

Florida

0.04

Georgia

0.038

Hawaii

0.029

Idaho

0.029

0.056

Illinois

0.027

New York

0.058

Indiana

0.034

North Carolina

0.078

Iowa

0.022

0.051

Kansas

0.027

Ohio

0.090

Kentucky

0.039

Oklahoma

0.087

Louisiana

0.04

Oregon

0.061

Maine

0.031

Pennsylvania

0.072

Maryland

0.028

Rhode Island

0.069

Massachusetts

0.023

South Carolina

0.081

Michigan

0.034 0.025

New Mexico

North Dakota

0.059

Minnesota

Tennessee

0.097

Mississippi

0.029

Texas

0.052

Missouri

0.031

Utah

0.041

Montana

0.024

Vermont

0.066

Nebraska

0.024

Virginia

0.065

Nevada

0.030

0.052

New Hampshire

0.026

South Dakota

Washington

420 https://doi.org/10.1017/9781009212021.015 Published online by Cambridge University Press

Six Sigma and Lean Management in Healthcare Sectors

Table 13.45 (cont.)

Location

Table 13.46 (cont.)

Share of Adults Told They Have Kidney Disease

Location

Unemployed

Idaho

0.030

New Mexico

0.036

Illinois

0.072

New York

0.025

Indiana

0.041

North Carolina

0.039

Iowa

0.040

North Dakota

0.027

Kansas

0.037

Ohio

0.033

Kentucky

0.044

Oklahoma

0.040

Louisiana

0.069

Oregon

0.031

Maine

0.048

Pennsylvania

0.031

Maryland

0.062

Rhode Island

0.024

Massachusetts

0.049

South Carolina

0.029

Michigan

0.050

South Dakota

0.029

Minnesota

0.040

Tennessee

0.037

Mississippi

0.062

Texas

0.033

Missouri

0.043

Utah

0.025

Montana

0.037

Vermont

0.024

Nebraska

0.025

Virginia

0.027

Nevada

0.078

Washington

0.027

New Hampshire

0.029

Washington, DC

0.020

New Jersey

0.073

New Mexico

0.079 0.077

West Virginia

0.042

New York

Wisconsin

0.028

North Carolina

0.046

0.023

North Dakota

0.040

Guam

0.030

Ohio

0.052

Puerto Rico

0.035

Oklahoma

0.037

Oregon

0.056

Pennsylvania

0.069

Table 13.46 Unemployment rate (seasonally adjusted) (Source: www .kff.org)

Rhode Island

0.059

South Carolina

0.045

Location

Unemployed

South Dakota

0.029

Alabama

0.033

Tennessee

0.049

0.066

Texas

0.065

Arizona

0.068

Utah

0.027

Arkansas

0.044

Vermont

0.031

0.077

Virginia

0.043

0.062

Washington

0.052

Connecticut

0.079

Washington, DC

0.070

Delaware

0.058

West Virginia

0.053

0.050

Wisconsin

0.039

Georgia

0.040

Wyoming

0.054

Hawaii

0.077

Puerto Rico

0.081

Wyoming

Alaska

California Colorado

Florida

421 https://doi.org/10.1017/9781009212021.015 Published online by Cambridge University Press

Data-Guided Healthcare Decision Making

Table 13.47 Uninsured rates for the nonelderly by federal poverty level (FPL) (Source: www.kff.org)

Table 13.47 (cont.)

Location

Under 100%

100– 199%

200– 399%

400%+

Tennessee

0.218

0.196

0.112

0.048

Texas

0.335

0.327

0.225

0.086

Utah

0.246

0.183

0.106

0.045

Vermont

0.075

0.071

0.080

0.029

Virginia

0.161

0.187

0.122

0.038

Washington

0.124

0.138

0.107

0.032

Washington, DC

0.061

0.099

0.064

0.015

0.033

0.094

0.039

West Virginia

0.094

0.121

0.089

0.043

0.229

0.176

0.077

Wisconsin

0.110

0.143

0.078

0.028

0.245

0.163

0.057

Wyoming

0.193

0.28

0.156

0.076

0.099

0.076

0.052

0.032

Puerto Rico

0.092

0.127

0.082

0.049

Idaho

0.214

0.190

0.104

0.070

Illinois

0.138

0.158

0.101

0.039

Indiana

0.168

0.165

0.109

0.045

Iowa

0.092

0.099

0.060

0.025

Kansas

0.220

0.203

0.099

0.042

Location

Under 100%

100– 199%

200– 399%

400%+

Alabama

0.211

0.172

0.105

0.045

Alaska

0.126

0.215

0.136

0.099

Arizona

0.179

0.201

0.155

0.060

Arkansas

0.141

0.166

0.119

0.033

California

0.130

0.143

0.117

0.041

Colorado

0.138

0.160

0.123

0.042

Connecticut

0.101

0.139

0.107

Delaware

0.134

0.150

Florida

0.260

Georgia

0.281

Hawaii

Kentucky

0.103

0.119

0.077

0.037

Louisiana

0.128

0.148

0.122

0.051

Maine

0.165

0.164

0.112

0.047

Maryland

0.126

0.135

0.091

0.034

Massachusetts 0.064

0.065

0.052

0.019

Michigan

0.101

0.114

0.081

0.030

Minnesota

0.106

0.101

0.080

0.023

Mississippi

0.235

0.211

0.138

0.066

Missouri

0.247

0.209

0.109

0.042

Montana

0.106

0.179

0.100

0.068

Nebraska

0.239

0.179

0.082

0.031

Nevada

0.204

0.208

0.140

0.065

New Hampshire

0.136

0.159

0.099

0.044

New Jersey

0.173

0.185

0.139

0.040

New Mexico

0.153

0.150

0.129

0.063

New York

0.087

0.087

0.084

0.036

North Carolina 0.231

0.217

0.143

0.052

North Dakota

0.187

0.074

0.050

0.143

Ohio

0.111

0.134

0.095

0.033

Oklahoma

0.313

0.238

0.167

0.076

Oregon

0.122

0.146

0.097

0.044

Pennsylvania

0.118

0.120

0.077

0.033

Rhode Island

0.073

0.107

0.063

0.023

South Carolina

0.230

0.200

0.126

0.057

South Dakota

0.252

0.197

0.106

0.043

422 https://doi.org/10.1017/9781009212021.015 Published online by Cambridge University Press

review. International Journal for Quality Research, 15(1), 241–258. Antony, J., Palsuk, P., Gupta, S., Mishra, D., & Barach, P. (2018). Six Sigma methods in healthcare: systematic review of the literature. International Journal of Quality and Reliability Management, 35(5), 1075–1092. Bhandari, P., Badar, M. A., & Childress, V. (2021). COVID-19 surge planning in response to global pandemic in a healthcare setting: a lean Six Sigma methods approach. Proceedings of the 11th International Conference on Industrial Engineering and Operations Management. Singapore. Virtual, March 7–11. Calogero, A., Longo, F., Nicoletti, L., & Padovano, A. (2017). Evaluating the impacts of lean management principles practices in healthcare via simulation. International Journal of Privacy and Health Information Management, 5(1), 1–22. Caulcutt, R. (2001) Why are Six Sigma methods so successful? Journal of Applied Statistics, 28(3–4), 301–306. https://doi.org/ 10.1080/02664760120034045. Chattamvelli, R., & Shanmugam, R. (2019). Generating Functions in Engineering and the Applied Sciences. Synthesis Lectures on Engineering. Williston, VT: Morgan & Claypool. Chattamvelli, R., & Shanmugam, R. (2020). Discrete Distributions in Engineering and the Applied Sciences. Williston, VT: Morgan & Claypool. Chattamvelli, R., & Shanmugam, R. (2021). Continuous Distributions in Engineering and the Applied Sciences. Williston, VT: Morgan & Claypool. Chen, K. S., Ouyang, L. Y., Hsu, C. H., & Wu, C. C. (2009). The communion bridge to Six Sigma methods and process capability indices. Quality and Quantity, 43(3), 463–469. Demirli, K., al-Kaf, A., Simsekler, M. C. E. et al. (2021). Using lean techniques and discrete-event simulation for performance improvement in an outpatient clinic. International Journal of Lean Six Sigma Methods, 12(6), 1260–1288. https://doi.org/10.11 08/IJLSS09-2020-0138.

Six Sigma and Lean Management in Healthcare Sectors

Table 13.48 COVID-19 vaccines delivered and administered (Source: www.kff.org)

Table 13.48 (cont.)

Location

Share of Delivered Vaccines That Have Been Administered

Share of Population Vaccinated with at Least One Dose

Share of Population Fully Vaccinated

Rhode Island

0.854

0.746

0.675

0.765

0.546

0.464

0.497

South Carolina

0.589

0.506

South Dakota

0.814

0.585

0.511

0.553

0.450

Tennessee

0.799

0.522

0.447

0.864

0.712

0.584

Texas

0.773

0.595

0.506

Colorado

0.877

0.651

0.589

Utah

0.857

0.585

0.500

Connecticut

0.893

0.756

0.682

Vermont

0.867

0.773

0.691

Delaware

0.767

0.658

0.571

Virginia

0.849

0.678

0.598

0.844

0.667

0.602

Location

Share of Delivered Vaccines That Have Been Administered

Share of Population Vaccinated with at Least One Dose

Share of Population Fully Vaccinated

Alabama

0.674

0.521

0.417

Alaska

0.751

0.568

Arizona

0.830

Arkansas

0.748

California

Florida

0.816

0.664

0.565

Washington

Georgia

0.733

0.543

0.446

0.699

0.595

0.808

0.763

0.573

Washington, DC

0.816

Hawaii Idaho

0.709

0.465

0.411

West Virginia

0.520

0.479

0.402

Illinois

0.855

0.679

0.530

Wisconsin

0.919

0.606

0.559

Wyoming

0.790

0.479

0.410

American Samoa

0.916

0.602

0.495

Federated States of Micronesia

0.843

0.426

0.364

Guam

0.967

0.755

0.664

Marshall Islands

0.749

0.386

0.335

0.806

0.626

0.596

Indiana

0.812

0.518

0.480

Iowa

0.826

0.577

0.536

Kansas

0.797

0.593

0.505

Kentucky

0.854

0.600

0.514

Louisiana

0.779

0.513

0.448

Maine

0.852

0.736

0.680

Maryland

0.797

0.701

0.636

Massachusetts 0.894

0.770

0.675

Michigan

0.785

0.566

0.519

Minnesota

0.857

0.632

0.577

Northern Mariana Islands

Mississippi

0.725

0.496

0.426

Puerto Rico

0.933

0.786

0.692

Missouri

0.799

0.544

0.473

Republic of Palau

0.923

0.983

0.853

Montana

0.820

0.543

0.481

Virgin Islands

0.891

0.509

0.439

Nebraska

0.857

0.590

0.542

Nevada

0.847

0.602

0.502

New Hampshire

0.815

0.690

0.612

New Jersey

0.823

0.720

0.638

New Mexico

0.944

0.720

0.625

New York

0.877

0.704

0.630

North Carolina 0.781

0.587

0.490

North Dakota

0.801

0.505

0.435

Ohio

0.809

0.538

0.498

Oklahoma

0.825

0.560

0.469

Oregon

0.777

0.661

0.602

Pennsylvania

0.840

0.717

0.573

Erthal, A., Frangeskou, M., & Marques, L. (2021). Cultural tensions in lean healthcare implementation: a paradox theory lens. International Journal of Production Economics, 233(1), 1–14. Escuder, M., Tanco, M., & Santoro, A. (2018). Major barriers in lean health care: an exploratory study in Uruguay. International Journal of Lean Six Sigma Methods, 9, 466–481. Gaikwad, L., & Sunnapwar, V. (2020). Integrated lean, green and Six Sigma methods strategies: a systematic literature review and directions for future research. TQM Journal, 32(2), 201–225. George, M. L. (2003). Lean Six Sigma Methods for Service: How to Use Lean Speed and Six Sigma Methods Qualities to Improve Services and Transactions. New York: McGraw-Hill.

423 https://doi.org/10.1017/9781009212021.015 Published online by Cambridge University Press

Data-Guided Healthcare Decision Making

Hernandez, C., Lopez, J. J., Melnyc, R., Friedman, M., & Gitlow, H. (2021). Six Sigma methods for home health care: applying theories & methodologies. International Journal of Healthcare Management, 14(1), 175–182.

Mohamed, K. B. N. R., Sharmila Parveen, S., Rajan, J., & Anderson, R. (2021). Six Sigma methods in health-care service: a case study on COVID-19 patients’ satisfaction. International Journal of Lean Six Sigma Methods, 12(4), 744–761.

Hilton, R. J. (2013). “Factors critical to a sustainable deployment of lean Six Sigma methods in Australian business” (Doctoral dissertation, Monash University).

Muraliraj, J., Zailani, S., Kuppusamy, S., & Santha, C. (2018). Annotated methodological review of lean Six Sigma methods. International Journal of Lean Six Sigma Methods, 9(1), 2–49.

Hundal, G. S., & Laux, C. M. (2020). Integrative technologies to make supply chains lean, agile and green: A review. International Journal of Supply Chain and Operations Resilience, 4(2), 171–186.

Neal, M. T., Richards, A. E., Curley, K. L., & Lyons, M. K. (2021). Launching a quality improvement project in neurosurgery: how to get started. Interdisciplinary Neurosurgery, 25, 101206.

Improta, G., Balato, G., Ricciardi, C. et al. (2019). Lean Six Sigma methods in healthcare: fast track surgery for patients undergoing prosthetic hip replacement surgery. TQM Journal, 31, 526–540.

Niñerola, A., Sánchez-Rebull, M. V., & Hernández-Lara, A. B. (2020). Quality improvement in healthcare: Six Sigma methods systematic review. Health Policy, 124(4), 438–445.

Improta, G., Guizzi, G., Ricciardi, C. et al. (2020). Agile Six Sigma methods in healthcare: case study at Santobono Pediatric Hospital. International Journal of Environmental Research and Public Health, 17(3), 1052.

Ohno, T. (1988). Toyota Production System: Beyond Large-Scale Production. Portland, OR: Productivity Press.

Institute of Medicine. (2001). Crossing the Quality Chasm: A New Health System for the 21st Century. Washington, DC: Institute of Medicine, National Academy Press. Iswanto, A. H. (2021). Impact of lean Six Sigma methods at pharmacy unit on hospital profitability before and during Covid-19 pandemic. International Journal of Lean Six Sigma Methods. 12(4), 718–743. https://doi.org/10.1108/IJLSS-10-20 20-0182. Johnson, C., Shanmugam, R., Roberts, L. et al. (2004). Linking lean healthcare to Six Sigma methods: an emergency department case study. In IIE Annual Conference Proceedings (pp. 1–14). Institute of Industrial and Systems Engineers. Kubala, M., Gardner, J. R., Criddle, J., Nolder, A. R., & Richter, G. T. (2021). Process improvement strategy to implement an outpatient surgery center efficiency model in an academic inpatient setting. International Journal of Pediatric Otorhinolaryngology, 144, 110650. https://doi.org/10.1016/j .ijporl.2021.110650. Liberatore, M. J. (2013). Six Sigma methods in healthcare delivery. International Journal of Health Care Quality Assurance, 26(7), 601–626. Machín, I. M., Elías, F. A., & Marcos, A. G. (2010). “Aplicación de la metodología de Dirección de Proyectos para la implantación de Lean en el sector sanitario” (Doctoral dissertation, Universidad of Rioja).

Pakdil, F., Toktas, P., & Can, G. F. (2021). Six Sigma methods project prioritization and selection: multi-criteria decisionmaking approach in the healthcare industry. International Journal of Lean Six Sigma Methods, 12(3), 553–578. Patel, A. S., & Patel, K. M. (2021). Critical review of literature on lean Six Sigma methods methodology. International Journal of Lean Six Sigma Methods, 12, 627–674. Pondhe, R., Asare, S. A., Badar, M. A., Zhou, M., & Leach, R. (2006). Applying lean techniques to improve an emergency department. In IIE Annual Conference. Proceedings of the IIE Annual Conference. Session: IERC 03 Engineering Management 6. Orlando, FL: Institute of Industrial and Systems Engineers. Ponsiglione, A. M., Ricciardi, C., Improta, G. et al. (2021). A Six Sigma methods DMAIC methodology as a support tool for health technology assessment of two antibiotics. Mathematical Biosciences and Engineering, 18(4), 3469–3490. Pyzdek, T., & Keller, P. (2014). Six Sigma Methods Handbooks. Irvine, CA: McGraw-Hill Education. Shanmugam, R., & Chattamvelli, R. (2015). Statistics for Scientists and Engineers. Hoboken, NJ: Wiley Inter-Science. Raval, S. J., & Kant, R. (2017). Study on lean Six Sigma methods frameworks: a critical literature review. International Journal of Lean Six Sigma Methods, 8(3), 275–334. Ricciardi, C., Balato, G., Romano, M. et al. (2020). Fast track for knee replacement surgery: a lean Six Sigma methods approach. TQM Journal, 32, 461–474.

Malins, R. J. (2018). Case study: application of DoD architecture framework to characterizing a hospital emergency department as the intended use environment for medical devices. INCOSE International Symposium, 28(1), 896–911.

Ricciardi, C., Ponsiglione, A. M., Converso, G. et al. (2021). Implementation and validation of a new method to model voluntary departures from emergency departments. Mathematical Biosciences and Engineering, 18, 253–273.

Matthews, L. (2013). Process Mining to Facilitate Process Improvement in a Healthcare Environment: An Emergency Department Case Study. Binghamton: State University of New York.

Ricciardi, C., Sorrentino, A., Improta, G. et al. (2020). A health technology assessment between two pharmacological therapies through Six Sigma methods: the case study of bone cancer. TQM Journal, 32, 1507–1524.

Michelin, L., Ricci, B., Barbagallo, V. et al. (2021). Can operating room efficiency be increased by applying the lean Six Sigma methods models? Archives of Clinical and Medical Case Reports, 5, 549–558.

Rodgers, B., Antony, J., Edgeman R., & Cudney, E. A. (2021). Lean Six sigma methods in the public sector: yesterday, today, and tomorrow. Total Quality Management & Business Excellence, 32 (5–6), 528–540. https://doi.org/10.1080/14783363.2019.1599714.

424 https://doi.org/10.1017/9781009212021.015 Published online by Cambridge University Press

Six Sigma and Lean Management in Healthcare Sectors

Romeijn, H. E., Schaefer, A., & Thomas, R. (2019). Six Sigma methods project recognition and prioritization. In Proceedings of the 2019 IISE Annual Conference, edited by H. E. Romeijn, A. Schaefer, & R. Thomas. Institute of Industrial and Systems Engineers.

Swee, M. L., Sanders, M. L., Phisitkul, K. et al. (2020). Development and implementation of a Tele nephrology dashboard for active surveillance of kidney disease: a quality improvement project. BMC Nephrology, 21(1), 1–10.

Sanders, D., & Hild, C. R. (2000). Common myths about Six Sigma methods. Quality Engineering, 13(2), 269–276.

Tadlaoui, K., Chafi, A., & Ennadi, A. (2018). The lean Six Sigma methods in a public hospital. Journal of Applied Engineering Science, 16(1), 60–69.

Shamsi, M. A., & Alam, A. (2018). Exploring lean Six Sigma methods implementation barriers in information technology industry. International Journal of Lean Six Sigma Methods, 9(4). https://doi.org/10.1108/IJLSS06-2017-0054.

Trakulsunti, Y., Antony, J., & Douglas, J. A. (2020). Lean Six Sigma methods implementation and sustainability roadmap for reducing medication errors in hospitals. TQM Journal, 33(1), 33– 55.

Shanmugam, R. (1991). Incidence rate restricted Poissonness. Sankhyā: The Indian Journal of Statistics, Series B, 53, 191–201.

Trakulsunti, Y., Antony, J., Dempsey, M., & Brennan, A. (2020). Reducing medication errors using lean Six Sigma methodology in a Thai hospital: an action research study. International Journal of Quality & Reliability Management, 38(1), 339–362.

Shanmugam, R. (1999). Kullback–Leibler information and interval estimation. Communications in Statistics, 28(9), 2057–2063. Shanmugam, R., Fulton, L., Ramamonjiarivelo, Z. et al. (2021). A report card on prevention efforts of COVID-19 deaths in US. Healthcare, 9, 1175. https://doi.org/10.3390/healthcare9091175. Sony, M., Antony, J., Park, S., & Mutingi, M. (2019). Key criticisms of Six Sigma methods: a systematic literature review. IEEE Transactions on Engineering Management, 67(3), 950–962. Suman, G., & Prajapati, D. (2018). Control chart applications in healthcare: a literature review. International Journal of Metrology and Quality Engineering, 9(5). https://doi.org/10.1051/ijmqe/ 2018003.

Van Den Heuvel, J., Does, R. J., & Verver, J. P. (2005). Six Sigma methods in healthcare: lessons learned from a hospital. International Journal of Six Sigma Methods and Competitive Advantage, 1(4), 380–388. Vincent, A., Pocius, D., & Huang, Y. (2021). Six Sigma methods performance of quality indicators in total testing process of pointof-care glucose measurement: a two-year review. Practical Laboratory Medicine, e00215. https://doi.org/10.1016/j.plabm .2021.e00215. Womack, J. P., & Jones, D. T. (1996). Lean Thinking. New York: Simon & Schuster.

425 https://doi.org/10.1017/9781009212021.015 Published online by Cambridge University Press

Chapter

Forecasting in Healthcare Sectors

14 After studying the chapter, readers will be able to: • Comprehend techniques that can be utilized to construct forecasts in the healthcare sector. • Forecast revenue and expenses so as to advance and maintain efficient healthcare services. • Acquire sufficient skill and background knowledge to formulate improved healthcare policies. • Draft a report addressing the sensitivity of healthcare decision-making based on time series concepts and data-analytic results. • Assess the consequences of decisions in healthcare sectors with high-quality time series analytic methodologies.

14.1 Motivation Healthcare administrators working in hospitals, clinics, government agencies, and financial and insurance institutions must probe whether the healthcare services they provide are effective, efficient, and optimal. Health economists and data analysts invest time and effort to project the future performance of the healthcare services their institutions provide. These and related concerns could be answered by time series data analysis and forecasting. Hospital/clinic administrators, healthcare professionals, insurance agents, and patients all desire high-quality healthcare services utilizing a minimal amount of resources. Developed and developing nations alike notice a percentage of their populations has inadequate health insurance coverage. Resources providers encounter restrictions that forbid financial support to underserved populations, including those with no health insurance coverage. Attempts have frequently been made to raise efficiency and cost-effectiveness in healthcare services. However, to achieve these goals, background knowledge and skill are essential and good understanding of forecasting methods can help healthcare administrators learn, apply, and utilize data to attain their goals. Every constituency involved in healthcare is aware of the necessity to improve the quality of healthcare services on a daily or a periodical basis.

426 https://doi.org/10.1017/9781009212021.016 Published online by Cambridge University Press

To resolve these and related issues, education is a prerequisite. For this purpose, evidence of variables documented at successive periods of time in the past needs to be extracted and interpreted as time series data. The time series technique is an analytic method used to extract and interpret evidence from time series data. Time-oriented data enable forecasting. Everyone in the healthcare sector is aware of the importance of familiarity with forecasting, which can be accomplished via two methods. In the stochastic approach, either the Box–Jenkins approach or Fourier transform can be used. The Box– Jenkins approach requires the calculation of the autoregressive (AR) average, the moving average (MA), and stationarity based on successive differencing of the original time series values. It is common in reality that a few observations will be missing (see Rais and Viana, 2011). The missing value xj at period j could be estimated using an imputation technique. The average value of the prior observations could be such j1 jþk X X

ð

yi þ

i¼jk

yi i¼jþ1

Þ

can replace the an estimate. Or yj  ¼ 2k missing value, where k is an arbitrarily selected point in time. Consider daily incidences of the spread of the COVID19 virus over a period of time. The deadly COVID-19 virus was first noticed in Wuhan, China, during December 2019. After several waves of COVID-19 infections, the World Health Organization (WHO) named the spread of COVID-19 a pandemic on January 31, 2020. The WHO created an internet dashboard for the purpose of daily updating the numbers of cases and deaths due to COVID-19 (https://covid19.who.int). As of May 17, 2022, 522 million cases had been observed worldwide, 6.27 million of whom died. These data constitute time series data. Anjum et al. (2021) analyze the time series data presented on the WHO dashboard. They publicize the taxonomy of COVID-19 in order to forecast COVID-19 cases and deaths. Using mechanistic and/or statistical approaches, Luo (2021) also devises a method to forecast COVID-19 cases. No single approach to forecasting

Forecasting in Healthcare Sectors

COVID-19 cases or deaths is considered superior. Healthcare professionals grasp the necessity of innovative techniques to forecast COVID-19, confirming what statistics professionals have been saying. An eminent statistician, George E. Box, once warned all models are wrong, but some work in the chance-oriented mechanism that generates time series data. The forecasts more often than not influence healthcare policies. However, forecasting a future incidence number amid the uncertainty of a pandemic like COVID-19 is fundamentally a challenge. Analysis of time series data has occurred in other healthcare specialties. An example is the forecast of the prevalence of Alzheimer’s disease (AD) as Zhao et al. (2021) illustrate it. Using time series techniques, they announce the prevalence and mortality of AD has been increasing. They use an autoregressive integrated moving average (ARIMA) model for their forecast, which matches closely the actual incidence of AD. The number of AD deaths in the United States increased from 31,145 females and 13,391 males to 84,062 females and 37,957 males, respectively. According to Zhao et al., the number of deaths attributed to AD is forecast to reach 42.40/100,000 in 2023. In this context, it is worth becoming familiar with time series notations and their interpretations. The ARIMA model (p, d, q) of first-order differencing of interest with an autoregressive average up to order p and moving averages up to order q is ð1  BÞxt ¼ βzt þ μ þ ϕ1 xt1 þ ϕ2 xt2 þ …:: þ ϕp xtp þ εt þ θ1 εt1 þ θ2 εt2 þ …:: þ θq εtq ; where zt is an exogenous predictor variable. More than 7.56 million COVID-19 cases occurred in Iran, and 145,000 of these cases died, according to the World Health Organization (www.who.org). Konarasinghe (2021) utilizes the daily time series data of COVID-19 in Iran from January 22, 2020, to June 17, 2021, to forecast the future incidence of COVID-19 in Iran. In his analysis, Konarasinghe obtains the auto correlation function (ACF) and uses the seasonal autoregressive integrated moving average (SARIMA) to forecast COVID-19 infections in Iran on a daily basis. His forecast was efficient, according to the Anderson Darling and Ljung–Box Q (LBQ) tests. The smaller mean absolute percentage error (MAPE), mean square error (MSE), and mean absolute deviation (MAD) confirm the model. Konarasinghe notices a five-period seasonal ARIMA (2, 0, 0) X (0, 1, 2)5, a seven-period seasonal ARIMA (0, 0, 1) X (0, 1, 2)7, and an eight-period seasonal ARIMA (0, 0, 1) X (0, 1, 1)8 compete with each other to provide a better forecast of COVID-19 cases.

The COVID-19 pandemic became virulent. The numbers of COVID-19 cases and deaths rose exponentially on all continents. Safeguarding public health became a nightmare. Consequently, more studies were undertaken after March 11, 2020. The results of one of those studies were publicized by Chen et al. (2021). The data analytic results of Canadian COVID-19 incidences from March 18, 2020, to August 16, 2020, for the provinces of Ontario, Alberta, British Columbia, and Quebec revealed surprises. To forecast COVID-19 cases and deaths, Chen et al. selected three time series models: (1) a smooth transition autoregressive model, (2) a neural network model, and (3) a susceptibleinfected-removed model. The neural network model outperformed the other two. Another potential application of time series data is to forecast the number of patients admitted to an intensive care unit (ICU) for the sake of increasing ICU capacity. Goic et al. (2021) analyze COVID-19 cases and deaths on a daily basis in Chile, South America, and innovate an approach to forecast the ability of Chilean healthcare services to accommodate COVID19 cases in the ICU. They integrate autoregressive, machine-learning, and epidemiological concepts to complete their forecast. The daily number of patient admissions in an emergency department (ED) was recorded and the data were analyzed by Rocha and Rodrigues (2021). Their aim was to project future admissions. Their analytic results were indicative of a strong seasonality with high precision. They performed a time series analysis using exponential smoothing and SARIMA and their results suggest improvement in healthcare services and management of emergency resources is possible. Management of an ED is complicated by patient delay. In reality, patients arrive to the ED via multiple means of transportation – walking, driving, and ambulance. The nurses in the ED or the receptionist at the front desk admit patients after collecting their background information and completing triage. The physicians examine patients in the ED for the purpose of choosing specialized treatment rooms (including the surgical theater) based on the level of urgency. These protocols are also practiced in dealing with COVID-19. To forecast COVID-19 admissions in the ED, Rostami-Tabar and Rendon-Sanchez consider ARIMA and exponential trend smoothing (ETS) (Rostami-Tabar and Rendon-Sanchez, 2021). Additional discussions on admitting COVID-19 patients in the ED appear in Cheng et al. (2021). A forecast of patient admissions to the pediatric ED is also important, especially in the time of COVID-19. Ramgopal et al. (2021) illustrate this theme. They report that, out of 29,787,815 encounters in the

427 https://doi.org/10.1017/9781009212021.016 Published online by Cambridge University Press

Data-Guided Healthcare Decision Making

pediatric ED at Chicago’s Robert H. Lurie Children’s Hospital, 1,913,085 occurred in 2020. Kumar and Viral (2021) warn COVID-19 has worsened the management of EDs even in developed nations like the United States, Spain, and Italy. To forecast future incidences, Kumar and Viral fit and utilize an autoregressive moving average (ARMA) model for daily COVID-19 data. They forecast the number of COVID-19 patients for India and showed an increase to 1,400,000 cases (see Figure 14.1 for the time series trend). Their forecast for daily cases went up to approximately 14,000. Cruz-Cano et al. (2021) present a related article using time series data and models based on a social distancing index. They forecast the number of COVID-19 cases for the US state of Maryland. Their illustration includes estimates of time series parameters based on observed daily COVID-19 cases. To forecast the spread of COVID-19, Alzahrani et al. (2020) fit and utilize an ARIMA model for Saudi Arabian time series data. Their forecast identifies a continuously increasing trend in Saudi Arabia. They projected 7,668 COVID-19 cases per day, in addition to 127,129 cumulative cases in a matter of four weeks. Nyoni and Nyoni (2020) offer a related article in which they utilize the number of observed outpatient visits to a hospital on a monthly basis in order to forecast future outpatient visits. Zimbabwe experienced operational problems due to overcrowding. These problems resulted in long wait times and more patient dissatisfaction. Motivated by the Zimbabwe experience, Nyoni and Nyoni employed artificial neural networks. Their model forecast outpatient visits, which are crucial to modify public health policy, to reduce overcrowding, and to lower patient dissatisfaction.

Elderly patients are empowered to enhance their health status when they use wearable healthcare technology. Using time series data and forecasting methodology, Talukder et al. (2020) demonstrate this concept. In this process, using an extended unified theory, anxiety level, and self-actualization, they develop a theoretical model with two steps to address the benefits of wearable technologies. They apply a structural equation model to identify significant determinants in the first step. They exercise a neural network model was to validate their findings from the first step and rank the determinants by importance in the second step. An accurate short-term forecast of COVID-19 cases versus recovered cases enables healthcare professionals or hospital/clinic administrators to optimize the utility of limited medical resources. Such forecasts help stop or reduce the spread of COVID-19. Six countries demonstrated promising potential to reduce the spread of COVID-19 – Italy, Spain, France, China, the United States, and Australia (Zeroual et al., 2020). Ismail et al. (2020) examine whether the COVID-19 pandemic will ever end using time series data and strategies adapted by healthcare organizations. They employ moving average principles, a simple but nonlinear trend, an S-curve for capturing the trend, exponential smoothing, and simple regression fitting. The underlying idea of the moving average is that the value at time T + 1 depends functionally on past values. A window in the time period consists of m past values. The simple moving average is a sliding window. At this juncture, the analyst could consider different weights to play up the most recent value and play down earlier values. Such an operation results in a weighted moving average. How far should the past values be considered? An appropriate

Figure 14.1 COVID-19 cases and deaths reported in India, March 1–31, 2020 (Source: Kumar & Viral, 2021).

428 https://doi.org/10.1017/9781009212021.016 Published online by Cambridge University Press

Forecasting in Healthcare Sectors

answer to the question is determined through significant autocorrelations, according to the time series data. Exponential smoothing is a method of forecasting using univariate time series data without trends or seasonality. The forecasts are possible using weighted averages. The most recent observations receive larger weights while earlier observations get smaller weights in the weighted moving averages. The weights provide leverage to downplay or play up the values. The weights decrease exponentially while moving from current to past observations. It is for this reason professionals call it exponential smoothing. The S-curve trend is a forecasting method in which a sigmoid relationship between the time and response variables is constructed. The S-curve trend occurs in a logistic regression model in which the independent variable could be the time index or some other equally spaced sequence of values. Logistic regression is a special form of nonlinear regression meant for the binary variable. Holt’s linear trend is an extension of the simple exponential smoothing method. Holt’s linear trend generates forecasts that depict a constant, increasing, or decreasing slope. This method tends to over-forecast. If a damping parameter in the Holt’s linear trend forecast is included to describe a flat-line forecast, it is called the damped trend method. Again, the ARIMA model is a combination of the AR and MA models. The AR is a regression model where the forecast value of a variable is a linear function of the past values with varying weights. Instead of using the past values for the forecast as in AR, MA uses past forecast errors in a regression model. When the AR and MA models combine with an order of differencing, it yields an integrated time series model, ARIMA. The ARIMA model is symbolically written as ARIMA (p, d, q). The variable p denotes how many past values are to be incorporated in the AR for a future forecast. For instance, if p ¼ 1, the model uses only one value of the immediate period for the forecast in AR. The variable p helps in adjusting the line being fitted to forecast the series. For example, when p ¼ 1, the autoregressive nature of the current observation yt is a linear combination yt  ϕ1 εt1 of the immediate past observation for t ¼ 1; 2; …:; n. When q ¼ 2, the AR principle utilizes the values of two periods as follows. That is, r2 xt ¼ rðrxt Þ ¼ rðxt  ϕ1 xt1 Þ ¼ xt  2ϕ1 xt1 þ ϕ2 xt2 for q ¼ 2. The integration is indicated by the abbreviation I. In the ARIMA model, the time series values are transformed into stationary series values. The differencing of order one means successive differences. That is, rxt ¼ xt  xt1 for t ¼ 1; 2; …:; n. Such differencing is needed to transfer the nonstationary time series to a stationary series. The first differencing value is the difference

between the current time period and the previous time period. Stationarity is a necessary requirement to forecast. The notation d denotes the degree of differencing to convert nonstationary time series to stationary time series. When d ¼ 2, it works as follows. That is, r2 xt ¼ rðrxt Þ ¼ rðxt  xt1 Þ ¼ xt  2xt1 þ xt2 for d ¼ 2. Notice second-degree differencing results in using the difference of the first-order differences. The notation q denotes the order of the MA. When q ¼ 1, the noises of the two consecutive periods are composed together by the healthcare system to make the current observation. For example, when q ¼ 1, the noise is integrated by the healthcare system as in rεt ¼ εt  θ1 εt1 , meaning a part θ1 of the immediate past noise εt1 is subtracted from the current noise εt for t ¼ 1; 2; …:; n. When the MA order is q ¼ 2, it works as follows. That is, r2 εt ¼ rðrεt Þ ¼ rðεt  θ1 εt1 Þ ¼ εt  2θ1 εt1 þ θ2 εt2 for q ¼ 2. This means a linear combination of the current, immediate past, and past of the immediate past noises are composed with unknown weights ð1; θ1 ; θ2 Þ respectively. The unknown moving average weights are estimated using the time series data. Using an analysis of COVID-19 time series data and methodology, a forecast claims the United States, Spain, Italy, France, Germany, Russia, Iran, the United Kingdom, Turkey, and India had the least control of the spread of COVID-19 (Kumar and Susan, 2020). Millions died due to the COVID-19 pandemic in these countries from January 22, 2020, to May 20, 2020. The mean absolute error, root mean square error, root relative squared error, and mean absolute percentage error support Kumar and Susan’s ARIMA models. Dehesh et al. (2020) give explanations for why ARIMA models provide an effective forecast of the spread of COVID-19. They utilize the daily incidences of COVID19 cases and deaths documented in a databank kept at Johns Hopkins University. In their results, China and Thailand have a stable trend while South Korea has a decreasing trend that later stabilizes. The trend in Iran and Italy is unstable. The ARIMA models predict a downward trend in confirmed COVID-19 cases. Dehesh et al. (2020) finalize an ARIMA (2,1,0) for China, ARIMA (2,2,2) for Italy, ARIMA (1,0,0) for South Korea, ARIMA (2,3,0) for Iran, and ARIMA (3,1,0) for Thailand. As they explain, time series analysis can distinguish COVID-19 patterns across 10 countries. Choudhury (2019) captures the stochastic behavior of COVID-19 patients. Jones et al. (2008) illustrate the daily arrival pattern of COVID-19 patients to a nearby ED and provide strategies of quality enhancement for healthcare professionals to consider. They fit ARIMA, Holt–Winters, and neural network models for the data. In fact, they select

429 https://doi.org/10.1017/9781009212021.016 Published online by Cambridge University Press

Data-Guided Healthcare Decision Making

ARIMA (3,0,0) and ARIMA (2,1,0) models as the best fit using minimum Akaike information and Schwartz Bayesian criteria. Both models are stationary, satisfying the Box–Ljung correlation and Jarque–Bera tests for normality. The readers might be convinced by Duarte and Faerman’s (2019) contention that the ARIMA model is powerful enough to provide hourly forecasts of patient arrivals to the ED. Duarte and Faerman utilize the wait time, the number of available beds, and the number of allocated versus unallocated patients in the ED. Their illustrations and comments might educate readers to prepare hospital planning, to improve and maintain high quality in healthcare services, and to train personnel in optimal use of limited resources. After analyzing time series data from 1982 through 2012 on the daily incidences of prostate and lung cancer in Australia, Earnest et al. (2019) narrate the importance of forecasting methods, especially ARIMA, to be prepared in healthcare services. Based on their analytic results, they identify two peaks of cancer occurrence in 1994 and 2009. Specifically, they choose ARIMA (1,1,0) for men aged 50+ years and then report an increase of prostate cancer cases from 3,606 in 1982 to 20,065 in 2012. This report is valuable to compare cancer incidences to COVID-19 incidences. A description of the time series analytic results using the utilization rate per 10,000 patients in the period 2008 to 2017 can be seen in Arimie et al. (2018). For the sake of stationarity, they perform a first-order difference. Then they estimate an ACF and a partial auto correlation function (PACF) and plot them. The Akaike and Bayesian information criteria are applied in order to choose the most suitable time series model. They choose ARIMA (0,0,1) or MA (1) with no seasonal variation in the data. Earlier, Shanmugam and Nevalainen (1988) provided the chronological development of fitting a time series model. Using time series concepts, they captured and interpreted environmental changes. They included a quick survey to promote what time series methods can contribute to environmental data analysis. A gap existed between the mathematically complicated materials in the literature and their oversimplified versions. An outline of the ideas, assumptions for analyzing, and limitations of time series methods ought to be learned from their experience.

Non-stochastic Time Series In the deterministic approach, the observations xt ; t ¼ 1; 1; 2; 3; …:; are not random but are composed of trend, seasonality, cyclicity, and irregular values. The time series with a linear deterministic trend is xt ¼ μ þ βt þ εt ; t ¼ 1; 2; …; ; with Eðxt Þ ¼ μ þ βt and Varðxt Þ ¼ Varðεt Þ ¼ σ 2ε . Such a time series is de-trended by creating xt  βt ¼ μ þ εt , which is random walk model. The deterministic approach is based on a notion that observations in time series databases in the healthcare sector are what happened but not a representation of many possibilities in the system. By graphing and/or analyzing such data, one could notice (1) the trend over the units of time, (2) seasonal components within every year, (3) the cyclical component (which is different and meaningful in a period longer than a year), and (4) the non-comprehendible irregular component. To capture the trend, one might fit a linear or polynomial equation. This is statistical work using regression ideas. Otherwise, one might construct a moving average (with or without weights for the observations) or exponential smoothing to capture the trend. Researchers doing either moving averages or exponential smoothing do not need probability and statistics knowledge. The basic idea behind exponential smoothing is that the time series value xt is a sum of the signal μ and noise εt ¼ xt  μ. The noises are scattered around zero mean with variance σ 2ε . The maximum likelihood estimate (MLE) of the parameters is n X xt n X t¼1 ^ ¼ n and σ^ 2ε ¼ μ ðxt  ^x t Þ2 =ðn  1Þ. The ACF is t¼1

then ρk ¼ 1  jkj n for k < n. Using Excel, one can obtain the moving averages of a specified order as illustrated in the third section of this chapter. In other words, the autocorrelation of lag k is nk X ðxt xÞðxtþk xÞ

^ρ k ¼ . Note the autocorrelations are lagn symmetric in the sense that ρk ¼ ρk for k ¼ 1; 2; 3; …; : . As the name reveals, the moving average is a continuously found mean value of a set of observations using n X t¼1

xi

14.2 Concepts

, where n is the size. In the next run, the first xð1;nÞ ¼ observation x1 is left out and a new observation xðnþ1Þ is nþ1 X

Healthcare administrators can opt between two approaches to analyze time series data: (1) a deterministic (i.e., non-stochastic) approach and (2) a stochastic approach (which includes the Fourier and Box–Jenkins time series models).

added to find the next moving average xð2;nþ1Þ ¼ , and so on. This is continued. A difficulty with the moving average is the selection of an appropriate n. However, the moving averages better smooth the original observations.

430 https://doi.org/10.1017/9781009212021.016 Published online by Cambridge University Press

i¼1 n

xi i¼2 n

Forecasting in Healthcare Sectors

Alternatively, moving average each observation observation and n X

the analyst may choose the weighted by selecting different weights for if the analyst seeks to downplay some play up other observations. That is, nþ1 X

wi xi

xð1;nÞ;w ¼

wi i¼1

wi xi

, xð2;nþ1Þ;w ¼

i¼1 n X

i¼2 nþ1 X

, and so on. When the

wi i¼2

weights are the same (i.e., wi ¼ n1), the weighted moving average simplifies to just the moving average as a special case. Exponential smoothing keeps the weighting principle by giving unevenly more weight 0 < α < 1 to the recent observation and a complementary weight 0 < 1  α < 1 to the past observations. That is, Ftþ1 ¼ αxt þ ð1  αÞFt ; 0 < α < 1; F0 ¼ 0; t ¼ 0; 1; 2; 3; …:; : where the time is t. The smoothing constant α is arbitrary but can be made smaller (or larger) as we want to downplay (or play up) the recent observation. To remedy the arbitrariness, some practitioners exercise the option of trying different values like α ¼ 0:1; 0:2; 0:3; 0:4; 0:5:0:6; 0:7; 0:8; 0:9 and select the best value for a specified α. To capture the linear trend Tt ¼ α þ βt; t ¼ 1; 2; 3; …; ::, where the initial effect α and the slope  P n n n P P n

^ ¼ t¼1 are estimated by the equations β n P n n

t¼1

txt  t2 

t

xi

t¼1 t¼12 o. n P t

t¼1

Alternatively, one could consider a quadratic trend Tt ¼ α þ β0 t þ β1 t2 ; t ¼ 1; 2; …::;. Once the trend is captured, the analyst tries to calculate the seasonality, cyclicity, and irregular components only after postulating the model for the healthcare sector. Such a model could be xt ¼ Tt  St  It , where Tt , St , and It are recognized as the trend, seasonal, and irregular amounts at time t. The seasonal-irregular values are generated for the entire period using St  It ¼ Fxtt . In the entire data, the values St  It are seen multiple times and an average seasonal index all P ðSt It Þ

St ¼ t¼1 all . The generated values of xStt are called de-seasonalized values. By plotting the de-seasonalized values over the entire period of the data domain, one can visualize the cyclic component. For example, geologists mention that earthquakes repeat every certain number of years. Economists mention that depressions occur cyclically. All of these values are easily computable using Excel.

Stochastic Time Series The most popular stochastic approach is the so-called Box– Jenkins equally spaced time series, which is explored in what follows. A non-equally spaced time series should be analyzed using special advanced techniques that are beyond the scope of this book. Another stochastic approach is frequency based (involving Fourier series) and utilizes trigonometric functions like sine and cosine values of the angles, periodicity, and so forth using advanced calculus; hence they are not explicated here. The Box–Jenkins approach consists of the AR and MA models and stationarity based on differencing of the original series values. The MA is always stationary but requires invertibility. On the contrary, the AR is always invertible but requires stationarity. Such complementarity between the MA and AR pushes for creating formulas that can be used to check the stationarity as well as the invertibility. The derivation of such formulas is beyond the scope of this book. However, the formulas will be cited and it will be shown how to compute and interpret their values. Within the Box–Jenkins approach are two basic ideas: (1) amalgamation of the noises base and (2) the chance-oriented system’s memory base. The first type is an MA model. The second type is an AR model. It is feasible for a system to utilize both the MA and AR models to generate its time series output/ values, but this mixing remains secretive and unknown. The analyst has to diagnose the presence of mixing with its proportionality using ACF and PACF. A mixed version is ARIMA model. Recall George E. Box’s comment that all models are wrong, but some models work. What did Box mean by this? Perhaps Box meant the nature of the chance mechanism might not follow mathematical principle(s) but we, the analysts, utilize them to easily comprehend/configure the system’s unknown functionality based on the chance mechanism’s output, or time series data. The two litmus tests are stationarity and invertibility. Unless a time series is stationary, it is not amenable to forecasting. A time series has stationarity if its structure xt ; xtþ1 ; …:; xtþn is the same as its structure in another time domain xtþk ; xtþkþ1 ; …:; xtþkþn . The MA model of order q is always stationary. Whether a model MA(q) is invertible needs to be checked out. If MA(q) is invertible, then it is compatible with an infinite autoregressive version. Invertibility is the passage from one version to the other between the MA and AR versions. The AR model of order p is always invertible to MA, though the AR(p) model may or may not be stationary. Only if it is stationary is its infinite version of the MA model obtainable. Hence, the validity of both the stationarity and invertibility of time series data is a necessity before it can be analyzed using the Box–Jenkins approach.

431 https://doi.org/10.1017/9781009212021.016 Published online by Cambridge University Press

Data-Guided Healthcare Decision Making

Autoregressive Time Series The AR(p) model is xt ¼ μ þ ϕ1 xt1 þ ϕ2 xt2 þ …:: þ εt and could be rewritten as xt ¼ μ þ εt  ϕ1 εt1  …:  ϕp εtp  …… due to invertibility, with μ The autoregressive concept refers mean Eðxt Þ ¼ p X ϕj

1

Eðxt Þ ¼ μ, variance Varðxt Þ ¼ tσ 2ε , and covariance Covðxi ; xj Þ ¼ 0 for i ≠ j. The model has stationarity. The noise components εt ; t ¼ 1; 2; ……; n follow a Gaussian (i. e., normal) probability pattern with zero mean and a constant (meaning it is free of the time effect) variance, σ 2ε . That means Eðxt Þ ¼ Eðμ þ εt Þ ¼ μ þ Eðεt Þ ¼ μ and Varðxt Þ ¼ Varðμ þ εt Þ ¼ Varðεt Þ ¼ σ 2ε .

j¼1

to when a recent observation is influenced by past observations up to some period as controlled by the chanceoriented system’s memory capacity. That is, in an AR model, xt ¼ ϕt1 xt1 þ ϕt2 xt2 þ …: þ ϕtp xtp , where ϕt1 ; ϕt2 ; …:; ϕtp are autoregressive parameters estimated using available data. When the current observation is determined by the immediate past observation, the system is Markovian. When a system lacks the required stationarity, it is feasible to create stationary series yt for the differences of the original nonstationary series xt values by differencing like      d 1 d 2 d ð1  BÞ xt ¼ 1 þ ðÞ B þ ð1Þ B2 ; ……:; 1 2    d d þ ð1Þd B xt d ¼ xt  dxt1 þ dðd  1Þxt2 =2 þ……: þ ð1Þd xtd; t ¼ 1; 2; ::; ::, where B is called the backward operator in the sense that Bk xt ¼ xtk . The differenced time series is likely to have stationarity even if the original time series does not, if we have succeeded in identifying the appropriate value for d. Because autocorrelations are exponentially decaying in autoregressive series, partial autocorrelations are needed to characterize AR(p) models. What is partial autocorrelation? Imagine three random variables: X, Y, and Z. A ^ t . Likewise, a regression of Xt on Zt gives a prediction X ^ t . Then the regression of Yt on Zt gives a prediction Y ^ ^ correlation ρxyjz ¼ corrðXt  X t ; Yt  Y t Þ is recognized as a partial correlation. The PACF is defined as the correlation between xt and xtk after adjusting for xt1 ; xt2 ; …::; xtkþ1 . Note the PACF xt and xtk is ϕk for k ≤ p and zero for AR(p) with k > p. Solving the Yule– Walker equations, we note the system of equations k X ϕik ρð jiÞ yields the partial autocorrelations ρj ¼ i¼1

ϕ1k; ϕ2k ; …::; ϕkk . The pure random time series xt ¼ μ þ εt with the time identifier t ¼ 1; 2; …:; n is called the white noise model. The white noise model has the following properties: mean

432 https://doi.org/10.1017/9781009212021.016 Published online by Cambridge University Press

AR (1) Process The first-order autoregressive process is symbolically stated as AR (1) and its model is xt ¼ μ þ ϕxt1 þ εt or equivalently xt ¼ μ þ εt þ ϕεt1 þ ϕ2 εt2 þ ……:þ for t ¼ 1; 2; …; n μ and variance due to invertibility with mean Eðxt Þ ¼ 1ϕ σ2

Varðxt Þ ¼ 1ϕε 2 . When jϕj < 1, the AR (1) process is sta-

tionary. Otherwise (i.e., jϕj > 1 or equivalently jϕ1 j < 1), the past noises are exponentially increasing in an explosive manner and hence the series is not convergent. Its autocorrelations of order k are ρk ¼ ϕk and the partial autocorrelations are ϕkk ¼ ρ1 for k=1. The higher-order autocorrelations and partial autocorrelations are zero. The autocorrelations are exponentially decreasing for AR 2 ^ ¼ ð1ϕ Þ σ 2 . (1). The sampling variance of AR (1) is VarðϕÞ n

ε

AR (2) Process The second-order autoregressive process is symbolically stated as AR (2) and its model is xt ¼ μ þ ϕ1 xt1 þ ϕ2 xt2 þ εt for t ¼ 1; 2; …; n due to invertibility, with mean Eðxt Þ ¼ 1ϕμϕ and variance 1ϕ

1

σ2

2

Varðxt Þ ¼ 1þϕ2 ð1ϕ Þε 2 ϕ2 . When the parametric conditions 2

2

1

jϕ2 j < 1; ϕ2  ϕ1 < 1; ϕ2 þ ϕ1 < 1 are validated, the AR (2) process is stationary. Otherwise, the past noises are exponentially increasing in an explosive manner and hence the series is not convergent. Its autocorrelations ϕ are ρ1 ¼ 1ϕ1 , ρ2 ¼ ϕ1 ρ1 þ ϕ2 , ρ3 ¼ ϕ1 ρ2 þ ϕ2 ρ1 , and so 2 on. The autocorrelations are exponentially decreasing for AR (2) also. The partial autocorrelations are ϕ ϕ11 ¼ ρ1 ¼ 1ϕ1 ; ϕ22 ¼ ϕ2 and the higher-order partial 2 autocorrelations are zero. The sampling variance of AR 2 ^ Þ ¼ ð1ϕ2 Þ σ 2 ; i ¼ 1; 2. (2) is Varðϕ i

n

ε

Moving Average Time Series The moving average concept is based on the notion that the system puts together the noises of certain period into a current observation. That is, xt ¼ μ þ εt þ θt1 εt1 þ θt2 εt2 þ … þ θtq εtq , where εt ; εt1 ; εt2 ; ……; εtq and θt1 ; θt2 ; ……; θtq are called unknown proportions of the noises at the subscripted time and the moving

Forecasting in Healthcare Sectors

average coefficients to be estimated from the given data. The MA model of order q is always stationary. Stationarity is a necessity to make a forecast of a system. Unless the system is stationary, the forecast is not possible. The model MA(q) of order q is then an aggregate of the past q noises with unknown weights θt1 ; θt2 ; ……:θtq to be estimated from the data itself. Note that Eðxt Þ ¼ Eðμ þ εt þ θt1 εt1 þ θt2 εt2 þ … þ θtq εtq Þ ¼ μ because Eðεt Þ ¼ 0 at every time t, Varðxt Þ ¼ Varðμ þ εt þ θt1 εt1 þ θt2 εt2 þ … þ θtq εtq Þ ¼ ð1 þ θ2t1 þ θ2t2 þ … þ θ2tq Þσ 2ε and covariance of observations xt and xtþk lag k Covðxt ; xtþk Þ ¼ ðθk þ θ1 θkþ1 þ …… þ θqk θq Þσ 2ε with

autocorrelation

ðθk þθ1 θkþ1 þ……þθqk θq Þ . ð1þθ21 þ……þθ2q Þ

of

lag

k

t ; xtþk Þ ρk ¼ Covðx Covðxt ; xt Þ ¼

The MA(q) model explicates that

Xt ¼ μ þ εt  θ1 εt1  …::  θtq εtq , meaning a current observation Xt is composed of an expected value plus the current noise level εt and a combination of the proportions of the past εt1 ; εt2 ; …:; εtq . Note Eðxt Þ ¼ μ since Eðεt Þ ¼ 0 for every time, t. The variance is Varðxt Þ ¼ Varðμ  εt þ θ1 εt þ …: þ θtq εtq Þ ¼ ð1 þ θ21 þθ22 þ …þθ2q Þ. The autocorrelation of order k is θk þθ1 θkþ1 þ…þθqk θq ; k ¼ 1; 2; …; q with a standard ρk ¼ 1þθ2 þ……þθ2 1

q

k1 X

ð1þ2

MA (1) Process The first-order moving average is symbolically stated as MA (1) and its model is xt ¼ μ þ εt þ θ1 εt1 for t ¼ 1; 2; …; n. Note that Eðxt Þ ¼ μ and Varðxt Þ ¼ ð1 þ θ21 Þσ 2ε . Its first jθ1 j autocorrelation is ρ1 ¼ 1þθ 2 ≤ 1=2. The autocorrelation 1

after lag one for MA (1) is cut off. The partial autocorrelations decrease exponentially. The sampling variance of ð1θ21 Þ 2 σ . MA (1) is Varð^θÞ ¼ ε

n

MA (2) Process The second-order moving average is symbolically stated as MA (2) and its model is xt ¼ μ þ εt þ θ1 εt1 þ θ2 εt2 for t ¼ 1; 2; …; n. Its first and second autocorrelations are 1 þθ1 θ2 2 respectively ρ1 ¼ θ and ρ2 ¼ 1þθθ2 þθ 2 . The autocorrel1þθ2 þθ2 1

2

1

2

ation after lag two for MA (2) is cut off. The partial autocorrelations decrease exponentially to zero. The samð1θ22 Þ 2 pling variance of MA (2) is Varð^θ i Þ ¼ σ ; i ¼ 1; 2. n

ε

Integrated Autoregressive Moving Average Time Series The underlying model for the integrated time series data is abbreviated as ARIMA (p, d, q). The ARIMA (p, d, q) model is symbolically stated as ð1  ϕ1 B1  ϕ2 B2  ……:  ϕp Bp Þð1  BÞd xt ¼ ð1 þ θ1 B1 þ θ2 B2 þ …:: þ θq Bq Þεt

^ρ 2i Þ

i¼1 . error Varð^ρ k Þ ≈ n θ1 The MA (1) has autocorrelation ρ1 ¼ 1þθ 2 . The MA (2)

where

is xt ¼ μ þ εt þ θ1 εt1 þ θ2 εt2 . Note that Eðxt Þ ¼ μ and

a mixture of exponential decay and mixed sinusoid

xt  xt1 , ð1  BÞd¼2 xt ¼ xt  2xt1 þ xt2 ð1  BÞd¼3 xt ¼ xt  3xt1 þ 3xt2  xt3 are recognized respectively as observation, noises, autoregressive parameters, moving average parameters, backward shift operation on observation, backward shift operation on noise, and the binomial expansion of differencing for stationarity in the time series. We assume the noises εt ; t ¼ 1; 2; …; follow a Gaussian (i.e., normal) distributed with zero mean and σ 2ε . When the variance is not constant but varies over time, then the time series is called heterogeneous and, in this situation, a transformation called the Cox transformation 8  occurs  λ < xt  1 λ≠0 is applied to have homogenyt ¼ λ¼0 : λx t  1 x t ln xt 8 9 n P < ln xt = . Consequently, the observaeity where x t ¼ exp t1 n ; :

expressions.

tions xt ; t ¼ 1; 2; …; are also Gaussian (or normal)

1

the variance is Varðxt Þ ¼ ð1 þ θ21 þ θ22 Þ. The autocorrelations are ρ1 ¼

θ1 þθ1 θ2q 1þθ21 þθ22

2 and ρ2 ¼ 1þθθ2 þθ 2 with a standard 1 2 ! k1 X

^ρ 2i

1þ2

error Varð^ρ k Þ ≈

i¼1 n

, and so on. The autocorrel-

ations after lag q are insignificant in MA(q). The standard error of the sample autocorrelation of order k is vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi u X u k1 u1þ2 ^ρ 2j t j¼1 . The partial autocorrelations of the MA(q) are n

xt ; t ¼ 1; 2; 3; :::::;

εt ; t ¼ 1; 2; :::;ϕ1 ; ϕ2 ; ……:ϕp ,

θ1 ; θ2 ; …::; θq , B xt ¼ xtp , Bq εt ¼ εtp , ð1  BÞd¼1 xt ¼ p

433 https://doi.org/10.1017/9781009212021.016 Published online by Cambridge University Press

Data-Guided Healthcare Decision Making

Table 14.1 Tendency of autocorrelations and partial autocorrelations for stationarity

Model

ACF

PACF

MA (q)

Cuts off after lag q

Exponentially decaying

AR (p)

Exponentially decaying

Cuts off after lag p

ARMA (p, q)

Exponentially decaying

Exponentially decaying

ρ1 ¼

the

autocorrelations

ðϕ1 þθ1 Þð1þϕ1 θ1 Þ 2 σε , ð1ϕ2ε Þ

Þ 2 ρ0 ¼ ð1þ2ϕθþθ σε , ð1ϕ2 Þ 2

and the higher-order autocorrel-

ations are zero. The partial autocorrelations decay in magnitude. The sampling variance of ARMA (1, 0, 1) is 2 2 ^ Þ ¼ ð1ϕ Þð1þϕ12θ1 Þ σ 2 and Varð^θ 1 Þ ¼ Varðϕ 1

ð1θ1 2 Þð1þϕ1 θ1 Þ2 nðϕ1 þθ1 Þ2

nðϕ1 þθ1 Þ

The model for the ARIMA (0,1,1) process (alternatively called an exponentially weighted integrated moving average process) is xt ¼ μ þ xt1 þ εt þ θεt1 . In this process, the ACF dies down slowly and the PACF has significant spikes at lag one and lag two.

Seasonal Integrated Autoregressive Moving Average Time Series

distribution because of their linear relation with the noises. To check the normality, the data can be sketched as a probability plot. That is, the mean of the stationary observations Xt is μ for all time t ¼ 1; 2; …;. The autocorrelation (sometimes called serial correlation) of order k among the stationary time series is ρk ; k ¼ 0; 1; 2; 3; …::;. The sampling error of the sample autocorrelation is v^a rðρk Þ ≈ p1ffiffin. Both the autocorrelation and the partial autocorrelation are utilized to select the underlying model for the given data. When the series has stationarity, there is no need to select a nonzero value for d. When q ¼ 0, the series becomes an autoregressive process. When p ¼ 0, the series is a moving average process. For the ARMA (p, q) process to be stationary, the roots of the equation mp  ϕ1 mp1  ϕ2 mp2  …:  ϕp ¼ 0 are less than one in absolute value. For the ARMA (p, q) process to be invertible, the roots of the equation mq  ϕ1 mq1  ϕ2 mq2  …:  ϕq ¼ 0 are less than one in absolute value. We may summarize this as shown in Table 14.1. The ARIMA (1, 0, 1) is xt ¼ μ þ ϕ1 xt1 þ εt þ θ1 εt1 with

ARIMA (0,1,1) Process

ε

σ 2ε .

Random Walk Process ARIMA (0,1,0) The simplest model of non-stationarity is the random walk model, whose structure is xt ¼ μ þ xt1 þ εt þ θεt1 . Here the autocorrelations die out slowly and the partial autocorrelation at lag one only is significant. When μ ¼ 0, the random walk model is without a drift. Notice that the random walk model is different from the white noise model we discussed earlier.

434 https://doi.org/10.1017/9781009212021.016 Published online by Cambridge University Press

When the autoregressive, stationarity, or moving average corresponding to seasonality (s) are to be described in the model, the seasonal stationarity terms ð1  BS ÞD , and the seasonal autoregressive terms ΦP ðBS Þ ¼ S 2S PS ð1  Φ1 B  Φ2 B  ……  ΦP B Þ need to be added in the model. Then the full seasonal model is SARIMAðp; d; qÞ  ðP; D; QÞ. The parameters are estimated using the maximum likelihood principles and a software like JMP15.

Model Selection Criteria Which among MA(1), MA(2), AR(1), AR(2), and integrated ARMA (1,1) is better performing for given data is decided based on criterion. The Schwartz criterion (SC) require seeking a model with a minimum poswhere sible value of SC ¼ n lnðRSS n Þ þ k lnðnÞ, n P 2 RSS ¼ ðxt  ^x t Þ ¼ residual sum of squares, where ^x t t¼1

is the forecast and xt  ^x t is the error of forecast. The Akaike criterion (AC) require a minimum value of AC ¼ n  lnðRSS n Þ þ 2  k. Another criteria that could be used is the adjusted coefficient of determination n X ðxt ^x t Þ2 =ðnpÞ

t¼1 R2adj ¼ 1  X n

, where p is the number of

ðxt x t Þ2 =ðn1Þ

t¼1

parameters to make the forecast ^x t . Sahai et al. (2020) tried to compare ARIMA models on COVID-19 pandemics in five countries: the United States, Brazil, India, Russia, and Spain. The ARIMA models for these countries came out different. The models for India, Brazil, Russia, Spain, and the United States are respectively ARIMA (4, 2, 4), ARIMA (3, 1, 2), ARIMA (3, 0, 0), ARIMA (4, 2, 4), and ARIMA (1, 2, 1). That is, the underlying model for the US pandemics is xt ¼ ð2 þ ϕ1 Þxt1  ð1 þ ϕ1 Þxt2 þ ϕ1 xt3 þ εt þ θ1 εt1 and for Russia it is

Forecasting in Healthcare Sectors

xt ¼ ϕ1 xt1 þ ϕ2 xt2 þ ϕ3 xt3 þ εt : They concluded Russia and Spain reached an inflexion point in the pandemic while the United States, Brazil, and India experienced an exponential curve. The parameters of a model are estimated using the maximum likelihood (ML) principle. The ML estimate of a function of the parameters is the function of the ML estimate of the parameters because finding the ML involves solving nonlinear equations. Hence practitioners adapt software to find the ML estimate of the model parameters. The beta version – not the Windows version – of JASP

computes the ML estimate of the models. Of course practitioners can use Excel commands to do the calculations, but the task could be tedious. An easier choice for practitioners is using the software JMP 15.0 as done in this chapter.

14.3 Illustration The data in Table 14.2 on the deaths due to COVID-19, pneumonia, and influenza in the United States during January through December 2020 are considered for illustration. Epidemiologists suggest pneumonia is seasonal. The deadly COVID-19 virus is more serious than pneumonia. Pneumonia is an inflammation in the lungs that affects

Table 14.2 Comparative causes of deaths in the United States, 2020 (Source: www.kff.org)

Week ending date

COVID-19 deaths

Pneumonia deaths

Influenza deaths

Exponential smoothing with α ¼ 0:6 for recent

1/4/2020

0

4,111

434

1/11/2020

1

4,153

475

1/18/2020

2

4,066

467

1/25/2020

2

3,915

500

2/1/2020

0

3,818

481

0.736

2/8/2020

2

3,823

521

1.4944

2/15/2020

2

3,845

561

1.79776

2/22/2020

6

3,719

565

4.319104

2/29/2020

9

3,842

656

7.127642

3/7/2020

37

3,977

635

25.05106

3/14/2020

60

3,974

628

46.02042

3/21/2020

588

4,553

559

3/28/2020

3,214

6,186

446

2,076.883

371.2082

4/4/2020

10,127

9,934

478

6,906.953

4/11/2020

16,317

12,016

474

12,552.98

4/18/2020

17,200

11,419

264

15,341.19

4/25/2020

15,550

10,388

143

15,466.48

5/2/2020

13,215

8,952

66

14,115.59

5/9/2020

11,231

7,836

48

12,384.84

5/16/2020

9,234

6,791

22

10,494.33

5/23/2020

7,248

5,910

24

8,546.534

5/30/2020

6,170

5,283

12

7,120.614

6/6/2020

5,056

4,906

11

5,881.845

6/13/2020

4,230

4,546

11

4,890.738

435 https://doi.org/10.1017/9781009212021.016 Published online by Cambridge University Press

Data-Guided Healthcare Decision Making

Table 14.2 (cont.)

Week ending date

COVID-19 deaths

Pneumonia deaths

Influenza deaths

Exponential smoothing with α ¼ 0:6 for recent

6/20/2020

3,847

4,371

8

4,264.495

6/27/2020

3,841

4,279

12

4,010.398

7/4/2020

4,551

4,575

5

4,334.759

7/11/2020

5,785

5,551

10

5,204.904

7/18/2020

7,192

6,199

13

6,397.161

7/25/2020

8,242

6,777

11

7,504.065

8/1/2020

8,302

6,855

13

7,982.826

8/8/2020

7,867

6,822

10

7,913.33

8/15/2020

7,261

6,520

5

7,521.932

8/22/2020

6,383

5,978

12

6,838.573

8/29/2020

5,747

5,555

12

6,183.629

9/5/2020

5,015

5,220

9

5,482.452

9/12/2020

4,626

4,947

7

4,968.581

9/19/2020

4,276

4,780

5

4,553.032

9/26/2020

4,299

4,947

4

4,400.613

10/3/2020

4,241

4,833

8

4,304.845

10/10/2020

4,817

5,189

13

4,612.138

10/17/2020

5,198

5,222

17

4,963.655

10/24/2020

5,995

5,652

15

5,582.462

10/31/2020

7,022

6,158

21

6,446.185

11/7/2020

8,756

7,095

21

7,832.074

11/14/2020

10,647

8,068

20

9,521.03

11/21/2020

13,356

9,429

30

11,822.01

11/28/2020

15,619

10,472

27

14,100.2

12/5/2020

18,560

12,094

34

16,776.08

12/12/2020

20,921

13,289

30

19,263.03

12/19/2020

22,322

14,317

32

21,098.41

12/26/2020

23,361

14,908

32

the small air sacs called alveoli. Pneumonia symptoms are dry cough, chest pain, fever, and difficulty breathing. Influenza, the flu, is an infectious disease caused by a virus. Influenza symptoms range from mild to severe fever, runny nose, sore throat, muscle pain, headache, coughing, diarrhea, and fatigue. Influenza might progress to pneumonia. Frequent hand washing and covering the mouth and nose when coughing reduce transmission.

436 https://doi.org/10.1017/9781009212021.016 Published online by Cambridge University Press

0.736

Figures 14.2 through 14.5 reveal that COVID-19 deaths surpassed pneumonia or influenza deaths in April, May, June, August, and November 2020. While influenza deaths were stable, COVID-19 deaths had an upward trend. The forecast is a prediction of a future outcome in healthcare. Forecasting is based on qualitative or quantitative information. The Delphi method is an approach to make qualitative forecasts. The quantitative entity is forecast using a smoothing technique or a time series model.

Forecasting in Healthcare Sectors

Figure 14.2 Comparative causes of deaths in the United States, 2020

Moving average and trend of COVID-deaths

Figure 14.3 Original, moving averages versus trend of COVID deaths

25000 20000

15000 10000 5000 0

COVID-19 Deaths

MV order 9

Linear (COVID-19 Deaths)

Figure 14.4 Exponential smoothing with 0.6

Exponenal Smoothing 25000

Value

20000 15000 Actual

10000

Forecast 5000

0 1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 Data Point

437 https://doi.org/10.1017/9781009212021.016 Published online by Cambridge University Press

Data-Guided Healthcare Decision Making

Figure 14.5 How does differencing create more stationarity?

First difference (light grey) is more stationary than the original (dark grey) series 25000 20000 15000 10000

5000 0

-5000

COVID-19 Deaths

d-1

Linear (COVID-19 Deaths)

Table 14.3 Comparison of models fitting the COVID-19 data

AIC

DF

Winters Method (Additive)

36

684.82

689.81

0.89

678.82

ARMA (1, 1)

49

889.27

895.13

0.91

883.27

Simple Exponential Smoothing

50

910.72

912.65

0.91

Seasonal ARIMA (1, 0, 1) X (1, 0, 1)12

47

1,752.56

1,762.32

Alzahrani et al. (2020) compare the prediction with the actual number of cases. Recall the ACF values in the AR (p) process and the PACF values in the MA (q) process converge to zero as its lag increases. When the ACF and PACF values do not converge to zero, then differencing may be needed. If all of the ACF values are near zero, then the time series is a random process and it is xt ¼ μ þ εt for t ¼ 1; 2; …:; :; n. When the ACF values of the first differences, ð1  BÞxt ¼ xt  xt1 are near zero, then the process is a random walk model: xt ¼ μ þ xt1 þ εt , for t ¼ 1; 2; …:; :; n.

Comparative Analyses of COVID-19, Pneumonia, and Influenza Deaths in Year 2020 The incidences of COVID-19, pneumonia, and influenza deaths in 2020 are displayed in Table 14.2. Let us first analyze the COVID-19 data, obtain a time series model, and create the forecast. The average number of COVID-19 deaths is 7,068 (see the line in Figure 14.6) with n ¼ 52 observations. The autocorrelations are exponentially decaying except the significant one lag spikes (see Figure 14.7). The partial autocorrelations have a significant spike only at first lag (Figures 14.6 through 14.8). According to the

438 https://doi.org/10.1017/9781009212021.016 Published online by Cambridge University Press

SBC

R-Square

−2LogLH

Model

6e+5

908.72 1,742.56

identification rules in Table 14.1, the time series model for the COVID-19 deaths is autoregressive AR (1) of order one. Ljung and Box (1978) introduced a method to test whether a set of autocorrelations is significant. Overall randomness is acceptable, according to a portmanteau test. The p-value of the Ljung–Box test is less than 0.0001 for the COVID-19 deaths in 2020. The fit of the AR (1) model for the COVID19 deaths is ^x t ¼ 7;068 þ 1:08xt1 þ ^ε t ,with noise variance σ^ xt ¼ 6; 163.

Time Series Basic Diagnostics A model that fits best the data is decided using R-square, AIC, SBIC, or -2lnLH (see Table 14.3). The R-square is the proportion in the domain ½0; 1 of the variations in the main variable, xt is predictable through the model. The Akaike information criterion (AIC) estimates the prediction error of the models. A lower AIC value means a better fit. By increasing the number of parameters, the model becomes over-fitting. The Schwarz information criterion (SBC) introduces a penalty for increasing the number of parameters. Both BIC and AIC attempt to resolve over-fitting by penalties. The likelihood ratio 2 ln LH is attributed to statistician S. S. Wilks.

Forecasting in Healthcare Sectors

Figure 14.6 COVID-19 deaths, 2020

Time Series Basic Diagnostics Lag 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

AutoCorr 0.8788 0.7005 0.4993 0.3077 0.1431 0.0066 -0.0939 -0.1638 -0.2048 -0.2228 -0.2219 -0.2079 -0.1779 -0.1289 -0.0680 -0.0190 0.0117 0.0230 0.0203 0.0034 -0.0236 -0.0543 -0.0830 -0.0985 -0.0943

Figure 14.7 Autocorrelations of COVID-19 deaths at different lags

439 https://doi.org/10.1017/9781009212021.016 Published online by Cambridge University Press

Data-Guided Healthcare Decision Making

Lag 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Partial 0.8788 -0.3152 -0.1622 -0.0596 -0.0320 -0.0586 -0.0150 -0.0345 -0.0194 -0.0233 -0.0142 -0.0185 0.0237 0.0524 0.0315 -0.0568 -0.0395 -0.0285 -0.0128 -0.0384 -0.0338 -0.0246 -0.0168 0.0196 0.0376

Figure 14.8 Partial autocorrelations of COVID-19 deaths, 2020

Winters (1960) proposed an additive model to capture the initial level, trends, seasonal versions, and their frequencies. The simple exponential smoothing model is non-stochastic and deterministic, and it is not a natural candidate for the COVID-19 data. The ARMA (1, 1) is a natural model for the COVID-19 data. Its seasonal version ARIMA (1, 0, 1) X (1, 0, 1)12 of seasonality of order 12 (as if the pandemics COVID-19 is recurring year after year with a calendar of 12 months) is inappropriate. After examining the fit of these four models to the COVID-19 data (see Table 14.3), a seasonal time series model ARIMA (1, 0, 1) X (1, 0, 1)12 is considered a better fit. The Winters additive models are not a good fit for these data. The seasonal ARIMA model (p, d, q) X (P, D, Q)s is to be interpreted as follows. The notation (p, d, q) refers to non-seasonal behavior in the time series and Φp ðBÞrd xt ¼ μ þ Φq ðBÞεt refers

in which μ is mean value, rxt ¼ xt  xt1 , Bp xt ¼ xtp , and Bq εt ¼ εtq . Likewise, the notation (P, D, Q)s refers to a seasonal behavior in the time series

440 https://doi.org/10.1017/9781009212021.016 Published online by Cambridge University Press

ΦP ðBS ÞrD xt ¼ μ þ ΦQ ðBS Þεt refers

in which S indicates seasonality (i.e., S ¼ 4 for quarterly seasons), μ is mean value, rs xt ¼ xt  xts , BPS xt ¼ xtPS , and BQS εt ¼ εtQS . The exponential smoothing and ARMA (1, 1) models compete with each other to be the best fit for the time series data. The exponential smoothing model is too deterministic to fit the stochastic nature of the COVID19 pandemic. Hence we could select the ARMA (1, 1) model as the best fit and use it to forecast. The ARMA (1, 1) fit is ^x t ¼ 9876:76 þ 0:97^x t1 þ ^ε t  0:83^ε t1 for the COVID-19 data. The forecast errors xt  ^x t widen as the future period for the forecast is distant. The forecasts based on ARMA (1, 1) are shown in Figure 14.9. The point forecasts (dots in Figures 14.9 through 14.11) and the 95% prediction intervals of the number of COVID-19 deaths are done based on the chosen fit ARMA (1, 1). A few comments are worthwhile on the excessive occurrences of COVID-19 deaths. According to Figure 14.12 and

Forecasting in Healthcare Sectors

Figure 14.9 Forecast of COVID-19 deaths

Figure 14.10 Dots are original series, the solid line is point forecast, and the blue segment is interval forecast.

Figure 14.11 One-step-ahead forecast error

Figure 14.12 When were COVID-19 deaths excessive?

441 https://doi.org/10.1017/9781009212021.016 Published online by Cambridge University Press

Data-Guided Healthcare Decision Making

ARMA (1, 1), the COVID-19 deaths happened to be excessive (i.e., above its UCL) during May and June 2020 in the pandemic’s first wave and during December 2020 in the pandemic’s second wave. The excessiveness in the second wave is not picked up by the moving range method. The moving range method is a nonparametric and approximate method. Now, let us turn to pneumonia deaths for a comparison (see Figure 14.14). The average number of pneumonia deaths is 6,578 (see the line in Figure 14.12) with n ¼ 52 observations. The pattern of pneumonia deaths is similar to the trend of COVID-19 deaths. The autocorrelations are exponentially decaying except the significant one lag spikes (see Figure 14.15). The partial autocorrelations have a significant spike only at one lag (Figure 14.16). Hence, according to Table 14.1, the time series model for pneumonia deaths is autoregressive AR (1) of order one. Ljung and Box (1978) introduced a test to determine whether a set of autocorrelations is significant. Overall randomness is revealed by a portmanteau test. The p-value of the Ljung– Box test is less than 0.0001 for the pneumonia deaths in 2020. The fit of the AR (1) model is ^x t ¼ 6; 578 þ 1:05xt1 þ ^ε t , with noise variance σ^ xt ¼ 2; 916. The ARMA (1, 1) model is the best fit for the pneumonia deaths data as well as for the COVID-19 deaths data. The best-fitted model is ARMA (1, 1): ^x t ¼ 8112:23 þ 0:96xt1 þ εt  0:72εt1 . The forecast error ^ε t ¼ xt  ^x t . See Figure 14.13 for the point (dots) of the incidences, the (red) line for the point forecast, the (blue) curved lines for the 95% prediction intervals of the pneumonia deaths. It is evident that the pneumonia deaths had many similarities with the COVID-19 deaths. However, there are differences between the pneumonia and COVID-19 death trends. A few comments are worthwhile on the excessive occurrence of pneumonia deaths. According to Figure 14.17 and ARMA (1, 1), the pneumonia deaths happened to be excessive (i.e., above its UCL) during March and April 2020 in the winter’s first wave and during November and December 2020 in the winter’s second wave. The excessiveness in the second wave is not picked up by the moving range method. The moving range method is a nonparametric and approximate method.

Now, let us discuss the patterns of influenza deaths in the United States. The average number of influenza deaths is 171 (see the line in Figure 14.18) with n ¼ 52 observations. The pattern increases in the winter (first three months of the year) and then decreases. The autocorrelations are exponentially decaying except the significant four lags spikes (see Figure 14.19). The partial autocorrelations have a significant spike only at one lag (Figure 14.20). Hence, according to Table 14.1, the time series model for the influenza deaths is autoregressive AR (1) of order one. Ljung and Box’s (1978) test helps check whether the autocorrelations are significant. Overall randomness is revealed in a portmanteau test. The p-value of the Ljung–Box test is less than 0.0001 for the influenza deaths in 2020, and this means the fit is significant. The fit of AR (1) model is ^x t ¼ 171 þ 1:20xt1 þ ^ε t , with noise variance σ^ xt ¼ 231:61. The exponential smoothing model is chosen as the best fit for the influenza deaths in the United States based on its highest R-square value (see Tables 14.4 and 14.5). Recall Ft , xt , and ^ε t ¼ xt  Ft are called forecast, observation, and error in the exponential smoothing methodology. The best fit for the influenza deaths is exponential smoothing methodology because it offers the highest Rsquare. The optimal choice for the smoothing coefficient α is the one among its various choices that offers the minimum value for the error sum of squares (ErrorSS), where n n X X ^ε 2t ¼ ðxt  Ft Þ2 . By applying this criterErrorSS ¼ t¼1

t¼1

ion, we identified the optimal value for the smoothing constant (i.e., α ¼ 0:6). The exponential smoothing process works like this. Note that α ¼ 0:6; F0 ¼ 0; F1 ¼ 0:6xt ¼ 0:6ð434Þ ¼ 260:4, and ^ε 21 ¼ ðx1  F1 Þ2 ¼ α ¼ 0:6; F1 ¼ 260:4; F2 ¼ ð434  260:4Þ2 ¼ 30136:96. ^ε 22 ¼ ðx2  F2 Þ2 ¼ ð475 0:6x2 þ 0:4F1 ¼ 389:16, 389:16Þ2 ¼ 7368:50, and so on. Hence, the forecasts are done using Ftþ1 ¼ 0:6xt þ ð1  0:6ÞFt ; α ¼ 0:6; t ¼ 3; 4; 5; …:; : ðsee Figure 14:21Þ: However, there are differences among the pneumonia and COVID-19 death trends and the influenza

Table 14.4 Summary of forecasting models and their scores for pneumonia deaths

DF

Winters Method (Additive)

36

638.22

643.21

0.88

632.22

ARMA (1, 1)

49

828.97

834.82

0.91

822.97

Simple Exponential Smoothing

50

841.65

843.58

0.90

839.65

Seasonal ARIMA (1, 0, 1) (1, 0, 1)12

47

1,614.38

1,624.14

442 https://doi.org/10.1017/9781009212021.016 Published online by Cambridge University Press

AIC

SBC

R-Square

−2LogLH

Model

−2e+5

1,604.38

Forecasting in Healthcare Sectors

Figure 14.13 Forecast error

Figure 14.14 Pneumonia deaths, 2020 (Source: www.kff.org)

deaths. A few comments are worthwhile on the excessive occurrence of influenza deaths. According to Figure 14.22, the influenza deaths happened to be excessive (i.e., above its UCL) during January and May 2020 in the winter and spring’s first wave and are much less (even below the LCL) after May 2020. The low level of influenza deaths is noticed by the moving range method. The moving range method is a nonparametric and approximate method.

14.4 Summary When observations are collected periodically, the data are called time series data. When the observations are nonstochastic, the time series data are analyzed using deterministic techniques. In deterministic techniques, the trend,

seasonality, and cyclic behavior is captured and utilized to make a forecast. Another approach to analyze the time series data with the assumption that the observations are stochastic is using either the frequency-based technique or the time domainbased Box–Jenkins technique. In the frequency-based technique, the Fourier series involving the sine and/or cosine functions is utilized and it identifies amplitude and periodicity. In the Box–Jenkins technique, the principles of moving average, autoregressive average, and an integrated average all define seasonality. In all these applications, the data should have stationarity and invertibility. When the required stationarity or invertibility are not noticed in time series data, a differencing ð1  BÞd of the data values with an appropriate level for d ¼ 1; 2; …:; and the operational meaning that Bb xt ¼ xtb is adapted.

443 https://doi.org/10.1017/9781009212021.016 Published online by Cambridge University Press

Data-Guided Healthcare Decision Making

Lag 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

AutoCorr 0.8670 0.6726 0.4578 0.2519 0.0773 -0.0689 -0.1746 -0.2451 -0.2805 -0.2880 -0.2701 -0.2347 -0.1758 -0.1020 -0.0255 0.0292 0.0583 0.0617 0.0427 0.0039 -0.0446 -0.0945 -0.1398 -0.1676 -0.1741

Figure 14.15 Autocorrelations of pneumonia deaths at different lags

Lag 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Partial 0.8670 -0.3184 -0.1567 -0.0873 -0.0386 -0.0857 -0.0235 -0.0454 -0.0211 -0.0309 -0.0049 -0.0147 0.0508 0.0308 0.0145 -0.0622 -0.0354 -0.0409 -0.0390 -0.0613 -0.0371 -0.0394 -0.0383 -0.0021 0.0064

Figure 14.16 Partial autocorrelations of pneumonia deaths at different lags, 2020

444 https://doi.org/10.1017/9781009212021.016 Published online by Cambridge University Press

Forecasting in Healthcare Sectors

Figure 14.17 When were pneumonia deaths excessive?

Figure 14.18 Influenza deaths, 2020

As we try to discover a model for data, it so happens that there are competing models. One or several of them could be better than others based on objective criteria. Such criteria include the mean squared error, Akaike information criterion, or Schwartz Bayesian criterion, among others. Once a model for the given data is chosen, it is utilized in the forecast as a point estimate or an interval estimate. This process has been an immense help in many healthcare settings.

14.5 Exercises 1. Download the COVID-19 daily cases, hospitalized, recovered, deaths data (for each country) from the webpage https://ourworldindata.org/coronavirus and construct comparisons charts and their interpretations using time series concepts.

445 https://doi.org/10.1017/9781009212021.016 Published online by Cambridge University Press

Data-Guided Healthcare Decision Making

Lag 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

AutoCorr 0.9662 0.9142 0.8556 0.7847 0.7084 0.6199 0.5250 0.4332 0.3360 0.2468 0.1638 0.0909 0.0311 -0.0303 -0.0877 -0.1236 -0.1463 -0.1618 -0.1744 -0.1852 -0.1965 -0.2065 -0.2165 -0.2261 -0.2349

Figure 14.19 Autocorrelations of influenza deaths at different lags

Lag 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Partial 0.9662 -0.2918 -0.0577 -0.2011 -0.0385 -0.2240 -0.0504 0.0163 -0.1454 0.1298 -0.0474 0.1339 0.0106 -0.1594 -0.0305 0.2025 0.0374 -0.0876 -0.0483 -0.0598 -0.1681 -0.0007 -0.0516 -0.0343 0.0069

Figure 14.20 Partial correlations of influenza deaths at different lags

446 https://doi.org/10.1017/9781009212021.016 Published online by Cambridge University Press

Forecasting in Healthcare Sectors

Figure 14.21 Forecast of influenza deaths

Figure 14.22 When were influenza deaths excessive?

Table 14.5 Models of influenza deaths and their scores

Model

DF

AIC

SBC

R-Square

−2LogLH

Winters Method (Additive)

36

438.98

443.97

0.644

432.98

Simple Exponential Smoothing

50

533.40

535.33

0.96

531.40

ARMA (1, 1)

49

541.69

547.55

0.95

535.69

Seasonal ARIMA (1, 0, 1) (1, 0, 1)12

47

544.60

554.36

0.94

534.60

447 https://doi.org/10.1017/9781009212021.016 Published online by Cambridge University Press

Data-Guided Healthcare Decision Making

2. Download data on the arrival and departure times of flights in major airports in the United States and obtain time series models for their times. 3. Describe a situation in a hospital in which the time series method is appropriate. Comment on trend, seasonality, cyclicity, system’s memory, and so forth. Is a point estimate or an interval estimate appropriate for future observation? 4. Consider a scheduling physician, nurses, staff, and so forth in a hospital. What criteria would you use? What measures of effectiveness are appropriate to assess healthcare services given to patients? 5. Consider the following data from patient surveys in a hospital in Table 14.6. Analyze them using multiple linear regression, interpret them, and forecast patient satisfaction. Evaluate the significance of each predictor. Generate the residuals and plot them against the predicted values. Also check whether the residuals are Gaussian (normally) distributed. Is there an outlier? If so, how would you proceed? Would you consider transforming the data? If so, what transformations would you do and on which variables? 6. Consider the following data presented in Table 14.7 on the seasonal meteorological index and the number of days in which the ozone level exceeded 30

ppm in a US city. Fit a time series model and make a projection for the year 2021. 7. Write down the forecast equation for the following model to project the value at lead times τ ¼ 1; 2; …::; L (a) AR (1) (b) AR (2) (c) MA (1) (d) MA (2) (e) ARMA (1, 1) (f) ARMA (1,2) (g) ARMA (2, 1) (h) IMA (1, 1) (i) ARIMA (1, 1, 0) 8. Analyze the following COVID-19 data for 2020 presented in Table 14.8 as reported in Lucia, Deisboeck, and Grisolia (2020) using an appropriate time series method. 9. Find an appropriate time series model for tetanus data presented in Table 14.9 over the years 1974–2020 in India, Japan, Mexico, Nepal, Philippines, the United States, and Vietnam.

Table 14.6 Severity of illness, satisfaction of patients with their ages (Source: www.kff.org)

Patient

Age (years)

Severity of illness (0–100)

Satisfaction (0–100)

1

81

38

35

2

84

73

28

3

89

67

40

4

98

64

1

5

44

33

88

6

3

55

29

7

25

93

56

8

92

98

98

9

90

17

5

10

48

69

26

11

11

14

7

12

18

16

85

13

98

5

28

14

17

0

7

15

99

7

88

16

95

55

55

448 https://doi.org/10.1017/9781009212021.016 Published online by Cambridge University Press

Forecasting in Healthcare Sectors

Table 14.6 (cont.)

Patient

Age (years)

17

27

Severity of illness (0–100) 4

Satisfaction (0–100) 21

18

28

17

5

19

52

96

96

20

15

62

89

21

34

6

70

Table 14.7 Meteorological index over the years (Source: www.kff.org)

Year

Index

Days

Year

Index

Days

1971

90.32

97

1996

50.05

36

1972

56.19

5

1997

33.07

73

1973

25.06

82

1998

64.1

4

1974

53.48

82

1999

67.09

46

1975

84.75

8

2000

1976

67.64

2

2001

84.45

21

1977

29.32

68

2002

47.3

91

1978

67.15

45

2003

52.15

33

1979

55.41

1

2004

1980

79.22

86

2005

74.74

89

9.776

97

2006

61.22

49

0.642

1981 1982

7.298

1.601

15

56

98

2007

61.52

33

1983

65.68

4

2008

35.27

39

1984

99.98

19

2009

21.8

10

1985

32.37

45

2010

49.78

25

35

2011

86.48

23

31

2012

41.35

94

5

2013

19.27

67

1986 1987 1988

2.167 15.34 2.987

1989

17.47

54

2014

68.11

95

1990

48.56

12

2015

72.79

29

1991

20.7

88

2016

74.15

13

1992

27.98

36

2017

14.12

34

1993

71.83

65

2018

51.29

56

24

2019

16.55

14

42

2020

14.9

35

1994 1995

8.153 72.05

449 https://doi.org/10.1017/9781009212021.016 Published online by Cambridge University Press

Data-Guided Healthcare Decision Making

Table 14.8 Number of COVID-19 cases earlier in the pandemic, 2020 (Source: www.kff.org)

Date

China

March 2

205

Italy 561

Spain 17

USA 20

March 3

127

347

31

14

March 4

119

466

37

22

March 5

117

587

49

34

March 6

170

769

61

74

March 7

101

778

113

105

March 8

46

1,247

56

95

March 9

45

1,492

159

121

March 10

20

1,797

615

200

March 11

29

977

435

271

March 12

24

2,313

501

287

March 13

22

2,651

864

351

March 14

19

2,547

1,227

511

March 15

22

3,497

1,522

777

March 16

25

2,823

2,000

823

March 17

43

4,000

1,438

887

March 18

23

3,526

1,987

1,766

March 19

44

4,207

2,538

2,988

March 20

99

5,322

3,431

4,835

March 21

52

5,986

2,833

5,374

March 22

65

6,557

4,946

7,123

March 23

138

5,560

3,646

8,459

March 24

69

4,789

4,517

11,236

March 25

78

5,249

6,584

8,789

March 26

102

5,210

7,937

13,963

March 27

94

6,153

8,578

16,797

March 28

119

5,959

7,871

18,695

March 29

113

5,974

8,189

19,979

March 30

98

5,217

6,549

18,360

March 31

84

4,050

6,398

21,595

April 1

54

4,053

9,222

24,998

April 2

100

4,782

7,719

27,103

Apri 3l

70

4,668

8,102

28,819

April 4

62

4,585

7,472

32,425

April 5

48

4,805

7,026

34,272

April 6

67

4,316

6,023

25,398

April 7

56

3,599

4,273

30,561

450 https://doi.org/10.1017/9781009212021.016 Published online by Cambridge University Press

Forecasting in Healthcare Sectors

Table 14.9 Number of tetanus cases in India, Japan, Mexico, Nepal, Philippines, the United States, and Vietnam, 1974–2020 (Source: www.kff.org)

Country

India

Japan

Mexico

Nepal

Philippines

United States

Vietnam

2020

1,361

105

15

336

620

26

361

2019

7,071

126

35

546

953

0

400

2018

7,000

133

23

485

943

0

401

2017

4,946

125

27

880

1,057

33

433

2016

3,781

129

35

766

1,082

34

258

2015

2,268

120

27

888

880

30

360

2014

5,017

126

26

883

839

25

244

2013

2,814

127

20

377

1,069

26

306

2012

2,404

116

28

359

37

253

2011

2,843

111

23

193

1,537

37

186

2010

1,756

104

43

547

1,140

26

196

2009

2,126

113

39

276

1,022

18

247

2008

2,959

123

45

308

813

19

221

2007

7,491

45

155

1,261

28

116

2006

2,815

54

240

1,232

41

57

2005

2,981

71

112

922

27

85

2004

3,883

69

68

104

1,293

34

72

2003

4,020

69

82

114

1,151

20

119

2002

12,197

80

61

241

1,165

25

151

2001

5,764

91

101

440

1,479

38

177

2000

8,997

90

103

305

1,135

35

267

1999

2,125

93

349

1,439

32

685

1998

6,705

47

148

556

648

34

401

1997

7,323

169

514

1,627

42

257

229

632 1,443

34

464

1996

117

44

1995

195

1994

32,551

1993

15,354

1992

11,268

1991

15,036

34

340

1990

23,356

47

381

1989

24,774

42

285

1988

24,343

53

430

33

399

260

4

1,998

51

553

139

79

2,116

48

479

170

48

710

45

338

1,237

57

893

2,286

64

628

3,102

53

1,703

3,199

53

2,158

35 185

1987

31,926

50

311

149

2,910

48

1,581

1986

32,453

62

359

235

3,381

64

1,532

1985

37,647

43

436

246

3,474

83

1,658

1984

31,366

42

399

3,353

74

1,371

1983

34,442

56

339

2,912

91

1,351

1982

41,726

36

307

174

2,441

88

1,371

1981

41,161

41

359

102

2,242

72

1,324

1980

45,948

50

363

116

3,080

95

1,948

1979

42,378

59

383

125

2,905

81

1,695

451 https://doi.org/10.1017/9781009212021.016 Published online by Cambridge University Press

Data-Guided Healthcare Decision Making

Table 14.9 (cont.)

Country

India

1978

43,048

1977

36,636

1976

Japan

Mexico

Nepal

Philippines

United States

Vietnam

74

439

110

4,236

86

72

490

256

5,539

87

2,231

72,965

90

391

73

4,745

75

1,521

1975

76,066

103

532

4,575

102

182

1974

47,051

155

465

4,576

101

755

2,334

Table 14.10 Number of deaths due to COVID-19, pneumonia, and influenza, 2019–2020 (Source: www.kff.org)

Start date

End date

COVID-19 deaths

12/29/2019

1/4/2020

0

4,111

434

1/5/2020

1/11/2020

1

4,153

475

1/12/2020

1/18/2020

2

4,066

467

1/19/2020

1/25/2020

2

3,915

500

1/26/2020

2/1/2020

0

3,818

481

2/2/2020

2/8/2020

2

3,823

521

2/9/2020

2/15/2020

2

3,845

561

2/16/2020

2/22/2020

6

3,719

565

2/23/2020

2/29/2020

9

3,842

656

3/1/2020

3/7/2020

37

3,977

635

3/8/2020

3/14/2020

60

3,974

628

3/15/2020

3/21/2020

588

4,553

559

3/22/2020

3/28/2020

3,214

6,186

446

3/29/2020

4/4/2020

10,127

9,934

478

4/5/2020

4/11/2020

16,317

12,016

474

4/12/2020

4/18/2020

17,200

11,419

264

4/19/2020

4/25/2020

15,550

10,388

143

4/26/2020

5/2/2020

13,215

8,952

66

5/3/2020

5/9/2020

11,231

7,836

48

5/10/2020

5/16/2020

9,234

6,791

22

5/17/2020

5/23/2020

7,248

5,910

24

5/24/2020

5/30/2020

6,170

5,283

12

5/31/2020

6/6/2020

5,056

4,906

11

6/7/2020

6/13/2020

4,230

4,546

11

6/14/2020

6/20/2020

3,847

4,371

8

6/21/2020

6/27/2020

3,841

4,279

12

6/28/2020

7/4/2020

4,551

4,575

5

7/5/2020

7/11/2020

5,785

5,551

10

7/12/2020

7/18/2020

7,192

6,199

13

7/19/2020

7/25/2020

8,242

6,777

11

7/26/2020

8/1/2020

8,302

6,855

13

452 https://doi.org/10.1017/9781009212021.016 Published online by Cambridge University Press

Pneumonia deaths

Influenza deaths

Forecasting in Healthcare Sectors

Table 14.10 (cont.)

Start date

End date

8/2/2020

8/8/2020

COVID-19 deaths 7,867

6,822

10

8/9/2020

8/15/2020

7,261

6,520

5

8/16/2020

8/22/2020

6,383

5,978

12

8/23/2020

8/29/2020

5,747

5,555

12

8/30/2020

9/5/2020

5,015

5,220

9

9/6/2020

9/12/2020

4,626

4,947

7

9/13/2020

9/19/2020

4,276

4,780

5

9/20/2020

9/26/2020

4,299

4,947

4

9/27/2020

10/3/2020

4,241

4,833

8

10/4/2020

10/10/2020

4,817

5,189

13

10/11/2020

10/17/2020

5,198

5,222

17

10/18/2020

10/24/2020

5,995

5,652

15

10/25/2020

10/31/2020

7,022

6,158

21

11/1/2020

11/7/2020

8,756

7,095

21

11/8/2020

11/14/2020

10,647

8,068

20

11/15/2020

11/21/2020

13,356

9,429

30

11/22/2020

11/28/2020

15,619

10,472

27

11/29/2020

12/5/2020

18,560

12,094

34

12/6/2020

12/12/2020

20,921

13,289

30

12/13/2020

12/19/2020

22,322

14,317

32

12/20/2020

12/26/2020

23,361

14,908

32

12/27/2020

1/2/2021

24,830

16,060

44

10. Obtain a time series model using the data presented in Table 14.10 and make a forecast for each of the three types of deaths in 2020. 11. Using the data presented in Table 14.11, obtain a time series model and make a forecast for each of the three types of deaths in 2021. 12. Using the data presented in Table 14.12, compare time series models for COVID-19 deaths in Saudi Arabia to confirm whether they predict the confidence intervals as shown. 13. Using the data presented in Table 14.13, identify a time series model for the number of weekly deaths due to COVID-19, pneumonia, and influenza in New York City over the period January through December 2020.

Pneumonia deaths

Influenza deaths

Make a forecast with the selected model and compare its performance with that of a deterministic model.

Selected References Alzahrani, S. I., Aljamaan, I. A., & Al-Fakih, E. A. (2020). Forecasting the spread of the COVID-19 pandemic in Saudi Arabia using ARIMA prediction model under current public health interventions. Journal of Infection and Public Health, 13(7), 914–919. Anjum, N., Asif, A., Kiran, M. et al. (2021). Intelligent COVID-19 forecasting, diagnoses and monitoring systems: a survey. IEEE Communications Surveys & Tutorials, 14(8), 1–20. Arimie, C. O., Biu, E. O., & Ijomah, M. A. (2018). Forecasting diagnostic imaging utilization rate for effective healthcare delivery. African Journal of Economic and Sustainable Development, 7(1), 73–87.

453 https://doi.org/10.1017/9781009212021.016 Published online by Cambridge University Press

Data-Guided Healthcare Decision Making

Table 14.11 Number of deaths due to COVID-19, pneumonia, and influenza, 2021 (Source: www.kff.org)

Start date

End date

COVID-19 deaths

Pneumonia deaths

Influenza deaths

1/3/2021

1/9/2021

25,959

16,920

32

1/10/2021

1/16/2021

25,677

16,964

35

1/17/2021

1/23/2021

23,641

15,838

31

1/24/2021

1/30/2021

20,279

13,880

34

1/31/2021

2/6/2021

16,976

12,291

28

2/7/2021

2/13/2021

13,473

10,230

24

2/14/2021

2/20/2021

10,792

8,975

15

2/21/2021

2/27/2021

8,622

7,578

18

2/28/2021

3/6/2021

6,696

6,578

26

3/7/2021

3/13/2021

5,689

5,763

13

3/14/2021

3/20/2021

4,876

5,311

7

3/21/2021

3/27/2021

4,436

5,096

17

3/28/2021

4/3/2021

4,171

4,768

18

4/4/2021

4/10/2021

4,280

4,958

12

4/11/2021

4/17/2021

4,421

4,943

8

4/18/2021

4/24/2021

4,556

5,109

15

4/25/2021

5/1/2021

4,133

4,867

5

5/2/2021

5/8/2021

3,955

4,706

9

5/9/2021

5/15/2021

3,642

4,478

10

5/16/2021

5/22/2021

3,191

4,292

9

5/23/2021

5/29/2021

2,739

3,929

4

5/30/2021

6/5/2021

2,319

3,805

8

6/6/2021

6/12/2021

2,014

3,675

6

6/13/2021

6/19/2021

1,746

3,597

11

6/20/2021

6/26/2021

1,600

3,401

8

6/27/2021

7/3/2021

1,503

3,274

7

7/4/2021

7/10/2021

1,604

3,434

3

7/11/2021

7/17/2021

1,934

3,676

8

7/18/2021

7/24/2021

2,686

4,036

9

7/25/2021

7/31/2021

3,971

4,876

11

8/1/2021

8/7/2021

6,234

6,242

12

8/8/2021

8/14/2021

8,958

8,126

9

8/15/2021

8/21/2021

11,537

9,686

11

8/22/2021

8/28/2021

13,181

10,753

14

8/29/2021

9/4/2021

13,814

11,118

26

9/5/2021

9/11/2021

12,828

10,408

13

9/12/2021

9/18/2021

10,937

8,787

9

9/19/2021

9/25/2021

7,142

5,766

7

9/26/2021

10/2/2021

2,374

1,936

4

454 https://doi.org/10.1017/9781009212021.016 Published online by Cambridge University Press

Forecasting in Healthcare Sectors

Table 14.12 Actual versus predicted COVID-19 cases in Saudi Arabia. (Source: www.kff.org)

Date

ARIMA

ARMA

AR

MA

04–06–2020

Actual 203

199 (98, 108)

210 (143, 277)

245 (170, 320)

187 (94, 280)

04–07–2020

327

340 (328, 352)

302 (235, 369)

399 (324, 474)

289 (196, 382)

04–08–2020

355

348 (336, 360)

338 (271, 405)

439 (364, 514)

310 (217, 403)

04–09–2020

364

360 (348, 372)

356 (289, 423)

505 (430, 580)

403 (310, 496)

04–10–2020

382

390 (378, 402)

377 (310, 444)

585 (510, 660)

467 (374, 560)

04–11–2020

429

420 (408, 432)

442 (375, 509)

604 (529, 679)

579 (486, 672)

04–12–2020

472

466 (454, 478)

460 (393, 527)

678 (603, 753)

688 (595, 781)

04–13–2020

435

442 (430, 454)

461 (394, 528)

689 (614, 764)

735 (642, 828)

04–14–2020

493

487 (475, 499)

473 (406, 540)

783 (708, 858)

895 (802, 988)

04–15–2020

518

525 (513, 537)

509 (406, 576)

842 (767, 917)

981 (888, 1074)

04–16–2020

762

758 (770, 746)

689 (756, 622)

936 (1011, 861)

944 (1037, 851)

04–17–2020

1132

1099 (1087, 1111)

899 (832, 966)

1054 (979, 1129)

1256 (1163, 1349)

04–18–2020

1088

1112 (1100, 1124)

892 (825, 959)

1259 (1184, 1334)

1355 (1262, 1448)

04–19–2020

1122

1179 (1167, 1191)

1322 (1255, 1389)

1056 (981, 1131)

1216 (1123, 1309)

04–20–2020

1147

1182 (1170, 1194)

1403 (1336, 1470)

1021 (946, 1096)

1589 (1496, 1682)

The predicted confidence intervals are within the brackets (Alzahrani, Aljamaan, and Al-Fakih, 2020)

Table 14.13 Weekly deaths due to COVID-19, pneumonia, and influenza in New York City, January–December 2020 (Source: www.kff.org)

Week ending date

COVID-19 deaths

Pneumonia deaths

Influenza deaths

1/4/2020

0

83

16

1/11/2020

0

97

1/18/2020

0

98

1/25/2020

0

106

12

2/1/2020

0

95

20

2/8/2020

0

101

18

18

455 https://doi.org/10.1017/9781009212021.016 Published online by Cambridge University Press

Data-Guided Healthcare Decision Making

Table 14.13 (cont.)

Week ending date

COVID-19 deaths

Pneumonia deaths

Influenza deaths

2/15/2020

0

91

16

2/22/2020

0

78

10

2/29/2020

0

84

3/7/2020

0

94

3/14/2020 3/21/2020 3/28/2020

105 125

157

14

982

607

47

4/4/2020

3,319

1,783

221

4/11/2020

4,867

2,146

329

4/18/2020

3,912

1,565

174

4/25/2020

2,627

1,153

77

5/2/2020

1,724

757

11

5/9/2020

1,051

481

5/16/2020

618

308

5/23/2020

417

203

5/30/2020

284

149

0

6/6/2020

196

130

0

6/13/2020

141

110

6/20/2020

90

94

6/27/2020

75

71

7/4/2020

53

67

7/11/2020

43

74

7/18/2020

53

75

0

7/25/2020

46

78

0

8/1/2020

35

79

8/8/2020

24

66

0

8/15/2020

29

61

0

8/22/2020

24

65

0

8/29/2020

19

56

9/5/2020

23

68

0

9/12/2020

18

69

0

9/19/2020

26

65

0

9/26/2020

20

72

0

10/3/2020

35

68

0

10/10/2020

24

60

10/17/2020

40

63

10/24/2020

28

66

456 https://doi.org/10.1017/9781009212021.016 Published online by Cambridge University Press

0

0

0

Forecasting in Healthcare Sectors

Table 14.13 (cont.)

Week ending date

COVID-19 deaths

Pneumonia deaths

Influenza deaths

10/31/2020

41

78

0

11/7/2020

70

106

0

11/14/2020

55

83

0

11/21/2020

68

95

0

11/28/2020

74

93

0

12/5/2020

133

100

0

12/12/2020

171

124

12/19/2020

224

139

12/26/2020

267

160

0

Chen, L.-P., Zhang, Q., Yi, G. Y., & He, W. (2021). Model-based forecasting for Canadian COVID-19 data. PLoS One, 16(1): e0244536. https://doi.org/10.1371/journal.pone.0244536.

Jones, S. S., Thomas, A., Evans, R. S. et al. (2008). Forecasting daily patient volumes in the emergency department. Academic Emergency Medicine, 15(2), 159–170.

Cheng, Q., Argon, N. T., Evans, C. S. et al. (2021). Forecasting emergency department hourly occupancy using time series analysis. American Journal of Emergency Medicine, 48, 177–182.

Konarasinghe, K. M. U. B. (2021). Forecasting wave-like patterns of COVID-19 daily infected cases in Iran. Journal of New Frontiers in Healthcare and Biological Sciences, 2(1), 39–56.

Choudhury, A. (2019). Hourly forecasting of emergency department arrivals: time series analysis. SSRN Electronic Journal, 26. arXiv preprint arXiv:1901.02714.

Kumar, N., & Susan, S. (2020). COVID-19 pandemic prediction using time series forecasting models. In 11th International Conference on Computing, Communication and Networking Technologies (ICCCNT), Kharagpur, India (pp. 1–7). Piscataway, NJ: Institute of Electrical and Electronics Engineers.

Cruz-Cano, R., Ma, T., Yu, Y., Lee, M., & Liu, H. (2021). Forecasting COVID-19 cases based on social distancing in Maryland, USA: a time-series approach. Disaster Medicine and Public Health Preparedness, 1–4. https://doi.org/10.1017/ dmp.2021.153. Dehesh, T., Mardani-Fard, H. A., & Dehesh, P. (2020). Forecasting of COVID-19 confirmed cases in different countries with ARIMA models. MedRxiv. Duarte, D., & Faerman, J. (2019). Comparison of time series prediction of healthcare emergency department indicators with ARIMA and Prophet. In 9th International Conference on Computer Science, Engineering and Applications (ICCSEA 2019), edited by N. Meghanathan & D. Nagamalai (pp. 123–133). Dubai: International Conference on Computer Science, Engineering and Applications. Earnest, A., Evans, S. M., Sampurno, F., & Millar, J. (2019). Forecasting annual incidence and mortality rate for prostate cancer in Australia until 2022 using autoregressive integrated moving average (ARIMA) models. BMJ Open, 9(8), e031331.

Kumar, S., & Viral, R. (2021). Effect, challenges, and forecasting of COVID-19 situation in India using an ARMA model. IEEE Transactions on Computational Social Systems, 8(4), 955–963. Ljung, G. M., & Box, G. E. (1978). On a measure of lack of fit in time series models. Biometrika, 65(2), 297–303. Lucia, U., Deisboeck, T. S., & Grisolia, G. (2020). Entropy-based pandemics forecasting. Frontiers in Physics, 8, 274. https://doi.org/ 10.3389/fphy.2020.00274. Luo, J. (2021). Forecasting COVID-19 pandemic: unknown unknowns and predictive monitoring. Journal of Technological Forecasting and Social Change, 166, 120602– 120605. Nyoni, S. P., & Nyoni, M. T. (2020). Forecasting the number of outpatient visits at Silobela district hospital in Zimbabwe using artificial neural networks. EPRA International Journal of Multidisciplinary Research. 6(2), 1–10. https://doi.org/10.36713/ epra2013.

Goic, M., Bozanic-Leal, M. S., Badal, M., & Basso, L. J. (2021). COVID-19: short-term forecast of ICU beds in times of crisis. Plos One, 16(1), e0245272. https://doi.org/10.1371/journal .pone.0245272.

Rais, A., & Viana, A. (2011). Operations research in healthcare: a survey. International Transactions in Operational Research, 18(1), 1–31.

Ismail, L., Materwala, H., Znati, T., Turaev, S., & Khan, M. A. (2020). Tailoring time series models for forecasting coronavirus spread: case studies of 187 countries. Computational and Structural Biotechnology Journal, 18, 2972–3206.

Ramgopal, S., Pelletier, J. H., Rakkar, J., & Horvat, C. M. (2021). Forecast modeling to identify changes in pediatric emergency department utilization during the COVID-19 pandemic. American Journal of Emergency Medicine, 49, 142–147.

457 https://doi.org/10.1017/9781009212021.016 Published online by Cambridge University Press

Data-Guided Healthcare Decision Making

Rocha, C. N., & Rodrigues, F. (2021). Forecasting emergency department admissions. Journal of Intelligent Information Systems, 56(3), 509–528. Rostami-Tabar, B., & Rendon-Sanchez, J. F. (2021). Forecasting COVID-19 daily cases using phone call data. Applied Soft Computing, 100, 106932. Sahai, A. K., Rath, N., Sood, V., & Singh, M. P. (2020). ARIMA modelling & forecasting of COVID-19 in top five affected countries. Diabetes & Metabolic Syndrome: Clinical Research & Reviews, 14(5), 1419–1427. Shanmugam, R., & Nevalainen, K. (1988). Analyzing environmental time series: a review and Finnish case study. Statistical Journal of the United Nations Economic Commission for Europe, 5(3), 315–338.

458 https://doi.org/10.1017/9781009212021.016 Published online by Cambridge University Press

Talukder, M. S., Sorwar, G., Bao, Y., Ahmed, J., & Palash, M. A. S. (2020). Predicting antecedents of wearable healthcare technology acceptance by elderly: a combined SEM-neural network approach. Technological Forecasting and Social Change, 150, 119793. https://doi.org/10.1016/j.techfore.2019.119793. Winters, P. R. (1960). Forecasting sales by exponentially weighted moving averages. Management Science, 6(3), 324–342. Zeroual, A., Harrou, F., Dairi, A., & Sun, Y. (2020). Deep learning methods for forecasting COVID-19 time-Series data: a comparative study. Chaos, Solitons & Fractals, 140, 110121. https://doi.org/10.1 016/j.chaos.2020.110121. Zhao, X., Li, C., Ding, G. et al. (2021). The burden of Alzheimer’s disease mortality in the United States, 1999–2018. Journal of Alzheimer’s Disease, 82(2), 803–813.

Epilogue

The aim of this epilogue is to enable readers to: • Summarize, apply, and reap benefits from the main approaches to healthcare decision-making covered in this book. • Comprehend how the contents of this book assist healthcare providers in practicing the decision-making concepts that are necessary to reach optimal decisions in situations in which resources are limited while expectations are high.

15.1 Motivation for This Epilogue This epilogue recaps the major approaches to healthcare decision-making covered in this book, whether those decisions are reached by patients, insurance agents, government agents including policy makers, or healthcare professionals including hospital administrators.

15.2 The Concept of an Epilogue An epilogue is a return to the themes mentioned in the introduction and later chapters of a book. The term ethics originates from the Greek word ethicos or ethos, which means a custom or habit. History suggests Roman philosophers introduced the term moralis as the Latin equivalent of ethicos. A distinction is often made between values, ethics, and morality. Values are core beliefs that direct attitudes and actions. Ethics refers to standards that govern individuals’ moral behavior. Morality denotes judgments. Ensuring an ethical decision-making environment is a challenge. Bloomberg’s system of ethics recognizes four characters who undertake the decision-making process. A conformist follows the rules of the system without question. A negotiator changes the rules as needed so as to improve the system. A navigator frames the decisions, is an effective leader whom the other members of the decision-making team can trust, and is the first to leave if he or she discovers unethical behavior. A wiggler is generally guided by self-interest. Sometimes, approaches to decision-making are hazy for healthcare professionals because of the symbols, notations, and mathematical operations involved. For the sake

of easy readability, this book has summarized them in three ways – a glossary, a list of notations, and lists of the tables and figures presented throughout this volume. In essence, this book contains 14 chapters covering a wide range of healthcare decision-making topics including but not limited to why and how decisions are made; the importance of data-guided decisions; the convenience offered by software programs like Excel, Microsoft Math Solver, and JASP; how to collect authentic data; resolving uncertainties; the integral role of models; decision trees; strengthening decisions by involving all relevant groups; learning from the root causes of adversities; attaining optimal healthcare decisions; diminishing risk; and methods of evaluating existing healthcare programs. Chapter 1 articulated the importance of formulating a functional team of decision makers and analysts. Chapter 2 presented five litmus tests for reaching the best possible healthcare decisions along with the basic concepts of models and values . The chapter explained that the analytical decision-making process is a joint venture between analysts and decision makers, and listed the advantages of data mining in refining the decisionmaking process. Standardization of data was emphasized. The Zika virus data was explained with interpretations. Chapter 3 featured a brief tutorial in the Microsoft Excel, Microsoft Math Solver, and JASP software programs. Learning the core ideas outlined in this chapter does not require readers to access the software, but practicing these concepts does. In our advanced computing age of the twenty-first century, software costs quite a large amount. These three free software programs were highlighted because every healthcare professional has Excel as a part of the Microsoft Office platform. Microsoft Math Solver and JASP are in the public domain. Chapter 4 emphasized the importance of collecting authentic data so as to reach optimal healthcare decisions. The chapter introduced the randomized response technique as a strategy to obtain data when responders are hesitant. The advantages of the online survey were presented. The chapter also illustrated methods of data collection, sampling techniques, randomized response techniques, the probability of accepting a false research hypothesis, the probability of

459 https://doi.org/10.1017/9781009212021.017 Published online by Cambridge University Press

Data-Guided Healthcare Decision Making

not accepting the true research hypothesis, and sample size selection. Chapter 5 addressed uncertainties in healthcare decision-making. The basic concepts of analyzing marginal and conditional probability via Venn diagrams were exposed. The total probability, probability of simultaneous outcomes, and the Bayes theorem were introduced and their relevance in the healthcare setting was explained. The relationship between odds and probability was explored. Chapter 6 articulated the importance of models and values in the process of healthcare decision-making. The impression that numbers are misleading is wrong. The concern, rather, is how professionals misinterpret numbers. The chapter presented four types of data – nominal, ordinal, interval, and ratio. Unless the data type is characterized, the analyst cannot configure a method to extract the evidence from the data. The chapter introduced various models and explained the situations in which they are appropriate. Chapter 7 concentrated on how healthcare decision trees emerge and function and listed six steps to developing such decision trees. The advantages and disadvantages of decision trees were explicated with examples. Chapter 8 studied how group decisions are reached in healthcare settings and explicated the advantages and difficulties of group decision-making. Shared decision-making was highlighted as a special case in the group decision-making process. The principles of three major decision-making approaches – nominal, Delphi, and integrated – were explained with illustrations. Chapter 9 discussed tracing and remedying the root causes of adversities. Root-cause analysis is an opportunity to discover the factors that contribute to errors. The chapter gave 12 steps to prevent adversity and improve patient safety, along with three types of root-cause analysis – divergent, serial, and convergent. Chapter 10 illustrated cost-effective healthcare decisionmaking in a hospital/clinic. The aim of a cost-effectiveness analysis is to maximize the level of benefits – health effects – relative to the available resources. To perform a costeffectiveness analysis, decision makers can use the decision tree concept. Chapter 11 stressed the importance of performing risk analysis before making healthcare decisions. The chapter enumerated, with examples, steps for conducting a focused risk and/or cost-effectiveness analysis using Excel, Microsoft Math Solver, and JASP. The chapter introduced SWOT analysis and various similarity measures and illustrated them with examples from the healthcare setting. Chapter 12 presented the concepts of program evaluation, sensitivity analysis, and authoring reports. The accreditation of programs is instrumental to terminate unpopular programs and to open up new, innovative services. Healthcare policy

460 https://doi.org/10.1017/9781009212021.017 Published online by Cambridge University Press

makers benefit from the findings of the program evaluation, and six sigma tools could help in the evaluation process. Chapter 13 is not seen in comparable books on decision-making. Yet the six sigma concept is quite useful in analyzing healthcare data. An associated concept is lean management, which advances the quality of healthcare services in hospitals, clinics, insurance industries, and state or federal agencies. Last but not least important is Chapter 14, which illustrated the time series concept and methods of forecasting. The chapter also pointed out the advantages of forecasting revenue and expenses.

References Chattamvelli, R., & Shanmugam, R. (2019). Generating Functions in Engineering and the Applied Sciences. Synthesis Lectures on Engineering. Williston, VT: Morgan & Claypool. Chattamvelli, R., & Shanmugam, R. (2020). Discrete Distributions in Engineering and the Applied Sciences. Williston, VT: Morgan & Claypool. Chattamvelli, R., & Shanmugam, R. (2021). Continuous distributions in engineering and the applied sciences: Part I. Synthesis Lectures on Mathematics and Statistics, 13(2), 1–173. Dolezel, D., Shanmugam, R., & Morrison, E. E. (2018). Are college students health literate? Journal of American College Health, 68(3), 242–249. https://doi.org/10.1080/07448481.2018.1539001. Marchau, V. A. W. J., Walker, W. E., Bloemen, P. J. T. M, & Popper, S. W. (2019). Decision Making under Deep Uncertainty: From Theory to Practice. New York: Springer. Milkman, K. L., Chugh, D., & Bazerman, M. H. (2009). How can decision making be improved? Perspectives on Psychological Science, 4(4), 379–383. Provost, F., & Fawcett, T. (2013). Data science and its relationship to big data and data-driven decision making. Big Data, 1(1), 51–59. Shanmugam, R. (2014a). Data envelopment analysis for operational efficiency. In Encyclopedia of Business Analytics and Optimization (Vol. 2), edited by J. Wang (pp. 18–28). New York: IGI Global. Shanmugam, R. (2014b). Data guided public healthcare decision making. In Encyclopedia of Business Analytics and Optimization (Vol. 2), edited by J. Wang (pp. 30–43). New York: IGI Global. Shanmugam, R. (2014). Stochastic frontier analysis and cancer survivability. In Encyclopedia of Business Analytics and Optimization (Vol. 5), edited by J. Wang (pp. 18–26). New York: IGI Global. Shanmugam, R. (2016). Data guided unraveling of mysteries in Zika virus incidences. Kenkyu Journal of Epidemiology & Community Medicine, SI 20161(100101), 1–12. Shanmugam, R., & Chattamvelli, R. (2015). Statistics for Scientists and Engineers. Hoboken, NJ: John Wiley Inter-Science. Young, A., & Kaffenberger, C. (2013). Making data work: a process for conducting action research. Journal of School Counseling, 11(2), n2.

Appendix Statistical Tables

This appendix prepares readers to read and interpret: • Area under the standard normal curve. • Percentile and area in the tail under a chi-squared table. • Converting the correlation coefficient r to the Fisher’s z score. • Critical values of the correlation coefficient with specified degrees of freedom. • Percentile and the tail area under a t distribution with specified degrees of freedom. • Percentile and area in the tail under an F distribution with numerator and denominator degrees of freedom.

Area under Standard Normal Curve z

0.00

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0.0

0.500

0.504

0.508

0.512

0.516

0.519

0.523

0.527

0.531

0.535

0.1

0.539

0.543

0.547

0.551

0.555

0.559

0.563

0.567

0.571

0.575

0.2

0.579

0.583

0.587

0.591

0.594

0.598

0.602

0.606

0.610

0.614

0.3

0.617

0.621

0.625

0.629

0.633

0.636

0.640

0.644

0.648

0.651

0.4

0.655

0.659

0.662

0.666

0.670

0.673

0.677

0.680

0.684

0.687

0.5

0.691

0.695

0.698

0.701

0.705

0.708

0.712

0.715

0.719

0.722

0.6

0.725

0.729

0.732

0.735

0.7389

0.742

0.745

0.748

0.751

0.754

0.7

0.758

0.761

0.764

0.767

0.7704

0.773

0.776

0.779

0.782

0.785

0.8

0.788

0.791

0.793

0.796

0.7995

0.802

0.805

0.807

0.810

0.813

0.9

0.815

0.818

0.821

0.823

0.8264

0.828

0.831

0.834

0.836

0.838

1.0

0.841

0.843

0.846

0.848

0.8508

0.853

0.855

0.857

0.859

0.862

1.1

0.864

0.866

0.868

0.870

0.8729

0.874

0.877

0.879

0.881

0.883

1.2

0.884

0.886

0.888

0.890

0.8925

0.894

0.896

0.898

0.899

0.901

1.3

0.903

0.904

0.906

0.908

0.9099

0.911

0.913

0.914

0.916

0.917

1.4

0.919

0.920

0.922

0.923

0.9251

0.926

0.927

0.929

0.930

0.931

1.5

0.933

0.934

0.935

0.937

0.9382

0.939

0.940

0.941

0.942

0.944

1.6

0.945

0.946

0.947

0.948

0.9495

0.950

0.951

0.952

0.953

0.954

1.7

0.955

0.956

0.957

0.958

0.9591

0.959

0.960

0.961

0.962

0.963

1.8

0.964

0.964

0.965

0.966

0.9671

0.967

0.968

0.969

0.969

0.970

1.9

0.971

0.971

0.972

0.973

0.9738

0.974

0.975

0.975

0.976

0.976

461 https://doi.org/10.1017/9781009212021.018 Published online by Cambridge University Press

Appendix

(cont.)

z

0.00

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

2.0

0.977

0.977

0.978

0.978

0.9793

0.979

0.980

0.980

0.981

0.981

2.1

0.982

0.982

0.983

0.983

0.9838

0.984

0.984

0.985

0.985

0.985

2.2

0.986

0.986

0.986

0.987

0.9875

0.987

0.988

0.988

0.988

0.989

2.3

0.989

0.989

0.989

0.990

0.9904

0.990

0.990

0.991

0.991

0.991

2.4

0.991

0.992

0.992

0.992

0.9927

0.992

0.993

0.993

0.993

0.993

2.5

0.993

0.994

0.994

0.994

0.9945

0.994

0.994

0.994

0.995

0.995

2.6

0.995

0.995

0.995

0.995

0.9959

0.996

0.996

0.996

0.996

0.996

2.7

0.996

0.996

0.996

0.996

0.9969

0.997

0.997

0.997

0.997

0.997

2.8

0.997

0.997

0.997

0.997

0.9977

0.997

0.997

0.997

0.998

0.998

2.9

0.998

0.998

0.998

0.998

0.9984

0.998

0.998

0.998

0.998

0.998

3.0

0.998

0.998

0.998

0.998

0.9988

0.998

0.998

0.998

0.999

0.999

3.1

0.999

0.999

0.999

0.999

0.9992

0.999

0.999

0.999

0.999

0.999

3.2

0.999

0.999

0.999

0.999

0.9994

0.999

0.999

0.999

0.999

0.999

3.3

0.999

0.999

0.999

0.999

0.999

0.999

0.999

0.999

0.999

0.999

3.4

0.999

0.999

0.999

0.999

0.999

0.999

0.999

0.999

0.999

0.999

z

Area

3.50

0.99976737

4.00

0.99996833

4.50

0.99999660

5.00

0.99999971

z

0.00

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

−3.4

0.0003

0.0003

0.0003

0.0003

0.0003

0.0003

0.0003

0.0003

0.0003

0.0002

−3.3

0.0005

0.0005

0.0005

0.0004

0.0004

0.0004

0.0004

0.0004

0.0004

0.0003

−3.2

0.0007

0.0007

0.0006

0.0006

0.0006

0.0006

0.0006

0.0005

0.0005

0.0005

−3.1

0.0010

0.0009

0.0009

0.0009

0.0008

0.0008

0.0008

0.0008

0.0007

0.0007

−3.0

0.0013

0.0013

0.0013

0.0012

0.0012

0.0011

0.0011

0.0011

0.0010

0.0010

−2.9

0.0019

0.0018

0.0018

0.0017

0.0016

0.0016

0.0015

0.0015

0.0014

0.0014

−2.8

0.0026

0.0025

0.0024

0.0023

0.0023

0.0022

0.0021

0.0021

0.0020

0.0019

−2.7

0.0035

0.0034

0.0033

0.0032

0.0031

0.0030

0.0029

0.0028

0.0027

0.0026

−2.6

0.0047

0.0045

0.0044

0.0043

0.0041

0.0040

0.0039

0.0038

0.0037

0.0036

−2.5

0.0062

0.0060

0.0059

0.0057

0.0055

0.0054

0.0052

0.0051

0.0049

0.0048

−2.4

0.0082

0.0080

0.0078

0.0075

0.0073

0.0071

0.0069

0.0068

0.0066

0.0064

−2.3

0.0107

0.0104

0.0102

0.0099

0.0096

0.0094

0.0091

0.0089

0.0087

0.0084

−2.2

0.0139

0.0136

0.0132

0.0129

0.0125

0.0122

0.0119

0.0116

0.0113

0.0110

−2.1

0.0179

0.0174

0.0170

0.0166

0.0162

0.0158

0.0154

0.0150

0.0146

0.0143

−2.0

0.0228

0.0222

0.0217

0.0212

0.0207

0.0202

0.0197

0.0192

0.0188

0.0183

−1.9

0.0287

0.0281

0.0274

0.0268

0.0262

0.0256

0.0250

0.0244

0.0239

0.0233

−1.8

0.0359

0.0351

0.0344

0.0336

0.0329

0.0322

0.0314

0.0307

0.0301

0.0294

−1.7

0.0446

0.0436

0.0427

0.0418

0.0409

0.0401

0.0392

0.0384

0.0375

0.0367

462 https://doi.org/10.1017/9781009212021.018 Published online by Cambridge University Press

Appendix

(cont.)

z

0.00

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

−1.6

0.0548

0.0537

0.0526

0.0516

0.0505

0.0495

0.0485

0.0475

0.0465

0.0455

−1.5

0.0668

0.0655

0.0643

0.0630

0.0618

0.0606

0.0594

0.0582

0.0571

0.0559

−1.4

0.0808

0.0793

0.0778

0.0764

0.0749

0.0735

0.0721

0.0708

0.0694

0.0681

−1.3

0.0968

0.0951

0.0934

0.0918

0.0901

0.0885

0.0869

0.0853

0.0838

0.0823

−1.2

0.1151

0.1131

0.1112

0.1093

0.1075

0.1056

0.1038

0.1020

0.1003

0.0985

−1.1

0.1357

0.1335

0.1314

0.1292

0.1271

0.1251

0.1230

0.1210

0.1190

0.1170

−1.0

0.1587

0.1562

0.1539

0.1515

0.1492

0.1469

0.1446

0.1423

0.1401

0.1379

−0.9

0.1841

0.1814

0.1788

0.1762

0.1736

0.1711

0.1685

0.1660

0.1635

0.1611

−0.8

0.2119

0.2090

0.2061

0.2033

0.2005

0.1977

0.1949

0.1922

0.1894

0.1867

−0.7

0.2420

0.2389

0.2358

0.2327

0.2296

0.2266

0.2236

0.2206

0.2177

0.2148

−0.6

0.2743

0.2709

0.2676

0.2643

0.2611

0.2578

0.2546

0.2514

0.2483

0.2451

−0.5

0.3085

0.3050

0.3015

0.2981

0.2946

0.2912

0.2877

0.2843

0.2810

0.2776

−0.4

0.3446

0.3409

0.3372

0.3336

0.3300

0.3264

0.3228

0.3192

0.3156

0.3121

−0.3

0.3821

0.3783

0.3745

0.3707

0.3669

0.3632

0.3594

0.3557

0.3520

0.3483

−0.2

0.4207

0.4168

0.4129

0.4090

0.4052

0.4013

0.3974

0.3936

0.3897

0.3859

−0.1

0.4602

0.4562

0.4522

0.4483

0.4443

0.4404

0.4364

0.4325

0.4286

0.4247

−0.0

0.5000

0.4960

0.4920

0.4880

0.4840

0.4801

0.4761

0.4721

0.4681

0.4641

z

Area

−3.50

0.00023263

−4.00

0.00003167

−4.50

0.00000340

−5.00

0.00000029

Area under Chi-Square Distribution

463 https://doi.org/10.1017/9781009212021.018 Published online by Cambridge University Press

Appendix

df

χ2

χ2

χ2

χ2

χ2

χ2

χ2

χ2

χ2

χ2

.995

.990

.975

.950

.900

.100

.050

.025

.010

.005

1

0.000

0.000

0.001

0.004

0.016

2.706

3.841

5.024

6.635

7.879

2

0.010

0.020

0.051

0.103

0.211

4.605

5.991

7.378

9.210

10.597

3

0.072

0.115

0.216

0.352

0.584

6.251

7.815

9.348

11.345

12.838

4

0.207

0.297

0.484

0.711

1.064

7.779

9.488

11.143

13.277

14.860

5

0.412

0.554

0.831

1.145

1.610

9.236

11.070

12.833

15.086

16.750

6

0.676

0.872

1.237

1.635

2.204

10.645

12.592

14.449

16.812

18.548

7

0.989

1.239

1.690

2.167

2.833

12.017

14.067

16.013

18.475

20.278

8

1.344

1.646

2.180

2.733

3.490

13.362

15.507

17.535

20.090

21.955

9

1.735

2.088

2.700

3.325

4.168

14.684

16.919

19.023

21.666

23.589

10

2.156

2.558

3.247

3.940

4.865

15.987

18.307

20.483

23.209

25.188

11

2.603

3.053

3.816

4.575

5.578

17.275

19.675

21.920

24.725

26.757

12

3.074

3.571

4.404

5.226

6.304

18.549

21.026

23.337

26.217

28.300

13

3.565

4.107

5.009

5.892

7.042

19.812

22.362

24.736

27.688

29.819

14

4.075

4.660

5.629

6.571

7.790

21.064

23.685

26.119

29.141

31.319

15

4.601

5.229

6.262

7.261

8.547

22.307

24.996

27.488

30.578

32.801

16

5.142

5.812

6.908

7.962

9.312

23.542

26.296

28.845

32.000

34.267

17

5.697

6.408

7.564

8.672

10.085

24.769

27.587

30.191

33.409

35.718

18

6.265

7.015

8.231

9.390

10.865

25.989

28.869

31.526

34.805

37.156

19

6.844

7.633

8.907

10.117

11.651

27.204

30.144

32.852

36.191

38.582

20

7.434

8.260

9.591

10.851

12.443

28.412

31.410

34.170

37.566

39.997

21

8.034

8.897

10.283

11.591

13.240

29.615

32.671

35.479

38.932

41.401

22

8.643

9.542

10.982

12.338

14.041

30.813

33.924

36.781

40.289

42.796

23

9.260

10.196

11.689

13.091

14.848

32.007

35.172

38.076

41.638

44.181

24

9.886

10.856

12.401

13.848

15.659

33.196

36.415

39.364

42.980

45.559

25

10.520

11.524

13.120

14.611

16.473

34.382

37.652

40.646

44.314

46.928

26

11.160

12.198

13.844

15.379

17.292

35.563

38.885

41.923

45.642

48.290

27

11.808

12.879

14.573

16.151

18.114

36.741

40.113

43.195

46.963

49.645

28

12.461

13.565

15.308

16.928

18.939

37.916

41.337

44.461

48.278

50.993

29

13.121

14.256

16.047

17.708

19.768

39.087

42.557

45.722

49.588

52.336

30

13.787

14.953

16.791

18.493

20.599

40.256

43.773

46.979

50.892

53.672

40

20.707

22.164

24.433

26.509

29.051

51.805

55.758

59.342

63.691

66.766

50

27.991

29.707

32.357

34.764

37.689

63.167

67.505

71.420

76.154

79.490

60

35.534

37.485

40.482

43.188

46.459

74.397

79.082

83.298

88.379

91.952

70

43.275

45.442

48.758

51.739

55.329

85.527

90.531

95.023

100.425

104.215

80

51.172

53.540

57.153

60.391

64.278

96.578

101.879

106.629

112.329

116.321

90

59.196

61.754

65.647

69.126

73.291

107.565

113.145

118.136

124.116

128.299

100

67.328

70.065

74.222

77.929

82.358

118.498

124.342

129.561

135.807

140.169

464 https://doi.org/10.1017/9781009212021.018 Published online by Cambridge University Press

Appendix

Correlation r to Fisher’s z

r

z

r

z

r

z

r

z

r

z

0.000

0.000

0.114

0.114

0.228

0.232

0.342

0.356

0.456

0.492

0.002

0.002

0.116

0.117

0.230

0.234

0.344

0.359

0.458

0.495

0.004

0.004

0.118

0.119

0.232

0.236

0.346

0.361

0.460

0.497

0.006

0.006

0.120

0.121

0.234

0.238

0.348

0.363

0.462

0.500

0.008

0.008

0.122

0.123

0.236

0.241

0.350

0.365

0.464

0.502

0.010

0.010

0.124

0.125

0.238

0.243

0.352

0.368

0.466

0.505

0.012

0.012

0.126

0.127

0.240

0.245

0.354

0.370

0.468

0.508

0.014

0.014

0.128

0.129

0.242

0.247

0.356

0.372

0.470

0.510

0.016

0.016

0.130

0.131

0.244

0.249

0.358

0.375

0.472

0.513

0.018

0.018

0.132

0.133

0.246

0.251

0.360

0.377

0.474

0.515

0.020

0.020

0.134

0.135

0.248

0.253

0.362

0.379

0.476

0.518

0.022

0.022

0.136

0.137

0.250

0.255

0.364

0.381

0.478

0.520

0.024

0.024

0.138

0.139

0.252

0.258

0.366

0.384

0.480

0.523

0.026

0.026

0.140

0.141

0.254

0.260

0.368

0.386

0.482

0.526

0.028

0.028

0.142

0.143

0.256

0.262

0.370

0.388

0.484

0.528

0.030

0.030

0.144

0.145

0.258

0.264

0.372

0.391

0.486

0.531

0.032

0.032

0.146

0.147

0.260

0.266

0.374

0.393

0.488

0.533

0.034

0.034

0.148

0.149

0.262

0.268

0.376

0.395

0.490

0.536

0.036

0.036

0.150

0.151

0.264

0.270

0.378

0.398

0.492

0.539

0.038

0.038

0.152

0.153

0.266

0.273

0.380

0.400

0.494

0.541

0.040

0.040

0.154

0.155

0.268

0.275

0.382

0.402

0.496

0.544

0.042

0.042

0.156

0.157

0.270

0.277

0.384

0.405

0.498

0.547

0.044

0.044

0.158

0.159

0.272

0.279

0.386

0.407

0.500

0.549

0.046

0.046

0.160

0.161

0.274

0.281

0.388

0.409

0.502

0.552

0.048

0.048

0.162

0.163

0.276

0.283

0.390

0.412

0.504

0.555

0.050

0.050

0.164

0.165

0.278

0.286

0.392

0.414

0.506

0.557

0.052

0.052

0.166

0.168

0.280

0.288

0.394

0.417

0.508

0.560

0.054

0.054

0.168

0.170

0.282

0.290

0.396

0.419

0.510

0.563

0.056

0.056

0.170

0.172

0.284

0.292

0.398

0.421

0.512

0.565

0.058

0.058

0.172

0.174

0.286

0.294

0.400

0.424

0.514

0.568

0.060

0.060

0.174

0.176

0.288

0.296

0.402

0.426

0.516

0.571

0.062

0.062

0.176

0.178

0.290

0.299

0.404

0.428

0.518

0.574

0.064

0.064

0.178

0.180

0.292

0.301

0.406

0.431

0.520

0.576

0.066

0.066

0.180

0.182

0.294

0.303

0.408

0.433

0.522

0.579

0.068

0.068

0.182

0.184

0.296

0.305

0.410

0.436

0.524

0.582

0.070

0.070

0.184

0.186

0.298

0.307

0.412

0.438

0.526

0.585

0.072

0.072

0.186

0.188

0.300

0.310

0.414

0.440

0.528

0.587

0.074

0.074

0.188

0.190

0.302

0.312

0.416

0.443

0.530

0.590

0.076

0.076

0.190

0.192

0.304

0.314

0.418

0.445

0.532

0.593

0.078

0.078

0.192

0.194

0.306

0.316

0.420

0.448

0.534

0.596

0.080

0.080

0.194

0.196

0.308

0.318

0.422

0.450

0.536

0.599

465 https://doi.org/10.1017/9781009212021.018 Published online by Cambridge University Press

Appendix

(cont.)

r

z

r

z

r

z

r

z

r

z

0.082

0.082

0.196

0.199

0.310

0.321

0.424

0.453

0.538

0.601

0.084

0.084

0.198

0.201

0.312

0.323

0.426

0.455

0.540

0.604

0.086

0.086

0.200

0.203

0.314

0.325

0.428

0.457

0.542

0.607

0.088

0.088

0.202

0.205

0.316

0.327

0.430

0.460

0.544

0.610

0.090

0.090

0.204

0.207

0.318

0.329

0.432

0.462

0.546

0.613

0.092

0.092

0.206

0.209

0.320

0.332

0.434

0.465

0.548

0.616

0.094

0.094

0.208

0.211

0.322

0.334

0.436

0.467

0.550

0.618

0.096

0.096

0.210

0.213

0.324

0.336

0.438

0.470

0.552

0.621

0.098

0.098

0.212

0.215

0.326

0.338

0.440

0.472

0.554

0.624

0.100

0.100

0.214

0.217

0.328

0.341

0.442

0.475

0.556

0.627

0.102

0.102

0.216

0.219

0.330

0.343

0.444

0.477

0.558

0.630

0.104

0.104

0.218

0.222

0.332

0.345

0.446

0.480

0.560

0.633

0.106

0.106

0.220

0.224

0.334

0.347

0.448

0.482

0.562

0.636

0.108

0.108

0.222

0.226

0.336

0.350

0.450

0.485

0.564

0.639

0.110

0.110

0.224

0.228

0.338

0.352

0.452

0.487

0.566

0.642

0.112

0.112

0.226

0.230

0.340

0.354

0.454

0.490

0.568

0.645

r

z

r

z

r

z

r

z

0.570

0.648

0.684

0.837

0.798

1.093

0.912

1.539

0.572

0.650

0.686

0.840

0.800

1.099

0.914

1.551

0.574

0.653

0.688

0.844

0.802

1.104

0.916

1.564

0.576

0.656

0.690

0.848

0.804

1.110

0.918

1.576

0.578

0.659

0.692

0.852

0.806

1.116

0.920

1.589

0.580

0.662

0.694

0.856

0.808

1.121

0.922

1.602

0.582

0.665

0.696

0.860

0.810

1.127

0.924

1.616

0.584

0.669

0.698

0.863

0.812

1.133

0.926

1.630

0.586

0.672

0.700

0.867

0.814

1.139

0.928

1.644

0.588

0.675

0.702

0.871

0.816

1.145

0.930

1.658

0.590

0.678

0.704

0.875

0.818

1.151

0.932

1.673

0.592

0.681

0.706

0.879

0.820

1.157

0.934

1.689

0.594

0.684

0.708

0.883

0.822

1.163

0.936

1.705

0.596

0.687

0.710

0.887

0.824

1.169

0.938

1.721

0.598

0.690

0.712

0.891

0.826

1.175

0.940

1.738

0.600

0.693

0.714

0.895

0.828

1.182

0.942

1.756

0.602

0.696

0.716

0.899

0.830

1.188

0.944

1.774

0.604

0.699

0.718

0.904

0.832

1.195

0.946

1.792

0.606

0.703

0.720

0.908

0.834

1.201

0.948

1.812

0.608

0.706

0.722

0.912

0.836

1.208

0.950

1.832

0.610

0.709

0.724

0.916

0.838

1.214

0.952

1.853

466 https://doi.org/10.1017/9781009212021.018 Published online by Cambridge University Press

r

z

Appendix

(cont.)

r

z

r

z

r

z

r

z

0.612

0.712

0.726

0.920

0.840

1.221

0.954

1.874

0.614

0.715

0.728

0.924

0.842

1.228

0.956

1.897

0.616

0.719

0.730

0.929

0.844

1.235

0.958

1.921

0.618

0.722

0.732

0.933

0.846

1.242

0.960

1.946

0.620

0.725

0.734

0.937

0.848

1.249

0.962

1.972

0.622

0.728

0.736

0.942

0.850

1.256

0.964

2.000

0.624

0.732

0.738

0.946

0.852

1.263

0.966

2.029

0.626

0.735

0.740

0.950

0.854

1.271

0.968

2.060

0.628

0.738

0.742

0.955

0.856

1.278

0.970

2.092

0.630

0.741

0.744

0.959

0.858

1.286

0.972

2.127

0.632

0.745

0.746

0.964

0.860

1.293

0.974

2.165

0.634

0.748

0.748

0.968

0.862

1.301

0.976

2.205

0.636

0.751

0.750

0.973

0.864

1.309

0.978

2.249

0.638

0.755

0.752

0.978

0.866

1.317

0.980

2.298

0.640

0.758

0.754

0.982

0.868

1.325

0.982

2.351

0.642

0.762

0.756

0.987

0.870

1.333

0.984

2.410

0.644

0.765

0.758

0.991

0.872

1.341

0.986

2.477

0.646

0.768

0.760

0.996

0.874

1.350

0.988

2.555

0.648

0.772

0.762

1.001

0.876

1.358

0.990

2.647

0.650

0.775

0.764

1.006

0.878

1.367

0.992

2.759

0.652

0.779

0.766

1.011

0.880

1.376

0.994

2.903

0.654

0.782

0.768

1.015

0.882

1.385

0.996

3.106

0.656

0.786

0.770

1.020

0.884

1.394

0.998

3.453

0.658

0.789

0.772

1.025

0.886

1.403

0.660

0.793

0.774

1.030

0.888

1.412

0.662

0.796

0.776

1.035

0.890

1.422

0.664

0.800

0.778

1.040

0.892

1.432

0.666

0.804

0.780

1.045

0.894

1.442

0.668

0.807

0.782

1.050

0.896

1.452

0.670

0.811

0.784

1.056

0.898

1.462

0.672

0.814

0.786

1.061

0.900

1.472

0.674

0.818

0.788

1.066

0.902

1.483

0.676

0.822

0.790

1.071

0.904

1.494

0.678

0.825

0.792

1.077

0.906

1.505

0.680

0.829

0.794

1.082

0.908

1.516

0.682

0.833

0.796

1.088

0.910

1.528

r

z

467 https://doi.org/10.1017/9781009212021.018 Published online by Cambridge University Press

Appendix

Critical Values for Pearson’s Correlation Coefficient

DF

.25

.10

Proportion in ONE Tail .05 .025

.01

.005

Proportion in TWO Tails .50

.20

.10

.05

.02

.01

1

.7071

.9511

.9877

.9969

.9995

.9999

2

.5000

.8000

.9000

.9500

.9800

.9900

3

.4040

.6870

.8054

.8783

.9343

.9587

4

.3473

.6084

.7293

.8114

.8822

.9172

5

.3091

.5509

.6694

.7545

.8329

.8745

6

.2811

.5067

.6215

.7067

.7887

.8343

7

.2596

.4716

.5822

.6664

.7498

.7977

8

.2423

.4428

.5494

.6319

.7155

.7646

9

.2281

.4187

.5214

.6021

.6851

.7348

10

.2161

.3981

.4973

.5760

.6581

.7079

11

.2058

.3802

.4762

.5529

.6339

.6835

12

.1968

.3646

.4575

.5324

.6120

.6614

13

.1890

.3507

.4409

.5140

.5923

.6411

14

.1820

.3383

.4259

.4973

.5742

.6226

15

.1757

.3271

.4124

.4821

.5577

.6055

16

.1700

.3170

.4000

.4683

.5425

.5897

17

.1649

.3077

.3887

.4555

.5285

.5751

18

.1602

.2992

.3783

.4438

.5155

.5614

19

.1558

.2914

.3687

.4329

.5034

.5487

20

.1518

.2841

.3598

.4227

.4921

.5368

21

.1481

.2774

.3515

.4132

.4815

.5256

22

.1447

.2711

.3438

.4044

.4716

.5151

23

.1415

.2653

.3365

.3961

.4622

.5052

24

.1384

.2598

.3297

.3882

.4534

.4958

25

.1356

.2546

.3233

.3809

.4451

.4869

26

.1330

.2497

.3172

.3739

.4372

.4785

27

.1305

.2451

.3115

.3673

.4297

.4705

28

.1281

.2407

.3061

.3610

.4226

.4629

29

.1258

.2366

.3009

.3550

.4158

.4556

30

.1237

.2327

.2960

.3494

.4093

.4487

31

.1217

.2289

.2913

.3440

.4032

.4421

32

.1197

.2254

.2869

.3388

.3972

.4357

33

.1179

.2220

.2826

.3338

.3916

.4296

34

.1161

.2187

.2785

.3291

.3862

.4238

35

.1144

.2156

.2746

.3246

.3810

.4182

36

.1128

.2126

.2709

.3202

.3760

.4128

37

.1113

.2097

.2673

.3160

.3712

.4076

468 https://doi.org/10.1017/9781009212021.018 Published online by Cambridge University Press

Appendix

(cont.)

DF

.25

.10

Proportion in ONE Tail .05 .025

.01

.005

Proportion in TWO Tails .50

.20

.10

.05

.02

.01

38

.1098

.2070

.2638

.3120

.3665

.4026

39

.1084

.2043

.2605

.3081

.3621

.3978

40

.1070

.2018

.2573

.3044

.3578

.3932

41

.1057

.1993

.2542

.3008

.3536

.3887

42

.1044

.1970

.2512

.2973

.3496

.3843

43

.1032

.1947

.2483

.2940

.3457

.3801

44

.1020

.1925

.2455

.2907

.3420

.3761

45

.1008

.1903

.2429

.2876

.3384

.3721

46

.0997

.1883

.2403

.2845

.3348

.3683

47

.0987

.1863

.2377

.2816

.3314

.3646

48

.0976

.1843

.2353

.2787

.3281

.3610

49

.0966

.1825

.2329

.2759

.3249

.3575

50

.0956

.1806

.2306

.2732

.3218

.3542

51

.0947

.1789

.2284

.2706

.3188

.3509

52

.0938

.1772

.2262

.2681

.3158

.3477

53

.0929

.1755

.2241

.2656

.3129

.3445

54

.0920

.1739

.2221

.2632

.3102

.3415

55

.0912

.1723

.2201

.2609

.3074

.3385

56

.0904

.1708

.2181

.2586

.3048

.3357

57

.0896

.1693

.2162

.2564

.3022

.3328

58

.0888

.1678

.2144

.2542

.2997

.3301

59

.0880

.1664

.2126

.2521

.2972

.3274

60

.0873

.1650

.2108

.2500

.2948

.3248

61

.0866

.1636

.2091

.2480

.2925

.3223

62

.0858

.1623

.2075

.2461

.2902

.3198

63

.0852

.1610

.2058

.2441

.2880

.3173

64

.0845

.1598

.2042

.2423

.2858

.3150

65

.0838

.1586

.2027

.2404

.2837

.3126

66

.0832

.1574

.2012

.2387

.2816

.3104

67

.0826

.1562

.1997

.2369

.2796

.3081

68

.0820

.1550

.1982

.2352

.2776

.3060

69

.0814

.1539

.1968

.2335

.2756

.3038

70

.0808

.1528

.1954

.2319

.2737

.3017

71

.0802

.1517

.1940

.2303

.2718

.2997

72

.0796

.1507

.1927

.2287

.2700

.2977

73

.0791

.1497

.1914

.2272

.2682

.2957

74

.0786

.1486

.1901

.2257

.2664

.2938

75

.0780

.1477

.1888

.2242

.2647

.2919

76

.0775

.1467

.1876

.2227

.2630

.2900

77

.0770

.1457

.1864

.2213

.2613

.2882

469 https://doi.org/10.1017/9781009212021.018 Published online by Cambridge University Press

Appendix

(cont.)

DF

.25

.10

Proportion in ONE Tail .05 .025

.01

.005

Proportion in TWO Tails .50

.20

.10

.05

.02

.01

78

.0765

.1448

.1852

.2199

.2597

.2864

79

.0760

.1439

.1841

.2185

.2581

.2847

80

.0755

.1430

.1829

.2172

.2565

.2830

81

.0751

.1421

.1818

.2159

.2550

.2813

82

.0746

.1412

.1807

.2146

.2535

.2796

83

.0742

.1404

.1796

.2133

.2520

.2780

84

.0737

.1396

.1786

.2120

.2505

.2764

85

.0733

.1387

.1775

.2108

.2491

.2748

86

.0728

.1379

.1765

.2096

.2477

.2732

87

.0724

.1371

.1755

.2084

.2463

.2717

88

.0720

.1364

.1745

.2072

.2449

.2702

89

.0716

.1356

.1735

.2061

.2435

.2687

90

.0712

.1348

.1726

.2050

.2422

.2673

91

.0708

.1341

.1716

.2039

.2409

.2659

92

.0704

.1334

.1707

.2028

.2396

.2645

93

.0700

.1327

.1698

.2017

.2384

.2631

94

.0697

.1320

.1689

.2006

.2371

.2617

95

.0693

.1313

.1680

.1996

.2359

.2604

96

.0689

.1306

.1671

.1986

.2347

.2591

97

.0686

.1299

.1663

.1975

.2335

.2578

98

.0682

.1292

.1654

.1966

.2324

.2565

99

.0679

.1286

.1646

.1956

.2312

.2552

100

.0675

.1279

.1638

.1946

.2301

.2540

T table

Table entry for p and C is the critical value t* with probability p lying to its right and probability C lying between −t* and t*.t*

470 https://doi.org/10.1017/9781009212021.018 Published online by Cambridge University Press

Appendix

t distribution critical values Upper-tail probability p df

.25

.20

.15

.10

.05

.025

.001

.0005

127.3

318.3

636.6

3.078

6.314

2

0.816

1.061

1.386

1.886

2.920

4.303

4.849

6.965

9.925

3

0.765

0.978

1.250

1.638

2.353

3.182

3.482

4.541

5.841

7.453

4

0.741

0.941

1.190

1.533

2.132

2.776

2.999

3.747

4.604

5.598

7.173

8.610

5

0.727

0.920

1.156

1.476

2.015

2.571

2.757

3.365

4.032

4.773

5.893

6.869

6

0.718

0.906

1.134

1.440

1.943

2.447

2.612

3.143

3.707

4.317

5.208

5.959

7

0.711

0.896

1.119

1.415

1.895

2.365

2.517

2.998

3.499

4.029

4.785

5.408

8

0.706

0.889

1.108

1.397

1.860

2.306

2.449

2.896

3.355

3.833

4.501

5.041

9

0.703

0.883

1.100

1.383

1.833

2.262

2.398

2.821

3.250

3.690

4.297

4.781

10

0.700

0.879

1.093

1.372

1.812

2.228

2.359

2.764

3.169

3.581

4.144

4.587

11

0.697

0.876

1.088

1.363

1.796

2.201

2.328

2.718

3.106

3.497

4.025

4.437

12

0.695

0.873

1.083

1.356

1.782

2.179

2.303

2.681

3.055

3.428

3.930

4.318

13

0.694

0.870

1.079

1.350

1.771

2.160

2.282

2.650

3.012

3.372

3.852

4.221

14

0.692

0.868

1.076

1.345

1.761

2.145

2.264

2.624

2.977

3.326

3.787

4.140

15

0.691

0.866

1.074

1.341

1.753

2.131

2.249

2.602

2.947

3.286

3.733

4.073

16

0.690

0.865

1.071

1.337

1.746

2.120

2.235

2.583

2.921

3.252

3.686

4.015

17

0.689

0.863

1.069

1.333

1.740

2.110

2.224

2.567

2.898

3.222

3.646

3.965

18

0.688

0.862

1.067

1.330

1.734

2.101

2.214

2.552

2.878

3.197

3.611

3.922

19

0.688

0.861

1.066

1.328

1.729

2.093

2.205

2.539

2.861

3.174

3.579

3.883

20

0.687

0.860

1.064

1.325

1.725

2.086

2.197

2.528

2.845

3.153

3.552

3.850

21

0.686

0.859

1.063

1.323

1.721

2.080

2.189

2.518

2.831

3.135

3.527

3.819

22

0.686

0.858

1.061

1.321

1.717

2.074

2.183

2.508

2.819

3.119

3.505

3.792

23

0.685

0.858

1.060

1.319

1.714

2.069

2.177

2.500

2.807

3.104

3.485

3.768

24

0.685

0.857

1.059

1.318

1.711

2.064

2.172

2.492

2.797

3.091

3.467

3.745

25

0.684

0.856

1.058

1.316

1.708

2.060

2.167

2.485

2.787

3.078

3.450

3.725

26

0.684

0.856

1.058

1.315

1.706

2.056

2.162

2.479

2.779

3.067

3.435

3.707

27

0.684

0.855

1.057

1.314

1.703

2.052

2.158

2.473

2.771

3.057

3.421

3.690

28

0.683

0.855

1.056

1.313

1.701

2.048

2.154

2.467

2.763

3.047

3.408

3.674

29

0.683

0.854

1.055

1.311

1.699

2.045

2.150

2.462

2.756

3.038

3.396

3.659

30

0.683

0.854

1.055

1.310

1.697

2.042

2.147

2.457

2.750

3.030

3.385

3.646

40

0.681

0.851

1.050

1.303

1.684

2.021

2.123

2.423

2.704

2.971

3.307

3.551

50

0.679

0.849

1.047

1.299

1.676

2.009

2.109

2.403

2.678

2.937

3.261

3.496

60

0.679

0.848

1.045

1.296

1.671

2.000

2.099

2.390

2.660

2.915

3.232

3.460

80

0.678

0.846

1.043

1.292

1.664

1.990

2.088

2.374

2.639

2.887

3.195

3.416

100

0.677

0.845

1.042

1.290

1.660

1.984

2.081

2.364

2.626

2.871

3.174

3.390

1000

0.675

0.842

1.037

1.282

1.646

1.962

2.056

2.330

2.581

2.813

3.098

3.300

z*

0.674

0.841

1.036

1.282

1.645

1.960

2.054

2.326

2.576

2.807

3.091

3.291

99.5%

99.8%

99.9%

80%

90%

95%

96%

98%

63.66

.0025

1.963

70%

31.82

.005

1.376

60%

15.89

.01

1.000

50%

12.71

.02

1

99%

14.09

22.33

31.60

10.21

12.92

Confidence level C

471 https://doi.org/10.1017/9781009212021.018 Published online by Cambridge University Press

Appendix

The F Distribution The F distribution is an asymmetric distribution that has a minimum value of 0, but no maximum value. The curve reaches a peak not far to the right of 0 and then gradually approaches the horizontal axis the larger the F value is. The F distribution approaches but never quite touches the horizontal axis. The F distribution has two degrees of freedom, d1 for the numerator, d2 for the denominator. For each combination of these degrees of freedom there is a different F distribution. The F distribution is most spread out when the degrees of freedom are small. As the degrees of freedom increase, the F distribution is less dispersed. The following figure shows the shape of the distribution. The F value is on the horizontal axis, with the probability for each F value represented by the vertical axis. The shaded area in the diagram represents the level of significance α shown in the table. There is a different F distribution for each combination of the degrees of freedom of the numerator and

P(F)

F

472 https://doi.org/10.1017/9781009212021.018 Published online by Cambridge University Press

denominator. Since there are so many F distributions, the F tables are organized somewhat differently than the tables for the other distributions. The three tables that follow are organized by the level of significance. The first table gives F values associated with α = 0.10 of the area in the right tail of the distribution. The second table gives the F values for α = 0.05 of the area in the right tail. The third table gives F values for the α = 0.01 level of significance. In each of these tables, the F values are given for various combinations of degrees of freedom.

The F Distribution

For example, if the α = 0.10 level of significance is selected, use the first F table. If there are 5 degrees of freedom in the numerator and 7 degrees of freedom in the denominator, the F value from the table is 2.88. This means there is exactly 0.10 of the area under the F curve that lies to the right of F = 2.88. When the significance level is α = 0.05, use the second F table. If there are 20 degrees of freedom in the numerator and 5 degrees of freedom in the denominator, then the critical F value is 4.56. This could be written F20,5;0.05 = 4.56 That is, for 20 and 5 degrees of freedom, the F value that leaves exactly 0.05 of the area under the F curve in the right tail of the distribution is 4.56. For the α = 0.01 level of significance, the third F table is used. Suppose there is 1 degree of freedom in the numerator and 12 degrees of freedom in the denominator. Then, F1,12;0.01 = 9.33. An F value of 9.33 leaves exactly 0.01 of area under the curve in the right tail of the distribution when there are 1 and 12 degrees of freedom.

Appendix

d2

1

2

3

4

5

6

7

8

9

1

39.86

49.5

53.59

55.83

57.24

58.2

58.91

59.44

59.86

2

8.53

9.00

9.16

9.24

9.29

9.33

9.35

9.37

9.38

3

5.54

5.46

5.39

5.34

5.31

5.28

5.27

5.25

5.24

4

4.54

4.32

4.19

4.11

4.05

4.01

3.98

3.95

3.94

5

4.06

3.78

3.62

3.52

3.45

3.40

3.37

3.34

3.32

6

3.78

3.46

3.29

3.18

3.11

3.05

3.01

2.98

2.96

7

3.59

3.26

3.07

2.96

2.88

2.83

2.78

2.75

2.72

8

3.46

3.11

2.92

2.81

2.73

2.67

2.62

2.59

2.56

9

3.36

3.01

2.81

2.69

2.61

2.55

2.51

2.47

2.44

10

3.29

2.92

2.73

2.61

2.52

2.46

2.41

2.38

2.35

11

3.23

2.86

2.66

2.54

2.45

2.39

2.34

2.3

2.27

12

3.18

2.81

2.61

2.48

2.39

2.33

2.28

2.24

2.21

13

3.14

2.76

2.56

2.43

2.35

2.28

2.23

2.20

2.16

14

3.10

2.73

2.52

2.39

2.31

2.24

2.19

2.15

2.12

15

3.07

2.70

2.49

2.36

2.27

2.21

2.16

2.12

2.09

16

3.05

2.67

2.46

2.33

2.24

2.18

2.13

2.09

2.06

17

3.03

2.64

2.44

2.31

2.22

2.15

2.10

2.06

2.03

18

3.01

2.62

2.42

2.29

2.20

2.13

2.08

2.04

2.00

19

2.99

2.61

2.40

2.27

2.18

2.11

2.06

2.02

1.98

20

2.97

2.59

2.38

2.25

2.16

2.09

2.04

2.00

1.96

21

2.96

2.57

2.36

2.23

2.14

2.08

2.02

1.98

1.95

22

2.95

2.56

2.35

2.22

2.13

2.06

2.01

1.97

1.93

23

2.94

2.55

2.34

2.21

2.11

2.05

1.99

1.95

1.92

24

2.93

2.54

2.33

2.19

2.10

2.04

1.98

1.94

1.91

25

2.92

2.53

2.32

2.18

2.09

2.02

1.97

1.93

1.89

26

2.91

2.52

2.31

2.17

2.08

2.01

1.96

1.92

1.88

27

2.90

2.51

2.30

2.17

2.07

2.00

1.95

1.91

1.87

28

2.89

2.50

2.29

2.16

2.06

2.00

1.94

1.90

1.87

29

2.89

2.50

2.28

2.15

2.06

1.99

1.93

1.89

1.86

30

2.88

2.49

2.28

2.14

2.05

1.98

1.93

1.88

1.85

40

2.84

2.44

2.23

2.09

2.00

1.93

1.87

1.83

1.79

60

2.79

2.39

2.18

2.04

1.95

1.87

1.82

1.77

1.74

120

2.75

2.35

2.13

1.99

1.90

1.82

1.77

1.72

1.68

inf

2.71

2.30

2.08

1.94

1.85

1.77

1.72

1.67

1.63

473 https://doi.org/10.1017/9781009212021.018 Published online by Cambridge University Press

Appendix

d2

10

12

15

20

24

30

40

60

120

inf

1

60.19

60.71

61.22

61.74

62

62.26

62.53

62.79

63.06

63.33

2

9.39

9.41

9.42

9.44

9.45

9.46

9.47

9.47

9.48

9.49

3

5.23

5.22

5.20

5.18

5.18

5.17

5.16

5.15

5.14

5.13

4

3.92

3.90

3.87

3.84

3.83

3.82

3.80

3.79

3.78

3.76

5

3.30

3.27

3.24

3.21

3.19

3.17

3.16

3.14

3.12

3.10

6

2.94

2.90

2.87

2.84

2.82

2.80

2.78

2.76

2.74

2.72

7

2.70

2.67

2.63

2.59

2.58

2.56

2.54

2.51

2.49

2.47

8

2.54

2.50

2.46

2.42

2.40

2.38

2.36

2.34

2.32

2.29

9

2.42

2.38

2.34

2.30

2.28

2.25

2.23

2.21

2.18

2.16

10

2.32

2.28

2.24

2.20

2.18

2.16

2.13

2.11

2.08

2.06

11

2.25

2.21

2.17

2.12

2.10

2.08

2.05

2.03

2.00

1.97

12

2.19

2.15

2.10

2.06

2.04

2.01

1.99

1.96

1.93

1.90

13

2.40

2.10

2.05

2.01

1.98

1.96

1.93

1.90

1.88

1.85

14

2.10

2.05

2.01

1.96

1.94

1.91

1.89

1.86

1.83

1.80

15

2.06

2.02

1.97

1.92

1.90

1.87

1.85

1.82

1.79

1.76

16

2.03

1.99

1.94

1.89

1.87

1.84

1.81

1.78

1.75

1.72

17

2.00

1.96

1.91

1.86

1.84

1.81

1.78

1.75

1.72

1.69

18

1.98

1.93

1.89

1.84

1.81

1.78

1.75

1.72

1.69

1.66

19

1.96

1.91

1.86

1.81

1.79

1.76

1.73

1.70

1.67

1.63

20

1.94

1.89

1.84

1.79

1.77

1.74

1.71

1.68

1.64

1.61

21

1.92

1.87

1.83

1.78

1.75

1.72

1.69

1.66

1.62

1.59

22

1.90

1.86

1.81

1.76

1.73

1.70

1.67

1.64

1.60

1.57

23

1.89

1.84

1.80

1.74

1.72

1.69

1.66

1.62

1.59

1.55

24

1.88

1.83

1.78

1.73

1.70

1.67

1.64

1.61

1.57

1.53

25

1.87

1.82

1.77

1.72

1.69

1.66

1.63

1.59

1.56

1.52

26

1.86

1.81

1.76

1.71

1.80

1.65

1.61

1.58

1.54

1.50

27

1.85

1.80

1.75

1.70

1.67

1.64

1.60

1.57

1.53

1.49

28

1.84

1.79

1.74

1.69

1.66

1.63

1.59

1.56

1.52

1.48

29

1.83

1.78

1.73

1.68

1.65

1.62

1.58

1.55

1.51

1.47

30

1.82

1.77

1.72

1.67

1.64

1.61

1.57

1.54

1.50

1.46

40

1.76

1.71

1.66

1.61

1.57

1.54

1.51

1.47

1.42

1.38

60

1.71

1.66

1.60

1.54

1.51

1.48

1.44

1.40

1.35

1.29

120

1.65

1.60

1.55

1.48

1.45

1.41

1.37

1.32

1.26

1.19

inf

1.60

1.55

1.49

1.42

1.38

1.34

1.30

1.24

1.17

1.00

474 https://doi.org/10.1017/9781009212021.018 Published online by Cambridge University Press

Appendix

d2

1

2

3

4

5

6

7

8

9

1

161.4

199.5

215.7

224.6

230.2

234.0

236.8

238.9

240.5

2

18.51

19.00

19.16

19.25

19.33

19.35

19.37

19.38

3

10.13

9.55

9.28

9.12

19.3 9.01

8.94

8.89

8.85

8.81

4

7.71

6.94

6.59

6.39

6.26

6.16

6.09

6.04

6.00

5

6.61

5.79

5.41

5.19

5.05

4.95

4.88

4.82

4.77

6

5.99

5.14

4.76

4.53

4.39

4.28

4.21

4.15

4.10

7

5.59

4.74

4.35

4.12

3.97

3.87

3.79

3.73

3.68

8

5.32

4.46

4.07

3.84

3.69

3.58

3.50

3.44

3.39

9

5.12

4.26

3.86

3.63

3.48

3.37

3.29

3.23

3.18

10

4.96

4.10

3.71

3.48

3.33

3.22

3.14

3.07

3.02

11

4.84

3.98

3.59

3.36

3.20

3.09

3.01

2.95

2.90

12

4.75

3.89

3.49

3.26

3.11

3.00

2.91

2.85

2.80

13

4.67

3.81

3.41

3.18

3.03

2.92

2.83

2.77

2.71

14

4.60

3.74

3.34

3.11

2.96

2.85

2.76

2.70

2.65

15

4.54

3.68

3.29

3.06

2.90

2.79

2.71

2.64

2.59

16

4.49

3.63

3.24

3.01

2.85

2.74

2.66

2.59

2.54

17

4.45

3.59

3.20

2.96

2.81

2.70

2.61

2.55

2.49

18

4.41

3.55

3.16

2.93

2.77

2.66

2.58

2.51

2.46

19

4.38

3.52

3.13

2.90

2.74

2.63

2.54

2.48

2.42

20

4.35

3.49

3.10

2.87

2.71

2.60

2.51

2.45

2.39

21

4.32

3.47

3.07

2.84

2.68

2.57

2.49

2.42

2.37

22

4.30

3.44

3.05

2.82

2.66

2.55

2.46

2.40

2.34

23

4.28

3.42

3.03

2.80

2.64

2.53

2.44

2.37

2.32

24

4.26

3.40

3.01

2.78

2.62

2.51

2.42

2.36

2.30

25

4.24

3.39

2.99

2.76

2.60

2.49

2.40

2.34

2.28

26

4.23

3.37

2.98

2.74

2.59

2.47

2.39

2.32

2.27

27

4.21

3.35

2.96

2.73

2.57

2.46

2.37

2.31

2.25

28

4.20

3.34

2.95

2.71

2.56

2.45

2.36

2.29

2.24

29

4.18

3.33

2.93

2.70

2.55

2.43

2.35

2.28

2.22

30

4.17

3.32

2.92

2.69

2.53

2.42

2.33

2.27

2.21

40

4.08

3.23

2.84

2.61

2.45

2.34

2.25

2.18

2.12

60

4.00

3.15

2.76

2.53

2.37

2.25

2.17

2.10

2.04

120

3.92

3.07

2.68

2.45

2.29

2.17

2.09

2.02

1.96

inf

3.84

3.00

2.60

2.37

2.21

2.10

2.01

1.94

1.88

475 https://doi.org/10.1017/9781009212021.018 Published online by Cambridge University Press

Appendix

d2

10

12

15

20

24

30

40

60

120

inf

1

241.9

243.9

245.9

248.0

249.1

250.1

251.1

252.2

253.3

254.3

2

19.4

19.41

19.43

19.45

19.45

19.46

19.47

19.48

19.49

3

8.79

8.74

8.70

8.66

8.64

8.62

8.59

8.57

8.55

19.5 8.53

4

5.96

5.91

5.86

5.80

5.77

5.75

5.72

5.69

5.66

5.63

5

4.74

4.68

4.62

4.56

4.53

4.50

4.46

4.43

4.40

4.36

6

4.06

4.00

3.94

3.87

3.84

3.81

3.77

3.74

3.70

3.67

7

3.64

3.57

3.51

3.44

3.41

3.38

3.34

3.30

3.27

3.23

8

3.35

3.28

3.22

3.15

3.12

3.08

3.04

3.01

2.97

2.93

9

3.14

3.07

3.01

2.94

2.90

2.86

2.83

2.79

2.75

2.71

10

2.98

2.91

2.85

2.77

2.74

2.70

2.66

2.62

2.58

2.54

11

2.85

2.79

2.72

2.65

2.61

2.57

2.53

2.49

2.45

2.40

12

2.75

2.69

2.62

2.54

2.51

2.47

2.43

2.38

2.34

2.30

13

2.67

2.60

2.53

2.46

2.42

2.38

2.34

2.30

2.25

2.21

14

2.60

2.53

2.46

2.39

2.35

2.31

2.27

2.22

2.18

2.13

15

2.54

2.48

2.40

2.33

2.29

2.25

2.20

2.16

2.11

2.07

16

2.49

2.42

2.35

2.28

2.24

2.19

2.15

2.11

2.06

2.01

17

2.45

2.38

2.31

2.23

2.19

2.15

2.10

2.06

2.01

1.96

18

2.41

2.34

2.27

2.19

2.15

2.11

2.06

2.02

1.97

1.92

19

2.38

2.31

2.23

2.16

2.11

2.07

2.03

1.98

1.93

1.88

20

2.35

2.28

2.20

2.12

2.08

2.04

1.99

1.95

1.90

1.84

21

2.32

2.25

2.18

2.10

2.05

2.01

1.96

1.92

1.87

1.81

22

2.30

2.23

2.15

2.07

2.03

1.98

1.94

1.89

1.84

1.78

23

2.27

2.20

2.13

2.05

2.01

1.96

1.91

1.86

1.81

1.76

24

2.25

2.18

2.11

2.03

1.98

1.94

1.89

1.84

1.79

1.73

25

2.24

2.16

2.09

2.01

1.96

1.92

1.87

1.82

1.77

1.71

26

2.22

2.15

2.07

1.99

1.95

1.90

1.85

1.80

1.75

1.69

27

2.20

2.13

2.06

1.97

1.93

1.88

1.84

1.79

1.73

1.67

28

2.19

2.12

2.04

1.96

1.91

1.87

1.82

1.77

1.71

1.65

29

2.18

2.10

2.03

1.94

1.90

1.85

1.81

1.75

1.70

1.64

30

2.16

2.09

2.01

1.93

1.89

1.84

1.79

1.74

1.68

1.62

40

2.08

2.00

1.92

1.84

1.79

1.74

1.69

1.64

1.58

1.51

60

1.99

1.92

1.84

1.75

1.70

1.65

1.59

1.53

1.47

1.39

120

1.91

1.83

1.75

1.66

1.10

1.55

1.50

1.43

1.35

1.25

inf

1.83

1.75

1.67

1.57

1.52

1.46

1.39

1.32

1.22

1.00

476 https://doi.org/10.1017/9781009212021.018 Published online by Cambridge University Press

Appendix

d2

1

2

3

4

5

6

7

8

9

1

4,052

4,999.5

5,403

5,625

5,764

5,859

5,928

5,982

6,022

2

98.50

99.00

99.17

99.25

99.30

99.33

99.36

99.37

99.39

3

34.12

30.82

29.46

28.71

28.24

27.91

27.67

27.49

27.35

4

21.20

18.00

16.69

15.98

15.52

15.21

14.98

14.80

14.66

5

16.26

13.27

12.06

11.39

10.97

10.67

10.46

10.29

10.16

6

13.75

10.92

9.78

9.15

8.75

8.47

8.26

8.10

7.98

7

12.25

9.55

8.45

7.85

7.46

7.19

6.99

6.84

6.72

8

11.26

8.65

7.59

7.01

6.63

6.37

6.18

6.03

5.91

9

10.56

8.02

6.99

6.42

6.06

5.80

5.61

5.47

5.35

10

10.04

7.56

6.55

5.99

5.64

5.39

5.2

5.06

4.94

11

9.65

7.21

6.22

5.67

5.32

5.07

4.89

4.74

4.63

12

9.33

6.93

5.95

5.41

5.06

4.82

4.64

4.50

4.39

13

9.07

6.70

5.74

5.21

4.86

4.62

4.44

4.30

4.14

14

8.86

6.51

5.56

5.04

4.69

4.46

4.28

4.14

4.03

15

8.68

6.36

5.42

4.89

4.56

4.32

4.14

4.00

3.89

16

8.53

6.23

5.29

4.77

4.44

4.20

4.03

3.89

3.78

17

8.40

6.11

5.18

4.67

4.34

4.10

3.93

3.79

3.68

18

8.29

6.01

5.09

4.58

4.25

4.01

3.84

3.71

3.60

19

8.18

5.93

5.01

4.50

4.17

3.94

3.77

3.63

3.52

20

8.10

5.85

4.94

4.43

4.10

3.87

3.70

3.56

3.46

21

8.02

5.78

4.87

4.37

4.04

3.81

3.64

3.51

3.40

22

7.95

5.72

4.82

4.31

3.99

3.76

3.59

3.45

3.35

23

7.88

5.66

4.76

4.26

3.94

3.71

3.54

3.41

3.30

24

7.82

5.61

4.72

4.22

3.90

3.67

3.50

3.36

3.26

25

7.77

5.57

4.68

4.18

3.85

3.63

3.46

3.32

3.22

26

7.72

5.53

4.64

4.14

3.82

3.59

3.42

3.29

3.18

27

7.68

5.49

4.60

4.11

3.78

3.56

3.39

3.26

3.15

28

7.64

5.45

4.57

4.07

3.75

3.53

3.36

3.23

3.12

29

7.60

5.42

4.54

4.04

3.73

3.50

3.33

3.20

3.09

30

7.56

5.39

4.51

4.02

3.70

3.47

3.30

3.17

3.07

40

7.31

5.18

4.31

3.83

3.51

3.29

3.12

2.99

2.89

60

7.08

4.98

4.13

3.65

3.34

3.12

2.95

2.82

2.72

120

6.85

4.79

3.95

3.48

3.17

2.96

2.79

2.66

2.56

inf

6.63

4.61

3.78

3.32

3.02

2.80

2.64

2.51

2.41

477 https://doi.org/10.1017/9781009212021.018 Published online by Cambridge University Press

Appendix

d2

10

12

15

20

24

30

40

60

120

inf

1

6,056

6,106

6,157

6,209

6,235

6,261

6,287

6,313

6,339

6,366

2

99.40

99.42

99.43

99.45

99.46

99.47

99.47

99.48

99.49

99.50

3

27.23

27.05

26.87

26.69

26.60

26.50

26.41

26.32

26.22

26.13

4

14.55

14.37

14.20

14.02

13.93

13.84

13.75

13.65

13.56

13.46

5

10.05

9.89

9.72

9.55

9.47

9.38

9.29

9.20

9.11

9.02

6

7.87

7.72

7.56

7.40

7.31

7.23

7.14

7.06

6.97

6.88

7

6.62

6.47

6.31

6.16

6.07

5.99

5.91

5.82

5.74

5.65

8

5.81

5.67

5.52

5.36

5.28

5.20

5.12

5.03

4.95

4.86

9

5.26

5.11

4.96

4.81

4.73

4.65

4.57

4.48

4.40

4.31

10

4.85

4.71

4.56

4.41

4.33

4.25

4.17

4.08

4.00

3.91

11

4.54

4.40

4.25

4.10

4.02

3.94

3.86

3.78

3.69

3.60

12

4.30

4.16

4.01

3.86

3.78

3.70

3.62

3.54

3.45

3.36

13

4.10

3.96

3.82

3.66

3.59

3.51

3.43

3.34

3.25

3.17

14

3.94

3.80

3.66

3.51

3.43

3.35

3.27

3.18

3.09

3.00

15

3.80

3.67

3.52

3.37

3.29

3.21

3.13

3.05

2.96

2.87

16

3.69

3.55

3.41

3.26

3.18

3.10

3.02

2.93

2.84

2.75

17

3.59

3.46

3.31

3.16

3.08

3.00

2.92

2.83

2.75

2.65

18

3.51

3.37

3.23

3.08

3.00

2.92

2.84

2.75

2.66

2.57

19

3.43

3.30

3.15

3.00

2.92

2.84

2.76

2.67

2.58

2.49

20

3.37

3.23

3.09

2.94

2.86

2.78

2.69

2.61

2.52

2.42

21

3.31

3.17

3.03

2.88

2.80

2.72

2.64

2.55

2.46

2.36

22

3.26

3.12

2.98

2.83

2.75

2.67

2.58

2.50

2.40

2.31

23

3.21

3.07

2.93

2.78

2.70

2.62

2.54

2.45

2.35

2.26

24

3.17

3.03

2.89

2.74

2.66

2.58

2.49

2.40

2.31

2.21

25

3.13

2.99

2.85

2.70

2.62

2.54

2.45

2.36

2.27

2.17

26

3.09

2.96

2.81

2.66

2.58

2.50

2.42

2.33

2.23

2.13

27

3.06

2.93

2.78

2.63

2.55

2.47

2.38

2.29

2.20

2.10

28

3.03

2.90

2.75

2.60

2.52

2.44

2.35

2.26

2.17

2.06

29

3.00

2.87

2.73

2.57

2.49

2.41

2.33

2.23

2.14

2.03

30

2.98

2.84

2.70

2.55

2.47

2.39

2.30

2.21

2.11

2.01

40

2.80

2.66

2.52

2.37

2.29

2.20

2.11

2.02

1.92

1.80

60

2.63

2.50

2.35

2.20

2.12

2.03

1.94

1.84

1.73

1.60

120

2.47

2.34

2.19

2.03

1.95

1.86

1.76

1.66

1.53

1.38

inf

2.32

2.18

2.04

1.88

1.79

1.70

1.59

1.47

1.32

1.00

478 https://doi.org/10.1017/9781009212021.018 Published online by Cambridge University Press

Index

abortion example, 160 accreditation and certification example, 313 accreditation program example, 314 adversity medical error occurrence, 217 root cause analysis (RCA), 212 seriousness of, 213 advocacy, 4 AIDS example, 97, 296–298 air pollution example, 287 Alzheimer’s disease example, 97, 167–170, 427 Amanuensis, 92 ambulance services per day example, 167 analysis paralysis defined, 6 three types of, 6 analyst background of, 2 in data guided healthcare, 9 importance of, 1 role serving decision maker, 9 analyst steps alternative expected value scoring, 11 effective communication with team, 10–11 example of, 11 findings and warning disclosure, 12 lessen uncertainty, 11 outcome medical utilization, 12 outcome reports, 12 public input, 12 sensitivity analysis, 12 specific aim/hypotheses before model, 10 statistical analysis explanation, 12 structure problem, 11 timeline, 11 analytic hierarchy process (AHP), 199, 200 anesthesiologist intervention in recovery example, 160–161 artificial intelligence (AI) defined, 16 electronic health records (EHR), 16 lean management principles, 350 limitations of, 16 artificial intelligence (AI) in healthcare decision making advantages/disadvantages, 5 facets, 5

need for, 5 attribute (model), 156 Austin Regional Hospital example, 289 autoregressive integrated moving average (ARIMA) cancer forecasting example, 430 concepts of, 429 COVID-19 example, 429 Cox transformation, 433 emergency department decisions example, 429–430 environmental decision strategies, 430 integrated moving average process (0,1,1), 434 model, 433–434 process of, 434 random walk process (0,1,0), 434 stationarity, 434 autoregressive moving average (ARMA), 428, 431 autoregressive time series first order process (AR (1)), 432 Markovian, 432 model, 432 partial correlation, 432 second order process (AR (2)), 432 stationarity, 432 white noise model, 432 availability, 388 average differences, 203 average time between two failures, 388 average time to first episode, 388 average time to repair, 388 Bayes, Thomas, 2, 111 Bayes factor (BF), 17 Bayes theorem chain smoking and lung disease example, 122–124 formula, 111 in group decisions, 201 Bayesian analysis causal networks, 215 conquer and rule principle, 2 JASP (software), 67 posterior probability, 115 Bayesian approach to uncertainty likelihood function, 111 posterior opinion, 111 prior opinion, 111 behavioral consensus, 203, 205

benchmarking, 353–354 Berg balance scale (BBS), 189–190 Bernoulli distribution, 162 best decision. See optimal decision bias function, 162 bias in decisionmaking example, 287 big data communications technologies, 93 decision tree analysis (DTA), 190 healthcare importance, 93 binomial probability distribution Bernoulli distribution and, 162 COVID-19 example, 369–371 formula, 250, 369 versus geometric, 249–250 inverse, 250 medication error example, 250–252 bioterrorism attack readiness, 14–15, 287 bioterrorism attack readiness example, 96, 97, 159 bootstrap analysis, 247 Box, G.E.P., 3 box plot, 374, 386 Box-Jenkins approach, 426, 431 brainstorming, 203 breast cancer medication cost example, 245 bumped-up binomial distribution (BBD), 159 cancer forecasting example, 430 cancer recurrence example, 97 case study approach to healthcare evaluation, 312, 316 cataract surgery example, 248 cause-and-effect diagram, root cause analysis (RCA), 214 census, 98 chain smoking and lung disease, 116–121, 122–124 checking independence example, 290 chi-squared distribution pattern pediatric hospital waiting times example, 380 probability density function, 380 chi-squared test, 18 cognitive mapping (CM), 205 Cohen’s kappa, 108 coherence (dependency), 114 community-based program sustainability example, 313

479 https://doi.org/10.1017/9781009212021.019 Published online by Cambridge University Press

Index

conditional confidence, 115 conditional independence, 110, 216 conditional probability, 108–109, 110, 289–290 conflict management in group decisions, 200, 202–203 conjunctive outcome probability, 113 contiguity, 98 continuous models, 177 control chart, 355 convergent root cause analysis (RCA) types, 216 convex Poisson probability distribution model, 158 correlation, 110–111 corroboration (dependency), 114 cost-benefit analysis defined, 10 six sigma concepts, 361–370 cost-effective beneficial approach to healthcare evaluation, 312, 316 cost-effectiveness analysis advantages of, 244 aim of, 244 background of, 245 better healthcare decisions, 248 bootstrap analysis and, 247 checklist, 244–245 community expectations, 246 constraints, 245–246 cost-effectiveness ratio (ICER), 244, 245 health technology assessment (HTA), 247 impact on priorities, 248 incremental cost-effectiveness ratio (ICER), 247 medical tourism and, 246 outcomes due to, 248, 249 probability bounds analysis (PBA), 245 public health policy and, 246 quality-adjusted life years (QALY), 244 reservations around, 247 sensitivity analysis and, 244 WHO guidelines, 249 cost-effectiveness analysis examples cataract surgery example, 248 insured patient percentages example, 255–259 Medicare policies, 248 patient satisfaction example, 255–257 supply demand example, 254–255 telehealth example, 248–249 vendor analysis example, 253–254 cost-effectiveness ratio (ICER), 244, 245 cost-effectiveness steps competitor comparison, 249 decision model, 249 expected cost, 249 final report, 249 operational cost estimate, 252 probability of uncertain components, 249–252

sensitivity analysis, 249 COVID-19 autoregressive integrated moving average (ARIMA), 429 autoregressive moving average (ARMA), 428 binomial probability distribution, 369–371 death evaluation example, 322–329 forecasting example, 426–427 ICU capacity forecasting, 427 inverse binomial distribution, 252, 375–377 Poisson probability distribution model, 373–376 risk analysis, 286–287, 297–299 six sigma concepts, 358 SWOT analysis, 293 time series forecasting technique, 427 COVID-19, pneumonia, influenza example, 435–443 COVID-19 example binomial probability distribution, 162–163 p-chart, 383–386 root cause analysis (RCA), 213 Cox proportional analysis, 190 Cox transformation, 433 critical thinking, 15 C-section birth example, 294–295 cumulative probability plot, 374–375 cyber insecurity example, 97 cyclic component, 431 daily cost, 252 daily probability, 249 data big, 93 defined, 91 versus information and knowledge, 91 processing steps, 91 data ambiguity, 17 data analysis distortions sampling error, 158 selection bias, 158 data collection. See also surveys AIDS example, 97 Alzheimer’s disease example, 97 benchmarking, 93 bioterrorism attack readiness example, 96, 97 cancer recurrence example, 97 cyber insecurity example, 97 data envelopment analysis (DEA), 96 earthquakes and aftershocks example, 95 Ebola virus example, 93–94 emergency department decisions example, 95 goal clarification, 93 heart-lung transplant example, 97–98

480 https://doi.org/10.1017/9781009212021.019 Published online by Cambridge University Press

hospital queue system example, 94–95 kidney cancer example, 97 menopause example, 94 menstrual cycle example, 97 motivation for, 91 patient medicine noncompliance example, 95 purpose of, 91–92 randomized response technique (RRT), 95 rape incidences example, 97 SARS hospital site infection example, 96 data collection concepts census, 98 contiguity, 98 interim analysis, 98 probability, 98 randomized response technique (RRT), 98 sampling, 98 data collection methodology data envelopment analysis (DEA), 95 data envelopment technique (DET), 95 mean exponential family (MEF), 98 stochastic frontier analysis (SFA), 95 data donors, 93 data envelopment analysis (DEA), 95 data envelopment technique (DET), 15, 95 data guidance, 16–17 data guided healthcare decision studies, 13–14 data guided healthcare decisions, concepts Bayes factor (BF), 17 data ambiguity, 17 data guidance, 16–17 goodness of fit tests, 17–18 Markov chain, 17 Shannon type information, 17 survival function, 18 transformations, 17 data guided healthcare decisions, examples hospital queue system, 20–25 Zika virus, 18–21 data guided healthcare decisions, steps check model validity, 12 data gathering, 12 decision presentation, 12 decision proposal, 12 model construction, 12 model update and implementation, 12 problem organization, 12 data integration, 92 data mining data quality and, 98 decision tree analysis (DTA) and, 189 defined, 15 goal of, 93 steps in, 15 data mining steps alarm monitoring, 367 business intelligence, 366 data exploration, 366

Index

data preparation, 365 data scoring, 366 data selection, 365 decision support system, 366 goal definition, 365 pattern deployment, 366 data profiling, 91, 92 data scientists, 15 data transformation min-max transformation, 17 Z-transformation, 17 data types interval, 13 methodology used and, 13 nominal, 12 ordinal, 12–13 primary, 91 ratio, 13 secondary, 91 data visualization, 93 data-based root cause analysis, 218 datum, 91 decision defined, 1, 2 optimal, 1 decision analysis, 9 decision counselors formal clinicians, 16 objective, 16 usual care clinicians, 16 decision fatigue, 5 decision maker accountability, 15 group decisions, 200 responsibilities of, 2, 9 surrogate, 6 value system, 3 decision maker types risk-adverse, 2 risk-neutral, 2 risk-taker, 2 decision making, healthcare Bayesian conquer and rule principle, 2 versus conclusion, 5–6 diagnostic test results, 6 difficulty of, 1, 15 environmental decisions, 5 ethics and, 6 forecasting knowledge, 426–445 importance of, 1, 4 importance of analyst to, 1 intraoperative strategies, 5 limitations of, 2 literature on, 4 negotiation, 6 Occam’s principle, 6 steps in, 1 decision making, healthcare concepts cost-effectiveness analysis, 244–259 data collection, 91–100 decision tree, 188–194 forecasting, 426–445

group decisions, 199–206 importance of data to, 16–18 motivation, 4–16 program evaluation, 312–330 risk in, 286–298 six-sigma and lean management principles, 349–390 software, 55–70 statistical models, 156–177 uncertainties, 107–124 decision tree advantages of, 191 defined, 188 disadvantages, 191 root cause analysis (RCA), 214 decision tree analysis (DTA) benefits of, 190 Berg balance scale (BBS), 189–190 big data, 190 and data mining, 189 defined, 188 history of, 190 lead poisoning and, 190 school counselors efficiency, 190 steps in, 189 decision tree analysis (DTA) components chance node, 189 decision node, 188 decision outcomes, 189 decision tree concepts expected value, 191 folding back, 191 game theory, 191 influence diagram, 194 Markov chain, 191 multiplication rule of the joint outcomes, 191 outcome role, 190–191 prediction process, 194 decision tree examples Medicaid example, 191–192 patient activity time example, 192–196 residential facility availability example, 192 decision-analytic model framework, 4 deep learning models, 93 degrees of freedom (df), 116, 381 Delphi method chairperson responsibilities, 204 consensus strategy, 200 forecasting knowledge, 436 group decisions, 199, 200–201 hospital durable goods procurement example, 205 Deming, William Edwards, 3–4 DeMorgan’s laws, 111–112 depression example, 312–313 de-seasonalized values, 431 deterministic approach. See non-stochastic time series forecasting technique diagnostic testing and decision making, 6 digital health interventions example, 313

dilution level of independence, 114 direct cost, 244 discrete model, 177 disjunctive outcome probability, 113 distribution plots, 374 divergent root cause analysis (RCA) types, 216, 217 DMAIC process (models), 158 Doctors without Borders, 247 double anchored syllogisms, 108 double anchoring, 108 drinking water quality study, 15 drug abuse model, 170–172 drug discovery clinical trial phases, 16 steps to identify goals, 16 durable power of attorney, 6 earthquakes and aftershocks example, 95 Ebola virus example, 93–94, 159 efficiency example, 314 efficiency score, 95 electronic health records (EHR) artificial intelligence (AI) for, 16 risk analysis, 286 emergency center admissions example, 224–225 emergency decision model, 203 emergency department admissions example, 427 emergency department decisions, 360 emergency department decisions example, 95, 429–430 emergency department pediatric admissions example, 427–428 emergency room overcrowding, 214 environmental decision strategies, 5, 430 epileptic pattern example, 160 epileptic pattern model, 170–172 equilibrium (divergent RCA), 217 ethics decision making types of, 459 defined, 6, 459 subdivisions of, 6 Euclidean distance, 212 evaluation of healthcare programs functions of, 316 implementation difficulties, 316 issues covered, 316 issues in, 312 necessity of, 316 negatives of, 312 process steps, 316–317 purpose of, 312 reasons for ignoring, 316 success considerations, 316 SWOT analysis, 353 evaluation of healthcare programs, approaches to case study, 312, 316 cost-effective beneficial, 312, 316 experimental, 312, 316

481 https://doi.org/10.1017/9781009212021.019 Published online by Cambridge University Press

Index

evaluation of healthcare programs, concepts component deliverables versus metrics table, 315 improvement suggestions, 315 pitfalls, 314–315 priorities of, 315–316 sensitivity analysis, 315 six sigma concepts, 315 tips for, 315 evaluation of healthcare programs, examples accreditation and certification example, 313 accreditation program example, 314 community-based program sustainability example, 313 COVID-19 death example, 322–329 depression example, 312–313 digital health interventions example, 313 efficiency example, 314 historical, 314 hospital healthcare performance example, 318–320 indices evaluation example, 321–322 infection prevention example, 313 interprofessional education example, 313 interrupted time series analysis example, 314 leaded gasoline example, 314 malpractice liability system example, 314 management competency, 313 methodology traditions example, 314 natural language processing for breast cancer treatment example, 314 nursing home cost example, 316–318 nursing support example, 313 patient advisory councils example, 313 patient care quality and safety example, 313 patient data inflow example, 321 patient disease severity example, 313 patient participation example, 313 public health interventions example, 314 quality improvement example, 314 seasonal flu example, 326–330 services provided for, 351 Triangle model example, 314 workforce productivity example, 318–321 expected value in decision tree analysis, 191 defined, 11, 188 formula, 11 experimental approach to healthcare evaluation defined, 312 groups, 312 random assignment, 316

variations, 312 exponential smoothing, 429, 430, 431 extinction by instinct, 6 factual support (dependency), 114 favorite (dependency), 114 F-distribution pattern pediatric hospital waiting times example, 382 probability density function, 381 process of, 381 flexing and bonding trivariate distribution (meta analysis), 14 folding back (decision tree), 191 forecasting knowledge, approaches Box-Jenkins approach, 426 Fourier transform approach, 426 forecasting knowledge, concepts COVID-19, pneumonia, influenza example, 435–443 non-stochastic time series forecasting technique, 430–431 stochastic approach, 431 forecasting knowledge, examples Alzheimer’s disease, 427 COVID-19, 426–427 wearable healthcare technology example, 428 forecasting knowledge, healthcare autoregressive integrated moving average (ARIMA), 429 exponential smoothing, 429 Holt’s linear trend, 429 imputation technique, 426 quality improvement, 426 reasons for, 426 S-curve trend, 429 time series forecasting technique, 426 weighted moving average, 428–429 Fourier transform approach, 426 frequency-based technique, 109 game theory, 191 Gaussian distribution pattern exponential distribution patterns, 378–379 pediatric hospital waiting times, 378–380 probability density function, 378 process of, 378 GDP health percentage example, 167–167 geometric probability distribution versus binomial, 249–250 defined, 161 formula, 250 inverse binomial distribution, 375 goodness of fit tests chi-squared test, 18 Kolmogorov-Smirnov (KS) test, 18 green belt training, 355 group communication strategy (GCS), 205 group decision concepts

482 https://doi.org/10.1017/9781009212021.019 Published online by Cambridge University Press

average differences, 203 behavioral consensus, 203, 205 chairperson responsibilities, 204 majority rule, 203 group decision examples emergency decision model, 203 hospital durable goods procurement example, 205–206 group decision strategies Delphi method, 199, 200–201 integrated group process (IGP), 201 nominal group technique (NGT), 200, 202 group decisions analytic hierarchy process (AHP), 199 Bayes theorem, 201 brainstorming, 203 cognitive mapping (CM), 205 conflict management, 202–203 consensus strategies, 200, 202 meta analysis, 202 motivation for, 199 multi-criteria decision analysis (MCDA), 199 patient consultation topics, 201 risks and uncertainty, 201 social judgment analysis (SJA), 205 steps for effective, 205–206 group decisions process shared decision making (SDM), 199, 201–202 team decisions, 200 harmonized lift measure (HLM), 115 hazard rate, 160 Health Insurance Portability and Accountability Act (HIPPA), 14 health literacy and healthcare decision making, 14 nominal group technique (NGT) and, 189 health technology assessment (HTA), 247 healthcare administration example, 245 healthcare decisions group decisions, 199–206 health literacy and, 14 risk analysis, 286–298 heart-lung transplant example, 97–98 Holt’s linear trend, 429 Hopkins statistic, 116 hospital adversity correlation example, 226 hospital durable goods procurement example, 205–206 hospital healthcare performance example, 318–320 hospital queue system example, 20–25, 94–95 hospital site infection example, 163–167 hospital supply procurement example, 121 human intelligence entities, 12 human papillomavirus (HPV) interventions example, 245

Index

hypergeometric distribution versus binomial, 377 process of, 377–378 imbalance measure, 115 imbalanced Bernoulli distribution, 296 imbalanced binomial distribution, 296 imputation technique, 426 incidence jump rate, 160 incremental cost-effectiveness ratio (ICER), 247 indices evaluation example, 321–322 indirect cost, 244 infection prevention example, 313 influence diagram, 194 information overload, 6 insured patient percentages example, 255–259 integrated group process (IGP) advantages of, 204 hospital durable goods procurement example, 206 recommendations, 201 steps in, 204 interestingness measures (data mining), 15 interim analysis, 98 interprofessional education example, 313 interrupted time series analysis example, 314 intersectional approach to risk analysis, 287 interval variable, 13 intervened 2-tier Poisson probability distribution model, 160 intervened exponential distribution (IED), 160–161 intervened geometric distribution (IGD), 161 intervened Poisson distribution (IPD), 161 intraoperative surgery AI strategies, 5 inverse binomial distribution COVID-19 example, 252, 375–377 defined, 375 formula, 250 geometric probability distribution, 375 inverted correlation matrix, 212 invertibility, 431 Jaccard index, 114–115 JASP (software) analysis types, 57–58 Bayesian analysis, 58–74 Bayesian calculations, 57 graphic user interfaces, 60–67 mediation analysis, 69 network analysis, 69 overview, 57 R language, 57 statistical graphs, 57 structural equation modeling (SEM), 69 Jumping at Zero Mass Point Convex poisson model, 14

Kaizen technique, 350 Katrina example, 217–218, 287 kidney cancer example, 97, 159 knowledge discovery database (KDD), 93 Kolmogorov-Smirnov (KS) test, 18 lab diagnostic time example, 121 language, 12 lavaan syntax, 69 lead poisoning example, 190 leaded gasoline example, 314 lean management principles applications of, 350 artificial intelligence (AI), 350 Kaizen technique, 350 Kano evaluation, 351 principles of, 350 lean six sigma methods (LSS) phases, 355 roadmap for, 359–363 versus six sigma, 364–373 success stories in, 358 takt time, 358 lean six sigma methods (LSS) examples benefits of, 360–361 emergency department wait times examples, 360 pharmacological therapies example, 359–362 life-critical shared decision-making support systems, 201 likelihood function, 111 likelihood ratio (LR), 113 linear model, 14–15, 95 linear trend, 431 long-term care insurance example, 245 machine learning importance of, 92–93 pattern identification, 93 Mahalanobis distances, 212, 217 majority rule, 203 malfunction relationship with accident example, 214 malpractice liability system example, 314 management competency example, 313 marginal probability, 112–113 market concentration, 321 Markov chain autoregressive time series, 432 in decision tree analysis, 191 defined, 17 in probability theory, 161 risk analysis and, 290 shared decisions, 199 Markov models, 114 matrix, 212 Matthew’s correlation, 116 mean exponential family (MEF), 98 mediation analysis, JASP (software), 69 Medicaid example, 191–192 medical tourism

Canadian hip transplants and, 246 cost of, 246 Doctors without Borders and, 247 popularity of, 246 Medicare cost-effectiveness analysis, 248 medication errors example, 220, 250–252, 290 menopause example, 94 menstrual cycle example, 97 meta analysis, group decisions, 202 methodology traditions example, 314 Microsoft Excel (software) Analysis Tool Pack, 55 commands, 81 computer simulations, 56 data entry, 58 error display, 55 formulas and functions list, 56 graphic user interfaces, 56–57 overview, 55 Pivot features, 55 R language and, 57 Regression Analysis tool, 55–56 Scientific Data Analysis Toolkit (SDAT), 55, 59 short-cut keys, 81 Solver Add-In, 55 step-by-step learning, 56 templates, 55 Visual Basic for Applications (VBA), 55, 57 Microsoft Math Solver (software) download, 70 graphic user interfaces, 77 probability, 70–80 min-max transformation, 17 models, concepts bias function, 162 observational bias, 162 probability density function, 162 probability mass function, 162 random sampling, 162 sample space (Ω), 162 models, healthcare attribute, 156 Bernoulli distribution, 162 binomial probability distribution, 162 deconstructing complex into simple parts, 12 defined, 3, 9–10 DMAIC process, 158 geometric distribution, 161 intervened exponential distribution (IED), 160–161 intervened geometric distribution (IGD), 161 intervened Poisson distribution (IPD), 161 linear, 95 Markov chain, 161 multi attribute value (MAV), 156 need for, 156

483 https://doi.org/10.1017/9781009212021.019 Published online by Cambridge University Press

Index

models, healthcare (cont.) odds ratios, 160 orthogonal principle, 157 Poisson probability distribution model, 158 posterior distribution, 160 posterior probability, 161 reasons for, 156 sampling biased gamma probability distribution model, 162, 163 spiral binomial probability distribution (SBPD), 160 uses for, 158 values, 156 models, rates hazard rate, 160 incidence jump rate, 160 prevalence rate, 160 vital rate, 160 models, types continuous, 177 discrete, 177 motivation in data guided healthcare decisions, 9 moving average time series coefficients, 433 components of, 430 first order process (MR (1)), 433 models, 433 second order process (MR (2)), 433 stationarity, 433 unknown properties of the noises, 432 multi attribute value (MAV), 156 multi-criteria decision analysis (MCDA), 199 multiplication rule of the joint outcomes, 191 multivariate attribute value (MAV), 188 natural language processing for breast cancer treatment example, 314 negative predictive value (NVP), 108, 113 negotiation, decisionmaking, 6 Netica (software), 216 network analysis, JASP (software), 69 noise (error), stochastic frontier analysis (SFA), 14 nominal group technique (NGT) advantages of, 189 background of, 200 chairperson responsibilities, 204 conflict management, 203 decision challenges in, 202 health literacy and, 189 hospital durable goods procurement example, 205 steps to improve, 189 nominal variable, 12 nonmaleficence intent, 4 non-stochastic time series forecasting technique cyclic component, 431

de-seasonalized values, 431 exponential smoothing, 430, 431 linear trend, 431 linear/polynomial equation, 430 moving average time series, 430 process of, 430–431 quadratic trend, 431 values, 430 weighted moving average, 431 normal distribution. See Gaussian distribution pattern nuclear accident example, 288 number system, 12 nursing autonomy, 4 nursing complaints example, 223–224 nursing home cost example, 316–318 nursing support example, 313 observational bias, 162 Occam’s principle, 6 odds (dependency), 114 odds (outcome), 111 odds ratios, 160 online surveys advantages of, 92 popularity of, 92 opioid crisis example, 213–214 opportunity cost, 244 optimal decision Bayesian conquer and rule principle, 2 defined, 1 five tests of, 9 steps in, 158 weighing benefits and harms, 9 ordinal variable, 12–13 organ transplant chances example, 96, 159–160 orthogonal principle, 157 outlier identification, 291–292, 382 overlap index (dependence), 114 Pareto analysis, 351 Pareto Priority Index (PPI), 361 partial correlation, 111 participatory decision making disadvantages of, 6 need for democratic process, 6 suitability of, 6 patient activity time example, 165–167, 192–196 patient advisory councils example, 313 patient autonomy, 4 patient body weight example, 217 patient care quality and safety example, 313 patient data inflow example, 321 patient disease severity example, 313 patient education decision counselors, 16 importance of, 16 systemic insufficiencies, 16 patient falls example, 214, 225

484 https://doi.org/10.1017/9781009212021.019 Published online by Cambridge University Press

patient medicine noncompliance example, 95 patient satisfaction example, 255–257 patient waiting times example, 386–394 patients and decisions example, 313 p-chart COVID-19, 383–386 patient waiting times example, 386–394 process of, 383 Pearson’s correlation, 115 pediatric hospital waiting times example, 380, 382 pediatric physician waiting times example, 383 pharmacological therapies example, 359–362 Poisson probability distribution model background of, 158, 371 bumped-up binomial distribution (BBD), 159 convex, 158 COVID-19 example, 373–376 formula, 371 intervened 2-tier, 160 Q-Q plot, 371 tweaked negative binomial distribution (TNBD) model, 159–160 portfolio analysts, 289 positive predictive value (PPV), 108, 113 posterior distribution, 160 posterior opinion, 2, 111 posterior probability, 161 prediction process, 194 prevalence rate, 160 primary data, 91 principal component analysis (PCA), 218 prior opinion, 2, 111 probability defined, 188 in uncertainty, 109 probability bounds analysis (PBA), 245 probability density function, 162 probability mass function, 162 probability mass plot, 375 probability sampling, 92, 98 productivity cost, 244 program cost, 252 prototype defined, 10 five principles of, 10 public health interventions example, 314 Q-Q plot, 371, 374–377, 378 quadratic trend, 431 quality in healthcare goals of, 349–350 improvement example, 314 quality-adjusted life years (QALY), 244 quota sampling, 98 R language download, 69

Index

JASP (software), 57 Microsoft Excel, 57 random noises (error), 95 random sampling, 98, 162, 250 randomized response technique (RRT), 92, 95, 98 rape incidences example, 97 ratio variable, 13 receiver operating characteristic curve (ROC), 15, 112, 113 relevance (dependency), 114 reliability, 387 residential facility availability example, 192 reverse prediction, 216 risk, defined, 189 risk analysis in healthcare cognitive ability and risk aversion, 287 concepts, 288 conditional probability, 289–290 defined, 189, 286 difficulties, 287 imbalanced Bernoulli distribution, 296 imbalanced binomial distribution, 296 importance of, 288 indices, 293–294 intersectional approach, 287 nuclear accident example, 288 outlier identification, 291–292 probability of security violation, 288 steps in, 288–289 SWOT analysis, 292–293 versus threat analysis, 287 vanishing correlation, 290 risk analysis in healthcare, examples AIDS example, 296–298 air pollution example, 287 Austin Regional Hospital example, 289 bias in decisionmaking example, 287 bioterrorism, 287 checking independence example, 290 COVID-19, 286–287 COVID-19 example, 297–299 C-section birth example, 294–295 electronic health records (EHR), 286 Katrina example, 287 medication errors example, 290 terror incident risk example, 290–291, 292 risk-adverse decision maker, 2, 189, 290 risk-neutral decision maker, 2, 189, 290 risk-taker decision maker, 2, 189, 290 root cause analysis (RCA) algorithm, 218 data types, 212 inverted correlation matrix, 212 Mahalanobis distances, 212 medical need for, 217 Netica (software), 216 steps in, 214, 225–228 types of, 212 root cause analysis (RCA) analytic tools

cause-and-effect diagram, 214 decision tree, 214 root cause analysis (RCA) concepts Bayesian causal networks, 215 error factors discovery, 215 Mahalanobis distances, 217 practical nature of, 215 principal component analysis (PCA), 218 root cause analysis (RCA) examples communal problems, 214 COVID-19, 213 emergency center admissions example, 224–225 emergency room overcrowding, 214 hospital adversity correlation example, 226 malfunction relationship with accident, 214 medical uses for, 213 medication errors example, 220 nursing complaints example, 223–224 opioid crisis, 213–214 patient falls example, 214, 225 patient safety and, 213 sentinel events example, 214 root cause analysis (RCA) types conditional independence, 216 convergent, 216 divergent, 216 examples of, 217–220 predictive probability, 216 reverse prediction, 216 serial, 215 run chart, 355 safety index value, 212 sample space (Ω), 109, 162 sampling advantages of, 98 defined, 98 quota, 98 random, 98 size determination, 98, 99 systemic, 98 without replacement, 98 sampling bias, 14 sampling biased gamma probability distribution model, 162, 163 sampling error, 158 SARS hospital site infection example, 96, 159 Schwartz information criterion (SBC), 438 Scientific Data Analysis Toolkit (SDAT), 55 S-curve trend, 429 seasonal flu example, 326–330 seat belt example, 364 secondary data, 91 selection bias, 158 sensitivity analysis analyst step, 12

cost-effectiveness analysis, 244 defined, 188 program evaluation, 315 sentinel. See adversity sentinel events example, 214 serial root cause analysis (RCA) types, 215, 217 Shanmugam index, 113 Shannon type information, 17 shared decision making (SDM) advantages of, 199 decision challenges in, 201–202 Markov chain, 199 sigma chart, 383 similarity measure, 114 six sigma methods Bayesian nature of, 353 benchmarking, 353–354 benefits of, 349, 367–368 capability indices, 350–351 communication tips, 364–365 concepts of, 349 conflict management, 354, 355 control chart, 355 control chart types, 382, 384 cost-benefit analysis, 361–370 COVID-19, 358 data mining, 365–367 evaluation, 315 genesis of, 351 history of, 349 Kano evaluation, 351 lean versus, 364–373 literature on, 355–356 model performance timetable, 354 myths of, 350 phases, 354, 364 project measurement, 354–355 risk assessment, 387–389 run chart, 355 seat belt example, 364 soft skill management, 355 steps of, 350 team member classification, 354 six sigma methods frequency patterns binomial probability distribution, 369–376 chi-squared distribution pattern, 379–382 F-distribution pattern, 381–384 Gaussian distribution patterns, 378–381 hypergeometric distribution, 377–379 inverse binomial distribution, 375–377 student’s t distribution pattern, 380–383 six sigma methods training black belt training, 355–358 expectations in, 356–357 green belt training, 355 requirements, 354 60–30-10 rule, 16 social judgment analysis (SJA), 205

485 https://doi.org/10.1017/9781009212021.019 Published online by Cambridge University Press

Index

software JASP, 57–70 Microsoft Excel, 55–57 Microsoft Math Solver, 70–80 motivation for, 55 Spearman’s correlation, 116 specificity (Sp), 108 spinned Poisson distribution, 15 spiral binomial probability distribution (SBPD), 160 stationarity autoregressive integrated moving average (ARIMA), 434 autoregressive time series, 432 Box-Jenkins approach, 431 defined, 431 moving average time series, 433 stochastic frontier analysis (SFA) data collection and, 95 defined, 14–15 study, 14 stochastic time series Box-Jenkins approach, 431 frequency-based technique, 431 structural equation modeling (SEM), 69 student’s t distribution pattern pediatric physician waiting times example, 383 probability density function, 380 process of, 380–381 supply demand example, 254–255 surrogate decision making, 4–5, 6 surveys. See also data collection online, 92 probability sampling, 92 questionnaire design, 92 randomized response technique (RRT), 92 survival function, 18 SWOT analysis, 292–293 systemic sampling, 98 takt time, 358 team decisions background of, 200 win-win philosophy, 200 technical efficiency, 95 technical inefficiency (error), 14, 95 telehealth example, 248–249 terror incident risk example, 290–291, 292 threat analysis, 287 time series data, defined, 443 time series forecasting technique autoregressive, 432 autoregressive integrated moving average time series (ARIMA), 433–434

COVID- 19, 427 COVID-19 ICU capacity, 427 defined, 426, 431 emergency department admissions, 427, 428 emergency department pediatric admissions, 427–428 model selection criteria, 434–435 moving average time series, 432–433 non-stochastic, 430–431 notations, 427 seasonal auto-regressive integrated moving average (SARIMA), 434 stochastic, 431 time value of money (TVM), 364 Triangle model example, 314 tumor recurrence example, 160 tweaked negative binomial distribution (TNBD) model, 159–160 uncertainty in healthcare, concepts Bayesian approach, 111 Bayesian posterior probability, 115 coherence, 114 communication importance, 109 conditional confidence, 115 conditional independence, 110 conjunctive outcome probability, 113 correlation, 110–111 corroboration, 114 cost-effectiveness analysis, 249–252 degrees of freedom (df), 116 DeMorgan’s laws, 111–112 dilution level of independence, 114 disjunctive outcome probability, 113 factual support, 114 favorite, 114 frequency-based technique, 109 harmonized lift measure (HLM), 115 Hopkins statistic, 116 imbalance measure, 115 Jaccard index, 114 likelihood ratio (LR), 113 marginal probability, 112–113 Markov models, 114 Matthew’s correlation, 116 negative predictive value (NVP), 113 odds (dependency), 114 odds (outcome), 111 overlap index, 114 partial correlation, 111 Pearson’s correlation, 115 positive predictive value (PPV), 113 probability, 109 receiver operating characteristic curve (ROC), 112, 113 relevance, 114

486 https://doi.org/10.1017/9781009212021.019 Published online by Cambridge University Press

sample space (Ω), 109 Shanmugam index, 113 similarity measure, 114 Spearman’s correlation, 116 three ingredients, 109 union, 111 vanishing partial correlation, 111 Youden’s index, 112 uncertainty in healthcare, examples chain smoking and lung disease, 116–121 hospital supply procurement, 121 lab diagnostic time, 121 uncertainty in healthcare, motivation for authentication, 108 borderline cases, 108 fragility, 107 hospital site infection, 108 reporting delays, 107–108 translating to knowledge discovery approach, 107 vagueness of, 107 uncertainty in healthcare, principles Cohen’s kappa, 108 conditional probability, 108–109, 110 double anchored syllogisms, 108 double anchoring, 108 negative predictive value (NVP), 108 positive predictive value (PPV), 108 prevalence, 108 specificity (Sp), 108 union (odds), 111 unknown properties of the noises, 432 values (model) defined, 10, 156, 188 steps in, 156–157 vanishing correlation, 290 vanishing partial correlation, 111 vendor analysis example, 253–254 veracity, 4 Visual Basic for Applications (VBA), 55, 57 vital rate, 160 volcano eruption victims, 14 wearable healthcare technology example, 428 weighted moving average, 428–429, 431 white noise model, 432 workforce productivity example, 318–321 Youden’s index, 112 Zika virus, 14, 18–21 Z-transformation, 17