Success in Academic Surgery: Health Services Research 9781447147176, 9781447147183, 1447147170

Surgical education is a rapidly expanding area of surgical research and career interest, and as the Association for Acad

695 119 4MB

English Pages 292 [295] Year 2014

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Success in Academic Surgery: Health Services Research
 9781447147176, 9781447147183, 1447147170

Table of contents :
Foreword......Page 8
Acknowledgement......Page 10
Contents......Page 12
Contributors......Page 14
Part I Main Research Areas......Page 17
1.1 What Is Health Services Research?......Page 18
1.2 What Is Outcomes Research?......Page 19
1.3 Part I. Main Research Areas......Page 20
1.4 Part II. Emerging Areas of Research......Page 21
1.6 Part IV. Career Development......Page 22
Further Reading......Page 23
2.1 Introduction......Page 24
2.2 What Is Comparative Effectiveness Research......Page 25
2.3 Why Comparative Effectiveness Research?......Page 26
2.4 Investments and Activities in Comparative Effectiveness Research......Page 27
2.5 CER and Stakeholder Engagement......Page 29
2.6.1 Randomized Trials......Page 30
2.6.2 Observational Studies......Page 31
2.6.2.1 Measuring Associations and Managing Confounding......Page 32
2.6.3 Research Synthesis......Page 33
2.7 Conclusion......Page 34
Landmark Studies......Page 35
3.1 Introduction......Page 36
3.2 The First Classic Papers: In the Beginning......Page 37
3.3 Gaining Momentum: Bigger Is Better......Page 38
3.4 Innovating Approaches, and Integrating Ideas – From Medicine to Surgery......Page 42
3.5.1 Variation in Spine Surgery......Page 43
3.5.2 Variation in Vascular Surgery......Page 44
3.6.1 Moving Towards Policy Implementation: A Unique Tool, the Dartmouth Atlas......Page 47
References......Page 50
Landmark Papers......Page 51
4.1 Introduction......Page 52
4.1.1 Common Pitfalls in Health Policy Research......Page 53
4.1.2 Physician and Hospital Payment Reform......Page 54
4.1.3 Surgical Training and Workforce Policy......Page 57
4.4 Where Should I Get Started in Health Policy Research?......Page 59
Further Reading......Page 60
5.1 Introduction......Page 61
5.2.1 Race......Page 62
5.2.3 Age......Page 63
5.3 What Are the Underlying Mechanisms?......Page 64
5.3.1.3 Timely Access to Surgical Care......Page 65
5.3.2.1 Provision of Appropriate Care......Page 66
5.3.3.2 Hospital Volume......Page 67
5.3.3.3 Patient Case-Mix......Page 68
5.4 How We Can Solve This Problem?......Page 69
5.4.1.4 Not Properly Risk Adjusting......Page 71
5.4.2 Final Comment......Page 72
References......Page 73
Suggested Readings......Page 75
6.1 Introduction......Page 76
6.1.1 Surgical Quality Measurement......Page 77
6.1.2 Data Source and Collection......Page 79
6.1.3.1 Risk-Adjustment......Page 80
6.1.3.2 Reliability......Page 81
6.1.3.4 Defining and Reporting Results......Page 82
6.1.4.2 Quality Improvement Programs......Page 83
6.1.5.1 Conclusions......Page 86
Landmark Papers......Page 87
Chapter7 Using Data for Local Quality Improvement......Page 88
7.1 Local Quality Improvement......Page 89
7.3 Engaging Stakeholders......Page 92
7.4 Future Directions......Page 93
Landmark Papers......Page 94
Part II Emerging Areas of Research......Page 95
Chapter8 Implementation Science and Quality Improvement......Page 96
8.1 Theories, Frameworks, and Models......Page 97
8.2 QI and Implementation Interventions (or Strategies)......Page 99
8.3 Research Designs......Page 102
8.5 Measurement of Constructs and Outcomes......Page 107
8.6 Getting Published and Funded in QI and Implementation Research......Page 108
8.8 Resources......Page 109
Recommended Readings......Page 111
9.1 Introduction......Page 112
9.2 Climate vs. Culture......Page 113
9.3 Measuring Safety Climate......Page 115
9.5 Changing Safety Climate and Culture......Page 116
9.6 Conclusion......Page 117
References......Page 118
10.1 A Paradigm Shift......Page 120
10.2 Principles of Patient Centered Care......Page 122
10.3 Defining Patient-Centered Outcomes......Page 123
10.4 Areas for Further Development......Page 125
References......Page 127
Landmark Papers......Page 128
11.1 Introduction......Page 129
11.2 Key Steps in Point-of-Care Research......Page 131
11.2.2 Develop or Adapt a Conceptual Framework......Page 132
11.2.3.1 Types of Data......Page 133
11.2.3.2 Sources of Data......Page 134
11.2.4 Determine Your Approach to Data Analysis......Page 135
11.3 Examples of Point-of-Care Studies in the Operating Room......Page 136
11.4 Conclusions......Page 140
References......Page 141
12.1.1 Need for Improvement......Page 142
12.1.2 Current Strategies in Quality Improvement – The Top Down Approach......Page 143
12.2.1 Defining Collaborative Quality Improvement......Page 144
12.2.3 MHA Keystone Center ICU Project......Page 145
12.3 Collaborative Quality Improvement in Surgery......Page 146
12.3.1 Northern New England Cardiovascular Disease Study Group......Page 147
12.3.2 Surgical Care and Outcomes Assessment Program......Page 148
12.3.3 Partnering with Payers – The Michigan Plan......Page 149
12.5 Keys to Success with Collaborative Quality Improvement......Page 154
References......Page 156
Part III Tools of the Trade......Page 160
13.1 Introduction......Page 161
13.2.1 Studying Rare Diagnoses, Procedures, or Complications......Page 162
13.2.2 Defining Temporal Trends or Regional Differences in Utilization......Page 163
13.2.4 Studying Outcomes Across Diverse Practice Settings......Page 164
13.3.1 Strengths and Weaknesses of Administrative Data......Page 165
13.3.2 Identifying Comorbidities in Administrative Data......Page 166
13.3.4 Clinical Registries......Page 167
13.4.1 Administrative Data......Page 168
13.4.1.4 Healthcare Cost and Utilization Project......Page 169
13.4.2 Clinical Registries......Page 170
13.4.2.1 STS National Database......Page 171
13.4.2.4 National Cancer Data Base......Page 172
References......Page 173
Chapter14 Methods for Enhancing Causal Inference in Observational Studies......Page 175
14.1 Using Observational Data for Health Services and Comparative Effectiveness Research......Page 176
14.2.1.1 Selection Bias......Page 177
14.2.1.3 Measurement Bias......Page 178
14.3 Controlling for Bias in Observational Studies......Page 180
14.3.1 Study Design......Page 181
14.3.2 Statistical Techniques......Page 182
14.3.2.2 Stratification or Restriction Prior to Multivariate Regression......Page 183
14.3.2.3 Propensity Score Analysis......Page 184
14.3.2.4 Instrumental Variable (IV) Analysis......Page 186
14.3.3 Limits of Advanced Statistical Techniques......Page 188
References......Page 189
Landmark Papers to Recommend to Readers......Page 190
15.1 Introduction......Page 191
15.3 Meta-analysis......Page 192
15.3.1 Methodology......Page 193
15.4 Conclusions......Page 200
References......Page 201
Landmark References......Page 202
Chapter16 Medical Decision-Making Research in Surgery......Page 203
16.2.2 Outcome Measures......Page 204
16.3.2 Decision Aids......Page 206
16.5 Opportunities for Surgeon Scientists......Page 207
16.6 What Is Decision-Analytic Modeling?......Page 208
16.7.2 Advantages of Decision Analytic Modeling......Page 209
16.9 How Do I Make a Model?......Page 210
16.10 Conclusion......Page 211
Landmark Papers......Page 212
17.1 Introduction......Page 213
17.3 Developing Questions......Page 214
17.4 Population......Page 215
17.5.1 Face to Face......Page 216
17.5.4 Internet......Page 217
17.7 Response Rates......Page 218
17.8 Nonresponse Bias......Page 219
17.9 Likert Scales......Page 220
17.11 Reporting Results......Page 221
References......Page 222
Chapter18 Qualitative Research Methods......Page 224
18.2 Formulating a Research Question......Page 225
18.3 Sampling Strategy......Page 226
18.4.1 Focus Groups......Page 227
18.4.2 Open Ended Interviews......Page 228
18.5 Analysis......Page 229
18.6.1 Credibility......Page 231
18.6.4 Transferability......Page 232
References......Page 233
Landmark Papers......Page 234
Part IV Career Development......Page 235
Chapter19 Engaging Students in Surgical Outcomes Research......Page 236
19.3 Approachability and the Faculty Mentor......Page 237
19.5 Effort by the Faculty Mentor......Page 238
19.6 Effort by the Student......Page 240
19.7 Productivity and the Faculty Mentor......Page 241
19.8 Productivity and the Student......Page 242
19.9 Summary......Page 243
Chapter20 Finding a Mentor in Outcomes Research......Page 244
20.1 Introduction......Page 245
20.2 What Makes a Good Outcomes Research Mentor?......Page 246
20.3 How to Locate a Good Outcomes Research Mentor and Determine Mutual Compatibility?......Page 250
20.4 Summary......Page 251
Selected Readings......Page 252
21.1 Finding a Mentor (& a Lab)......Page 253
21.3 Setting Goals......Page 255
21.4 Getting an Education......Page 256
21.6 Having a Life......Page 257
21.7 Re-entering Residency......Page 258
Chapter22 Funding Opportunities for Outcomes Research......Page 259
22.1 What Costs Money......Page 260
22.2 Who Has the Money......Page 261
22.3.2 K Awards......Page 263
22.4 The Road to Riches......Page 264
22.5 “Selling the Drama”......Page 266
22.6 “You See What You Look For”......Page 267
23.1 Introduction......Page 268
23.2 Preparing for the Job Search......Page 269
23.3 Looking for Opportunities......Page 270
23.4 Screening Jobs......Page 272
23.5 Crafting the Job......Page 273
References......Page 275
24.1 Introduction......Page 276
24.3 Organization and Governance......Page 277
24.4 Challenges in Faculty Development......Page 278
24.5 Creating a “Faculty Pipeline”......Page 280
24.7 Research Platforms......Page 282
24.7.2 “Local Labs”......Page 283
24.8 Collaborations......Page 284
References......Page 285
Index......Page 286

Citation preview

Success in Academic Surgery

Justin B. Dimick Caprice C. Greenberg Editors

Success in Academic Surgery: Health Services Research

Success in Academic Surgery Series Editors Herbert Chen, University of Wisconsin, Madison, Wisconsin, USA Lillian Kao, The University of Texas Health Sciences Center at Houston, Houston, TX, USA

For further volumes: http://www.springer.com/series/11216

Justin B. Dimick • Caprice C. Greenberg Editors

Success in Academic Surgery: Health Services Research

123

Editors Justin B. Dimick Henry King Ransom Professor of Surgery Chief, Division of Minimally Invasive Surgery Associate Chair for Faculty Development Department of Surgery University of Michigan Ann Arbor, MI, USA

Caprice C. Greenberg Associate Professor of Surgery WARF Professor of Surgical Research Director, Wisconsin Surgical Outcomes Research (WiSOR) Department of Surgery University of Wisconsin Madison, WI, USA

ISSN 2194-7481 ISSN 2194-749X (electronic) ISBN 978-1-4471-4717-6 ISBN 978-1-4471-4718-3 (eBook) DOI 10.1007/978-1-4471-4718-3 Springer London Heidelberg New York Dordrecht Library of Congress Control Number: 2014942652 © Springer-Verlag London 2014 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. Exempted from this legal reservation are brief excerpts in connection with reviews or scholarly analysis or material supplied specifically for the purpose of being entered and executed on a computer system, for exclusive use by the purchaser of the work. Duplication of this publication or parts thereof is permitted only under the provisions of the Copyright Law of the Publisher’s location, in its current version, and permission for use must always be obtained from Springer. Permissions for use may be obtained through RightsLink at the Copyright Clearance Center. Violations are liable to prosecution under the respective Copyright Law. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. While the advice and information in this book are believed to be true and accurate at the date of publication, neither the authors nor the editors nor the publisher can accept any legal responsibility for any errors or omissions that may be made. The publisher makes no warranty, express or implied, with respect to the material contained herein. Printed on acid-free paper Springer is part of Springer Science+Business Media (www.springer.com)

This book is dedicated to the following individuals: To our role models, John Birkmeyer and John Clarke, for pioneering health services research in surgery, and creating a career path for us and subsequent generations of health services researchers And, To our current and past chairmen, K. Craig Kent, Michael Mulholland, and Michael J. Zinner for embracing this fledgling field and investing the resources and support necessary to give our field a strong foothold in academic surgery.

Foreword

Twenty years ago, as I was completing my own research training, a book on surgical health services and outcomes research would have been inconceivable. There were too few surgeons with sufficient expertise in this field to support an anthology of reasonable breadth and rigor. More importantly, the book would have struggled to find an audience—and a publisher. Health services research was considered a niche in academic surgery, and not a particularly important one. This book underscores how times have changed. The annual meetings of our professional organizations are now dominated by clinical outcomes research, rather than laboratory science. As more surgeons pursue advanced research training in the field, a growing cadre of surgeon health services researchers is attacking the most pressing issues in surgical care and policy with remarkable sophistication. The authors and the comprehensiveness of this text serve as testimony to the critical intellectual mass that has been achieved. Interest in health services and outcomes research is soaring because it matters— to practicing surgeons, to surgical leaders, and to society at large. By some estimates, surgery is at least a $500 billion annual industry in the United States. In the wake of the Affordable Care Act and tightening resource constraints, understanding the comparative effectiveness of surgical interventions has become a leading priority of policy makers, payers, and the National Institutes of Health. In that context, several excellent chapters in this anthology outline research strategies for conducting high impact comparativeness effectiveness research, including techniques for enhancing causal inference in observational studies. Dealing with large, unexplained variations in surgical performance across hospitals and surgeons is the second imperative for the field. At least 100,000 Americans die undergoing surgery every year; 20-fold more experience surgical complications. Improving surgical quality is of course a complicated task, but insightful chapters in this text highlight strategies for measuring quality on a large scale and in terms relevant to patients as well as surgeons. Others provide interesting frameworks for understanding variation in performance and, more importantly, how to break down barriers to change and optimal practice.

vii

viii

Foreword

In addition to addressing the major issues facing the surgical profession today, this text will be an invaluable resource as we develop the next generation of surgeon health services researchers. The “Tools of the Trade” section provides thorough, readable primers on large databases, survey research, qualitative research, and data synthesis methodologies. The “Career Development” section provides great practical guidance—mentorship, funding and other survival skills—for new surgeon investigators looking to make their own mark on the field. It is my distinct pleasure to write the foreword for this important book. This resource will help ensure that this generation of surgeons and those that follow will not only take health services research to a new scientific level, but also continue our quest to make surgical practice as safe, equitable, and responsible as it can be. Center for Healthcare Outcomes & Policy University of Michigan Ann Arbor, MI, USA

John D. Birkmeyer George D. Zuidema

Acknowledgement

We would like to thank the current and past leadership of the Association for Academic Surgery (AAS) for their tireless promotion of academic excellence and career development across the diverse spectrum of scientific endeavor. Most notably, we are grateful to past-presidents Herb Chen, MD and Lillian Kao, MD, MS for conceiving this book series, including this volume, and past-president Daniel Albo, MD, for promoting health services research within the AAS and Academic Surgical Congress. We hope that this book will serve as an introduction to the critical concepts of this important and growing discipline of scientific inquiry.

ix

Contents

Part I

Main Research Areas

1

An Introduction to Health Services Research . . . . . . .. . . . . . . . . . . . . . . . . . . . Justin B. Dimick and Caprice C. Greenberg

3

2

Comparative Effectiveness Research .. . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . George J. Chang

9

3

Understanding Variations in the Use of Surgery . . . .. . . . . . . . . . . . . . . . . . . . Philip P. Goodney

21

4

Health Policy Research in Surgery . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . Justin B. Dimick, Terry Shih, and Andrew M. Ryan

37

5

Studying Surgical Disparities: It’s Not All Black and White . . . . . . . . . . Diane Schwartz and Adil Haider

47

6

Measuring Health Care Quality in Surgery: Strategies, Limitations, and Currently Available Initiatives . . . .. . . . . . . . . . . . . . . . . . . . Ryan P. Merkow and Karl Y. Bilimoria

7

Using Data for Local Quality Improvement . . . . . . . . .. . . . . . . . . . . . . . . . . . . . Sandra L. Wong

Part II

63 75

Emerging Areas of Research

8

Implementation Science and Quality Improvement . . . . . . . . . . . . . . . . . . . . Lillian S. Kao

85

9

Understanding and Changing Organizational Culture in Surgery . . . 101 Amir A. Ghaferi

10 Assessing Patient-Centered Outcomes . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 109 Arden M. Morris

xi

xii

Contents

11 Studying What Happens in the OR . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 119 Lane Frasier and Caprice C. Greenberg 12 Collaborative Quality Improvement.. . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 133 Jonathan F. Finks Part III

Tools of the Trade

13 Large Databases Used for Outcomes Research . . . . .. . . . . . . . . . . . . . . . . . . . 153 Terry Shih and Justin B. Dimick 14 Methods for Enhancing Causal Inference in Observational Studies. . 167 Kristin M. Sheffield and Taylor S. Riall 15 Systematic Review and Meta-analysis: A Clinical Exercise . . . . . . . . . . . 183 Melinda A. Gibbons 16 Medical Decision-Making Research in Surgery . . . . .. . . . . . . . . . . . . . . . . . . . 195 Clara N. Lee and Carrie C. Lubitz 17 Survey Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 205 Karen J. Brasel 18 Qualitative Research Methods . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 217 Margaret L. Schwarze Part IV

Career Development

19 Engaging Students in Surgical Outcomes Research . . . . . . . . . . . . . . . . . . . . 231 Kyle H. Sheetz and Michael J. Englesbe 20 Finding a Mentor in Outcomes Research .. . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 239 Omar Hyder and Timothy M. Pawlik 21 What Every Outcomes Research Fellow Should Learn . . . . . . . . . . . . . . . . 249 Yue-Yung Hu 22 Funding Opportunities for Outcomes Research . . . .. . . . . . . . . . . . . . . . . . . . 255 Dorry Segev 23 Choosing Your First Job as a Surgeon and Health Services Researcher .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 265 Scott E. Regenbogen 24 Building a Health Services Research Program .. . . . .. . . . . . . . . . . . . . . . . . . . 273 Samuel R.G. Finlayson Index . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 283

Contributors

Karl Y. Bilimoria, M.D., M.S. Surgical Outcomes and Quality Improvement Center, Department of Surgery, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA Karen J. Brasel, M.D., M.P.H. Division of Trauma/Critical Care, Medical College of Wisconsin, Milwaukee, WI, USA George J. Chang, M.D., M.S. Department of Surgical Oncology, University of Texas, MD Anderson Cancer Center, Houston, TX, USA Colorectal Center, University of Texas, MD Anderson Cancer Center, Houston, TX, USA Minimally Invasive and New Technologies in Oncologic Surgery Program, University of Texas, MD Anderson Cancer Center, Houston, TX, USA Justin B. Dimick, M.D., M.P.H. Henry King Ransom Professor of Surgery, Chief, Division of Minimally Invasive Surgery, Associate Chair for Faculty Development, Department of Surgery, University of Michigan, Ann Arbor, MI, USA Michael J. Englesbe, M.D. Department of Surgery, University of Michigan Medical School, Ann Arbor, MI, USA Jonathan F. Finks, M.D. Department of Surgery, University of Michigan Health System, Ann Arbor, MI, USA Samuel R.G. Finlayson, M.D., M.P.H. Department of Surgery, University of Utah School of Medicine, Salt Lake City, UT, USA Lane Frasier, M.D. Wisconsin Surgical Outcomes Research Program, Department of Surgery, University of Wisconsin Hospitals and Clinics, Madison, WI, USA Amir A. Ghaferi, M.D., M.S. Department of Surgery, Center for Healthcare Outcomes and Policy, University of Michigan, Ann Arbor, MI, USA

xiii

xiv

Contributors

Melinda A. Gibbons, M.D., M.S.H.S. Department of Surgery, David Geffen School of Medicine at University of California, Los Angeles, CA, USA Department of Surgery, Olive View UCLA Medical Center, Sylmar, CA, USA Department of Surgery, Greater Los Angeles VA Medical Center, Los Angeles, CA, USA Philip P. Goodney, M.D., M.S. Director, Center for the Evaluation of Surgical Care (CESC), Department of Surgery, Dartmouth Hitchcock Medical Center, Lebanon, NH, USA Co-Director, VA Outcomes Group, White River Junction, VA Medical Center, White River Junction, VT Caprice C. Greenberg, M.D., M.P.H. Associate Professor of Surgery, WARF Professor of Surgical Research, Director, Wisconsin Surgical Outcomes Research (WiSOR), Department of Surgery, University of Wisconsin, Madison, WI, USA Adil Haider, M.D., M.P.H. Center for Surgical Trials and Outcomes Research (CSTOR), Johns Hopkins University, School of Medicine, Baltimore, MD, USA Yue-Yung Hu, M.D., M.P.H. Department of Surgery, Beth Israel Deaconess Medical Center, Boston, MA, USA Omar Hyder Department of Anesthesia, Massachusetts General Hospital, Boston, MA, USA Lillian S. Kao, M.D., M.S. Department of Surgery, University of Texas Health Science Center at Houston, Houston, TX, USA Clara N. Lee, M.D., M.P.P., F.A.C.S. University of North Carolina, Chapel Hill, NC, USA Carrie C. Lubitz, M.D., M.P.H. Department of Surgery, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA Ryan P. Merkow, M.D., M.S. Surgical Outcomes and Quality Improvement Center, Department of Surgery, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA Arden M. Morris, M.D., M.P.H. Health Behavior Health Education, University of Michigan, Ann Arbor, MI, USA Department of Surgery, University of Michigan, Ann Arbor, MI, USA Division of Colorectal Surgery, University of Michigan, Ann Arbor, MI, USA Timothy M. Pawlik Department of Surgery, Johns Hopkins Hospital, Baltimore, MD, USA Scott E. Regenbogen, M.D., M.P.H. Department of Surgery, University of Michigan, Ann Arbor, MI, USA

Contributors

xv

Taylor S. Riall, M.D., Ph.D. Department of Surgery, The University of Texas Medical Branch, Galveston, TX, USA Andrew M. Ryan, Ph.D. Division of Healthcare Policy and Economics, Weill Cornell Medical College, New York, NY, USA Diane Schwartz, M.D. Assistant Professor, Department of Surgery, Johns Hopkins Bayview Medical Center, Baltimore, MD, USA Margaret L. Schwarze Department of Surgery, University of Wisconsin School of Medicine and Public Health, Madison, WI, USA Dorry Segev, M.D., Ph.D. Department of Surgery and Epidemiology, Department of Surgery, Johns Hopkins Medical Institutions, Baltimore, MD, USA Kyle Sheetz Department of Surgery, University of Michigan Medical School, Ann Arbor, MI, USA Kristin M. Sheffield, Ph.D. Department of Surgery, The University of Texas Medical Branch, Galveston, TX, USA Terry Shih, M.D. Department of Surgery, University of Michigan Health System, Ann Arbor, MI, USA Sandra L. Wong, M.D., M.S. Department of Surgery, University of Michigan, Ann Arbor, MI, USA

Part I

Main Research Areas

Chapter 1

An Introduction to Health Services Research Justin B. Dimick and Caprice C. Greenberg

Abstract The scientific focus of academic surgery has changed dramatically over the past decade. Historically, surgeon-scientists engaged almost exclusively in basic science research. With the rise of health services and outcomes research, more trainees and junior faculty are pursuing research in these disciplines. Despite the increasing popularity of this field, there are very few resources for young surgeons interested in learning about this discipline as applied to surgery. We hope this book helps to fill this gap. We start with a description of the main research areas in health services research followed by a look ahead into emerging areas of investigation. We then include several chapters that introduce the tools necessary to conduct this type of research. The final chapters provide practical advice on career development and program building for surgeon-scientists interested in pursuing this area of scholarly work. Keywords Outcomes • Quality • Surgery • Health services research • Research methods

1.1 What Is Health Services Research? We often get asked how health services research is different from traditional “clinical research”. Studying the end results of surgical care is clearly not new. As long as surgeons have been operating, we have been studying our patient’s J.B. Dimick, M.D., M.P.H. () Department of Surgery, University of Michigan Health System, 2800 Plymouth Road, Bldg 16, Room 137E, Ann Arbor, MI 48109-2800, USA e-mail: [email protected] C.C. Greenberg, M.D., M.P.H. Wisconsin Surgical Outcomes Research Program, Department of Surgery, University of Wisconsin Hospitals and Clinics, Madison, WI, USA J.B. Dimick and C.C. Greenberg (eds.), Success in Academic Surgery: Health Services Research, Success in Academic Surgery, DOI 10.1007/978-1-4471-4718-3__1, © Springer-Verlag London 2014

3

4

J.B. Dimick and C.C. Greenberg

outcomes. Although there is clearly overlap between traditional clinically-focused scientific work, health services research often takes different perspectives and looks at health care through a much broader lens. To illustrate these differences in perspective, it is useful to consider two popular definitions of health services research (HSR). AcademyHealth, the leading professional organization for health services researchers (their Annual Research Meeting is a great meeting to attend by the way), defines HSR as follows: AcademyHealth defines health services research as the multidisciplinary field of scientific investigation that studies how social factors, financing systems, organizational structures and processes, health technologies, and personal behaviors affect access to health care, the quality and cost of health care, and ultimately our health and well-being. Its research domains are individuals, families, organizations, institutions, communities, and populations.

The Agency for Healthcare Research and Quality (AHRQ), one of the leading funding agencies for HSR, uses the following definition: Health services research examines how people get access to health care, how much care costs, and what happens to patients as a result of this care. The main goals of health services research are to identify the most effective ways to organize, manage, finance, and deliver high quality care; reduce medical errors; and improve patient safety.

1.2 What Is Outcomes Research? Outcomes research is sometimes used interchangeably with health services research but is probably best considered one chief discipline within HSR. The Agency for Healthcare Research and Quality (AHRQ) defines outcomes research as follows: Outcomes research seeks to understand the end results of particular health care practices and interventions. End results include effects that people experience and care about, such as change in the ability to function. In particular, for individuals with chronic conditions— where cure is not always possible—end results include quality of life as well as mortality. By linking the care people get to the outcomes they experience, outcomes research has become the key to developing better ways to monitor and improve the quality of care.

While this formal definition of outcomes research is not as broad as the definition of health services research above, it is still different from traditional clinical research in a few important ways. Most importantly, there is a focus on a broader set of outcomes beyond clinical endpoints (e.g., mortality and morbidity), including quality of life and patient-centered outcomes. With the increasing popularity of Patient Centered Outcomes Research (PCOR), and the creation and funding of the Patient Centered Outcomes Research Institute (PCORI), researchers who primarily focus on this area would probably label themselves as “outcomes researchers”. Whereas investigators who focus on health care policy evaluation may refer to themselves as “health services researchers”. However, for the purposes of this overview, we view the two as comprising a single area of scientific endeavor which we will refer to as HSR.

1 An Introduction to Health Services Research

5

Table 1.1 Key differences between health services research and traditional clinical research The questions

The setting

The outcomes

The data

The tools

HSR asks broader questions. Rather than assessing clinical treatments, HSR questions often address the organization, delivery, financing, and regulation of the health care system HSR studies health care in “real world” settings as opposed to the carefully constructed environment of a clinical trial. This difference is often described as studying “effectiveness” (real world) vs. “efficacy” (randomized clinical trial) HSR often uses different end-points. Rather than focusing on clinical endpoints (morbidity and mortality), HSR often uses patient centered outcomes, such as quality of life and symptom bother Rather than directly collecting data from the medical record, HSR often uses large datasets to conduct observational research; or, at the other end of the spectrum, surveys or interviews with patients are used to gather very detailed information The research tools necessary to perform sophisticated HSR vary with the nature of the question and span from large database analysis and econometrics to qualitative research and psychometrics

As evident in these definitions above, there are several key distinctions between HSR and traditional clinical research, including important differences in the questions, the settings, the outcomes, the data, and the tools (Table 1.1).

1.3 Part I. Main Research Areas The book begins with an introduction to the main research themes that investigators are currently pursuing. Dr. Chang (Chap. 2) provides an overview of comparative effectiveness research and describes how this field goes beyond randomized clinical trials—which represent only a narrow part of the field. He describes the spectrum of study designs (e.g., pragmatic trials, observational studies) available for assessing which treatments are most effective, and how effectiveness may vary across different patient and provider subgroups. As described above, HSR often asks questions much more broadly than traditional clinical research, including investigating differences in practice style and treatment across large areas and understanding how these are shaped by healthcare policy. Dr. Goodney (Chap. 3) provides an overview of the seminal work done on variations across geographic areas by the Dartmouth Atlas group that opened our eyes to wide, unwarranted practice variations in the United States. Dr. Dimick (Chap. 4) then discusses the importance of taking a broad perspective in evaluating how health care policy research can help improve the context in which we work by critically evaluating the incentives and structure that are largely invisible, but shape our daily work. Dr. Haider (Chap. 5) considers the inequities in our health care system that lead to disparities in use and outcomes of surgery. He emphasizes that the field of disparities research needs to move beyond documenting quality gaps, and, instead, begin fixing them.

6

J.B. Dimick and C.C. Greenberg

Another important focus of HSR is on measuring and improving quality. Dr. Bilimoria, a leader within the American College of Surgeons National Surgical Quality Improvement Program (ACS-NSQIP), describes the field of quality measurement, including the pros and cons of the structure, process, and outcomes that are used as performance indicators (Chap. 6). Dr. Wong, who serves as an Associate Chair for Quality at her institution, then discusses how these quality indicators can be used locally to improve care (Chap. 7).

1.4 Part II. Emerging Areas of Research We next consider several emerging areas within HSR that are likely to become integral to our field within the next 5–10 years. While many of these fields are wellestablished outside surgery, only a small number of investigators are pursuing these within our profession, creating a large opportunity for young surgeon-scientists. Dr. Kao (Chap. 8) describes the field of implementation and dissemination research. Implementation science explicitly recognizes the gap in translating evidence into practice, providing a rich set of theoretical frameworks and research tools to rigorously study barriers and facilitators of the adoption of evidence in real-world settings. Building on this knowledge of the importance of “context” in optimizing healthcare, Dr. Ghaferi (Chap. 9) examines the important role organizational culture plays in creating well-functioning environments that are safe and favorable to successful adoption of best practices. There is also a growing emphasis on assessing outcomes from the patient perspective. Traditional clinical endpoints are clearly important, but there is often divergence between clinician and patient perspectives. For example, after inguinal hernia repair, surgeons usually measure the recurrence rate, which is quite rare. Patients, however, are much more bothered by chronic inguinal pain, which has a much higher rate than recurrence. Dr. Morris (Chap. 10) provides an overview of Patient-Centered Outcome measures, which are an increasingly important part of health services research. With the recent establishment of PCORI, and funds available for CER focused on the patient perspective, it is important for young surgeons to get involved in this area of research. Perhaps the most cutting edge research in our field is aiming to get inside the “black box” of what happens in the operating room. Most existing quality improvement work focuses on optimizing perioperative care (e.g., antibiotics for prevention of SSI) and completely ignores how the operation itself is conducted. Dr. Greenberg (Chap. 11) describes multidisciplinary efforts to understand and improve the performance of systems, teams and individuals in the operating room environment. Once we have creative solutions for improving quality and performance it is essential to have an infrastructure to disseminate and test them in the community. Dr. Finks (Chap. 12) describes one potential laboratory for evaluating these interventions, regional quality improvement collaboratives. He describes the power of such collaboratives for implementing best practices across large geographic areas and in diverse practice settings.

1 An Introduction to Health Services Research

7

1.5 Part III. Tools of the Trade The tools necessary to conduct HSR are diverse and constantly in flux. There is constant innovation in HSR bringing in expertise from additional fields. However, there are certain tools that are trademarks of HSR and we will cover those in this section of the book. Many young surgeons begin their research careers working with large datasets. These are relatively inexpensive and can help fellows and junior faculty get over “bibliopenia”—a necessary first step towards establishing yourself as an investigator. Dr. Shih (Chap. 13) provides an overview of large datasets available for conducting health services research. Because we often try to make causal inference from these large datasets, tools are need to address confounding and selection bias. Methods for addressing these problems and thereby enhancing causal inference are central to the HSR toolbox. Dr. Sheffield (Chap. 14) introduces commonly used methods, including multivariate regression, propensity scores, and instrumental variable analysis. Dr. Maggard (Chap. 15) describes how individual studies can be brought together and synthesized in a meta-analysis. Besides giving a single summary “best estimate” of available studies, these techniques also allow us to systematically study how the treatment effect varies across patient and provider subgroups (i.e., establish treatment-effect heterogeneity). Dr. Lee and Dr. Lubitz (Chap. 16) then provide an overview of what can be done to synthesize data to help patients make better decisions. Large datasets are usually a great starting point for young surgeon-scientists but they lack the detail required to answer many important questions. Dr. Brasel (Chap. 17) discusses the use of survey research to generate primary data about knowledge, attitudes, and beliefs. Dr. Schwarze (Chap. 18) then provides an introduction to qualitative research, a rich field of inquiry that uses focuses groups, interviews, and ethnographic methods to gather information. Qualitative research uses words rather than numbers as data and is an entire science unto itself. These methods are absolutely crucial for helping us understand “why” things do and do not work in healthcare.

1.6 Part IV. Career Development The final section provides practical advice for young surgeons interested in building a career focused in health services research. The first few chapters focus on mentoring within HSR. Drs. Sheetz (a medical student himself when he wrote the chapter) and Englesbe (Chap. 19) describe the keys to engaging medical students in outcomes research so it is a win/win proposition. Drs. Hyder and Pawlik (Chap. 20) offer advice on finding a mentor to conduct health services research and to guide career development more broadly. Dr. Hu (Chap. 21), who recently finished her

8

J.B. Dimick and C.C. Greenberg

fellowship with Dr. Greenberg, describes what the goals should be for a resident during their research fellowship. The book concludes with chapters targeting junior faculty and focus on funding for HSR (Dr. Segev, Chap. 22), choosing your first job (Dr. Regenbogen, Chap. 23), and building an outcome research program (Dr. Finlayson, Chap. 24).

Further Reading 1. Donabedian A. Evaluating the quality of medical care. Milbank Meml Fund Q. 1966;44: 166–203. 2. Wennberg JE, Gittelsohn A. Small area variations in health care delivery. Science. 1973;192:1102–8. 3. Brook RH, Ware JE, Rogers WH, et al. Does free care improve adult’s health? Results from a randomized controlled trial. N Engl J Med. 1983;309:1426–34. 4. Birkmeyer JD. Outcomes research and surgeons. Surgery. 1998;124:477–83. 5. Cabana MD, Rand CS, Pose NR, et al. Why don’t physicians follow clinical practice guidelines? A framework for improvement. JAMA. 1999;282(15):1458–65. 6. Lohr KN, Steinwachs DM. Health services research: an evolving definition of the field. Health Serv Res. 2002;37:7–9.

Chapter 2

Comparative Effectiveness Research George J. Chang

Abstract Comparative effectiveness research (CER) is a type of research involving human subjects or data from them that compares the effectiveness of one preventive, diagnostic, therapeutic, or care delivery modality model to another. The purpose of this research is to improve health outcomes by developing and disseminating evidence-based information to patients, clinicians, and other decision-makers to improve decisions that affect medical care. CER studies utilize a variety of data sources and methods to conduct timely and relevant research that can be disseminated in a quickly usable form to improve outcomes and value for health care systems. Keywords Comparative effectiveness research • Comparative effectiveness • Effectiveness • Patient centered outcomes research

2.1 Introduction Perhaps the greatest challenge confronting the United States (U.S.) health care system is delivering effective therapies that provide the best health outcomes with high-value. While some of the most innovative treatments originate from the U.S.

G.J. Chang, M.D., M.S. () Department of Surgical Oncology, University of Texas, MD Anderson Cancer Center, 1515 Holcombe Blvd, Houston, TX 77230-1402, USA Colorectal Center, University of Texas, MD Anderson Cancer Center, 1515 Holcombe Blvd, Houston, TX 77230-1402, USA Minimally Invasive and New Technologies in Oncologic Surgery Program, University of Texas, MD Anderson Cancer Center, 1515 Holcombe Blvd, Houston, TX 77230-1402, USA e-mail: [email protected] J.B. Dimick and C.C. Greenberg (eds.), Success in Academic Surgery: Health Services Research, Success in Academic Surgery, DOI 10.1007/978-1-4471-4718-3__2, © Springer-Verlag London 2014

9

10

G.J. Chang

and despite having among the highest per capita spending on health care, the U.S. still lags in important health outcomes for treatable medical conditions. Although debate remains about how it should be performed, in the push for health care reform there is agreement that addressing the gaps in quality and efficiency in U.S. health care should be a top priority. Too often there is inadequate information regarding the best course of treatment for a particular patient’s medical condition given that patient’s characteristics and coexisting conditions. In order to aid in these decisions for both individuals and groups of patients, knowledge regarding both the benefits and harms of the treatment options is necessary. Furthermore it is important to understand the priorities of the relevant stakeholders in these decisions so that the evidence that is generated is relevant and can be applied. Thus comparative effectiveness research (CER) focuses on addressing these gaps to improve health care outcomes while improving value.

2.2 What Is Comparative Effectiveness Research Broadly speaking, comparative effectiveness research is a type of research involving human subjects or data from them that compares the effectiveness of one preventive, diagnostic, therapeutic, or care delivery modality to another. Over the past decade the definition has evolved in part due to influence from policy makers to become more specific. As defined by the Federal Coordinating Council for Comparative Effectiveness Research (FCCCER), comparative effectiveness research is “the conduct and synthesis of research comparing the benefits and harms of different interventions and strategies to prevent, diagnose, treat, and monitor health conditions in ‘real world’ settings [1]. The purpose of this research is to improve health outcomes by developing and disseminating evidence-based information to patients, clinicians, and other decision-makers, responding to their expressed needs, about which interventions are most effective for which patients under specific circumstances.” It has been further refined by legislation through section 6301 of The Patient Protection and Affordable Care Act signed into law by President Obama in 2010. CER now pertains to research comparing the clinical effectiveness of “two or more medical treatments, services, and items.” These treatments, services, and items include “health care interventions, protocols for treatment, care management, and delivery, procedures, medical devices, diagnostic tools, pharmaceuticals, (including drugs and biological), integrative health practices, and any other strategies or items being used in the treatment, management, and diagnosis of, or prevention of illness or injury in, individuals.” Comparative effectiveness research focuses on the practical comparison of two or more health interventions to discern what works best for which patients and populations and to inform health-care decisions at the both the individual patient and policy levels. It seeks to provide evidence for the effectiveness, benefits, and harms of different treatment options. This differs from many traditional approaches to treatment studies that have prioritized the efficacy of treatment without equal

2 Comparative Effectiveness Research

11

emphasis on the potential harms and the associated burden to the patient or society. In contrast to efficacy, which is typically assessed under ideal circumstances within a stringent randomized study, effectiveness refers measurement of the degree of benefit under “real world” clinical settings. Unlike efficacy studies, CER extends beyond just knowledge generation to determine the best way to disseminate this new evidence in a usable way for individual patients and their providers. The scope of CER is broad and refers to the health care of both individual patients and populations. CER is highly relevant to healthcare policy and studies of organizations, delivery, and payment of health services. Indeed 50 of the 100 priority topics as recommended by the Institute of Medicine (IOM) in its Initial National Priorities for Comparative Effectiveness Research report related to comparing some aspect of the health care delivery system [3]. These topics relate to how or where services are delivered, rather than which services are rendered. Furthermore, in its scope, CER values stakeholder engagement (patients, providers, and other decision makers) throughout the CER process. Its over-arching goal is to improve the ability for patients, providers, and policy makers to make healthcare decisions that affect individual patients and to determine what works best and for whom. CER measures both benefits and harms and the effectiveness of interventions, procedures, regimens, or services in “real-world” circumstances. This is an important distinction from traditional clinical research and CER, in that CER places high value on external validity, or the ability to generalize the results to real-world decisions. Another important feature of CER is the employment of a variety of data sources and the generation of new evidence through observational studies and pragmatic clinical trials, or those with more practical and generalizable inclusion criteria and monitoring requirements [7, 8].

2.3 Why Comparative Effectiveness Research? As surgeons, perhaps one of the most compelling arguments for CER can be made in examining the hundreds of years old practice of bloodletting. While the lack of benefit and the associated harms seem obvious today, it wasn’t until Scottish surgeon Alexander Hamilton performed a comparative effectiveness study in 1809 that the harms of the practice were clearly identified. In this pragmatic clinical trial, sick solders were admitted to the infirmary where one surgeon employed “the lancet” as frequently as he wished and two others were prohibited from bloodletting. Mortality was ten-fold higher among the soldiers assigned to the blood-letting service (30 %) compared to only 3 % in the blood-letting prohibited service. While this may be quite an extreme example, there are many practices today that are performed with either only marginal benefits that may be outweighed by the harms or no clear evidence for benefit. This problem was highlighted in the 2001 IOM report Crossing the Quality Chasm that concluded that the U.S. health care delivery system does not provide consistent, high-quality medical care to all people and that a chasm exists between what health care we now have and what we could

12

G.J. Chang

(should) have [4]. The IOM further identified aims for improvement and rules for health care systems redesign. In summary, the goal is to provide safe, effective, patient-centered, efficient and equitable care that is evidence-based, coordinated, and without waste. What physician, at least fundamentally, does not share these goals? Yet it is clear that either sufficient evidence or the mechanisms for translating that evidence into practice in order to cross the chasm is lacking. For policy makers CER has become an important priority in an effort to identify ways to address the rising cost of health care. The unsustainability of the rising costs of U.S. health care has been widely recognized. With an aging population, resource use-based reimbursement, and advancing medical technology, health care spending accounted for 17.9 % of the U.S. Gross Domestic Product in 2012 and is projected to rise to 19.9 % by 2022. The rising costs are also driven by widespread variation in practice patterns and the use of new therapies leading to system waste due to the lack of high-quality evidence regarding treatment options. In fact the IOM estimates that fewer than 50 % of treatments delivered today are supported by evidence and that as much as 30 % of health care dollars are spent on medical care of uncertain value. It is imperative, therefore, to understand the incremental value of medical treatments in diverse, real-world patient populations to identify and promote the use of the most effective treatments and discourage or eliminate the use of ineffective treatments. Comparative effectiveness research (CER) hopes to address this need.

2.4 Investments and Activities in Comparative Effectiveness Research Comparative effectiveness research is not a new idea. The principles of CER applied to improve the quality and maximize the value of health care services have been in place for nearly 50 years. Efforts began with the U.S. Congress Office of Technology Assessment created in 1972 and abolished by Congress in 1995 (Fig. 2.1). The Agency for Health Care Policy and Research, later Agency for Health Care Research and Quality (AHRQ), initially focused on developing guidelines for clinical care but subsequently expanded its scope with the Medicare Modernization Act (MMA) of 2003 that ensured funding for CER. More recently efforts in CER have grown thanks in part to a greater federal emphasis in identifying value in health care through CER. In 2009 the American Recovery and Reinvestment Act (ARRA) provided $1.1 billion in research support for CER through the AHRQ to identify new research topics, evidence, gaps, and develop pragmatic studies and registries; the National Institutes of Health (NIH) to fund “challenge grants” and “grand opportunity” grants to address the Institute of Medicine (IOM) CER priority research areas; and the Department of Health and Human Services (HHS) to fund infrastructure, collection, and dissemination of CER. ARRA thus established the 15-member Federal Coordinating Council for Comparative Effectiveness Research (FCCCER), composed of senior representatives of several federal agencies to

2 Comparative Effectiveness Research

13

Fig. 2.1 Timeline representing the evolution of comparative effectiveness research in each of its forms within the U.S.

coordinate research activities and also allocated $1.5 to the IOM to investigate and recommend national CER priorities through extensive stakeholder engagement. The FCCCER report in its strategic framework identified four core categories for investment in CER: (1) research (generation or synthesis of evidence); (2) human and scientific capital (to train new researchers in CER and further develop its methods); (3) CER data infrastructure (to develop Electronic Health Records and practice based data networks); and (4) dissemination and translation of CER (to build tools and methods to disseminate CER findings to clinicians and patients to translate CER into practice). Furthermore, it recommended that the activities be related to themes that cut across the core categories. While there have been many public sector activities in CER including those funded by the AHRQ, NIH, the Veterans Health Administration (VHA), the Department of Defense (DoD), until recently it has not possible to estimate the total number of CER studies funded due to a lack of a standard, systematic means for reporting CER across the funding agencies. Additionally a number of public and private sector organizations have been engaged in CER, much of it has been fragmented and not aligned with a common definition or set of priorities for CER, resulting in numerous gaps in the research being conducted. Thus the Patient Protection and Affordable Care Act of 2010, subsequently approved by Congress as the Affordable Care Act (ACA), established the Patient

14

G.J. Chang

Centered Outcomes Research Institute (PCORI) to be the primary agency to oversee and support the conduct of CER. The ACA was enacted with provisions for up to $470 million per year of funding for Patient Centered Outcomes Research (PCOR) which includes greater involvement by patients, providers, and other stakeholders for CER. PCORI is governed by a 21 member board which has supplanted the FCCCER. The research that PCORI supports should improve the quality, increase transparency, and increase access to better health care [2]. However, the creation and use of cost-effectiveness thresholds or calculations of quality adjusted life years are explicitly prohibited. There is also specific language in the act that the reports and research findings may not be construed as practice guidelines or policy recommendations and that the Secretary of HHS may not use the findings to deny coverage, reflecting political fears that the research findings could lead to the potential for health care rationing. CER has also been a national priority in many countries including the UK (National Institute for Health and Clinical Excellence), Canada (The Canadian Agency for Drugs and Technologies in Health Care), Germany (Institute for Quality and Efficiency in Health Care), and Australia (Pharmacy Benefits Advisory Committee) to name just a few.

2.5 CER and Stakeholder Engagement A major criticism of prior work in clinical and health services research and potential explanation for the gap in practical knowledge with limited translation to real-world practice is that studies failed to maintain sustained and meaningful engagement of key decision-makers in both the design and implementation of the studies. Stakeholder engagement is felt to be a critical element for researchers to understand what clinical outcomes matter most to patients, caregivers, and clinicians in order to design “relevant” study endpoints that improve knowledge translation into usual care. Stakeholders may include patients, caregivers providers, researchers, and policy-makers. The goal of this emphasis on stakeholder engagement is to improve the dissemination and translation of CER. By involving stakeholders in identifying the key priorities for research, the most relevant questions are identified. In fact this was a major activity of the IOM when it established the initial CER priorities [5]. Now PCORI engages stakeholders in a similar fashion to identify new questions that are aligned with its five National Priorities for Research: (1) assessing options for prevention, diagnosis, and treatment; (2) improving health care systems; (3) addressing disparities; (4) communicating and disseminating research; and (5) improving patient-centered outcomes research methods and infrastructure [6]. Engagement of stakeholders can also help identify the best ways to disseminate and translate knowledge into practice.

2 Comparative Effectiveness Research

15

2.6 Types of CER Studies Comparative effectiveness research requires the development, expansion, and use of a variety of data sources and methods to conduct timely and relevant research and disseminate the results in a form that is quickly usable by clinicians, patients, policymakers, and health plans and other payers. The principle methodologies employed in CER include randomized trials (experimental study), observational research, data synthesis, and decision analysis [7]. These methods can be used to generate new evidence, evaluate the available existing evidence about the benefits and harms of each choice for different patient groups, or to synthesize the existing data to generate new evidence to inform choices. CER investigations may be based on data from clinical trials, clinical studies, or other research. As a more detailed coverage of specific research methods are provided in subsequent chapters of this text, we will focus the discussion here on aspects particularly relevant for the conduct of CER.

2.6.1 Randomized Trials Randomized comparative studies represent perhaps the earliest form of comparative effectiveness research for evidence generation in medicine. Randomized trials generally provide the highest level of evidence to establish the efficacy of the intervention in question and thus can be considered the gold standard of efficacy research. Randomized trials also provide investigators with an opportunity to study patient-reported outcomes and quality of life associated with the intervention, and also to measure potential harms of treatment. However traditional randomized trials have very strict inclusion and exclusion criteria, are typically performed after selection of the healthiest patients, and involved detailed and rigorous patient monitoring and management that is not routinely performed in day-to-day patient management. Thus while the traditional randomized controlled trial may assess the efficacy of an intervention, the real-world effectiveness of the intervention when performed in community practice may be quite different. One of the main limits to generalizability in traditional randomized controlled trials is the strict patient selection criteria designed to isolate the effect of the intervention from confounding. Moreover, treatment in randomized controlled trials often occurs in ideal clinical conditions that are not readily replicated during realworld patient care. Some of this difference stems from the fact that traditional randomized trials are often used for novel therapy development and for drug registration or label extension. In contrast among the goals of CER trials are to compare the effectiveness of various existing treatment options, to identify patient and tumor subsets most likely to benefit from interventions, to study screening and prevention strategies, and to focus on survivorship and quality of life. The

16

G.J. Chang

results of CER trials should be generalizable to the broader community and easily disseminated for broad application without the stringent criteria inherent in traditional randomized trials. A number of alternative, non-traditional trial designs may be considered for CER and overcome some of the limitations outlined above. In Cluster Randomized Trials, the randomization is by group rather than the individual patient. Implementation of a single intervention at each site improves external validity as patients are treated as in the real-world and there is less risk for contamination across the arms. Statistical methods such as hierarchical models must be used to adjust for cluster effects effectively reducing the statistical power compared to studies with individual randomization. Pragmatic Trials are highly aligned with the goals of CER as they are performed in typical practice and in typical patients with eligibility criteria designed to be inclusive. The study patients have the typical comorbid diseases and characteristics of patients in usual practice. In keeping with the practical nature and intent of the pragmatic trials, the outcomes measured are tailored to collect only the most pertinent and easily assessed or adjudicated. While these trials have good both internal (individual randomization) and external validity, the lack of complete data collection precludes meaningful subsequent subgroup analysis for evaluation of treatment heterogeneity. Adaptive Trials change in response to the accumulating data by utilizing the Bayesian framework to formally account for prior knowledge. Key design parameters change during the execution based upon predefined rules and accumulating data from the trial. Adaptive designs can improve the efficiency of the study and allow for more rapid completion. But there are limitations to adaptive designs, that affect the potential for type I error in particular. Sample size estimations can thus be complex and require careful planning with adjustment of statistical analyses.

2.6.2 Observational Studies While well-controlled experimental efficacy studies optimize internal validity, this often comes at the price of generalizability and the therapies studied may perform differently in general practice. Furthermore the patients for whom the benefits of therapy may be the most limited (patients who are elderly or have many comorbid conditions) are the least likely to be enrolled in the randomized trials. On the other hand, observational studies use data from patient care as it occurs in real life. Observational studies use data from medical records, insurance claims, surveys, and registry databases. Although observational studies are associated with the intrinsic benefit of being resource efficient and well-suited for CER studies, because the exposures or treatments are not assigned by the investigator and rather by routine practice considerations, threats to internal validity must be considered in the interpretation of the observed findings. A central assumption in observational CER studies is that the treatment groups compared have the same underlying risk for the outcome other than the intervention.

2 Comparative Effectiveness Research

17

Of course, only in a randomized trial is this completely possible. However, because observational studies use data collected in real life without a priori intent for the CER study, issues of bias are significant concerns, principally related to confounding by indication in intervention studies and confounding by frailty in prevention studies. Unfortunately it is difficult to measure these effects and therefore a number of methods have been developed in order to handle these issues.

2.6.2.1 Measuring Associations and Managing Confounding CER with observational data begins with descriptive statistics and graphical representation of the data to broadly describe the study subjects, assess their exposure to covariates, and assess the potential for imbalances in these measures. Estimates of treatment effects can then be determined by regression analysis. One increasingly common approach to cope with confounding for CER is propensity score analysis. Propensity score analysis is mainly used in the context of binary treatment choices to answer the question of why the patient was given one treatment over another. It is defined as the probability of receiving the exposure conditional on observed covariates and is estimated typically by logistic regression models. The propensity score is the estimated probability or propensity for a patient to receive one treatment over another. Patients with similar propensity scores may be “similar” for comparing treatment outcome. The propensity scores may be used for stratification, matching, and weighting or included as a covariate in regression model for outcomes. Propensity models should include covariates that are either confounders or are otherwise related to the outcome in addition to covariates that are related to the exposure. The distribution of propensity scores between the exposure groups may provide a visual assessment of the risk for biased exposure estimates among cohorts with poor overlap in propensity scores. Thus it is important after propensity adjustment, that balance in the study covariates between the exposure groups be carefully assessed. One approach to using propensity scores to adjust for confounding is matching on the propensity score. Good overlap in the propensity score distributions can facilitate balance in the study covariates as is achieved in randomized treatment groups. While use of propensity score adjustment can result in residual imbalances in the study covariates, matching techniques reduce sample size and power. Furthermore, one can only ensure that measured covariates are being balanced, and unmeasured confounding may still need to be addressed. Stratification by quantiles also permits comparisons among groups with heterogeneous response characteristics. Thus propensity score analyses are well suited to CER because they attempt to model the process of patient selection for therapy, focus on the treatment effect, and provide insight into subgroups with heterogeneous response characteristics. Disease risk scores (DRS) are similar to propensity scores in that it is a summary measure derived from the observed values of the covariates. It estimates the probability or rate of disease as a function of the covariates. It can be calculated as a regression of the “full-cohort” DRS, also known as the multivariate confounder

18

G.J. Chang

score. It relates the study outcome to the exposure and covariates for the entire study population. The resultant DRS is then derived for each individual subject and then stratified in order to determine the stratified estimate of the exposure effect. The DRS regression model can also be developed from the “unexposed cohort” only but the fitted values are then determined for the entire cohort. The DRS method is particularly favorable in studies having a common outcome and rare exposure or multiple exposures. Another approach to managing incomplete information or potentially unmeasured confounders in CER is instrumental variable analysis. An “instrument” is an external cause of the intervention or exposure but is by itself unrelated to the outcome. An important assumption is that the instrument does not affect the outcome except through the treatment. Even if there is unmeasured confounding, the effect of the instrument on the treatment without effect on the outcome can together be used to essentially create the effect of “randomization.” After dividing the population into subgroups according to the value of the instrumental variable, the rates of treatment in the groups will differ. Thus the probability of treatment is therefore not affected by individual characteristics and comparing outcomes between groups who have different values of the instrumental variable is analogous to comparing groups that are randomized with different probabilities (of receiving an intervention). However, an instrument must not affect the outcome except through the treatment (so called “exclusion restriction”). The choice of approach for coping with confounding should be determined by the characteristics and availability of the data and there may be situations where multiple analytic strategies should be utilized.

2.6.3 Research Synthesis Approaches to research synthesis include systematic reviews, meta-analysis, and technology assessments. Each of these methods rely upon the use of rigorous methods to collect, evaluate, and synthesize studies in accordance with explicit and structured methodology, some of which are outlined in AHRQ methods guides and by the IOM. There is considerable overlap between the methods of research synthesis for CER and for traditional evidence based medicine (EBM). However a central priority of CER evidence synthesis is the focus on the best care options in the context of usual care. Stakeholder input (e.g. citizen panels) is often solicited to select and refine the questions to be relevant to daily practice and for the improvement of the quality of care and system performance. Finally CER studies cast a broad net with respect to the types of evidence with high-quality, highly applicable evidence about effectiveness as the top of the hierarchy that may include pragmatic trials and observational studies. Systematic reviews in CER should have a pre-specified plan and analytic framework for gathering and appraising the evidence that sets the stage for the

2 Comparative Effectiveness Research

19

qualitative evidence synthesis. Stakeholder input may be again solicited to refine the analysis. If the studies lend themselves to a quantitative synthesis, a meta-analysis can provide a direct summary with a pooled relative risk. However, underlying heterogeneity of the studies can lead to exaggeration of the findings and is an important potential pitfall. By definition, CER reviews may include a broad range of study designs, not just randomized controlled trials, and the risk for amplification of bias and confounding must be carefully examined before quantitative synthesis.

2.6.4 Decision Analysis Decision analysis is a method for model-based quantitative evaluation of the outcomes that result from specific choices in a given situation. It is inherently CER in that it is applied to classic question of “which choice of treatment is right for me?” It involves evaluating a decision that considers the benefits and harms of each treatment choice for a given patient. For example a patient is faced with the decision between two surgical treatment options, one that has a lower risk for recurrent disease but a greater impact on function and another that has a higher risk for recurrent disease but a lower impact on function. Similarly a decision analytic question can be framed for groups of patients. Ultimately the answer to the question will depend on the probability of the outcome and the patient’s subjective value of the outcome. The decisions that are faced thus involve a tradeoff, for example a procedure may have a higher risk for morbidity but has a lower risk for recurrent disease, and the evaluation of this tradeoff permits individualization of treatment.

2.7 Conclusion The need to improve value in our health care treatment options has highlighted CER as a major discipline in health services research. The goal of CER is to generate the knowledge to deliver the right treatment to the right patient at the right time and thus aid patients, providers, and policymakers in making the best healthcare decisions. It emphasizes practical comparisons and generates evidence regarding the effectiveness, benefits, and harms of different treatment options. It utilizes a variety of research methodologies including pragmatic trials and observational research to assess real-world effects of treatment decisions and engages key stakeholders to improve the way that knowledge is disseminated and translated into practice. The challenge for the future will be to continue to develop the infrastructure, networks, methodologies and techniques that will help to close the gap between the health care that we have now and the health care that we could and should have.

20

G.J. Chang

References 1. Federal Coordinating Council for Comparative Effectiveness Research. Report to The President and Congress. U.S. Department of Health and Human Services. 2009. 2. Fleurence R, Selby JV, Odom-Walker K, et al. How the patient-centered outcomes institute is engaging patients and others in shaping its research agenda. Health Aff. 2013;32(2):393–400. 3. Iglehart JK. Prioritizing comparative-effectiveness research–IOM recommendations. N Engl J Med. 2009;361(4):325–8. 4. IOM (Institute of Medicine). Crossing the quality chasm. Washington, DC: The National Academies Press; 2001. www.nap.edu/catalog/10027.html. 5. IOM (Institute of Medicine). Initial national priorities for comparative effectiveness research. Washington, DC: The National Academies Press; 2009. www.nap.edu. 6. Patient-Centered Outcomes Research Institute. National priorities for research and research agenda [Internet]. Washington, DC: PCORI; [cited 2013 Oct 1]. Available from: http://www. pcori.org/what-we-do/priorities-agenda/. 7. Sox HC, Goodman SN. The methods of comparative effectiveness research. Annu Rev Public Health. 2012;33:425–45. 8. Tunis SR, Benner J, McClellan M. Comparative effectiveness research: policy context, methods development and research infrastructure. Stat Med. 2010;29:1963–76.

Landmark Studies • Iglehart JK. Prioritizing comparative-effectiveness research–IOM recommendations. New Engl J Med. 2009;361(4):325–8. • IOM (Institute of Medicine). Initial national priorities for comparative effectiveness research. Washington, DC: The National Academies Press; 2009. www.nap.edu. • Sox HC, Goodman SN. The methods of comparative effectiveness research. Annu Rev Public Health. 2012;33:425–45.

Chapter 3

Understanding Variations in the Use of Surgery Philip P. Goodney

Abstract This chapter details the origins of the study of variations in health care, with special attention to variations in the use, indications, and outcomes in surgery. We review the initial studies that demonstrated the value of this methodology, and describe how the study of variations allows insight into surgical practice. Finally, we will review the ways limiting variation can improve patient outcomes across a variety of surgical specialties, and inform health policy efforts aimed at limiting disparities in the use and outcomes of surgical procedures. Keywords Variation • Quality of care • Regression • Health policy • Patterns of care

3.1 Introduction Any surgeon “worth their salt” will attest that surgery can be technically difficult at certain times. Retro-hepatic vena cava exposure, penetrating trauma to Zone III of the neck, and drainage of a pancreatic pseudocyst are three examples of technically difficult surgical exercises that can test even the most experienced technicians. But more often than not, the most challenging aspect of surgery is deciding when – and when not -to operate. Surgeons of all specialties face these difficult scenarios every day. A 60 year old man with an elevated prostate-specific antigen

P.P. Goodney, M.D., M.S. () Director, Center for the Evaluation of Surgical Care (CESC), Department of Surgery, Dartmouth Hitchcock Medical Center, Lebanon, NH 03765, USA Co-Director, VA Outcomes Group, White River Junction, VA Medical Center, White River Junction, VT 05059 e-mail: [email protected] J.B. Dimick and C.C. Greenberg (eds.), Success in Academic Surgery: Health Services Research, Success in Academic Surgery, DOI 10.1007/978-1-4471-4718-3__3, © Springer-Verlag London 2014

21

22

P.P. Goodney

(PSA) test. A 75 year old woman with critical but asymptomatic carotid artery stenosis. A 50 year old man with back pain. In each of these clinical settings, there are a variety of approaches that could be the “right” answer. Some argue that each of these scenarios allows the practicing surgeon to emphasize the “art” of surgery as much as the science of heath care in deciding whom should be offered surgical treatment. A careful, informed conversation in the office, outlining the risks and benefits of a surgical approach is a cornerstone of an effective clinic visit. Discussing the options with the patient and their family, and formulating a plan that leaves both patients and physicians feeling like they are “making the best choice” can be both satisfying and rewarding. But many of us approach different scenarios in different ways, and different approaches lead to different practice patterns. The study of the variation in these practice patterns is the focus of this chapter. This variation can be helpful – by introducing “natural experiments” wherein different approaches can be studied. But this variation can also be harmful, resulting in overuse or underuse of surgical treatments, with loss of their potential benefits or exacerbations of their potential harms. This chapter will introduce the study of variation and its potential implications in surgery. In our current era of patient safety, study surgery as they would other complex systematic processes, such as manufacturing an automobile. In automobile manufacturing, variations can produce remarkable results – such as an elegant, beautiful hand-made sports car. However, while beautiful and elegant, a Ferrari (for many reasons) would not serve as an ideal mode of transportation for large populations. Rather, many argue Henry Ford’s approach – standardization and eliminating variation – may be much better. By limiting variation and ensuring quality, Ford delivered a better car, at a lower price, to more drivers than any of his competitors around the world. Surgery is certainly not an assembly line. However, there are shared characteristics between complex processes in manufacturing, and complex processes in patient selection and process measure performance in surgery. Limiting variation in both settings leads to better results. In this chapter, we explore the beginnings of efforts to study variations in health care, examine the progression towards variations in surgery and subspecialty surgery, and finally outline how attempts to limit variation have – and will – affect health policy in the United States.

3.2 The First Classic Papers: In the Beginning The study of variation in health care, and surgery in particular, began in the early 1970s, when J.P. Bunker examined differences in the numbers of surgeons and the number of operations between the United States [1]. This overview studied relationships between the supply of surgeons, the number of procedures, and health policy in a nationalized health care setting (England) and a fee-forservice environment (the United States). These insights prompted early interest.

3 Understanding Variations in the Use of Surgery

23

But subsequent work by John Wennberg, an internist and nephrologist with public health training, prompted the first real interest and excitement surrounding the variation in patient care. Interestingly, this work was not published in JAMA or the New England Journal, but in a journal with an even broader impact – Science [2]. This landmark paper was not a broad, sweeping analysis comparing entire health systems, as the Bunker analysis had done a few years earlier. Rather, Wennberg’s approach was exactly the opposite. He chose to approach the problem from a different, and entirely novel, perspective – by studying small area variation. Instead of examining practice patterns across the country, he examined them across an entire state, and a small one at that – Vermont, which was (and still is) a small state, with a population of around half a million residents. As shown in Fig. 3.1, Dr. Wennberg studied patterns of care by creating a new “unit of analysis” – the hospital service area. Wennberg and his colleague, Alan Gittleson, painstakingly examined each community in Vermont, and studied where patients sought surgical care. He categorized the state into 13 distinct regions – terms hospital service areas – where each service area represented the population of patients served by a community or academic hospital or group of hospitals. Using these methods, Wennberg was able to study variation in patients, utilization, and outcomes across these service areas. Wennberg’s findings were striking. Tonsillectomy rates per 10,000 persons, adjusted for age, varied from 13 in some regions to 151 in others. Similar extent of variation was seen in appendectomy (10–32 per 10,000 population), prostatectomy (11–38 per 10,000 population), and hysterectomy (20–60 per 10,000 population). And when Wennberg looked for explanations for these striking variations, he found a simple but elegant explanation. The more physicians and hospitals in a service area, the more services they provided. These relationships held fast across a broad variety of measures – number of procedures, population size, and number and type of specialists. What Wennberg did not find was large differences in patients across the communities in Vermont. Patients, overall, were similar – but the amount of care they received was not. Wennberg concluded, in this early work, that there are wide variations in resource input, utilization of services, and expenditures – even in neighboring communities. Further, these variations in utilization seemed to be directly related to considerable uncertainty about the effectiveness of specific health services. His “prescription” for these uncertainties was to spend the next 40 years attempting to use informed choice to leverage patient decision-making towards trying to limit variations in care.

3.3 Gaining Momentum: Bigger Is Better Building on these initial analyses, Wennberg and colleagues sought to broaden their work from a small state in New England to more representative – and generalizable – insights about the extent of variation occurring across the United

24

P.P. Goodney

Fig. 3.1 Map of Vermont demonstrating hospital services areas. Dark lines represent boundaries of hospital service areas; areas without dots are served principally by New Hampshire hospitals (Reproduced with permission according to JSTOR, Gittelsohn [2])

States. Accordingly, Wennberg, John Birkmeyer and colleagues used an aggregate of the hospital service area – called the hospital referral region – to study variation in common surgical procedures [3]. While the hospital service area studied care at the level of neighborhoods and communities, the hospital referral region

3 Understanding Variations in the Use of Surgery

25

Fig. 3.2 Map demonstrating variation in rates of carotid endarterectomy across the 306 hospital referral regions of the United States (Reproduced with permission from Elsevier, Birkmeyer et al. [3])

(n D 306 across the United States) studied care at the level of a regional referral center. And, instead of using data from one state, Medicare claims were selected to provide a national, generalizable view of variations in surgical care. Birkmeyer’s findings centered around two important principles. First, just as Wennberg found dramatic variation across different – and sometimes neighboring communities in Vermont, Birkmeyer found dramatic variation across different hospital referral regions across the United States. For example, as shown in Fig. 3.2, rates of carotid endarterectomy varied nearly three-fold across different regions of the United States. Maps demonstrating region variation, inspired by the work of investigators in the Dartmouth Atlas, became universal in terms of a way to demonstrate regional differences in utilization. Darker areas represented areas where procedures were performed more commonly, and lighter areas represented the areas where procedures were performed less commonly. These representations brought these differences to stark contrast, and one cannot help looking at the map and seeing what color – and utilization rate – is reflected in the region you call home. The second important finding this work demonstrated was that the extent of variation was different across different types of operations. As shown in Fig. 3.3,

26

P.P. Goodney

Fig. 3.3 Variation profiles of 11 surgical procedures, demonstrating the ratio of observed to expected Medicare rates in the 306 hospital referral regions of the United States. Rates are adjusted for age, sex, and race, with high and low outlier HRRs distinguished by dotted lines (Reproduced with permission from Elsevier, Birkmeyer et al. [3])

there were certain operations where consensus existed, in terms of when to proceed with surgery. Hip fracture demonstrated this axiom quite nicely and unsurprisingly so. The indication for surgery is clear in this setting, as a hip fracture is easy to diagnosis. The benefits are easily seen as well, as all but the most moribund patients do better with surgery than with non-operative care. Therefore, there is little variation across the United States in terms of the utilization of hip fracture surgery. Figure 3.3 demonstrates this concept by showing each hospital referral region (HRR) as a dot, and listing the procedures across the x-axis. All HRRs cluster closely together for procedures like hip fracture. However, for procedures like carotid endarterectomy, back surgery, and radical prostatectomy, the HRRs spread over a much wider range. These procedures,

3 Understanding Variations in the Use of Surgery

27

unlike hip fracture, are much more discretionary in their utilization. In general, it is evident that procedures with the highest degree of variation reflect areas of substantial disagreement about both diagnosis (what does an elevated PSA really mean) and treatment (is back surgery really better than conservative treatment)? Dealing with this variation will require, Birkmeyer argues, will require better understanding of surgical effectiveness, patient-specific outcome assessment, and a more thorough understanding of patient preferences. Patients, clinicians, payers, and policymakers all will need to work together, he argues, to determine “which rate is right.”

3.4 Innovating Approaches, and Integrating Ideas – From Medicine to Surgery After these publications in the early 1990s, Wennberg and his colleagues spent the next decade refining analytic methods, and incorporating what seemed to be a recurrent theme in their work: that there was significant variation in the provision of medical care, and more care was not necessarily associated with better outcomes. But critics wondered if this work, limited in clinical detail, actually reflected different care on similar patients – because clinical variables for risk adjustment were commonly unavailable. To deal with these limitations, researchers began to use clinical events – such as death to create cohorts similar in risk strata. In the most prominent of these approaches, Wennberg and Fisher created cohorts of patients who were undergoing care – medical, surgical and otherwise – at the end of life [4, 5]. By studying care provided in the last year of life, they argued, all patients in the cohort had similar 1-year mortality – 100 % – therefore limiting the effect of any un-measurable confounders. This research, published in 2003 and widely referenced, concluded that nearly 30 % of spending on end of life care offers little benefit, and may in fact be harmful. Surgeons were quick to translate these innovate approaches, and integrate these ideas into surgical analyses. In a manuscript published in the Lancet in 2011, Gawande, Jha and colleagues adopted this technique and studied surgical care in the last year of life [6]. They had two basic questions. First, they asked if regional “intensity” of surgical care varied by the number of hospital beds, or by the number of surgeons in a region. And second, they examined relationships between regional surgical intensity and its mortality and spending rate. Their team found that nearly one in three Medicare patients underwent a surgical procedure in the last year of life, and that this proportion was related to patient age (Fig. 3.4). Regions with the highest number of beds were mostly likely to operate on patients in the last year of life (R D 0.37), as were regions where overall spending in the last year of life was highest (R D 0.50). These findings reinforced earlier considerations about the need for patient-specific outcomes, and patient preferences in the provision of care at the end of life.

28

P.P. Goodney

Fig. 3.4 Percentage of 2008 elderly Medicare decedents who underwent at least one surgical procedure in the last year of life (Reproduced with permission from Elsevier, Kwok et al. [6])

3.5 Specialty Surgeons and Their Efforts in Describing and Limiting Variation Many of the previously described investigations approached the subject of surgical variation using broad strokes – studying procedures as diverse as hip fracture, lower extremity bypass, and hernia repair, all within in the same cohorts. These approaches garnered effective, “big-picture” results, and surgeons grew interested in studying variation. Just as Wennberg sought to establish precise detail in the level of variation, surgeons now grew interested in exploring the different extent and drivers of variation across different specialties. In this section, we discuss three areas of subspecialty variation spine surgery and vascular surgery.

3.5.1 Variation in Spine Surgery Patients presenting with back pain are a diverse cohort, and treatment with surgery is used at different rates in different parts of the country. As interest in studying the extent of variation and its causes began to build momentum, Weinstein and colleagues explored variation in the use of spine surgery for lumbar fusion [7]. These interests were brought to the fore with the development of devices such as prosthetic vertebral implants and biologics such as bone morphogenetic protein, all placed into everyday practice with a dearth of high quality evidence from randomized trials. Weinstein and colleagues saw these changes occurring in “real-time”, in the context of their clinical interests as spine surgery specialists. They found that rates of spine surgery rose dramatically over between 1993 and 2003. By 2003, Medicare

3 Understanding Variations in the Use of Surgery

29

Fig. 3.5 Total Medicare reimbursements, for lumbar spine surgery, by year (Reproduced with permission from Elsevier, Weinstein et al. [8])

spent more than one billion dollars on spine surgery. In 1992, lumbar fusion accounted for 14 % of this spending, and by 2004, fusion accounted for almost half of total spending on spine surgery (Fig. 3.5). These observations led them to investigate the extent of this variation. What they found was truly remarkable. As shown in Fig. 3.6, there was nearly a 20-fold range in the rates of lumbar fusion across different hospital referral regions – the largest coefficient of variation reported with any surgical procedure to that date, a value five-fold greater than any variation seen in patients undergoing hip fracture. These data served to motivate extensive funding for the SPORT (Spine Patient Outcomes Research Trial), one of the largest continually funded randomized trials funded by the National Institutes of Health [8].

3.5.2 Variation in Vascular Surgery Clinical changes have motivated research into variation in other areas as well, especially in patients with lower extremity vascular disease. Much like cages and bone proteins revolutionized spine surgery, the development of endovascular techniques revolutionized the treatment of lower extremity and cerebrovascular occlusive disease. Before the mid-1990s, patients with carotid stenosis or lower extremity vascular disease, for the most part, had only surgical options for revascularization. However, with the endovascular revolution, dramatic changes occurred in two important ways.

30

P.P. Goodney

Fig. 3.6 Variation profiles of orthopedic procedures (Reproduced with permission from Elsevier, Weinstein et al. [8])

Given a less invasive endovascular option, many patients who were not candidates for open surgery could now undergo less-invasive endovascular surgery. And, because these approaches no longer required a surgical approach, the pool of potential practitioners now grew instantly – from surgeons alone, to surgeons, radiologists, interventional cardiologists, and a variety of catheter-trained specialists. Motivated by these changes, Goodney and colleagues explored trends in lower extremity revascularization [9], aortic aneurysm repair [10], and carotid revascularization [11]. They found that temporal changes occurred in the national utilization of open surgical repair and endovascular interventions for all of these procedures (Fig. 3.6). And moreover, changes in specialty profile often were linked directly to changes in the utilization of these procedures. For example, cardiologists and vascular surgeons, over the last decade, came to supplant interventional radiologists as the principal providers of lower extremity revascularization (Fig. 3.7). Therefore, changes in the types of providers (Fig. 3.8), as well as the types of procedures, often contribute to variation in utilization.

3 Understanding Variations in the Use of Surgery

31

Fig. 3.7 Rates of endovascular interventions, lower extremity bypass surgery, and major lower extremity amputation in Medicare patients 1996–2006 (Reproduced with permission from Elsevier, Goodney et al. [9])

Fig. 3.8 Proportion of endovascular interventions, by specialty (1996–2006) (Reproduced with permission from Elsevier, Goodney et al. [9])

32

P.P. Goodney

3.6 Moving Toward Health Policy – The Next Steps in Limiting Variation in Surgical Care Over the last decade, the evidence became irrefutable that unwarranted variation was present in many aspects of surgery, especially in settings where new technology and uncertainty in indications cross paths. Upon these foundations, surgeons began to study the effect of these variations on outcomes. In this section, we will review the manner in which variation in care can create “natural experiments” – settings where patients receive different care because of variation, and where these differences can be used to examine the effect of different exposures on outcomes. In our first example, Ghaferi and colleagues studied variation in surgical mortality after several common general and vascular surgery procedures [12]. Using data from the National Surgical Quality Improvement Program (NSQIP), they categorized hospitals according to their mortality rates, and found significant variation in mortality across hospitals. While this was interesting in and of itself, they also noted key differences between hospitals when they studied surgical complication rates, in hospitals with low mortality rates, as well as in hospitals with high mortality rates. Their study noted that while complication rates were similar across mortality risk, those patients treated at hospitals with the highest mortality were most likely to die following complications – unlike patients in low mortality hospitals, who were likely to be “rescued’ following a surgical complication and were unlikely to die from it (Fig. 3.9). These data provided powerful evidence, and a guideline for national quality improvement efforts aimed at limiting mortality with inpatient surgery. To address hospitals with high mortality rates, the most effective strategies may not be to simply try to limit complications, but instead may try to teach physicians and hospitals how to better deal with complications after they occur. In our second example, Goodney et al. examined the effect of different levels of intensity of vascular care on outcomes in patients with severe lower extremity peripheral arterial disease [13]. Using the regional rates of intensity of vascular care as an exposure variable, their study noted that those regions most likely to perform invasive vascular care tended to have lower population-based rates of amputation. In other words, no matter how you measured it – diagnostic vascular care, invasive endovascular procedures, or open surgical reconstructions, more vascular care was closely correlated with lower rates of amputation for population in those regions (Fig. 3.10). These data were vital in arguing that patients at risk for amputation require attention and identification for more care, unlike other areas in surgery where variation has primarily focused on under-treatment.

3.6.1 Moving Towards Policy Implementation: A Unique Tool, the Dartmouth Atlas The description of variations in healthcare began more than 30 years ago, and evidence surrounding these scope and impact of these variations has been building

3 Understanding Variations in the Use of Surgery

33

Fig. 3.9 Rates of all complications, major complications, and death after major complications, by hospital quintile of mortality (Reproduced with permission, personal communications, Dr. Justin Dimick [12])

Fig. 3.10 Relationships between regional intensity of vascular care and amputation rate, for (a) all inpatient revascularizations, (b) open surgical bypass, (c) endovascular interventions, and (d) all outpatient/inpatient procedures (Reproduced with permission from Elsevier, Goodney et al. [13])

34

P.P. Goodney

Fig. 3.11 The original edition of the Dartmouth Atlas of HealthCare (Reproduced with permission, personal communication, John Wennberg, AHA Publishing, Inc. Nov 1, 2013)

for more than three decades. Publications in major journals, attention from task forces and leaders in health policy, and lengthy consensus opinions have all stated that variation in health care delivery, especially surgical care, is not helpful, and potentially harmful. So why, then, has this trend continued? Undoubtedly, translating this evidence to effective health policy has been challenging. Measuring variation is difficult, as is defining the implications of variation. The data involved can make an accountant’s head spin, and often the clearest messages only emerge after careful study of reams and reams of data. To help increase the visibility of the difficulties surrounding variation in the provision of health care, and to help those in health policy grasp its true impact, Dr. Wennberg conceived an “atlas” of health care that would graphically convey this message to broad audiences. First published in 1996, the original edition of the Dartmouth Atlas of HealthCare used maps, charts, and tables to illustrate the relationships between geography, variation, and healthcare (Fig. 3.11) [14]. These compendiums, rather than simply aimed at medical audiences, were written for a broader appeal. Wennberg felt that health policy had to be understood

3 Understanding Variations in the Use of Surgery

35

not just by physicians, but by patients, payers, and policymakers to have the greatest impact. And, given the success of the original edition, several subsequent Atlases and reports have followed. These reports have garnered attention from leaders in health policy both nationally and internationally, and have served as a blueprint for health care reform aimed at limiting variation and unnecessary spending in the US health care system.

References 1. Bunker JP. Surgical manpower. A comparison of operations and surgeons in the United States and in England and Wales. N Engl J Med. 1970;282(3):135–44. PubMed PMID: 5409538. 2. Gittelsohn WJ. Small area variations in health care delivery. Science. 1973;182(4117):1102–8. PubMed PMID: 4750608. 3. Birkmeyer JD, Sharp SM, Finlayson SR, Fisher ES, Wennberg JE. Variation profiles of common surgical procedures. Surgery. 1998;124(5):917–23. Reprinted with permission from Elsevier. 4. Fisher ES, Wennberg DE, Stukel TA, Gottlieb DJ, Lucas FL, Pinder EL. The implications of regional variations in Medicare spending. Part 1: The content, quality, and accessibility of care. Ann Intern Med. 2003;138(4):273–87. 5. Fisher ES, Wennberg DE, Stukel TA, Gottlieb DJ, Lucas FL, Pinder EL. The implications of regional variations in Medicare spending. Part 2: Health outcomes and satisfaction with care. Ann Intern Med. 2003;138(4):288–98. 6. Kwok AC, Semel ME, Lipsitz SR, Bader AM, Barnato AE, Gawande AA, et al. The intensity and variation of surgical care at the end of life: a retrospective cohort study. Lancet. 2011;378(9800):1408–13. PubMed PMID: 21982520. 7. Weinstein JN, Lurie JD, Olson PR, Bronner KK, Fisher ES. United States’ trends and regional variations in lumbar spine surgery: 1992-2003. Spine (Phila Pa 1976). 2006;31(23):2707–14. PubMed PMID: 17077740. Pubmed Central PMCID: 2913862. Reprinted with permission from LWW. 8. Weinstein JN, Lurie JD, Tosteson TD, Tosteson AN, Blood EA, Abdu WA, et al. Surgical versus nonoperative treatment for lumbar disc herniation: four-year results for the Spine Patient Outcomes Research Trial (SPORT). Spine (Phila Pa 1976). 2008;33(25):2789–800. PubMed PMID: 19018250. Pubmed Central PMCID: 2756172. 9. Goodney PP, Beck AW, Nagle J, Welch HG, Zwolak RM. National trends in lower extremity bypass surgery, endovascular interventions, and major amputations. J Vasc Surg. 2009;50(1):54–60. PubMed PMID: 19481407. Copyright Elsevier, 2009. 10. Scali ST, Goodney PP, Walsh DB, Travis LL, Nolan BW, Goodman DC, et al. National trends and regional variation of open and endovascular repair of thoracic and thoracoabdominal aneurysms in contemporary practice. J Vasc Surg. 2011;53(6):1499–505. PubMed PMID: 21609795. Pubmed Central PMCID: 3313472. Epub 2011/05/26. eng. 11. Goodney PP, Travis LL, Malenka D, Bronner KK, Lucas FL, Cronenwett JL, Goodman DC, Fisher ES. Regional variation in carotid artery stenting and endarterectomy in the medicare population. Circ Cardiovasc Qual Outcomes. 2010;3(1):15–24. doi:10.1161/CIRCOUTCOMES.109.864736. Epub 8 Dec 2009. Reprinted with permission from Elsevier. 12. Ghaferi AA, Birkmeyer JD, Dimick JB. Variation in hospital mortality associated with inpatient surgery. N Engl J Med. 2009;361(14):1368–75. PubMed PMID: 19797283. Reprinted with permission from Massachusetts Medical Society.

36

P.P. Goodney

13. Goodney PP, Holman K, Henke PK, Travis LL, Dimick JB, Stukel TA, Fisher ES, Birkmeyer JD. Regional intensity of vascular care and lower extremity amputation rates. J Vasc Surg. 2013;57(6):1471–79, 1480.e1-3; discussion 1479-80. doi:10.1016/j.jvs.2012.11.068. Epub 1 Feb 2013. Reprinted with permission from Elsevier. 14. www.dartmouthatlas.org. Dartmouth Atlas of HealthcareOctober 1st, 2007: [October 1st, 2007 pp.]. Available from: www.dartmouthatlas.org. Accessed 21 Jan 2013.

Landmark Papers • Birkmeyer JD, Sharp SM, Finlayson SR, Fisher ES, Wennberg JE. Variation profiles of common surgical procedures. Surgery. 1998;124:917–23. This paper was the first to demonstrate that small area analysis of variations in surgical practice was feasible on a national scale, and introduced the study of regional variation to surgeons. • Fisher ES, Wennberg DE, Stukel TA, Gottlieb DJ, Lucas FL, Pinder EL. The implications of regional variations in medicare spending. Part 1: The content, quality, and accessibility of care. Ann Intern Med. 2003a;138:273–87. • Fisher ES, Wennberg DE, Stukel TA, Gottlieb DJ, Lucas FL, Pinder EL. The implications of regional variations in medicare spending. Part 2: Health outcomes and satisfaction with care. Ann Intern Med. 2003b;138:288–98. This two-part manuscript is among the most commonly referenced large studies of variation in Medicare spending and its effect on outcomes. • Wennberg J, Gittelsohn. Small area variations in health care delivery. Science. 1973;182:1102– 08. This was the first prominent publication demonstrating that small area analysis could identify important variations in the use and outcomes of medical care.

Chapter 4

Health Policy Research in Surgery Justin B. Dimick, Terry Shih, and Andrew M. Ryan

Abstract The purpose of this chapter is to provide an overview of health policy research in surgery. We will begin by considering common pitfalls in conducting health policy research. We will then provide research examples from two key areas: (1) Physician and hospital payment reform and (2) Surgical training and workforce policy. For each area of research, recent and impending policy changes will be discussed; examples of studies that have answered important questions provided; and important research questions that are not yet answered will be highlighted. Finally, we will close with a brief discussion of the research tools necessary to generate the right answers and where to find collaborators for those interested in pursuing research in this field. Keywords Health • Econometrics

policy



Quality



Outcomes



Large

databases

4.1 Introduction Despite being largely invisible to most practicing surgeons, health policy shapes every detail of the context in which we work, including (1) how we are paid, (2) how we are trained, and (3) whether we are incentivized for volume or value. J.B. Dimick, M.D., M.P.H. () Department of Surgery, University of Michigan Health System, 2800 Plymouth Road, Bldg 16, Room 137E, Ann Arbor, MI 48109-2800, USA e-mail: [email protected] T. Shih, M.D. Department of Surgery, University of Michigan Health System, Ann Arbor, MI, USA A.M. Ryan, Ph.D. Division of Healthcare Policy and Economics, Weill Cornell Medical College, New York, NY, USA J.B. Dimick and C.C. Greenberg (eds.), Success in Academic Surgery: Health Services Research, Success in Academic Surgery, DOI 10.1007/978-1-4471-4718-3__4, © Springer-Verlag London 2014

37

38

J.B. Dimick et al.

However, despite shaping our environment, health policy is hard to see. Surgical researchers are often drawn to focus on topics that have immediate relevance to their practice. Surgeons are therefore naturally drawn to research that compares the effectiveness of different approaches to managing disease. After all, most of us chose surgery over other specialties because of how tightly linked outcomes are to our interventions. Surgeons thrive on immediacy. Nonetheless, the decisions made by policymakers create the reality in which we live, however remote these decision are from our daily practice. Moreover, policymakers often make such decisions without good evidence. There is very little research on health policy in surgery and consequently, very little “evidence-based policymaking”. We need surgeons focused on evaluating the effectiveness of health policy to better inform these decisions. The purpose of this chapter is to provide an overview of health policy research in surgery. We will begin by considering common pitfalls in conducting health policy research. We will then provide research examples from two key areas: (1) Physician and hospital payment reform and (2) Surgical training and workforce policy. For each area of research, recent and impending policy changes will be discussed; examples of studies that have answered important questions provided; and important research questions that are not yet answered are identified. Finally, we will close with a brief discussion of the research tools necessary to generate the right answers and where to find collaborators for those interested in pursuing research in this field.

4.1.1 Common Pitfalls in Health Policy Research There are several common pitfalls to be aware of when conducting health policy research. Because it is difficult (or nearly impossible) to randomize hospitals to different policy options, health care delivery system research often uses observational studies and so-called “natural experiments”. Ignoring pre-existing time trends. One important flaw of many policy evaluation studies is to ignore pre-existing trends towards improved outcomes. When evaluating a policy, it is tempting to simply compare outcomes before vs. after its implementation. Such “pre-post” studies may incorrectly attribute a significant improvement in outcomes to the policy when outcomes would have improved without the policy (i.e., if there was a trend towards improved outcomes that was not adjusted for). The best way to avoid this pitfall is to include a control group of patients (or hospitals) that can be used to account for (i.e., subtract off) the time trends. This sort of quasi-experimental study design is common in health policy research and is called a “differences-in-differences” design. The first difference is the delta between the pre and post periods (outcomes in post-period [A] – outcomes in pre-period [B]). The second difference is the delta between the patients exposed vs. not-exposed to the policy (outcomes in the exposed [C] – outcomes in the nonexposed [D]). The differences-in-differences “estimator” is then calculated as the

4 Health Policy Research in Surgery

39

delta between these two differences as follows: (A–B)–(C–D). The key assumption for this method is that the trends are parallel prior to the implementation of the policy, which should be directly tested prior to using this approach. If they are not parallel, this can be adjusted for using a time*exposure interaction term. If you are not familiar with these methods, it is extremely important to consult an economist (or econometrician) when performing these complex regression analyses. An alternative approach is needed to adjust for time trends when there is no control group of hospitals, i.e., where the intervention was implemented in all United States hospitals simultaneously (e.g., public reporting of outcomes on Medicare’s Hospital Compare website). In this setting, you can control for time trends directly by including a year or quarter dummy variable in your model. If the time trend is not linear then non-linear time trends can be used. Once again, it is important to consult an econometrician or statistician to help conduct these analyses. Not considering unintended consequences. Another important consideration is to include an evaluation of unintended consequences. Changes in health policy can have several important unintended effects. There could be positive or negative “spillover” effects to other clinical areas. Positive spillover effects would be when a policy aimed at improving care in one clinical area (or one outcome) results in improvements for other areas (or outcomes). For example, policies aimed at reducing readmissions for vascular surgery could also result in fewer readmissions for other procedures (e.g., general surgery) as these patients are often cared for by the same nurses and on the same floors. Negative spillover effects, often called “multi-tasking” in the economics literature, result when resources are shifted to a targeted condition and care deteriorates in another clinical area. For example, policies aimed at reducing readmissions for vascular surgery could inadvertently increase readmissions for other surgical conditions if resources are taken from other surgical patients and care is improved only for the patients targeted by the policy. Health policy researchers need to evaluate for potential spillover effects and other potential unintended consequences.

4.1.2 Physician and Hospital Payment Reform For young surgeons entering practice right now, rising health care costs and consequent reform to “bend the cost curve” will be the single driving policy force of their professional lives. For the last 20 years, rising health care expenditures have been called unsustainable, and it appears that they are finally living up to that reality. The recent passage of the Affordable Care Act (ACA) brought about several changes that will fundamentally change how physicians and hospitals are paid by Medicare. Since private payers often follow Medicare’s lead, we are seeing similar changes in payment and delivery system innovation from all major insurers. Most of these changes aim to move from a volume based payment system to one that pays for value. One of the dominant features of these reforms is to shift financial risk to providers. Such “at risk” payment models fundamentally change the incentives for

40

J.B. Dimick et al.

Table 4.1 Center for Medicare and Medicaid Services (CMS) Policies aimed at improving quality and reducing costs in surgery Policy option Selective referral Refer patients to specific providers, i.e., “centers of excellence” Non-payment for adverse events Incentivize quality improvement by withholding payment for certain adverse outcomes Pay for performance Reward providers for high quality or low cost care Bundled payment Incentivize efficient, coordinated care by bundling payments around an episode Accountable care organizations Health care providers accept risk for reducing health care expenditure growth for population of Medicare beneficiaries

CMS adoption National coverage decisions for selected procedures Example: Bariatric surgery coverage linked to “center of excellence status” National programs already in place Examples: In October 2008, CMS discontinued additional payments for certain hospital-acquired conditions that were deemed preventable Multiple large pilot programs Example: Medicare/Premier Pilot for cardiac and orthopedic surgery Regional and national pilot programs Examples: Medicare Acute Care Episode (ACE) Demonstration Project; and the Center for Medicare and Medicaid Innovation (CMMI) Bundled Payment Pilot Program Pilot programs Examples: Pioneer Accountable Care Organization (ACO) Demonstration Program; Medicare Shared Savings Accountable Care Organization (ACO) Program

improving quality of care. In traditional payment models, the payer bears most of the financial risk for complications. Payers would foot the bill for complications and the subsequent health care, including prolonged length of stay, physician consultations, home health care, and skilled nursing care after discharge. The Center for Medicare and Medicaid Services (CMS), the largest health care payer, has several candidate policies that aim to improve quality and reduce costs in surgery, including selective referral, pay-for-performance, non-payment for adverse events (e.g., hospital acquired infections and readmissions), episode bundled payments, and accountable care organizations (Table 4.1). It is essential that researchers thoroughly evaluate the benefits and harms of these policy changes. Without such research, policymakers will not know what works, and what doesn’t, as we move forward with future iterations of payment reform. Below we include several examples from the literature that evaluate health care policies. These illustrate many of the concepts discussed above, including how these studies addressed common pitfalls in health policy research. Example 1 Bariatric surgery complications before vs after implementation of a national policy restricting coverage to centers of excellence. Dimick JB, Nicholas LH, Ryan AM, Thumma JR, Birkmeyer JD. JAMA 2013;309:792–799. This study from our research group evaluated the impact of the CMS national coverage decision for bariatric surgery, which was the most ambitious selective referral program in surgery to date. In 2006, CMS limited coverage of bariatric

4 Health Policy Research in Surgery

41

surgery to so-called centers of excellence (COEs) as defined by the American College of Surgeons (ACS) and American Society for Metabolic and Bariatric Surgery (ASMBS). Prior studies evaluating the program had shown benefits, with reductions in morbidity and mortality. However, these studies had failed to adequately account for pre-existing trends towards improved outcomes in bariatric surgery. In our study, a control group of non-Medicare patients undergoing bariatric surgery was used to adequately account for these trends. In this differences-indifferences analysis (discussed in detail above), there was no independent effect of the CMS policy on overall complications, serious complications, or reoperations. This study demonstrates the importance of adequately adjusting for pre-existing time trends. Without such an adjustment, policymakers would mistakenly attribute the improved outcomes to the policy. Key Unanswered Questions Further research needs to demonstrate the extent to which this policy limited access for Medicare beneficiaries in need of bariatric surgery. It is possible that Medicare patients had to travel further for surgery. Vulnerable populations may experience a decline in availability of surgery if they could not afford to travel away from their homes. Since the policy had no measurable benefit, research demonstrating such harms should strongly motivate CMS to reconsider this policy. Example 2 The long-term effect of premier pay for performance on patient outcomes. Jha AK, Joynt KE, Orav EJ, Epstein AM. N Engl J Med 2012;366:1606–1615. This study evaluated the impact of Medicare’s flagship pay-for-performance program, the Premier Hospital Quality Incentive Demonstration (HQID), on patient outcomes. Prior studies had demonstrated improved adherence to processes of care with the implementation of the program, but its longer-term impact on risk-adjusted outcomes had not been explored. The authors evaluated 30-day mortality for coronary artery bypass grafting (and other medical conditions) at 252 hospitals participating in the Premier HQID compared to control hospitals that were not participating. They also used a differences-in-differences design to ensure that temporal differences in outcomes were taken into account. They found no improvement in outcomes, beyond existing trends, with the implementation of the pay-for-performance program. From this data, they inferred that other programs modeled after this program, such as the Hospital Value Based Purchasing Program implemented as part of the Affordable Care Act, is unlikely to have meaningful impact on patient outcomes. Key Unanswered Questions The important questions around pay-for-performance include whether programs with larger incentives will have an impact on outcomes. As Medicare’s Hospital Value Based Purchasing Program becomes implemented nationally, it will be important to understand if this large program will have benefits. With programs that penalize hospitals for poor outcomes, it will be important to conduct studies to understand whether such policies improve or exacerbate racial and socioeconomic disparities in surgical outcomes.

42

J.B. Dimick et al.

4.1.3 Surgical Training and Workforce Policy Policy around surgical training includes the implementation of the 80-hour workweek, which has dramatically changed how we train surgeons in the United States. The motivation for this policy change was the perception that with longer workweeks surgical trainees are fatigued and make more errors that threaten patient safety. This policy change has been written about in numerous studies that assess resident and faculty perceptions about safety. But relatively few studies have addressed the key question: Did this policy have the intended consequences of improving patient safety? We will discuss an example of a paper that examined this question below. However, it is also important to also ask whether the policy had any unintended consequences. For this particular policy, unintended consequences include the potential to make patient safety worse, by increasing hand-offs, and possibly impacting surgical education in way that makes surgical trainees less prepared for independent clinical practice. Examples 3 and 4 Mortality among hospitalized Medicare beneficiaries in the first 2 years following ACGME resident duty hour reform. Volpp KG, Rosen AK, Rosenbaum PR, Romano PS, Even-Shoshan O, Wang Y, Bellini L, Behringer T, Silber JH. JAMA 2007;298:975–983. Mortality among patients in VA hospitals in the first 2 years following ACGME resident duty hour reform. Volpp KG, Rosen AK, Rosenbaum PR, Romano PS, Even-Shoshan O, Canamucio A, Bellini L, Behringer T, Silber JH. JAMA 2007;298: 984–992. To assess whether the implementation of the 80-hour workweek was associated with improved patient safety, Volpp and colleagues conducted these two large population-based studies in the national Medicare population and the Veterans Affairs population. These studies compared mortality rates before vs. after implementation of the 80-hour workweek. They found no changes in 30-day mortality after implementation of the policy. Rather than simply using a pre-post design, comparing outcome before vs. after the policy was implemented, this study employed an elegant analytic approach that used several control groups with varying degrees of teaching intensity. Specifically, they used the ratio of residency positions to beds to classify teaching intensity and used this as their main exposure variable, comparing outcomes in “high” vs. “low” intensity teaching hospitals. Since the policy should have a larger impact on hospitals with higher teaching intensity (vs. lower teaching intensity), they were able to evaluate the impact of the policy on safety in a controlled fashion. Key Questions Left Unanswered These studies are important and widely cited but they do not answer the question of whether the 80-hour workweek has impacted the quality of surgical training. There is a need for definitive studies evaluating whether surgeons are as capable when entering independent practice after the policy as they were before.

4 Health Policy Research in Surgery

43

Another key health policy issue is the adequacy of the surgical workforce. There is active, passionate debate about whether there is an impending shortage of general surgeons. Many believe the aging baby boomer population and the increased need for surgery will lead to a shortage of surgeons. This conclusion is logical. However, there is another school of thought that thinks we already have too many surgeons and this leads to overutilization of surgical procedures, especially discretionary procedures. This theory of “supply sensitive care” has been popularized by the Dartmouth Atlas of Healthcare, and argues that we should not be concerned about a 10 % shortage of surgeons when we have more than two-fold (100 % differences) variations across regions of the United States. It is likely there are an oversupply in certain regions and an undersupply in others. These diverging schools of thought are more than theoretical musing. The practical implications of fixing the problem using available policy levers bring about an important issue. If we try and fix the projected shortage by increasing the number of surgeons we train, it is very likely that we will exacerbate the distribution problems rather than alleviate the shortage-i.e., the surgeons we train will no doubt choose to live in the most desirable areas and the regions that are currently underserved will continue to be that way. In other words, increasing the overall supply of surgeons is a very blunt tool for fixing a shortage that only exists in a few regions. Once again, there have been numerous studies written that use various models to make workforce predictions in surgery. But very few have assessed the key question: How many surgeons are actually needed to provide adequate care within a region? Below we review one of the few studies that address this important question, albeit from an indirect perspective. Example 5 Perforated appendicitis among rural and urban patients: implications of access to care. Paquette IM, Zuckerman R, Finlayson SR. Ann Surg 2011;253:534–538. This study evaluated rates of perforated appendicitis in rural as compared to urban areas. Perforation was used as a proxy for delayed access to care in rural areas. Paquette and colleagues used the Nationwide Inpatient Sample (NIS) to compare rates of perforation across regions with different population density. They found that patients living in rural areas were more likely than those living in urban areas to present with perforated appendicitis (36 % vs. 31 %). Although this is indirect evidence, it suggests there may not be enough general surgeons in rural regions. Key Question Left Unanswered There is very little research to guide evidencebased policymaking in decisions about surgical workforce. Future research needs to focus on better understanding how many surgeons are needed to provide access to all necessary care in a region. The study by Finlayson and colleagues is a good start but this work needs to be extended to a broader range of clinical conditions. In addition, research aimed at evaluating the impact of policies for increasing the supply of surgeons in underserved regions should be pursued. For example, it is unclear if incentives such as loan forgiveness to work in these areas translate into a long-term increase in the supply of surgeons in these areas.

44

J.B. Dimick et al.

4.2 What Research Tools Are Needed for Health Policy Research? One key challenge to evaluating health policy is that it is nearly impossible to conduct true randomized experiments. It is very difficult to randomize hospitals and physicians to different payment structures. As discussed above, it is necessary to draw inferences about the effectiveness of policy changes from quasi-experimental or observational studies. Economics and econometrics (this is what economists call statistics) provides robust methodological tools for designing these studies. Longitudinal study designs, including differences-in-differences, and other panel data approaches are widely used in policy evaluation. These build on standard linear and logistic regression models that can be learned in any basic statistics class. Many of these econometric models (e.g., differences-in-differences) can be implemented in multivariate regression models as a simple interaction term. For example, in a study evaluating outcomes pre vs. post in an exposed and non-exposed group of hospitals, the difference-in-difference can be estimated as the interaction term post*exposed (where post D 1 after implementation and exposed D 1 if the policy targeted the hospital where the patient had surgery).

4.3 Where Can I Find Collaborators for Health Policy Research? For policy evaluation, the most natural collaborators are often health economists, which you may find in the medical school health services research department, a health management and policy department in a school of public health, or in the undergraduate department of economics. Collaborators from political science, policy analysis, and health policy will provide the policy context, advice about looking for the impact of unintended consequences, and will often also have quantitative skills in econometrics.

4.4 Where Should I Get Started in Health Policy Research? The best place to start is to keep up with health policy. Follow changes in health policy by reading the newspaper. The New York Times, Wall Street Journal, and Washington Post provide robust coverage of health policy. If you want information early you may have to visit the Federal Register or directly read the legislation to find the important details of a policy. The most relevant journals include Health Affairs, NEJM, and JAMA. Health Affairs is entirely focused on health policy and innovation in health care delivery, and has a health policy blog that is a good source of information. NEJM and JAMA

4 Health Policy Research in Surgery

45

often publish policy-relevant articles, including editorials (Perspectives in NEJM and Viewpoints in JAMA). NEJM also has a special section of its website dedicated to health policy and reform. Other journals that demonstrate rigorous methodology for policy evaluation but tend to have a broader focus include Health Services Research and Medical Care. As discussed in other chapters, the key to developing a career in any research discipline is to find a mentor. Mentors who do health policy research in surgery are rare, and you may find greater success looking for non-surgeons (maybe even non-physicians).

4.5 Conclusion Health policy is often invisible but shapes every detail of how we work. Because of unsustainable growth in health care expenditures, the pace of policy change is accelerating. The careers of surgeons training today will be characterized by constant innovation in our delivery system, particularly in how we get paid by Medicare and private payers. Sophisticated research on the effectiveness of these policy changes is needed to help policymakers make evidence based decisions. Despite the importance of this area of research, there are very few surgeons involved in rigorous policy evaluation, which provides a great opportunity for young surgeons to fill this void.

Further Reading 1. Dimick JB, Nicholas LH, Ryan AM, Thumma JR, Birkmeyer JD. Bariatric surgery complications before vs. after implementation of a national policy restricting coverage to centers of excellence. JAMA. 2013;309:792–9. 2. Jha AK, Joynt KE, Orav EJ, Epstein AM. The long-term effect of premier pay for performance on patient outcomes. N Engl J Med. 2012;366:1606–15. 3. Volpp KG, Rosen AK, Rosenbaum PR, Romano PS, Even-Shoshan O, Wang Y, Bellini L, Behringer T, Silber JH. Mortality among hospitalized Medicare beneficiaries in the first 2 years following ACGME resident duty hour reform. JAMA. 2007;298:975–83. 4. Volpp KG, Rosen AK, Rosenbaum PR, Romano PS, Even-Shoshan O, Canamucio A, Bellini L, Behringer T, Silber JH. Mortality among patients in VA hospitals in the first 2 years following ACGME resident duty hour reform. JAMA. 2007;298:984–92. 5. Paquette IM, Zuckerman R, Finlayson SR. Perforated appendicitis among rural and urban patients: implications of access to care. Ann Surg. 2011;253:534–8.

Chapter 5

Studying Surgical Disparities: It’s Not All Black and White Diane Schwartz and Adil Haider

Abstract Over the past several years, there has been an explosion of publications describing disparities in virtually all aspects of our health care system. Surgery is no exception, with recent studies relating inequities in surgical care and outcomes based on race, gender, age, socioeconomics, education, and even location of where an individual lives. Progress, however, has remained slow and a major reason for this has been the lack of elucidation of the mechanisms that lead to disparities. Without understanding these, it is impossible to create effective solutions. Since most people who will be reading this chapter are surgeons, we would strongly recommend all of you to focus on identifying causes of disparities or on creating and testing innovative programs to reduce them. Keywords Race • Disparities • Outcomes • Socioeconomic

5.1 Introduction Over the past several years, there has been an explosion of publications describing disparities in virtually all aspects of our health care system. Surgery is no exception, with recent studies relating inequities in surgical care and outcomes based on race, gender, age, socioeconomics, education, and even location of where an individual lives.

D. Schwartz, M.D. Assistant Professor, Department of Surgery, Johns Hopkins Bayview Medical Center, Baltimore, MD, USA A. Haider, M.D., M.P.H. () Center for Surgical Trials and Outcomes Research (CSTOR), Johns Hopkins University, School of Medicine, 600 N Wolfe St, Halsted 610, Baltimore, MD 21287, USA e-mail: [email protected] J.B. Dimick and C.C. Greenberg (eds.), Success in Academic Surgery: Health Services Research, Success in Academic Surgery, DOI 10.1007/978-1-4471-4718-3__5, © Springer-Verlag London 2014

47

48

D. Schwartz and A. Haider

Given this surge in papers, one would think that this is relatively new field with lots to explore. Although there is a lot to explore, this is not a new field at all. In fact, there is a paper in the New England Journal of Medicine from 1977 stating that black patients at Johns Hopkins were four times more likely than white patients to be operated on by residents without appropriate supervision, while white patients undergoing similar operations were cared for more directly by attending surgeons [1]. The first policy work on disparities in health care dates to the secretary of Health & Human Services report in the 1980s when the concept was first introduced at the national level. Since then, there has been significant policy progress at the federal level and eradicating disparities has become a national priority with several programs and laws focused on this problem. Congress even passed a law to create the National Institute of Minority Health and Disparities (NIMHD), whose mission involves the elimination of healthcare disparities for minorities. Progress, however, has remained slow and a major reason for this has been the lack of elucidation of the mechanisms that lead to disparities. Without understanding these, it is impossible to create effective solutions. Since most people who will be reading this chapter are surgeons, we would strongly recommend you focus on identifying causes of disparities or creating and testing innovative programs to reduce them. In this chapter we will give a brief overview of what is known about disparities (so that you don’t have to repeat these studies), followed by thoughts re: mechanisms at play (which is where you can achieve success as an academic surgeon). We will then present a new framework for solving a public health problem in six steps, hoping to collectively apply this concept to the elimination of disparities.

5.2 What Kind of Disparities Have Been Defined in Surgery? Attempts to mitigate health care disparities must begin with an understanding of inequities in the at-risk populations. In this section, we summarize the different types of disparities that have previously been described.

5.2.1 Race Disparities in surgical outcomes have been extensively described for multiple races, across a spectrum of procedures. The most compelling evidence exists in reporting worse outcomes for black patients versus their white counterparts, with higher rates of morbidity, mortality, and disease recurrence, and lower rates of operative care. For example, a large national study of appendectomy, gastric fundoplication and gastric bypass surgery found that black patients were twice as likely to die, and up to four times as likely to suffer from a complication/surgical misadventure as white patients [2]. Similarly disparate outcomes have been reported

5 Studying Surgical Disparities: It’s Not All Black and White

49

for a variety of procedures, worsening with higher-risk operations, such as spine surgery, craniotomies, solid organ transplants, limb revascularizations and carotid endarterectomies. While differences in undiagnosed comorbidities may potentially confound the aforementioned accounts, the existence of disparities in trauma, often considered the “great equalizer,” provide conclusive proof of this health care inequity [3]. A recent meta-analysis estimated that injured black patients were nearly 20 % more likely to die than injured white patients [4]. The evidence regarding other non-white demographics, such as Hispanics and Asians, is largely equivocal and conflicting, warranting further research to more clearly elucidate the presence of disparities.

5.2.2 Gender Inequitable access to care constitutes the most prominent feature of gender-based disparities. Mota et al. report that despite an equal need and demand as men, women are less likely to receive total hip or knee arthroplasty [5]. Similarly, women are less likely to undergo life-saving cardiac procedures, or be recommended for heart and kidney transplants. While other reports suggesting that women are more likely to have worse outcomes following diabetes-related lower extremity amputations and coronary interventions, and better outcomes following sepsis and traumatic shock may have biological bases, undertones of true gender-based disparities cannot be completely disregarded.

5.2.3 Age Health care disparities occur at all ages. However, due to their inherent vulnerabilities, these effects are magnified in the pediatric and geriatric populations, often overlapping with other contributing factors. For example, Chavers et al. report that kidney transplant recipients aged 14–16 are at greatest risk of graft failure, with outcomes worse for black adolescents [6]. Additionally, an analysis of the National Pediatric Trauma Registry found black children with traumatic brain injury to have worse clinical and functional outcomes compared with similarly injured white children [7]. Multiple national studies evaluating disparities in arthritis-related hip and knee surgeries report lower utilization rates for older black versus white patients (age 65). Similarly, many studies evaluating outcomes for geriatric trauma demonstrate higher morbidity and mortality in this population, even with relatively minor injuries. Trunkey et al. report that withdrawal of trauma care for the elderly is often arbitrarily decided with suboptimal documentation [8]. Recent evidence also suggests that outcomes disparities are mitigated for black compared with white geriatric trauma patients, perhaps due to better universal health coverage or advanced age being the overwhelming driver of outcomes.

50

D. Schwartz and A. Haider

5.2.4 Disabilities People with physical or mental disabilities often have worse outcomes. A recent US Public Health Service publication, Closing the Gap: A National Blueprint to Improve the Health of Persons with Mental Retardation, states that persons with mental disabilities are more likely to receive either no care or inappropriate and inadequate treatment [9]. Consequently, this unmet health care need translates into a higher documented incidence of non-communicable diseases, early death and complications. Although few studies specifically evaluate surgical outcomes in this population, a recent study by Dhiman et al. demonstrates that patients with cerebral palsy (CP) face four times the odds of any complication after appendectomy compared with non-CP patients [10]. It is likely that in the course of future research, similar patterns will emerge for other surgical outcomes in these patients.

5.2.5 Geographic Location Remoteness to care and clustering of health quality are two key elements of geographically-driven health care disparities. Regional differences have been described for implantable cardioverter-defibrillator (ICD), stroke care and solid organ transplant utilization rates [11, 12]. Additionally, discrepant regional outcomes have been described for spine surgery, prostate cancer, colorectal cancer, vascular, transplant and trauma surgery. A large nationwide study demonstrates that Northeastern states have a 70 % higher risk of mortality compared with Southern states following anterior cervical spine procedures [13]. Another study by Lian et al. reports a 20 % higher colorectal cancer-specific risk of death in the most socioeconomically disadvantaged neighborhoods [14]. Similarly, Herbert and colleagues report that black mothers were nearly 60 % less likely to deliver at top-tier hospitals compared with white mothers [15]. In trauma, urban inner city hospitals are known to have worse outcomes. A recent evaluation of trauma center performance suggests that minority patients cluster at low quality hospitals, with twice as many predominantly minority trauma centers being high mortality compared with predominantly white centers [16]. While the reasons are certainly more complex, these regional differences identify communities and locales at higher risk for disparities.

5.3 What Are the Underlying Mechanisms? Disparities in the myriad of vulnerable populations can be explained by common underlying mechanisms. In this section we explore these mechanisms at the patient, provider and systemic levels. An understanding of these is critical before any improvement initiative can be designed or implemented.

5 Studying Surgical Disparities: It’s Not All Black and White

51

5.3.1 Patient Factors 5.3.1.1 Insurance Insurance status is known to affect outcomes after elective and emergent surgery, with the uninsured facing higher complication and mortality rates than privately insured patients. Although race and socioeconomic status are known to interact with insurance status, multiple studies demonstrate poor outcomes for the uninsured even after controlling for such effects. It is not all about access to hospitals though, as insurance status is associated with increased mortality after trauma as well [17]. Given that federal law mandates emergent treatment of trauma and emergency patients regardless of their ability to pay, this finding demonstrates the complexity of merely looking at insurance status.

5.3.1.2 Socioeconomic Status Socioeconomic status (SES), used to quantify an individual’s wealth status, has been shown to independently affect surgical outcomes. Multiple studies using this Nationwide Inpatient Sample (NIS) have consistently demonstrated more outcomes for patients from lower socioeconomic areas using median household income of the patients zip code as a proxy for SES [18, 19]. However, using the same dataset, Ricciardi et al. found that SES was not a significant predictor of mortality after appendectomy, gastric fundoplication and gastric bypass [20]. These differences may reflect a variation across disease severity and procedure type. While the evidence is often conflicted in describing the association between SES and higher morbidity and mortality rates, lower SES is an important determinant in the provision of surgical care. Axelrod and colleagues demonstrate that highest SES strata patients were nearly 80 % more likely to receive kidney transplants than those in the lowest SES strata in their analysis of the Organ Procurement and Transplant Network (OPTN) data [21]. Another study found that total knee arthroplasties were less commonly performed for patients with household incomes D

Description Explains the factors and processes contributing to how, why, and how quickly new advances are spread and adopted in societies over time Assists with the planning, evaluation, reporting and review in translating research into practice Diagnoses factors contributing to the health of the target population to assist in planning health programs (PRECEDE) and provides measures to evaluate the implementation process, impact, and outcome (PROCEED)

Evaluates how the inter-relationships between the external environment, the implementation and sustainability infrastructure, the intervention, and the recipients influence implementation Describes successful implementation as a function of the evidence, the context in which the evidence is being introduced, and the facilitation strategies utilized Provides a taxonomy consisting of five domains (characteristics of individuals, intervention characteristics, inner setting, outer setting, and process) and multiple constructs that influence implementation. Unifies key constructs from multiple published theories in order to build implementation knowledge across settings and studies

Examples and classification as dissemination or implementation models taken from Tabak et al. [4]

vary in their focus (dissemination, implementation, or both); the flexibility of their constructs; and the levels at which they operate (individual, organization, community, and system). These models also vary in their terminology and the degree of validation. Nonetheless, for researchers interested in D&I, there are a variety of models from which to choose (see Table 8.1 for examples). Whether the researcher selects an existing model without modification, adapts the model, or creates a new

88

L.S. Kao

model should depend upon factors such as the research focus, the intervention, the target population, and the setting. The researcher should understand the applicability of the model to the intended research, the strengths and limitations of the selected model, and the ability to integrate resulting research findings using that model into the broader D&I knowledge base.

8.2 QI and Implementation Interventions (or Strategies) Changing practice through QI efforts or implementation of clinical interventions (i.e., peri-operative antibiotic or venous thromboembolism prophylaxis) can be challenging. There are multiple QI and implementation interventions or strategies that have been utilized either alone or in combination to facilitate or compel change. These interventions may target multiple levels including patients, providers, organizations, and/or communities. Examples of QI or implementation interventions include patient or provider incentives, financial penalties, audit and feedback, educational initiatives, computerized reminders, collaborative involvement, and community-based QI teams. Systematic reviews suggest that most of these interventions have resulted in at least a modest improvement in performance, although the quality of the evidence is poor. Although these interventions are often complex and target multiple levels, there is no clear evidence to support the superiority of multi-faceted versus single interventions. Ultimately, there is no single or multifaceted intervention that is effective across all settings, nor is a single intervention necessarily the only effective method for facilitating change within a specific setting. Multiple approaches may be equally successful in producing the same outcome, a concept known as equifinality. Further research is necessary to determine which QI or implementation interventions are most effective in which settings. Continuous quality improvement (CQI): A widely used QI strategy is CQI, also known as Total Quality Management (TQM). CQI is derived from a strategy used by the manufacturing industry which evaluates a process over time to determine whether variation is present that causes the process to be unstable and unpredictable. If a process is out of control, iterative cycles of small changes are made to address the problem. These small-scale changes are referred to as Plan-Do-StudyAct (PDSA) or Plan-Do-Check-Act (PDCA) cycles (Fig. 8.1a). Several tools assist in the performance of CQI. Variation is evaluated using statistical process control (SPC) charts (Fig. 8.1b). Special-cause variation exists (i.e., a process is out of control) if the process is outside the upper and lower control limits, or in excess of three standard errors of the process mean in either direction. SPC charts can be used to both monitor and manage change. If specialcause variation or a problem is detected, various tools can be used to diagnose the contributing factors. Pareto charts (Fig. 8.1c) depict the frequency of each factor in descending order as well as the cumulative frequency. They are based on the principle that 80 % of the problem results from 20 % of the contributing factors. Identification of the major factors contributing to the problem can guide

8 Implementation Science and Quality Improvement

89

Fig. 8.1 Tools of continuous quality improvement (CQI) (a) Plan-Do-Study-Act (PDSA) cycles test a series of small-scale changes whereby each new change is informed by data from previous cycles of change (b) Statistical Process Control (SPC) Charts evaluate for special-cause variation as identified by outliers outside of three standard deviations from the mean, represented by the upper and lower control limits. (c) Pareto charts show the relative frequency of factors contributing to the problem in descending order (represented by the bars) as well as the cumulative percentage of contribution of the sum of the factors (represented by the line) (d) Fishbone diagrams (or cause-and-effect or Ishikawa diagrams) demonstrate the causes and sub-causes of the problem. (e) Flowcharts demonstrate the steps in a process (i.e., in preparing a patient for surgery and ensuring compliance with peri-operative antibiotic and venous thromboembolism, VTE, prophylaxis measures)

90

Fig. 8.1 (continued)

L.S. Kao

8 Implementation Science and Quality Improvement

91

initial QI efforts and maximize their impact. Fishbone diagrams (also called causeand-effect or Ishikawa diagrams, Fig. 8.1d) are also used to systematically identify and categorize all of the contributing factors to a problem. The “spine” represents the problem, while the major branches or “bones” depict the major causes of the problem. Minor branches represent sub-causes. Flowcharts (Fig. 8.1e) are used to depict the steps of a process to identify where changes are necessary. Other tools such as check sheets, scatter diagrams, and histograms are also used. More details about CQI and related tools can be found in textbooks and courses (see Resources).

8.3 Research Designs There are several questions to consider in designing QI or implementation research: What is the quality of the evidence and strength of recommendation for the clinical intervention? Has the intervention been tested in real world conditions? Has the intervention been tested in the particular setting or population of interest? What are the optimal strategies for facilitating uptake or adoption of the clinical intervention in that particular setting or population? Strong versus weak recommendations for an intervention: There are multiple tools and resources for evaluating the level of evidence and strength of recommendation for an intervention; one system that is frequently used in translating evidence into guidelines is GRADE (Grading of Recommendations Assessment, Development and Evaluation). The quality of the evidence is determined by the study design; sources of bias due to methodological limitations; and the magnitude, consistency, and precision of the estimate of treatment effect. The strength of the recommendation accounts for the overall benefits versus risks of an intervention, burdens, costs, and patient and provider values. The quality of evidence and strength of recommendation for an intervention or guideline affect the implementation process and evaluation. For example, guidelines based on high quality evidence may result in greater stakeholder acceptance and ease in implementation; measurement of adoption of the guidelines by the providers may be adequate to ensure success. On the other hand, guidelines based on only moderate evidence may be harder to implement and require more rigorous assessment of their effect on patient outcomes. Efficacy versus effectiveness: Efficacy or explanatory trials test an intervention under tightly-controlled or “ideal” circumstances in order to isolate its effect in a small, homogeneous, highly compliant patient population. Effectiveness or pragmatic trials test an intervention in the “real world” in a large, heterogeneous patient population. Efficacy trials focus on internal validity or minimization of bias while effectiveness trials focus on external validity or generalizability. The PRECIS (Pragmatic-Explanatory Continuum Indicator Summary) framework is a tool that can be used by researchers in order to place a trial along the continuum. The ten dimensions are depicted as spokes on a wheel, with the explanatory pole at the hub and the pragmatic pole at the rim (Fig. 8.2). Trials may fall along a continuum

92

L.S. Kao

Fig. 8.2 The pragmatic-explanatory continuum indicator summary (PRECIS) tool. (a) The indicators are at the periphery of the PRECIS wheel, which represents the pragmatic end of the continuum. (b) The indicators are at the center of the PRECIS wheel, which represents the explanatory end of the continuum

8 Implementation Science and Quality Improvement

93

between efficacy and effectiveness, but trials towards the effectiveness end of the spectrum tend to be more amenable to translation into practice. Effectiveness versus implementation: Effectiveness and implementation trials differ in their interventions, units of randomization and analysis, and outcomes; they also differ in the research methodologies used to assess these outcomes. Effectiveness trials evaluate clinical interventions (i.e., drugs or procedures), whereas implementation trials evaluate whether the intervention works when applied to a new patient population or in a different setting. Implementation trials may utilize one or more strategies aimed at promoting uptake of a clinical intervention as described above (i.e., audit and feedback). Effectiveness trials focus on the patient and individual health outcomes, whereas implementation trials focus on providers, units, or systems and proximal outcomes such as degree of adoption of a process measure. Rather than perform effectiveness and implementation trials sequentially, hybrid effectiveness-implementation designs have been proposed in order to minimize delays in translating evidence into practice. There are three types of hybrid designs ranging from a primary focus on effectiveness (Type I) to a primary focus on implementation (Type III). In general, these designs are intended to test interventions with minimal safety concerns and preliminary evidence for effectiveness, including at least indirect evidence of applicability to the new population or setting. To randomize or not to randomize? Randomized controlled trials (RCTs) are considered the gold standard for traditional clinical interventional studies because they minimize imbalances in baseline characteristics between treatment arms. However, RCTs are infrequently performed in QI research for a variety of reasons: costs, resources, and time; desire for rapid change; perceived favorable risk-benefit ratio of the intervention; and lack of trial expertise. Nonetheless, even QI or implementation interventions may not be effective in the real world or may have unintended consequences. As with other therapies, QI interventions tested in RCTs have been demonstrated to be ineffective or even harmful such as a QI intervention to increase use of total mesorectal excision in rectal cancer [1], collaboratives to improve antibiotic prophylaxis compliance [2], and a bundle of evidence-based practices to prevent surgical site infections [3]. RCTs may be appropriate when the quality of the evidence for an intervention is poor or when there is significant risk associated with the intervention. Randomization may occur at the individual or unit level. There are advantages and disadvantages to each (Table 8.2). Randomizing individuals may lead to contamination whereby the intervention is taken up by the control group, thus resulting in underestimation of the treatment effect. On the other hand, cluster RCTs (i.e., where the hospital is the unit of randomization) require larger sample sizes and more resources. RCTs are not always feasible or practical, in which case, alternative designs should be considered (Table 8.2). If possible, a control group should be included as there are multiple reasons that biased results may occur in an uncontrolled before-and-after study. These include but are not limited to: regression to the mean (whereby outlying values move towards the average over time), temporal trends such

94

L.S. Kao

Table 8.2 Advantages and disadvantages of study designs Design

Control group

Advantages

Disadvantages

Singe center RCT

Yes

Minimizes bias due to differences in baseline characteristics between groups

Multi-center cluster RCT

Yes

Minimizes bias due to differences in baseline characteristics between groups Less risk of contamination than single center RCT

Potential for contamination May have sample size limitations Cannot change the intervention during the trial (must be fully vetted) Increased sample size requirements for intra-class correlation

Uncontrolled pre- and postintervention cohort study

No

May be most feasible design for single center study

Controlled preand postintervention cohort study Interrupted time series design

Yes

Improved ability over uncontrolled pre- and post- study to adjust for temporal trends Improved ability over preand post- study to determine temporal trends and make causal interpretations regarding intervention

Switching replications time series design

Yes

No/Yes

Improved ability over interrupted time series design to identify trends due to external factors other than the intervention

Increased cost, resources Cannot change the intervention during the trial (must be fully vetted) Weak study design due to biases such as confounding by unrelated temporal trends or changes in the patient population, regression to the mean, etc. Unable to determine cause and effect, even in the presence of a statistically significant change May be confounded by differences between the control and the intervention groups May need a large number of measurements depending upon the stability of the trend before and after the intervention Does not eliminate bias due to other non-temporal confounders May need a large number of measurements depending upon the stability of the trend before and after the intervention Does not completely eliminate bias due to other non-temporal confounders

Modified from Kao et al. [5]

as additional interventions occurring during the same time period, changes in the patients or providers over the time period, and lack of pre-specification of a stopping point. If there is adequate data over multiple time periods, an interrupted time series design can be used to analyze temporal trends before and after an intervention. If

8 Implementation Science and Quality Improvement

95

Fig. 8.3 Stepped wedge design: interventions are randomly started at different time periods

there are multiple units undergoing the intervention, a stepped wedge design could be considered. For this design, units are randomly assigned to begin the intervention at different time periods, thus allowing for each unit to serve as a control for itself and the other units (Fig. 8.3). Quantitative versus qualitative research: Mixed-methods research approaches, which include both quantitative and qualitative methods, are frequently used in QI and implementation research. Quantitative research uses the study designs described above to test hypotheses by modeling and analyzing numerical data. Qualitative research uses methods such as focus groups, interviews, and observations to understand phenomena. Qualitative and quantitative research can be used simultaneously or sequentially; one may be used to confirm, complement, or expand upon the findings of the other method. Hybrid effectiveness-implementation trials use mixed-methods approaches. Quantitative study designs such as randomized trials may be used to determine if a clinical intervention is effective, while qualitative research methods may be used for process evaluation to determine why the intervention succeeded or failed. Additionally, one research method may be used

96

L.S. Kao

to develop a tool or identify a sample of participants for use by the other method. For example, site surveys may be used to develop a tool for measuring contextual factors influencing implementation; the tool may then be used in a trial to measure the impact of an intervention to change context to improve implementation of an evidence-based practice.

8.4 Role of Context The methodological design of a study contributes to its internal validity, or the minimization of bias. The generalizability of the results of a study reflects its external validity. Interventions that are effective in one setting are rarely effective across all settings; in other words, the success of QI and implementation efforts is dependent upon the local setting or context-sensitive. Context refers to all of the factors that are not part of the QI or implementation intervention itself. A recent survey of experts identified four high priority contexts in evaluating implementation of patient safety practices: structural organizational characteristics such as size and financial status; external factors such as regulations or financial incentives; patient safety culture, teamwork and leadership at the level of the unit; and availability of resources for implementation such as opportunities for provider education and training. Despite its importance, context is often not adequately described in published QI and implementation research.

8.5 Measurement of Constructs and Outcomes Ultimately, QI and implementation research aims to improve outcomes at individual and population levels. These outcomes may include patient-centered outcomes such as health-related quality of life as well as clinical outcomes such as morbidity and mortality. Proctor and colleagues suggested a framework that defined two additional, proximal levels of outcomes: D&I outcomes and service system outcomes. Implementation outcomes measure the effects of implementation strategies, while service system outcomes measure the six domains of quality care described by the Institute of Medicine (effectiveness, safety, patient-centeredness, timeliness, efficiency, and equity). Many of the commonly measured implementation outcomes are derived from the RE-AIM framework (described briefly in Table 8.1). Reach refers to the extent of individual participation by the target population. Adoption refers to the intent to use or the actual uptake of the evidence-based practice by the target providers, settings, or institutions. Implementation in the RE-AIM framework includes measurement of fidelity and costs. Fidelity assesses the degree to which the intervention was delivered as intended, while implementation costs measure the incremental resources necessary to carry out the intervention. Implementation costs may include labor, time, and material costs. Maintenance, or sustainability, refers

8 Implementation Science and Quality Improvement

97

to the robustness of the intervention over time. Other implementation outcomes include acceptability, appropriateness, feasibility, and penetration. These outcomes can be evaluated either directly by observation or indirectly through surveys, focus groups, interviews, and administrative data. In addition to outcomes, measures of the factors contributing to the success and failure of QI and implementation efforts should be assessed. The constructs upon which such measures are based are derived from the multiple theories, frameworks, and models previously described. Development of standardized measures would facilitate comparisons between and synthesis across studies. However, multiple issues exist including but not limited to inconsistency and redundancy in terminology between different theories, models, and frameworks; lack of a “gold standard” and psychometric validation for many of the measures; heterogeneity in the levels of intervention targets or timeframe in the implementation process at which measurements are occurring; and differences in local resources for feasibly and practically collecting measurements. Collaborative efforts have been developed to promote the harmonization of constructs and measures. These efforts have led to research conferences and the development of the interactive Grid-Enabled Measures (GEM) website (see Resources). Measurement of implementation outcomes and constructs should be incorporated into QI and implementation research study designs. For example, a multi-center cluster randomized trial compared participation in a QI collaborative in addition to audit and feedback to audit and feedback alone on compliance with perioperative antibiotic prophylaxis measures [2]. No overall differences in compliance were observed; single-center changes were not reported. Lack of adoption of all of the components of the collaborative intervention (i.e., participation in the monthly conference calls or attendance at the in-person meetings) may have contributed to the lack of effectiveness. Alternatively, organizational or provider-level factors may have contributed to the failure of the intervention.

8.6 Getting Published and Funded in QI and Implementation Research QI and implementation research are both publishable and fundable. QI and implementation research may be published in surgical and subspecialty journals or in journals focused on quality (i.e., BMJ Quality & Safety in Health Care and Journal of Healthcare Quality) or on implementation (i.e., Implementation Science). Consensus guidelines exist for reporting studies of QI interventions: the Standards for Quality Improvement Reporting Excellence (SQUIRE); these guidelines recommend reporting of methodological issues pertaining to internal validity as well as of details about the setting and intervention relating to external validity. QI and implementation efforts evaluated in a randomized trial should be reported using the Consolidated Standards of Reporting Trials (CONSORT) statement; a

98

L.S. Kao

modification has been proposed that incorporates the RE-AIM framework. For example, the traditional CONSORT diagram depicts the flow of excluded, enrolled, randomized, and analyzed patients. The extended CONSORT diagram also details the number and percentage of settings included and excluded; the extent that the treatment components were delivered as intended; and characteristics of patients, providers, and settings that participated and that dropped out. Reporting of nonrandomized studies should adhere to guidelines for observational studies. Although not explicitly included in any of the guidelines, studies should provide details about the implementation context, focusing on the identified high priority contextual factors as previously listed. In January 2013, the National Institutes of Health (NIH) reissued a program announcement for a trans-disciplinary funding opportunity announcement specifically for D&I research (R03, R21, and R01). Other agencies that fund D&I research include the Patient-Centered Outcomes Research Institute (PCORI), the Agency for Healthcare Research and Quality (AHRQ), and the National Science Foundation (NSF).

8.7 Summary QI and implementation research build on comparative effectiveness research, focusing on the translation of the evidence generated from such research into routine practice to improve the quality of care. QI and implementation research have challenges distinct from those of comparative effectiveness research in terms of dealing with complex and often multi-faceted interventions, identifying an underlying framework upon which to base the interventions, balancing the impetus for rapid change with the need for methodological rigor, accounting for and measuring the role of context in the effectiveness of the interventions, and standardizing the measurement of outcomes and constructs across frameworks and studies. However, there is also significant opportunity for scientific inquiry and for advancing the science of patient safety to improve the quality of care.

8.8 Resources Listed below are examples of agencies and programs that support and fund QI and implementation research, conferences and training opportunities in implementation science, websites offering webinars on and tools for QI and implementation, and resources for developing evidence-based guidelines and for publishing QI research. Agency for Healthcare Research and Quality (AHRQ): governmental agency whose mission is to improve the quality, safety, efficiency, and effectiveness of health care for all Americans; commissioned reports such as Closing the Quality

8 Implementation Science and Quality Improvement

99

Gap: A Critical Analysis of Quality Improvement Strategies and Assessing the Evidence for Context-Sensitive Effectiveness and Safety as well as other resources are available on their website; funding opportunities for D&I research also exist (http://www.ahrq.gov) Dissemination and Implementation in Health Listserv: monthly listserv that provides updates on publications, conferences, meetings, funding opportunities, etcetera in D&I in health care and public health (http://cancercontrol.cancer.gov/ di_listserv) Grading of Recommendations Assessment, Development and Evaluation (GRADE) Working Group: website provides publications, webinars, and tools including free software on how to evaluate the quality of evidence and strength of recommendations (i.e., in the translation of evidence into guidelines) (http:// www.gradeworkinggroup.org) Grid-Enabled Measures (GEM) Database: collaborative repository of constructs and measures used in behavioral, social science, and other scientific research housed on an interactive website based on a wiki platform (http://cancercontrol. cancer.gov/brp/gem.html) Institute for Healthcare Improvement (IHI) Open School: independent, not-forprofit, organization that partners with health care entities to improve quality of care; website offers multiple resources including free online courses on QI and patient safety (http://www.ihi.org) Patient-Centered Outcomes Research Institute (PCORI): organization that promotes the generation of high-quality evidence to guide patients in making informed health care decisions; funds patient-centered outcomes research (http:// pcori.org) Quality Enhancement Research Initiative (QUERI): Veterans Affairs based program to improve quality of care for high-cost conditions among veterans; the QUERI website includes a QI toolkit and implementation guide among other resources (http://www.queri.research.va.gov) Reach Effectiveness Adoption Implementation Maintenance (RE-AIM): framework for planning, guiding, and evaluating implementation efforts; website provides publications, presentations, and tools for applying the RE-AIM framework to implementation efforts (http://www.re-aim.org) Seattle Implementation Research Conference (SIRC): conference series that fosters communication and collaboration between implementation researchers (http://www.seattleimplementation.org) Standards for Quality Improvement Reporting Excellence (SQUIRE): guidelines for authors, reviewers, and editors for reporting QI research in healthcare and in evaluating the quality of QI research articles (http://squire-statement.org) Training Institute for Dissemination and Implementation Research (TIDIRH): annual NIH-sponsored five-day training institute on dissemination and implementation research

100

L.S. Kao

References 1. Simunovic M, coates A, Goldsmith CH, Thabane L, Reeson D, Smith A, McLeod RS, DeNardi F, Whelan TJ, Levine MN. The cluster-randomized Quality Initiative in Rectal Cancer trial: evaluating a quality improvement strategy in surgery. CMAJ. 2010;182(12):1301–6. 2. Kritchevsky SB, Braun BI, Bush AJ, Bozikis MR, Kusek L, Burke JP, Wong ES, Jernigan J, Davis CC, Simmons B. The effect of a quality improvement collaborative to improve antimicrobial prophylaxis in surgical patients: a randomized trial. Ann Intern Med. 2008;149(7):472–80. 3. Anthony T, Murray BW, Sum-Ping JT, Lenkovsky F, Vornik VD, Parker BJ, McFarline JE, Hartless K, Huerta S. Evaluating an evidence-based bundle for preventing surgical site infection: a randomized trial. Arch Surg. 2011;146(3):263–9. 4. Tabak RG, Khoong EC, Chambers DA, Brownson RC. Bridging research and practice: models for dissemination and implementation research. Am J Prev Med. 2012;43(3):337–50. Copyright 2012, with permission from Elsevier. 5. Kao LS, Lally KP, Thomas EJ, Tyson JE. Improving quality improvement: a methodologic framework for evaluating effectiveness of surgical quality improvement. J Am Coll Surg. 2009;208(4):894–901.

Recommended Readings • Brownson RC, Colditz GA, Proctor EK. Dissemination and implementation research in health: translating science to practice. New York: Oxford University Press; 2012. • Chaudoir SR, Dugan AG, Barr CH. Measuring factors affecting implementation of health innovations: a systematic review of structural, organizational, provider, patient, and innovation level measures. Implement Sci. 2013;8:22. • Curran GM, Bauer M, Mittman B, Pyne JM, Stetler C. Effectiveness-implementation hybrid designs: combining elements of clinical effectiveness and implementation research to enhance public health impact. Med Care. 2012;50(3):217–26. • Grimshaw J, Eccles M, Thomas R, MacLennan G, Ramsay C, Fraser C, Vale L. Toward evidence-based quality improvement. Evidence (and its limitations) of the effectiveness of guideline dissemination and implementation strategies 1966–1998. J Gen Intern Med. 2006;21 Suppl 2:S14–20. • Proctor EK, Landsverk J, Aarons G, Chambers D, Glisson C, Mittman C. Implementation research in mental health services: an emerging science with conceptual, methodological, and training challenges. Adm Policy Ment Health. 2009;36(1):24–34. • Tabak RG, Khoong EC, Chambers DA, Brownson RC. Bridging research and practice: models for dissemination and implementation research. Am J Prev Med. 2012;43(3):337–50. • Taylor SL, Dy S, Foy R, Hempel S, McDonald KM, Ovretveit J, Pronovost PJ, Rubenstein LV, Wachter RM, Shekelle PG. What context features might be important determinants of the effectiveness of patient safety practice interventions? BMJ Qual Saf. 2011;20(7):611–7. • Thorpe KE, Zwarenstein M, Oxman AD, Treweek S, Furberg CD, Altman DG, Tunis S, Bergel E, Harvey I, Magid DJ, Chalkidou K. A pragmatic-explanatory continuum indicator summary (PRECIS): a tool to help trial designers. CMAJ. 2009;180(1):E47–57.

Chapter 9

Understanding and Changing Organizational Culture in Surgery Amir A. Ghaferi

Abstract Culture is about using a “similarity of approach, outlook, and priorities” to reach a common goal. That universal goal in healthcare and surgery is clear— provide quality, safe, efficient, patient-centered care. How best to achieve this goal, however, remains unclear. This chapter will outline the differences between organizational climate and culture, the current methods for measuring safety climate and culture, and finally ways to expand our current understanding of this vital facet of surgical care in order to ultimately enact change. Keywords Surgical safety • Safety culture • Safety climate • Organizational culture • Surgical outcomes

9.1 Introduction Healthcare has increasingly complex and competing interests and priorities. Financial pressures—from declining reimbursement to penalties incurred for “poor quality” care—are forcing health systems to do more with less. Although this stress is often seen as a threat, it also represents an opportunity for physicians, especially surgeons interested in health services research, to lead the way in creating a more efficient and efficacious health care delivery system. While health care in the United States accounts for nearly 20 % of the gross domestic product, we lag behind every other industry in both efficiency and safety. The fractured state of the system may account for this observation, but even countries with more unified and cohesive health systems (i.e., the United Kingdom) still strive to achieve the level of proficiency and security seen in industries such as A.A. Ghaferi, M.D., M.S. () Department of Surgery, Center for Healthcare Outcomes and Policy, University of Michigan, 2800 Plymouth Road, NCRC B016/Rm167-C, Ann Arbor, MI 48109, USA e-mail: [email protected] J.B. Dimick and C.C. Greenberg (eds.), Success in Academic Surgery: Health Services Research, Success in Academic Surgery, DOI 10.1007/978-1-4471-4718-3__9, © Springer-Verlag London 2014

101

102

A.A. Ghaferi

aviation and nuclear energy. Something inherent to these so-called “high reliability organizations”, is their ability and willingness to deal with the five pillars of highly reliable organizations—failure, simplification, operations, resilience, and expertise [1]. The business world embraced these concepts to evaluate, shape, and change their culture years ago. The earliest discussions of organizational culture came from Barry Turner’s work in the 1970s. He was a pioneer in organizational studies and research who clearly defined the relationship between culture and its importance for organizations: Part of the effectiveness of organizations lies in the way in which they are able to bring together large numbers of people and imbue them for a sufficient time with a sufficient similarity of approach, outlook and priorities to enable them to achieve collective, sustained responses which would be impossible if a group of unorganized individuals were to face the same problem. However, this very property also brings with it the dangers of a collective blindness to important issues, the danger that some vital factors may be left outside the bounds of organizational perception [2].

Thus, culture is about using a “similarity of approach, outlook, and priorities” to reach a common goal. That universal goal in healthcare and surgery is clear— provide quality, safe, efficient, patient-centered care. How best to achieve this goal, however, remains unclear. In this chapter, I will outline the differences between organizational climate and culture, the current methods for measuring safety climate and culture, and finally ways to expand our current understanding of this vital facet of surgical care in order to ultimately enact change.

9.2 Climate vs. Culture Everyone can agree that organizational culture and climate are important to the safety of patients and providers in the surgical workplace. However, the terms culture and climate are thrown around constantly and whimsically, but what is the difference between culture and climate? Culture refers to the shared, often unconscious attitudes and standards that govern behavior, especially in crisis situations that lack clearly defined pathways or processes. Climate, on the other hand, refers more to the principal, usually consciously held, perception of leadership’s priority for safety. Therefore, culture is ingrained in the fabric of an organization— it develops over time and thus takes longer to change. Conversely, climate is the immediate perception of leadership’s attention to concerns about safety and can change more rapidly. Palmieri et al. summarize these observations with their Safety Hierarchy Model whereby the safety culture is the stable organizational attribute, the climate is the more malleable group attribute, and the attitude is the impressionable individual attribute. They introduce the idea of a safety standard that represents the industry-wide safety culture phenomena cited in other industries. This standard results when the vast majority of firms exhibit safety culture as an organizational property and associated entities are also impacted by safety culture. Unfortunately, the healthcare industry has not established such a safety standard (Fig. 9.1 and Table 9.1) [6].

9 Understanding and Changing Organizational Culture in Surgery

103

Fig. 9.1 Safety hierarchy model Table 9.1 Safety constructs: levels of analysis, construct stability, and modifiability Safety construct Safety attitude Safety climate Safety culture Safety standard

Levels of analysis Individuals and small work groups Units and departments Corporate divisions and organizations Industries

Concept stability/difficulty to modify Less stable—flexible More stable—semi-flexible Very stable—inflexible Extremely stable—rigid

Source: Hofmann and Stetzer [3], Wiegmann et al. [4], and Zohar [5]

To further elaborate on the differences between culture and climate, one can think of culture as the forces in an organization that operate in the background—it’s “context”. These forces are shaped by the organization’s values, beliefs, traditions, norms, and even myths. These components are difficult and nearly impossible to measure, but require thoughtful study and must be managed in order to change the organization. Traditionally, qualitative methods have been employed to study culture. This requires an appreciation for the unique aspects of individual social settings and the nuances of the interactions between the levels of hierarchy

104

A.A. Ghaferi

over time. Alternatively, climate operates in the foreground and provides immediate and prominent signals regarding what is wanted and needed in the moment. Climate is a more tangible dimension of the work environment that can be measured with relative precision, therefore often employing quantitative methodology such as surveys. The factors that determine the climate of an organization include leadership style, organizational structure, standards of accountability, standards of behavior, communication, rewards, trust, commitment, vision and strategies, and organizational connectedness. Climate changes faster than culture, but when climate changes are sustained over time, culture can be reformed.

9.3 Measuring Safety Climate Most previous work on organizational safety climate focused on worker safety in manufacturing industries and passenger safety in the airline industry. There is an extensive literature in manufacturing industries outlining several key factors that affect the rate of worker injury. Dominant among these are supervisory systems and behaviors, including the individual supervisor’s attitudes, actions, expectations, and communications; inclusion of safety in the supervisor’s position responsibilities; involvement of senior management and workers in safety issues; and the attitudes and behaviors of the workers themselves as influenced by the system [7, 8]. Over the last two decades there has been increasing emphasis and interest in studying, measuring, and improving patient safety. Clearly, there is a big difference between the safety of patients and the safety of workers, as patient’s outcomes are largely out of their control. As such, the safety climate of an organization or hospital unit that results from the attitudes and behaviors of healthcare workers functions as a surrogate for a patient’s individual safety. Therefore, patient safety mostly resembles passenger safety, which is why healthcare leaders have turned to the aviation industry for guidance and examples on how to measure and change safety climate. Since the Institute of Medicine (IOM) recommended that healthcare organizations show enhance their patient safety culture, there have been many surveys developed to measure safety climate in healthcare organizations [9]. A recent systematic review of the available surveys over a dozen safety climate surveys being used in health care settings [10]. These range from surveys designed for teams to complete together (i.e., Strategies for Leadership: An Organizational Approach to Patient Safety (SLOAPS)) to surveys focusing on individual hospital units (i.e., Safety Attitudes Questionnaire (SAQ)). The currently available surveys measuring safety climate in the healthcare setting vary considerably with regard to their general characteristics, dimensions covered, and the psychometrics performed in their development. Further, the IOM challenged providers to not only measure their safety climates, but to improve them with the ultimate goal of improving patient outcomes. However, few safety climate survey results have been directly compared to patient outcomes. Rather, these are generally used to compare safety climates

9 Understanding and Changing Organizational Culture in Surgery

105

between and within institutions. Therefore, it remains vital to the improvement of patient safety that we continue to study the association between climate survey scores and patient outcomes. Taking for granted the association between these two measures may result in significantly misguided investments.

9.4 Measuring Safety Culture Understanding and measuring safety culture can be difficult. The idea behind studying an organization’s safety culture is to place a finer point on the statement, “That’s how things are done around here.” As mentioned earlier, qualitative methods are traditionally used to study culture. Several qualitative methods have been used to measure safety culture, including interviews, focus groups, audits, and expert ratings. Many who study safety culture have generally rejected quantitative methods such as questionnaires as an inappropriate means of data collection. However, a hybrid approach has been employed in some studies where qualitative methods are used to investigate safety culture, and quantitative methods are subsequently developed on the basis of those results [11]. The concern with questionnaires is the results are open to misinterpretation and researcher-bias unless some form of follow up is conducted with respondents. Surveys that attempt to elucidate the current state of safety culture may thus misguide administrators and managers resulting in reactionary changes for a so-called “quick fix,” with little regard for long-term solutions. Therefore, it remains difficult to study safety culture in isolation from safety climate for the reasons described above. The casual use of the term culture in lieu of climate in the literature creates confusion. Thus, most of the research into culture may actually be providing further insight in to safety climate. Ultimately, the study of safety culture in surgery and healthcare as a whole remains a wide open field fertile for further research.

9.5 Changing Safety Climate and Culture Now knowing how difficult it is to measure safety culture, it is equally, if not more challenging to change it. While culture seems far more important in determining who we are, and why we behave in certain ways, climate can be seen as more of a reflection of what we are and what we do. As such, the ease of measuring safety climate and its inherent utility for change make climate more appealing in the absence of reliable methods of measuring culture that do not involve deep dives into qualitative methods such as interviews, focus groups, or ethnography. Further, there is some evidence that measures of safety climate (i.e., the Safety Attitudes Questionnaire) correlate with patient and safety outcomes making climate a reliable target for improvement.

106

A.A. Ghaferi

In order to change organizational safety culture and climate, healthcare leaders, practitioners, and researchers will need to move from the current “blame and shame” in the face of medical errors to an enriched progressive culture where errors are recognized as learning opportunities to better the organization. Extensive research into the approach to supporting a positive safety culture have resulted in the following key organizational commitments—to construct reliable systems and processes, to support and encourage error reporting with an open and just culture, to embrace management practices and behaviors supportive of safety, and to detect and analyze errors and adverse events with robust investigation [12–14]. This final commitment lends itself to some well-established methods for quality improvement, such as failure mode and effect analysis or aggregate root cause analysis, that have begun to penetrate healthcare systems. Failure mode and effect analysis (FMEA) is a technique born from the engineering community used to proactively analyze vulnerabilities in the system before close calls occur. Health care FMEA (HFMEA) focuses on health care processes and was developed by the VA National Center for Patient Safety (NCPS) and Tenet HealthSystem in Dallas. It is a hybrid prospective analysis model that combines concepts found in FMEA and Hazard Analysis and Critical Control Point (HACCP) with tools and definitions from the VA’s root cause analysis process [15]. HFMEA is a 5-step process that convenes an interdisciplinary team to use process flow diagramming, a Hazard Scoring Matrix™, and the HFMEA Decision Tree™ to identify and assess potential vulnerabilities in a health care process. The HFMEA worksheet is used to record the team’s assessment, proposed actions, and outcome measures. HFMEA then includes testing to ensure that the system functions effectively and new vulnerabilities have not been introduced elsewhere in the system. While the success and utility of such techniques is largely dependent on local leadership buy-in and support, implementation of such rigorous evaluation and action-taking systems can boost safety attitudes, climate, and ultimately culture.

9.6 Conclusion Payers, patients, and policymakers continue to pay closer attention to patient safety. While patient outcomes are the ultimate measure of safety, increased scrutiny of how care is delivered and experienced by patients place more attention on healthcare delivery teams and systems. As we focus on organizational safety climates and culture in surgical care, it will be vital for surgeons to remain actively involved in measuring, evaluating, and improving safety culture. While the aviation industry has provided an excellent foundation for guidance in this field, moving forward we can learn immensely from the work of Kathleen Sutcliffe and Karl Weick on high reliability organizations [1]. Inherent to the care of high risk surgical patients is the ability to recognize and appropriately treat complications as they arise (i.e., rescuing a patient from a major complication)—a concept our group has written extensively about (Fig. 9.2) [16–19]. Sutcliffe and Weick describe the phenomenon of rescuing

9 Understanding and Changing Organizational Culture in Surgery

107

Fig. 9.2 Conceptual framework for understanding the role of hospital resources, safety attitudes, and safety behaviors in the rescue of patients from major complications

patients as a measure of an organization’s resilience, which is “something a system does, not something a system has.” [1] The organization’s ability to bounce back from a complication is grounded in the processes of mindful organizing. Mindful organizing is specified as five complementary processes that improve ongoing recognition and action taking—preoccupation with failure, avoidance of simplifying interpretations, sensitivity to operations, commitments to resilience, and flexible decision structures that defer to expertise. Ongoing research into the current state of surgical safety culture along with carefully incorporating the processes of mindful organizing may help pave the path toward improving surgical outcomes.

References 1. Weick KE, Sutcliffe KM. Managing the unexpected: resilient performance in an age of uncertainty. 2nd ed. San Francisco: Jossey-Bass; 2007. 2. Turner BA, Pidgeon NF. Man-made disasters. 2nd ed. Boston: Butterworth-Heinemann; 1997. 3. Hofmann DA, Stetzer A. A cross-level investigation of factors influencing unsafe behaviors and accidents. Pers Psychol. 1996;49(2):307–39. 4. Wiegmann DA, Zhang H, Von Thaden TL, Sharma G, Gibbons AM. Safety culture: an integrative review. Int J Aviat Psychol. 2004;14(2):117–34. 5. Zohar D, Tenne-Gazit O. Transformational leadership and group interaction as climate antecedents: a social network analysis. J Appl Psychol. 2008;93(4):744. 6. Palmieri PA, Peterson LT, Pesta BJ, Flit MA, Saettone DM. Safety culture as a contemporary healthcare construct: theoretical review, research assessment, and translation to human resource management. Adv Health Care Manage. 2010;9:97–133.

108

A.A. Ghaferi

7. Zohar D. A group-level model of safety climate: testing the effect of group climate on microaccidents in manufacturing jobs. J Appl Psychol. 2000;85:587–96. 8. Varonen U, Mattila M. The safety climate and its relationship to safety practices, safety of the work environment and occupational accidents in eight wood-processing companies. Accid Anal Prev. 2000;32:761–9. 9. Kohn LT, Corrigan JM, Donaldson MS, editors. To err is human: building a safer health system, vol. 627. Washington, DC: National Academies Press; 2000. 10. Colla JB, Bracken AC, Kinney LM, Weeks WB. Measuring patient safety climate: a review of surveys. Qual Saf Health Care. 2005;14:364–6. 11. Lee T. Assessment of safety culture at a nuclear reprocessing plant. Work & Stress. 1998;12:217–37. 12. Kohn LT, Corrigan JM, Donaldson MS. To err is human: building a safer health system. A report of the Committee on Quality of Health Care in America, Institute of Medicine. Washington, DC: National Academy Press; 2000. 13. Weick KE. The reduction of medical errors through mindful interdependence. Medical error: what do we know. 2002:177–99. 14. Page AE, editor. Keeping patients safe: transforming the work environment of nurses. Washington, DC: National Academies Press; 2004. 15. DeRosier J, Stalhandske E, Bagian JP, Nudell T. Using health care failure mode and effect analysis: the VA National Center for Patient Safety’s Prospective Risk Analysis System. J Comm J Qual Patient Saf. 2002;28:248–67. 16. Ghaferi AA, Birkmeyer JD, Dimick JB. Variation in hospital mortality associated with inpatient surgery. N Engl J Med. 2009;361:1368–75. 17. Ghaferi AA, Birkmeyer JD, Dimick JB. Complications, failure to rescue, and mortality with major inpatient surgery in medicare patients. Ann Surg. 2009;250:1029–34. 18. Ghaferi AA, Osborne NH, Birkmeyer JD, Dimick JB. Hospital characteristics associated with failure to rescue from complications after pancreatectomy. J Am Coll Surg. 2010;211:325–30. 19. Ghaferi AA, Birkmeyer JD, Dimick JB. Hospital volume and failure to rescue with high-risk surgery. Med Care. 2011;49:1076–81.

Chapter 10

Assessing Patient-Centered Outcomes Arden M. Morris

Abstract With the maturation of traditional quality research, patient advocates and policymakers increasingly are keen to expand and apply new findings in ways that measurably benefit individual patients. Over the past decade, therefore, we have seen an accelerating paradigm shift from “objective” provider-centered care toward a more subjective patient-centered model of care, which prioritizes individual patient autonomy and holistic well-being. The basic concepts of patient-centeredness describe recognition of person as an individual with unique preferences, needs, and values; transparency and coordination of treatment in partnership with the patient and their caregivers; and bi-directional patient-provider communication as a means for accomplishing treatment. Keywords Patient-centered care • Stakeholder • Shared decision making • Quality measurement

10.1 A Paradigm Shift Traditionally, clinical quality research has focused firmly on structure, process, and outcomes from an “objective” provider and health care system perspective. This provider-centered paradigm places responsibility for improving the quality of care

A.M. Morris, M.D., M.P.H. () Health Behavior Health Education, University of Michigan, 1500 E Medical Center Dr, 2124F, Ann Arbor, MI 48109-5343, USA Department of Surgery, University of Michigan, 1500 E Medical Center Dr, 2124F, Ann Arbor, MI 48109-5343, USA Division of Colorectal Surgery, University of Michigan, 1500 E Medical Center Dr, 2124F, Ann Arbor, MI 48109-5343, USA e-mail: [email protected] J.B. Dimick and C.C. Greenberg (eds.), Success in Academic Surgery: Health Services Research, Success in Academic Surgery, DOI 10.1007/978-1-4471-4718-3__10, © Springer-Verlag London 2014

109

110

A.M. Morris

Provider Knowledge Skill

Patient Preferences Needs Values

Patient-provider relationship Communication o Informative o Supportive o Relationship building Shared decision making Involvement of caregivers

Health behaviors

Treatment Access to care Coordination of care Physical comfort Transition and continuity Clinical outcomes Survival Mortality Morbidity

Patient reported outcomes Satisfaction Quality of life Functional status

Provider centered → Patient centered Fig. 10.1 Patient-centered outcomes: a paradigm shift

upon providers and systems. A natural outgrowth of such work is the establishment of evidence based guidelines for care, which define standards and ostensibly reduce variation in care. Once evidence is synthesized and translated into guidelines, the guidelines may go on to become performance or quality measures. Currently, one of the most common methods for enacting large scale quality improvement in the United States is to review and endorse performance measures for which hospitals or health systems are held accountable. The historical provider-centered perspective also implies that the patients are intrinsically passive in their care and uniform in their needs. However, with the maturation of traditional quality research, patient advocates and policymakers increasingly are keen to expand and apply new findings in ways that measurably benefit individual patients. Over the past decade, therefore, we have seen an accelerating paradigm shift from “objective” provider-centered care toward a more subjective patient-centered model of care, which prioritizes individual patient autonomy and holistic well-being (Fig. 10.1). In 2001, patient centered care was formally described by the Institute of Medicine as care that encompasses “qualities of compassion, empathy, and responsiveness to needs, values, and expressed preferences of the individual patient” [1] and was acknowledged as one of the six tenets of quality of care. Most recently,

10 Assessing Patient-Centered Outcomes

111

the importance of patient centered outcomes to quality and quality improvement has been reinforced with the creation of the Patient Centered Outcomes Research Institute (PCORI). In spite of the current momentum, however, the definition of patient-centeredness is not entirely clear and many may take an “I know it when I see it” approach. For example, Stewart defined patient centeredness in the negative, stating “It may be most commonly understood for what it is not—technology centered, doctor centered, hospital centered, disease centered” [2]. This imprecision results in difficulty comparing relative patient centeredness or even determining how to assess patient centeredness. Moreover, the relationship between patient centered care and clinical outcomes or health status is poorly understood. Thus significant fundamental work remains in order to understand how to measure and operationalize patient centered care and outcomes in the quality improvement process.

10.2 Principles of Patient Centered Care The basic concepts of patient centeredness have been described for nearly 50 years [3] and ultimately have been refined into various models by research and policy groups. In common, they describe recognition of the whole patient as an individual with unique preferences, needs, and values; transparency and coordination of treatment in partnership with the patient and their caregivers; and bi-directional patient-provider communication as a means for accomplishing treatment. Perhaps the most comprehensive model of patient centered care is described by the Picker Institute and Commonwealth Fund centered on eight principles (Table 10.1) [4]: Respect for patients’ values, preferences and expressed needs. The patientprovider encounter should include discussion of the patient’s quality of life or subjective sense of well-being, expectations of treatment, and desired involvement in decision making, including desired caregiver involvement. The encounter should incorporate attention to patient autonomy and cultural values. Coordination and integration of care. Care across services should be coordinated with clear designation of points of contact among services. Table 10.1 Picker Institute/Commonwealth Fund Principles of Patient Centered Care

Respect for patients’ values, preferences and expressed needs Coordination and integration of care Information, communication and education Physical comfort Emotional support and alleviation of fear and anxiety Involvement of family and friends Transition and continuitya Access to carea a

Not included as a principle of patient centered care in the Institute of Medicine report Crossing the Quality Chasm (Ref. [1])

112

A.M. Morris

Information, communication and education. The patient and family should be consistently and reliably informed about clinical status, clinical progress, prognosis, and processes of care in order to facilitate autonomy and the care partnership. Physical comfort. Pain management should be a priority with attention to assistance with activities of daily living and a clean, comfortable environment within the healthcare setting. Emotional support and alleviation of fear and anxiety. The provider should elicit and discuss patients concerns and anxieties especially as they pertain to clinical status, treatment, prognosis, and the impact of illness on self and family. Involvement of family and friends. The care encounter should include identification of key caregivers and acknowledgment of their impact on the clinical course, including decision making as advocate, proxy, or surrogate, and recognition of the family caregiver roles and needs. Transition and continuity. Transitions in the setting of care should include information that will help patients care for themselves away from a clinical setting, as well as coordination among providers and support of the patient and family to ease transitions. Access to care. Access to care should include not only availability of appropriate care but also minimization of time spent waiting for admission or an appointment in the outpatient setting. Given the breadth of patient centered care principles, distinguishing patientcentered outcomes from “usual care” or traditional clinical outcomes presents a challenge. From a pragmatic perspective, an outcome that results from implementation of any of the principles of patient centered care is a patient centered outcome. Therefore, the family of patient centered outcomes is comprised of a heterogeneous group of variables that may be difficult to compare across studies.

10.3 Defining Patient-Centered Outcomes In a recent systematic review of randomized trials to improve patient centered care, Dwamena and colleagues organized outcomes into four major categories: the consultation processes, health behavior, satisfaction, and health status (Table 10.2) [5]. Although it was not required for inclusion, the randomized intervention for every study that met criteria was a patient-centered care training program directed at providers. Notably, the review authors found that interventions to transfer patient centered care skills to providers were effective at improving provider communication but that this did not necessarily result in improvement of other outcomes. Consultation processes describe the effect of interventions on the process of the patient-provider encounter—in this case, the process is the outcome to be assessed. Outcomes in this category include provider communication skills as well as consultation process measures. Provider communication skills may be subjectively reported by the patient, generally indicated by the designation “perceived”, or they may be objectively measured through audio- or video-taping.

10 Assessing Patient-Centered Outcomes

113

Table 10.2 Measuring patient centered outcomes Patient centered outcome categories Care processes

Health Behavior

Satisfaction

Health status

Types of variables

Examples of variables

Data sources

Patient-provider communication Shared decision making Administrative processes Timeliness Transparency Coordination Health related habits

Elicitation of patient preferences Time from diagnosis to surgery Appropriate post-discharge follow-up

Medical records

Routine exercise

Adherence to recommendations Utilization of care Satisfaction

Smoking cessation

Medical records (physiologic or laboratory tests, provider notes) Patient reported (survey, interview)

Regret Physiologic Clinical assessment Patient reported

Adherence to medication Global satisfaction; Satisfaction with specific aspects of care Decisional regret Patient weight Nutritional status

Administrative records Observational or recorded data Patient reported (survey, interview)

Patient reported (survey, interview)

Medical records Patient reported (survey, interview)

Well-being or quality of life Pain Functional status

Adapted from Dwamena et al. [5] and Oliver and Greenberg [6]

While traditional outcomes research would favor objective measures, the principles of patient centered care prioritize the patient experience and perceptions. Patient advocates would argue that if the communication is not understood by patients, it is not useful to patients [7]. Therefore assessing communication in this context should include “subjective” patient-reported data, such as perceived empathy or attentiveness, or objective evidence of bi-directionality, such as a recording of provider elicitation of patients’ concerns and beliefs. Shared decision making is a complex consultation process that includes several aspects of communication between the patient and provider. It begins with a twoway information exchange between the provider and the patient, followed by discussion of treatment preferences by both parties until they reach consensus on a treatment decision. In keeping with the principles of patient centeredness, the importance of engaging caregivers or other relevant stakeholders is increasingly recognized, especially within ethnic minority communities [8].

114

A.M. Morris

Other patient centered outcomes under the rubric of consultation processes include administrative aspects of care such as patient waiting time, coordination and consistency of care, ease of access to health records, and smooth transitions of care. Essentially these outcomes reflect efforts to align not just providers but also the health system with patients’ preferences, needs, and values. Health behavior reflects patient actions and, like consultation processes, is often considered a mediating variable between patient-centered care and clinical outcomes [9]. While most interventions to improve patient centered outcomes focus on changing provider behaviors or processes, the fundamental concept of a partnership between the patient and physician includes engagement on the part of the patient. Assessing health behaviors such as adherence to recommendations or utilization of care attributes some responsibility for clinical outcomes or health status to patients themselves. Satisfaction refers to the patient’s emotional or cognitive evaluation of the healthcare encounter [6] and is a broad subjective measure that can be based on almost any aspect of the encounter or the patient reaction to the encounter. As a measure of patient centered outcomes, global satisfaction is perhaps the most commonly used patient centered outcome measure. However, the family of patient satisfaction measures also includes measures of satisfaction related to specific aspects of the encounter, decisional satisfaction, and regret. Health status, along with satisfaction, is considered a set of end outcomes of care, in contrast to clinical processes or health behaviors which are intermediate outcomes. The health status category encompasses traditional health services and clinical research measures—physiologic outcomes (e.g., decreased body mass index (BMI)) and clinical assessments (e.g., nutritional status)—provided they are the result of patient-centered care. In patient centered outcomes research, this category also includes patient reported health status (e.g., pain, quality of life, physical functioning). Outcomes that do not qualify as patient-centered are simply outcomes that do not result from patient-centered care. In the setting of a specified intervention, say a randomized trial, the outcomes measured would not be considered patient centered unless the preceding intervention specifically included patient centeredness principles or some aspect of patient reported outcomes was included.

10.4 Areas for Further Development It is worth noting three fundamental tensions in the effort to improve patientcenteredness. First, achieving balance in patient centeredness is critical but tenuous. While patient-centered care heralds the value of individualized care, patient centered outcomes research strives to standardize definitions for quantitative comparison. Addressing the needs of many individuals requires systems of care, and systems mandate routinized structure and processes. How can both assessment and improvement be realized? How can efforts toward individual and population

10 Assessing Patient-Centered Outcomes

115

health be balanced? How can researchers and policymakers compare the extremely heterogeneous measures used across studies? To address these questions, researchers and policymakers must agree upon standardized definitions and measures that fulfill both individual and cohort needs. One way to approach this complex task could be the creation of standardized composite variables, which have previously shown high predictive accuracy in clinical outcomes research [10]. For example, a set of variables that collectively indicate the alignment of health services with individual patients’ needs and expectations would more precisely reflect patient satisfaction than a simple global score. Comparing the effect size of alignment could obviate the heterogeneity of interventions and outcomes. Rather than relying on a single measure such as time from diagnosis to surgery, a composite scale that also includes aspects of patientprovider communication, patient preferences, and health behaviors would provide a more accurate assessment of the consultation process. Incorporation of the principles of implementation science, already underway among many projects funded by the Patient Centered Outcomes Research Institute (PCORI), will help to bridge the gap between assessing patient centered outcomes and actually improving patient centered care [11]. To develop genuinely effective interventions and to determine the value of implementation, ad hoc engagement of patients and other stakeholders in every aspect of planning is necessary—as is a rigorous post-hoc assessment. Although it may be more expensive up front, a well-constructed composite variable that pairs patient preferences with postintervention functional status will ultimately be more useful and less wasteful than an intervention based on a single assessment of clinical status alone. Health information technology innovations and translation of lean concepts from industry will also play a major role in identifying measures that reflect individual needs, for example desired levels of transparency, among large patient populations. A second major tension concerns the unclear relationship between patientcentered outcomes and clinical outcomes. Berwick argues that ideally these concepts would be maintained separately—and that patient centeredness takes precedence [7]. Fortunately many randomized trials that have assessed the clinical impact of patient-centered care interventions indicate a positive or neutral effect on health status [5]. However, a rigorously conducted trial in 30 dementia residential care facilities demonstrated fewer episodes of agitation but significantly more falls among patients in facilities randomized to the patient-centered care intervention arm [12]. Another study of 43 general practices reported that patients whose providers were randomized to the intervention arm had significantly increased satisfaction and quality of life but also had significantly increased BMI and serum triglycerides compared to the usual care arm [13]. Therefore, planning the allocation of limited resources should include explicit acknowledgment that improving patientcenteredness may result in reduced resources or attention toward technical aspects of care and consequently lead to worse clinical outcomes. Third, while communication and respect for individual autonomy are each considered a fundamental good and together form the basis of patient-centeredness, there is no clear line between providers informing/educating patients and providers

116

A.M. Morris

respecting patients’ decision making autonomy. Who determines whether enough information has been given and understood to constitute informed consent? How long should the communication process continue? At what point during communication to foster adherence or obtain consent is patient decisional autonomy threatened? Shared decision making is our best means to ensure autonomy; however, patients’ preferred level of involvement may vary—some may prefer provider-based decision making—and the shared decision model does not yet incorporate caregivers or strategies for cultural congruence. No doubt these pervasive issues will be addressed as the patient centered outcomes movement and decision sciences continue to evolve.

References 1. Institute of Medicine. Crossing the quality chasm: a new health system for the 21st century. Washington, DC: National Academies Press; 2001. 2. Stewart M. Towards a global definition of patient centred care. BMJ. 2001;322(7284):444–5. 3. Balint E. The possibilities of patient-centered medicine. J R Coll Gen Pract. 1969;17(82):269–76. 4. Gerteis M, Edgman-Levitan S, Daley J, Delbanco TL. Through the patients’ eyes: understanding and promoting patient-centered care. San Francisco: Jossey-Bass; 1993. 5. Dwamena F, Holmes-Rovner M, Gaulden CM, Jorgenson S, Sadigh G, Sikorskii A, Lewin S, Smith RC, Coffey J, Olomu A. Interventions for providers to promote a patient-centred approach in clinical consultations. Cochrane Database Syst Rev. 2012;12, CD003267. 6. Oliver A, Greenberg CC. Measuring outcomes in oncology treatment: the importance of patient-centered outcomes. Surg Clin North Am. 2009;89(1):17–25, vii. 7. Berwick DM. What ‘patient-centered’ should mean: confessions of an extremist. Health Aff (Millwood). 2009;28(4):w555–65. 8. Mead EL, Doorenbos AZ, Javid SH, Haozous EA, Alvord LA, Flum DR, Morris AM. Shared decision-making for cancer care among racial and ethnic minorities: a systematic review. Am J Public Health. 2013;103(12):e15–29. 9. Rathert C, Wyrwich MD, Boren SA. Patient-centered care and outcomes: a systematic review of the literature. Med Care Res Rev. 2013;70(4):351–79. 10. Dimick JB, Staiger DO, Osborne NH, Nicholas LH, Birkmeyer JD. Composite measures for rating hospital quality with major surgery. Health Serv Res. 2012;47(5):1861–79. 11. Fleurence R, Selby JV, Odom-Walker K, Hunt G, Meltzer D, Slutsky JR, Yancy C. How the patient-centered outcomes research institute is engaging patients and others in shaping its research agenda. Health Aff (Millwood). 2013;32(2):393–400. 12. Chenoweth L, King MT, Jeon YH, Brodaty H, Stein-Parbury J, Norman R, Haas M, Luscombe G. Caring for Aged Dementia Care Resident Study (CADRES) of person-centred care, dementia-care mapping, and usual care in dementia: a cluster-randomised trial. Lancet Neurol. 2009;8(4):317–25. 13. Kinmonth AL, Woodcock A, Griffin S, Spiegal N, Campbell MJ. Randomised controlled trial of patient centred care of diabetes in general practice: impact on current wellbeing and future disease risk. The Diabetes Care From Diagnosis Research Team. BMJ. 1998;317(7167):1202–8.

10 Assessing Patient-Centered Outcomes

117

Landmark Papers • Berwick DM. What ’patient-centered’ should mean: confessions of an extremist. Health Aff (Millwood). 2009;28(4):w555–65. • Dwamena F, Holmes-Rovner M, Gaulden CM, Jorgenson S, Sadigh G, Sikorskii A, Lewin S, Smith RC, Coffey J, Olomu A. Interventions for providers to promote a patient-centred approach in clinical consultations. Cochrane Database of Systematic Reviews 2012; Issue 12. Art. No.: CD003267. doi:10.1002/14651858.CD003267.pub2. • Fleurence R, Selby JV, Odom-Walker K, Hunt G, Meltzer D, Slutsky JR, Yancy C. How the patient-centered outcomes research institute is engaging patients and others in shaping its research agenda. Health Aff. 2013;32(2):393–400. • Rathert C, Wyrwich MD, Boren SA. Patient-centered care and outcomes: a systematic review of the literature. Med Care Res Rev. 2013;70(4):351–79.

Chapter 11

Studying What Happens in the OR Lane Frasier and Caprice C. Greenberg

Abstract Despite significant attention from both the healthcare community and the population at large, limited improvements have been made in patient safety over the last decade. Given the frequency with which adverse events occur in surgery, and in the operating room in particular, this is a critical area to target for improvements. Traditional quantitative retrospective approaches to research are limited in their ability to advance this field. For this reason, we must expand our armamentarium to include research at the point of care. In this chapter, we will present an overview of the available approaches to data collection and analysis as well as the critical steps to performing this type of research. We will also discuss several representative research studies focusing on point-of-care research in the operating room. Keywords Patient safety • Performance improvement • Systems engineering • Human factors engineering • Observational field studies • Focus groups and interviews

11.1 Introduction It has been over a decade since the landmark Institute of Medicine reports To Err is Human and Crossing the Quality Chasm catapulted issues of quality and safety to center stage in healthcare. With the discovery that 44,000–98,000 patients die as a result of preventable medical errors each year in the United States, major emphasis has been placed on improving the quality and safety of health care.

L. Frasier, M.D. • C.C. Greenberg, M.D., M.P.H. () Wisconsin Surgical Outcomes Research Program, Department of Surgery, University of Wisconsin Hospitals and Clinics, 600 Highland Avenue H4/730, Madison, WI 53792-7375, USA e-mail: [email protected] J.B. Dimick and C.C. Greenberg (eds.), Success in Academic Surgery: Health Services Research, Success in Academic Surgery, DOI 10.1007/978-1-4471-4718-3__11, © Springer-Verlag London 2014

119

120

L. Frasier and C.C. Greenberg

Health-care adverse events cost our nation an estimated $393 billion to $958 billion dollars annually and represents up to 45 % of our health care expenditures [1]. For this reason, one approach has been to use financial incentives to decrease adverse events. For example, the Centers for Medicare and Medicaid (CMS) no longer reimburse hospitals for the additional costs related to “never events”, specific hospital-acquired conditions thought to represent an implicit error in the delivery of health care. Examples of CMS never events include retained foreign objects after surgery, catheter-associated blood stream infections, and stage III and IV decubitus ulcers. Given the voluminous literature documenting a relationship between procedural volume and outcome, other policy initiatives aim to channel certain patients or procedures to high volume centers or designated centers of excellence that meet a defined set of criteria, with the aim of creating health care systems better suited to caring for complex conditions. Regional initiatives such as the Pennsylvania Patient Safety Authority, which mandates reporting of adverse events across the state while providing protections against liability for reporters, have increased reporting and subsequently provided data available for analysis of the extent of the problem. Finally, databases have been established to track quality-related outcomes for various patient populations. In 2004, the American College of Surgeons established the National Surgical Quality Improvement Project (ACS NSQIP), originally developed for the Veterans’ Affairs system, to track and report risk-adjusted outcomes reports on their performance relative to similar hospitals nationally. The goal is to use such bench-marked data to identify areas in quality and safety to target for quality improvement initiatives. Unfortunately, evidence suggests that significant advances in patient safety remain unrealized despite these initiatives. An article published by Landrigan et al. failed to detect an improvement in the rate of adverse event or preventable adverse events across hospitals in North Carolina from 2002 to 2007 [2]. Contemporaneous to these policy and quality improvement initiatives that have been underway, the biomedical community has increased its efforts to better understand problems in quality and safety, and increased funding has been made available for relevant projects. For example, the Agency for Healthcare Research and Quality (AHRQ) funding portfolio for patient safety has increased and a new study section specific to this area was created in 2011. A number of notable scientific advances have been made. The research community has developed increasingly sophisticated methodologies for analyzing large, complex data sets, and these methodologies have been applied to claims data and national databases. While such analyses can identify patterns, such as regional variations in care, and associations, such as the relationship between surgical volume and outcome, they cannot provide any real insight as to the etiology of these associations or cause and effect. “Big picture” quantitative analysis may help identify problems but are limited in their ability to inform methods for improvement.

11 Studying What Happens in the OR

121

Another analytic approach relies on retrospective analysis of adverse events and near misses, and seeks to identify contributing factors which, if addressed, could prevent future events. This type of retrospective or root cause analysis can be performed for a series of incidents identified from safety reporting systems and malpractice claims series, or for single, ‘sentinel’ events. While this method can predict both future problems and solutions, its retrospective nature can introduce significant recall bias and requires both the occurrence and detection of an adverse outcome. This reliance limits the researcher’s ability to detect, investigate, and better understand successful compensatory strategies and ‘near misses.’ Traditionally, very little health care research has taken place in real time at the point of care. Point-of-care work, also referred to as fieldwork due to the frontline nature of its investigation, offers an exciting and under-utilized approach to improving the delivery of healthcare. In this chapter, we will provide an overview of research at the point of care, beginning with a discussion of various conceptual frameworks with an emphasis on the Systems Engineering Initiative in Patient Safety (SEIPS) model. [3]. We will explore the importance of strong collaboration, methodological approaches to data collection, discussing field observations, video analysis, and interviews, approaches to data analysis, and methods for optimal presentation and dissemination of research findings. Finally, we will provide examples of several seminal studies in surgery, with an emphasis on point-of-care research in the operating room (OR) and an aim to highlight the critical role of multi-disciplinary care in this type of research.

11.2 Key Steps in Point-of-Care Research To successfully execute a point-of-care research project, the researcher must consider several aspects of project development. Careful attention to team development, identification of a conceptual model, and forethought regarding optimal presentation and dissemination of results will allow the researcher not only to collect relevant data but have a context in which to enact relevant change.

Six Key Steps in Point-of-Care Research 1. 2. 3. 4. 5. 6.

Identify collaborators Develop or adapt a conceptual framework Decide on data collection and sampling strategies Determine approach to analysis Consider optimal presentation of results Disseminate results and implement change

122

L. Frasier and C.C. Greenberg

11.2.1 Identify Collaborators Other disciplines, including education, business, psychology, and engineering, have long been studying issues of quality and safety, and have therefore developed conceptual models and methodological tools useful for studying these problems in healthcare. By identifying collaborators in other fields and utilizing tools already developed for research purposes, researchers interested in studying quality and safety in healthcare can increase the pace of research and reduce redundancy and the need to ‘reinvent the wheel.’ Two closely-tied disciplines worth mentioning specifically are human factors and systems engineering and cognitive and organizational psychology. Human factors engineering seeks to optimize system performance (http://www.iea.cc/) and is commonly involved in safety projects. Human factors engineering principles have guided safety-reliability studies in other high-risk fields including aviation, transportation, and nuclear science, resulting in improvements in safety. Cognitive psychology focuses on aspects of human attention, memory, multi-tasking, and problem-solving and can be utilized to better understand how healthcare providers function within a work system, deal with competing responsibilities, and make decisions under high-stress circumstances. Many human factors and cognitive psychology analyses involve observational studies, and result in more qualitative/descriptive data than physicians may be accustomed to, but can provide critical information about how healthcare providers and/or patients function within a healthcare system. Additionally, collaboration with key administrative and front-line personnel within the healthcare organization is vital to the success of a point-of-care research project. Early discussions with frontline personnel can identify practical concerns and potential flaws in a research plan which might lead to workflow disruptions and personnel/patient inconvenience. Identification of frontline ‘champions’ can help researchers deliver key information regarding the goals and importance of a proposed project. Buy-in from such stakeholders will provide support and legitimacy to a research project, and is vital to ensuring adequate provider participation.

11.2.2 Develop or Adapt a Conceptual Framework In order to successfully understand the interactions between patients, providers, and the healthcare environment, we as practitioners must broaden our conceptualization of how care is delivered. In order to study the operating room using point-ofcare research, we need to adjust the way we view the operating room: it must be thought of as a system. It is a complex assembly of people, information, resources, equipment and procedures working toward a common goal. In comparison, the traditional medical mindset has been that a patient’s physician is the sole determinant

11 Studying What Happens in the OR

123

of outcome, and this culture has been slow to change. While we have been taught to feel personal responsibility for all aspects of care, it is critical to understand that even the best individual provider cannot provide totally safe care within a flawed system. All aspects of the healthcare system must be evaluated and understood in order to identify vulnerabilities. A number of theories have been published to describe the function, quality and safety of the healthcare system. In James Reason’s Swiss Cheese Model of Error, aspects of a system align to allow an error to pass through several layers of organizational and behavioral safeguards before causing patient harm. This model also develops the concepts of active and latent failures: active failures are behaviors and decisions performed at the delivery end of a system, while latent failures are characteristics of an organization which predispose it to error. Another well-known framework was developed by Avedis Donabedian in the 1960s. Donabedian’s model is based on the triad of structure (the context in which healthcare is delivered), process (the interactions between patient and providers), and outcome (effects of care on the patient and populations as a whole) and is still widely used as a conceptual model in the field of healthcare quality. The Systems Engineering Initiative for Patient Safety (SEIPS) model, developed in collaboration between human factors engineers and medical care providers, expands Donabedian’s model by providing a more detailed focus on the system/structure component. The SEIPS model specifically examines five domains and their interactions within the system: tools/technology; organization; tasks; environment; and people. It also expands on the feedback effects of process and outcome measures relative to the structure/system component of the model.

11.2.3 Decide on Data Collection and Sampling Strategies 11.2.3.1 Types of Data There are two general categories of data: quantitative and qualitative. Quantitative data is generally more familiar to physicians and includes traditional endpoints seen in randomized controlled trials such as survival and complication rates. It can also represent simpler data including counts and frequencies. Qualitative data provides information about the qualities, processes, and meanings of things and is generally more descriptive. In a point-of-care study, an example of quantitative data collection might be counting the number of times nurses must leave a room while providing patient care due to competing responsibilities. Qualitative data might include identifying and classifying the types of those competing responsibilities and tracing the workflow for patient care. Research involving both types of data and analyses is called mixed methods, and is commonly encountered in point-of-care research.

124

L. Frasier and C.C. Greenberg

11.2.3.2 Sources of Data Common data sources in point of care research include various types of pre-existing documentation: floor plans, policies and procedures, and patient medical records. Frontline providers are another important source of information. Common methods of gathering data from health care providers include surveys, interviews, and focus groups, which can include open-ended and/or structured questions. Depending on the research question being investigated, one or more of these methods may be appropriate. Surveys represent a low-cost tool for obtaining answers to specific questions developed by the research team. Other advantages include the ability to question large numbers of subjects; subject anonymity, which may increase subjects’ honesty; and a low time commitment for researchers. Significant attention must be given to developing survey questions, as ambiguous or poorly worded questions will seriously detract from the data collected. Additionally, it can be difficult or impossible to ask follow-up questions to survey respondents if new themes or avenues of research develop. Interviews solicit information from individuals embedded in the clinical setting of interest. Deliberate selection of key informants can provide a rich source of information across a wide range of experiences. For example, interviewing patients, nurses, and physicians will provide the research team with markedly different perspectives. Additionally, the interview provides a venue for focusing on individual informants’ experiences and opinions in great detail and the interactions between interviewer and interviewee allows for follow-up questions and exploration of new themes as the interview progresses. Focus groups are composed of several research subjects and seek ‘via the dynamic and interactive exchange among the participants, [to] produce multiple stories and diverse experiences’ [4]. A key aspect of this methodology is in the interactions between focus group members which can provide researchers with an in-depth perspective of the topic of interest. Additionally, because there are several participants, focus groups can be used to explore disparate opinions and experiences in a relatively short amount of time. However, focus groups may not be the best venue for exploring extremely personal topics as subjects may be reluctant to disclose these experiences in the presence of others. Direct observation, a more time-intensive form of data collection, can be performed by on-site research personnel. Direct field observation can provide significant advantages over reliance on provider accounts, which may be hampered by recall bias or limitations in memory, but can be hampered by observer availability, space limitations, and the ability to monitor and record all significant events for simultaneous or fast-paced interactions. Observation using an audio-video (AV) system allows the researcher to record real-time relationships between events. It provides temporal data for analysis and the ability for repeated viewing, minimizing lost interactions. AV recording does have its disadvantages: it can be expensive, analysis is time-consuming, and the recording of patient interactions and procedures requires a cultural shift and management of issues including privacy, data storage, and medico-legal discoverability.

11 Studying What Happens in the OR

125

Regardless of the type of data collection, careful consideration must be given to the tools used to collect data. Initial descriptive or exploratory studies may closely resemble an ethnographic description of culture, without pre-established hypotheses about what will occur or which behaviors will be observed. Alternatively, the researcher may have pre-established ideas about which behaviors or events are relevant, and may therefore design an observation tool to better analyze these specific behaviors or events of interest. In the development of survey, interview, and focus group tools, both open-ended and structured questions have their uses. Openended questions allow the respondent(s) significant freedom in their answers and can be used to explore poorly-defined topics, identify new themes, and encourage a wide range of responses. In contrast, structured questions seek to focus subjects’ responses on a particular area of interest and are commonly used when attempting to explore previously-described themes, behaviors, and beliefs.

11.2.3.3 Sampling Strategies The sampling strategy will depend on the nature and goals of the research project as well as its intended future audience. Studies attempting to provide purely descriptive results may be able to justify less-rigorous strategies – for example, convenience sampling – and may only require a handful of subjects. An initial investigation may choose to focus on a small subset of the subject population to begin describing key characteristic, themes, or behaviors; this type of homogenous sampling may reduce generalizability but can provide some context for your findings [5] and provide background knowledge for larger, future investigative efforts. In contrast, maximum variation sampling ‘seeks to obtain the broadest range of information and perspectives on the subject of study’ [5] and can be used to highlight key similarities and differences between cases. Research attempting to identify trends, obtain information for modeling or prediction, or determine differences between groups may require more of a controlled approach to minimize bias and ensure that group differences are accounted for. The use of random purposeful sampling attempts to control for these factors while still focusing on the key subjects of interest. Similarly, stratified purposeful sampling allows each subgroup within a population of interest to be sampled separately to allow comparisons and illustrate relevant differences [5].

11.2.4 Determine Your Approach to Data Analysis Multiple analytic approaches are available, and their use will depend on the nature of the data collected and relevant research question. Examples include systems analyses, task analyses, time and motion analyses, root cause analyses, or failure modes and effects analyses. The analytic approach will be dependent on whether the researcher is studying behaviors or attitudes, whether the data is qualitative or quantitative, and the size

126

L. Frasier and C.C. Greenberg

of the population under investigation. There are many software programs that can aid in these analyses, but it is critical to understand that these are just tools to carry out your analysis and not analytic techniques themselves. In other words, it is vital to remember that software can only manage data, which must be put into the context of the clinical question under investigation and conceptual framework already established by the research team. Additional details are provided elsewhere regarding these types of approaches.

11.2.5 Consider Optimal Presentation of Results After data analysis, some attention to optimal presentation of results will maximize impact and improve communication between the researcher and their intended audience. Quantitative results from field observation studies are often descriptive statistics, such as counts or frequencies, and can be presented in traditional tables or graphics. Qualitative results, on the other hand, often rely on illustrative examples, quotations and schematic representations. Work-flow studies, decision trees, and other descriptive results may be best represented in figures. Examples of both qualitative and quantitative results from several studies will be discussed below.

11.2.6 Disseminate Results and Implement Change As with all research, it is critical to consider how the researcher will relay their new knowledge to those than can enact change. Providing feedback to front-line stakeholders, especially those who championed the project, will provide validation for their efforts and provide feedback for potential improvements at the most basic level. Additionally, presentation of results at the local or regional level can be a lowcost way to share insights with nearby facilities and research teams who may be facing similar problems. Finally, dissemination at the national/international level through peer-reviewed publications, presentations, and forums can target larger audiences and share results with other research teams. The more one can engage stakeholders throughout the process, the more likely change will ultimately occur as a result of the research project.

11.3 Examples of Point-of-Care Studies in the Operating Room We will now review several examples of point-of-care research in surgery, focusing on the OR. These examples will serve as an introduction to some of the potential uses of observational studies, AV recordings, focus groups, interviews, and surveys, and demonstrate multiple approaches to understanding patient safety in the OR.

11 Studying What Happens in the OR

127

1. Hu YY, Arriaga AF, Roth EM, Peyre SE, Corso KA, Swanson RS, Osteen RT, Schmitt P, Bader AM, Zinner MJ, Greenberg CC. Protecting patients from an unsafe system: The etiology and recovery of intraoperative deviations in care. Annals of Surgery (2012);256(2):203–10. Background: The research team sought to understand the influences of ‘human, team, and organizational/environmental factors’ on intraoperative patient safety. Ten high-acuity surgical cases were video-recorded and transcribed, representing 43.7 h of patient care. Deviations were identified, defined as ‘delays and/or episodes of decreased patient safety’, and factors that contributed to and/or mitigated each deviation were identified and ascribed to the patient, health care team, or environment/organization. Results: The team identified 33 deviations over ten operative cases, averaging once approximately every 80 min. Communication and/or organizational structure contributed to the majority of deviations. The team concluded that unanticipated events are common in the OR, are frequently the result of organizational/system factors, and that caregivers were often the source of resolution of a deviation. Discussion: This paper represents an example of surgeon-directed research, in collaboration with educational psychology and human factors engineering. Given the sensitive nature of recording operative cases, the research team describes working closely with their institutional risk management group to arrive at a protocol for data security and protection of human subjects. An anesthesiologist and OR nurse were also part of the research team and participated in classifying deviations as delays, safety compromises, or both. The team’s conceptual model, in this case of safety compromise, is described with a figure and captions, including a discussion of intra-operative events which increase or decrease safety. AV data was analyzed using open-source software to identify and catalogue deviations and source(s) of recovery; data were then described using counts, frequencies, and descriptive examples. 2. Pisano GP, Bohmer RMJ, Edmondson AC. Organizational differences in rates of learning: Evidence from the adoption of minimally invasive cardiac surgery. Management Science (2001);47(6):752–68. Background: The research team sought to better understand the factors which might contribute to different learning curves and improved performance among health care teams adopting new technology. Data were analyzed from 660 patients undergoing minimally-invasive cardiac surgery immediately after each institution adopted this technology. Relative contributions of various factors, including hospital operative volume, patient-specific factors, operative complexity, and type of cardiac operation performed, were identified, and these were used to create predictions about future case operative time as a proxy for team performance. The research team also conducted interviews with health care providers at all participating institutions, providing illustrative examples of various approaches to adoption of the new technology. Two hospitals, representing a fast- and slow- learning institution, were then specifically discussed.

128

L. Frasier and C.C. Greenberg

Results: Despite receiving identical training from the company which designed the technique, significant differences in both initial operative time and rate of change in operative time were identified for various hospitals. Hospital M, despite starting with a longer-than-average procedure time, demonstrated significant improvements after the introduction of new technology. In analysis of interviews, several themes were identified: the team sent for initial training was hand-picked based on a history of being able to work together, and team composition was intentionally kept constant for the first 30 cases. Prior to institution of the new procedure, several preoperative meetings were held to standardize intra-operative terminology and ensure good understanding of individual roles. Pre-operative briefings and post-operative de-briefings were instituted for the first 10 and 20 cases, respectively. Finally, it was noted that the cardiac surgeon at this institution appeared to encourage ‘a high degree of cooperation among members of the team’. In comparison, Hospital R demonstrated a significantly slower rate of procedure time improvement, and differences in approach to technology integration were identified. Team training was based on staffing availability and was inconstant for the first six cases. There was an absence of pre-operative team preparation, and little advanced notice for members of the operative team that they would be participating in one of the new minimally-invasive cardiac cases. Finally, the surgeon at Hospital R, while considered respectful of other team members, indicated that his primary focus was on ‘mastering the technical aspects of the operation rather than managing the overall adoption process’. Discussion: This project was executed by three business professors at an Ivy League institution, one of whom is also a physician. Aspects of collaboration with the various hospitals under study are not explicitly discussed, but are evident in the research team’s ability to interview multiple operative team members at each institution. A substantial portion of the paper is spent discussing the conceptual framework and learning theories behind the clinical question under investigation, and draws from learning-curve studies in business and economics, the medical literature relating case volume, clinical experience, and clinical outcomes, and organizational learning theories. Quantitative data were modeled using computer software and regression techniques. Trends in these data were then further investigated using qualitative analysis of interviews at institutions of interest. 3. deLeval MR, Carthey J, Wright DJ, Farewell VT, Reason JT. Human factors and cardiac surgery: a multicenter study. Journal of Thoracic and Cardiovascular Surgery (2000);119(4Pt1):661–72. Background: Researchers sought to understand the role of human factors on surgical outcomes, focusing on a highly-complex pediatric cardiac procedure, the arterial switch operation. Quantitative data from a total of 243 operations across 16 institutions were collected, and clinical outcomes were categorized and measured for individual surgeons. Qualitatively, self-assessment questionnaires were completed post-operatively by the surgeon, surgical assistants, anesthesia team, perfusionist, and scrub nurse. Finally, a majority (173) cases were observed and analyzed from anesthesia induction to ICU admission by a human factors researcher.

11 Studying What Happens in the OR

129

Cases were analyzed for occurrence of errors and categorized as major or minor events; additionally, the observer determined whether each event was compensated for or uncompensated during surgery. These 173 cases were used to create logistic regression for 2 endpoints: probability of death and probability of death and/or near miss. Results: There were 16 in-hospital deaths (6.6 % mortality). The patient’s coronary artery pattern (CAP), length of time on bypass, aortic cross-clamp time, and use of specific vasopressor agents were found to be statistically significant predictors for death and/or near miss. The number of major events was the dominant predictor for inpatient death and/or near miss, and was worsened if they were uncompensated. In contrast, the number of minor events per case was also predictive of death and/or near miss but intra-operative compensation did not appear to alleviate this risk. The post-operative questionnaires did not identify statistically significant characteristics which altered the prediction model. Discussion: Here we have another study executed as a collaboration between a surgeon and human factors engineers/cognitive psychologists. Discussion of collaboration between the research team and various participating hospitals is, as in the prior paper, largely omitted from the paper but is evident in the amount of coordination required to observe every non-simultaneous arterial switch operation in the United Kingdom over a period of 18 months. Development of the team’s conceptual model is also not discussed; instead, the majority of the paper is spent detailing development of the risk prediction model and incorporation of the human factors reports into this model. In their discussion section, the authors do note that their questionnaire results may have been affected by post-procedure bias. They also discuss that one human factors observers ‘never gained sufficient knowledge about the procedure’ and that those ten cases had to be discarded from the final analysis, highlighted some of the difficulties associated with field observations. 4. Wadhera RK, Parker SH, Burkahrt, HM, Greason KL, Neal JR, Levenick KM, Wiegmann DA, Sundt TM. Is the ‘sterile cockpit’ concept applicable to cardiovascular surgery critical intervals or critical events? The impact of protocol-driven communication during cardiopulmonary bypass. Acquired Cardiovascular Disease (2010);139:312–19 Background: In aviation, safety protocols mandate cessation of all non-essential communication during critical intervals of flight, including takeoff and landing. The research team attempted to identify parallel critical intervals during cardiac surgery which might benefit from protocol-driven communication. They therefore performed an assessment of cognitive workload using the National Aeronautics and Space Administration Task Load Index, a validated tool to measure mental workload, across all aspects of surgery and from the perspectives of various members of the cardiac surgery team. They also used semi-structured focus groups to determine critical stages of a typical run on cardio-pulmonary bypass. Based on eight critical events common to almost all cardiac surgery cases, a protocol for standardized communication was developed and implemented. Live observation before and after implementation of this communication protocol was directed at

130

L. Frasier and C.C. Greenberg

communication exchanges between the surgeon and perfusionist and categorized as having no issue or one of several types of communication breakdown. Rates and frequencies of various communication events were then compared pre- and postimplementation. Results: Cognitive workloads varied widely by specialty throughout the case (example: during induction and intubation, workload was high for anesthesia but low for all other team members). Because of the inability to identify one or two ‘critical intervals’ the team instead focused on identification of ‘critical events’ focusing on interactions between the surgeon and perfusionist related to cardiopulmonary bypass. Statistically significant decreases in non-verbalized critical actions, need for repeat communication, and total communication breakdowns were identified post-implementation, although post-implementation communication breakdowns still averaged 7.3 per case. Discussion: This project was a collaboration between health care providers and human factors engineers. Their conceptual framework is based on theories of crew resource management in aviation, specifically the sterile cockpit and protocoldriven communication. Quantitative data included counts of communication events and analysis of cognitive workload using a validated tool developed in another discipline; qualitative data revolved around focus group data identifying critical themes for a specific aspect of cardiac surgery, cardio-pulmonary bypass. Cases were observed using convenience sampling. The authors note that convenience sampling placed them in ORs with teams more likely to participate in the proposed protocol and that universal acceptance/utilization may be lower. Additionally, this protocol only addresses interactions between the surgeon and perfusionist, limiting its overall applicability to the rest of the cardiac surgery team.

11.4 Conclusions This chapter has provided an introduction to research carried out at the point of care. We started with an overview of the important steps to consider in designing this type of study with an emphasis on the importance of multi-disciplinary collaboration. We have provided several studies that illustrate such collaboration as well as a mixed methods approach to analysis. While we have focused on the operating room to illustrate our points, this type of research can be performed across the care continuum and represents an important and under-utilized approach to research. We anticipate that work in this area will continue to increase over the next decade as we continue to focus on improving the quality and safety of surgical care by optimizing the performance of individuals, teams, and systems in the operating room and beyond.

11 Studying What Happens in the OR

131

References 1. Goodman JC, et al. The social cost of adverse medical events, and what we can do about it. Health Aff. 2011;30(4):290–595. 2. Landrigan CP, et al. Temporal trends in rates of patient harm resulting from medical care. N Engl J Med. 2010;363:2124–34. 3. Carayon P, et al. Work system design for patient safety: the SEIPS model. Qual Saf Health Care. 2006;15(Supplement I):i50–8. 4. Brown JB. The use of focus groups in clinical research. In: Crabtree B, Miller W, editors. Doing qualitative research. Thousand Oaks: Sage Publications Inc.; 1999. p. 109–124 (Print). 5. Kuzel AJ. Sampling in qualitative inquiry. In: Crabtree B, Miller W, editors. Doing qualitative research. Thousand Oaks: Sage Publications Inc.; 1999. p. 33–45 (Print).

Chapter 12

Collaborative Quality Improvement Jonathan F. Finks

Abstract Implementing change at a system level requires a broad, comprehensive approach to quality improvement that engages multiple stakeholders, encourages a culture of knowledge sharing, and takes into account differences in local contexts. It is in this regard that collaborative quality improvement (CQI) efforts are most effective. CQI involves multi-institutional teams who meet to share data through centralized, clinical registries and work collectively to identify best practices which are then evaluated in the local setting of their home institutions. The aim of these collaborative efforts is to improve care by reducing variation among hospitals and providers, minimizing the lag between changes in knowledge and changes in practice, and evaluating care strategies in real-world settings. Keywords Quality • Collaborative • Outcomes • Process • Improvement • Surgery

12.1 Introduction 12.1.1 Need for Improvement Improving surgical care at US hospitals has become a major focus of payers, policy makers and professional societies, driven by a heightened awareness that many patients receive care that is not evidence-based, are harmed by preventable medical errors and are treated in hospitals which vary widely on measures of quality. Complications of medical care are not only harmful to patients but also substantially

J.F. Finks, M.D. () Department of Surgery, University of Michigan Health System, 1500 E. Medical Center Drive, Ann Arbor, MI 48109, USA e-mail: [email protected] J.B. Dimick and C.C. Greenberg (eds.), Success in Academic Surgery: Health Services Research, Success in Academic Surgery, DOI 10.1007/978-1-4471-4718-3__12, © Springer-Verlag London 2014

133

134

J.F. Finks

increase the cost of health care. With some operations, avoidable complications may account for up to 20 % of the total cost of inpatient care, with per patient costs exceeding $10,000 [1, 2]. In two widely influential reports, the Institute of Medicine made the case for failures in quality and urged a critical rethinking of our health care systems [3, 4].

12.1.2 Current Strategies in Quality Improvement – The Top Down Approach In this context, there have been a number of different efforts by payers and policy makers to promote quality improvement, with decidedly mixed results. Incentivebased models, the Pay for Performance (P4P) programs, aim to reward hospitals for the use of specific evidence-based practices, such as perioperative antimicrobial use and antithrombotic prophylaxis. More punitive approaches include the non-payment policy of the Centers for Medicare and Medicaid Services for complications such as catheter-associated urinary tract and blood stream infections. Other initiatives, such as Center of Excellence models and public reporting of hospital performance data, have focused on steering patients toward high quality hospitals. Despite the large-scale nature of many of these initiatives, their impact has been somewhat modest. For example, hospital adherence to Medicare’s SCIP measures, which is publicly reported, has not been shown to reduce rates of postoperative infection [5, 6] and Medicare’s Premier Hospital Quality Incentive Demonstration, a P4P initiative, was not found to reduce 30-day mortality with coronary artery bypass graft, myocardial infection, congestive heart failure or pneumonia [7]. Similarly, an evaluation of Medicare’s policy of nonpayment for catheter-associated blood stream and urinary tract infections demonstrated no measurable effect on infection rates in US hospitals [8]. Moreover, an examination of Medicare’s policy to restrict bariatric surgery coverage to hospitals designated as Centers of Excellence found no difference in adjusted rates of complications and reoperations in the time before and after the coverage decision [9]. Furthermore, large systematic reviews of both public reporting [10, 11] and P4P programs [12] have failed to demonstrate evidence that they improve care. There are several potential explanations for the limited success of these quality improvement (QI) initiatives. First, individual process measures are one small component of factors contributing to outcomes with surgical procedures. Other local factors, such as technical variation with operations, surgeon skill and judgment, and operative environment, are likely to have a greater impact on patient outcomes. Yet, it is difficult to account for these factors with the administrative data used for most P4P and other QI programs. Furthermore, provider-specific measures are limited by small sample sizes and a lack of clinically rich data sources for adequate risk

12 Collaborative Quality Improvement

135

adjustment [13]. There is also the problem of unintended consequences. Public reporting, P4P programs, and non-payment policies may encourage providers to avoid sicker patients [14] and can lead to a decline in the reliability of the administrative data on which they are based, as hospitals modify their billing data to enhance apparent performance [15]. Finally, the one-size-fits-all approach of many of these efforts fails to account for institutional differences in resources and culture, both of which can impact the implementation of QI changes.

12.2 Collaborative Quality Improvement 12.2.1 Defining Collaborative Quality Improvement Making and evaluating changes at a system level clearly requires a broader, more comprehensive approach to QI that engages multiple stakeholders, encourages a culture of knowledge sharing, and takes into account differences in local contexts. It is in this regard that collaborative quality improvement (CQI) efforts are most effective. CQI involves multi-institutional teams who meet to share data through centralized, clinical registries and work collectively to identify best practices which are then evaluated in the local setting of their home institutions [16]. The aim of these collaborative efforts is to improve care by reducing variation among hospitals and providers, minimizing the lag between changes in knowledge and changes in practice, and evaluating care strategies in real-world settings [17]. QI collaboratives are generally centered on a clinical registry containing detailed information on patient demographics and comorbidities, as well as provider and hospital characteristics, processes of care and outcomes. Performance data is fed back to participating hospitals to allow for benchmarking to other programs. Members of the collaborative meet on a regular basis to evaluate the data, identify best practices and develop targeted interventions focused on specific clinical problems [2]. Regional QI collaboratives incorporate principles of evidence-based medicine, industrial quality management science and organizational theory to generate improvements in health care across multiple institutions. With the collaborative QI approach, multi-disciplinary groups from participating institutions focus on a particular clinical problem, such as prevention of venous thromboembolism after bariatric surgery. Best practices are identified not only from published evidence but also through the sharing of knowledge and experience that occurs at regular meetings and other activities, such as local site visits and conference calls. Through an iterative process, practice changes are made and evaluated rapidly through frequent reporting of data with analysis and dissemination of results throughout the collaborative. This cycle of intervention, evaluation and adjustment allows for an accelerated process of quality improvement [18].

136

J.F. Finks

12.2.2 Advantages of Collaborative Quality Improvement What makes CQI efforts unique is that they achieve their results through social interaction and collective learning among a network of people focused on a particular area of interest. This is important because improving quality involves changing clinician behavior and this is unlikely to occur simply by decree or as a direct consequence of across-the-board behavior-changing strategies. Behavior is strongly influenced by the social networks in which people participate and changing clinician behavior is more likely to succeed when part of a social process. Indeed, evidence suggests that clinicians are more likely to be influenced by knowledge gained from peers and their own experience than that obtained through more traditional approaches such as lectures. Furthermore, it appears that social pressure is more effective on directing clinician behavior that the threat of legal or other hierarchical sanction. Finally, collaborative decision-making improves the process of adapting strategies to local institutional contexts [19–22]. Another distinct advantage of QI collaboratives is the large sample size of their clinical registries. This provides statistical power for a more robust evaluation of the association between processes and outcomes and the impact of QI initiatives than would be possible with most other intervention studies, including randomized clinical trials [2]. Collaborative size also allows investigators to conduct studies sufficiently powered to identify risk factors for infrequent complications, such as leak after colorectal resection [23]. Furthermore, data from collaboratives can be used to supplement randomized controlled trials with regard to subgroups which may be underrepresented in the trial [24]. Of course, the large size of collaboratives also ensures that QI initiatives target greater numbers of patients across an entire system or region [25].

12.2.3 MHA Keystone Center ICU Project QI collaboratives have been used in health-care related fields for over two decades in several countries and in disciplines as disparate as patient safety, health care disparities, chronic medical care, pediatrics, and primary care [26]. One of the most successful and well-known examples of CQI is the Michigan Health and Hospital Association (MHA) Keystone Center’s Intensive Care Unit (ICU) Project. This collaborative involved 67 hospitals and was focused on reducing rates of catheterrelated blood stream infections (CRBSI) in ICU’s across Michigan [27]. The Keystone ICU project began with interventions designed to improve teamwork and communication and to enhance the patient safety culture. These steps included a daily goals sheet and a comprehensive unit-based safety program. These were followed up with a bundle of evidence-based technical interventions focused on reducing rates of CRBSI. Checklists were used to ensure adherence to infection control practices and ICU performance data was rapidly reported back at regular

12 Collaborative Quality Improvement

137

Fig. 12.1 Catheter related bloodstream infection as a function of time following a collaborative intervention in 103 intensive care units in Michigan. Circles represent mean infection per quarter; thick blue line represents estimated mean rate of infection; thin red lines represent changes in observed infection rates over time within a random sample of 50 intensive care units [28]

intervals to each hospital to allow for benchmarking. Each ICU had physician and nurse champions who were instructed in the science of safety, data collection and in the specific interventions. They were trained through a program of conference calls, individual coaching and statewide meetings. The champions then partnered with local hospital-based infection control to obtain hospital data on CRBSI. The Keystone ICU project resulted in a significant and sustained decline in rates of CRBSI, with a similar degree of improvement at hospitals across the entire collaborative (Fig. 12.1) [28]. There are likely several factors that contributed to the remarkable success of the Keystone ICU project. First, the initiative paired a limited number of technical process changes with a broader program designed to influence provider behavior through improved teamwork and communication and an enhanced focus on patient safety. In addition, the agents of change were frontline clinicians within each ICU, thus ensuring that participating ICU’s could provide input on the intervention and that the interventions would be optimally adapted for the local environment. Finally, standardization of data collection and the timely feedback of comparative performance data helped maintain team engagement and garner the support of hospital leadership [25].

12.3 Collaborative Quality Improvement in Surgery There is a long history of successful CQI initiatives in surgery as well as in medical disciplines but the focus is somewhat different. In medicine, there exist numerous evidence-based processes, which are often compiled into consensus guidelines,

138

J.F. Finks

designed to improve the care of patients with diabetes, asthma, acute MI and other acute and chronic illnesses [29–31]. Improving adherence to published guidelines is a primary objective of many medicine-based collaboratives. The same is not true with surgical collaboratives, as evidence-based guidelines are generally lacking in surgery [13]. Therefore, surgical collaboratives often focus on determining the drivers of patient outcomes and identifying best practices to optimize those outcomes [13, 32].

12.3.1 Northern New England Cardiovascular Disease Study Group The first major surgical QI collaborative, and the one on which most subsequent CQI efforts have been based, was the Northern New England Cardiovascular Disease Study Group (NNECDSG). Founded in 1987 as a response to government-mandated public reporting with coronary artery bypass graft (CABG) surgery, the NNECDSG is a voluntary consortium of clinicians, administrators and researchers that represents all hospitals performing CABG procedures in Maine, New Hampshire and Vermont. Their goal upon launch was to foster continuous improvement in the quality, safety, effectiveness and cost involved with the management of cardiovascular disease. Their approach to reducing mortality after CABG surgery provides a beautiful illustration of the way in which regional collaboration can dramatically affect patient care across multiple institutions and settings [33–35]. In 1991, the NNECDSG examined in-hospital mortality following CABG surgery in the region and found substantial variation among hospitals and surgeons that could not be explained by patient factors alone [36]. They reported a 2.5-fold difference in adjusted mortality between the best and worst hospitals and a 4.2-fold difference among surgeons. They concluded that these differences in mortality most likely represented variation in unmeasured aspects of patient care. This important finding led to the group’s first major intervention to reduce CABG mortality across the region. The intervention had three components. The first component was to provide continuous performance feedback to the participating centers. This step allowed for ongoing self-assessment and benchmarking at each participating center. The second part involved extensive training courses in the techniques of continuous QI for both the collaborative leadership and the general members. The third component was a series of round-robin site visits to all centers with visiting teams consisting of industrial engineers, surgeons, nurses and perfusion staff. These benchmarking visits allowed the clinical teams from each hospital to learn from each other and ultimately resulted in practice changes that involved technical and organizational aspects of patient care, as well as methods used for evaluating patients. These changes led to a 24 % reduction in in-hospital mortality in the post-intervention

12 Collaborative Quality Improvement

139

period, with significant improvement at all of the participating institutions and across all patient subgroups [33, 35]. From there the group launched an effort to identify the factors that led to mortality after CABG surgery through an analysis of all deaths in the region over a 2-year period. They found that low-output cardiac failure was not only the most common mode of death throughout the region but also accounted for 80 % of the difference in mortality between low-risk and high-risk surgeons [37]. This discovery then led to an in-depth investigation into processes that could bear upon low-output failure. With further site visits and the inclusion of additional perioperative variables, the group identified four process variables that were associated with a reduced risk for mortality from low-output failure: continuation of preoperative aspirin [38], use of the left internal mammary artery as a bypass conduit [39], avoidance of anemia while on cardiopulmonary bypass [40], and adequacy of beta-blockade-induced heart rate control before induction of anesthesia [41]. Individualized care protocols, based on a patient’s predicted risk for low-output failure, were instituted and over the following 3 years, the incidence of fatal low-output failure declined across the region from 1.24 to 0.72 % [34].

12.3.2 Surgical Care and Outcomes Assessment Program CQI projects are no longer limited to cardiovascular surgery. The Surgical Care and Outcomes Assessment Program (SCOAP) was developed in Washington State in 2006 as a consortium of surgeons, hospital QI leaders and health services researchers focused on improving outcomes with general surgical procedures. SCOAP has since broadened its mission to other disciplines (e.g. vascular surgery and interventional radiology) and currently collects patient data from 60 of 65 hospitals that perform at least 2 colon resections per year [42, 43]. As with other CQI programs, SCOAP has a standardized clinical data collection platform that contains information on patient characteristics, process measures and procedure-specific outcomes for all patients undergoing the selected procedures at participating hospitals. A strong emphasis was placed on tracking optimal processes of care and SCOAP now tracks over 50 different process measures, some linked to evidence and others determined by consensus, as well as several measures considered to be under evaluation. Over time, these processes are then evaluated in the context of the collaborative. These quality metrics include processes such as avoiding transfusion if the hemoglobin is 7, continuing beta-blocker use in the perioperative period, routine intraoperative leak testing after colorectal resection, using diagnostic imaging in patients with presumed appendicitis, employing nutritional supplements for malnourished patients scheduled for elective operations, obtaining glycemic control during colorectal operations, and using appropriate neoadjuvant therapy for patients with rectal cancer. Adherence to these surgeon-determined process measures is reinforced by means of operating room checklists, preprinted order sets,

140

J.F. Finks

Fig. 12.2 Negative appendectomy rates, by calendar quarters, among hospitals participating in Washington State’s Surgical Care and Outcomes Assessment Program [43]

educational interventions, e-newsletters and regional meetings. Efforts to correct under-performance occur largely through education and peer support/pressure, often with peer-led interventions focusing on sharing best practices and creating behavior change around quality metrics. Since its launch, SCOAP has registered a number of achievements. Surgeons in all participating centers now use standardized order sets and a SCOAP OR checklist that addresses several areas of under-performance [43]. The rate of negative appendectomies has steadily declined through efforts to encourage the use of preoperative imaging among high-risk patients (Fig. 12.2) [44]. Adverse events have declined with elective colorectal resection (Fig. 12.3), coincident with increased adherence to processes such as leak testing and glycemic control measures. Finally, when compared to non-SCOAP institutions, hospitals participating in SCOAP have significantly reduced the costs associated with appendectomy, as well as colorectal and bariatric operations (Fig. 12.4) [43].

12.3.3 Partnering with Payers – The Michigan Plan One of the major challenges for CQI efforts is funding. There are significant costs associated with participation in CQI programs, particularly with regard to data collection, which can create a substantial financial burden, especially for smaller hospitals. At the same time, complications are very expensive and their cost is borne largely by payers [45]. In states with at least one dominant payer, therefore, there is

12 Collaborative Quality Improvement

141

Fig. 12.3 Rates of operative complications in elective colorectal operations in sites (n D 6) that eventually joined the Surgical Care and Outcomes Assessment Program [43]

Fig. 12.4 Average cost per case for appendectomy, colorectal and bariatric operations by calendar year, among hospitals participating in the Surgical Care and Outcomes Assessment Program [43]

a strong business case to be made for payer-supported CQI programs, since even a small reduction in complications can result in substantial cost savings for the payer [46]. This model of quality improvement has been in place in Michigan for nearly a decade.

142

J.F. Finks

Since 2004, following earlier success with a pilot CQI program in percutaneous coronary interventions, Blue Cross and Blue Shield of Michigan/Blue Care Network (BCBSM/BCN) has partnered with Michigan providers and hospitals to support statewide registry-based CQI programs in a number of different disciplines. The insurer currently invests over $30 million annually to fund 16 programs, which collectively encompass the care of well over 200,000 patients each year. The programs focus on clinical conditions and procedures that are common and associated with high episode costs. The targeted procedures tend to be technically complex, rapidly evolving and associated with wide variation in hospital practice and outcomes. Current collaboratives include general and vascular surgery, thoracic and cardiovascular surgery, bariatric surgery, trauma, urology, breast oncology, interventional cardiology and others [2]. In this Pay for Participation model, most of the costs for administering the collaborative programs are in the form of payments to hospitals, based on a fixed percentage of each hospital’s total payments from BCBSM/BCN. In 2007, annual payments to hospitals participating in at least 1 regional collaborative ranged from $11,000 to over $1 million. In exchange for these supplemental payments, hospitals are expected to submit timely, accurate data to the coordinating center and to allow regular site visits from data auditors. Each hospital is also required to send a physician champion and program coordinator to the quarterly meetings held by each collaborative and is expected to participate actively in regional quality improvement interventions [2]. The coordinating center for each collaborative maintains a clinical registry containing high quality clinical outcomes data, including information on patient characteristics necessary for risk adjustment, procedure-specific processes of care and relevant outcomes. The data are prospectively collected by trained abstractors using standardized definitions and are externally audited annually to ensure accuracy and completeness. Hospitals and surgeons are provided with timely feedback on their performance, benchmarked against the other providers in the collaborative. That performance data is not publicly reported and is not released to the payer. Rather, these data are used to drive QI initiatives that are implemented at all participating hospitals under the direction of local program coordinators. The interventions are then evaluated and discussed at quarterly meetings and then further refined [13]. Over the last several years, the Michigan CQI programs have resulted in improvements across a wide range of clinical conditions and have led to reduced costs in a number of important areas. One example is the Michigan Surgery Quality Collaborative (MSQC), the largest of the programs and one that focuses on general and vascular surgery procedures. Given the broad range of procedures, the QI activities of the MSQC tend to focus on aspects of perioperative care, including specific practices designed to reduce venous thromboembolism and surgical site infections. In a study designed to evaluate the added value of the CQI model, hospitals participating in the MSQC were compared to non-Michigan hospitals participating in the American College of Surgeons’ National Surgical Quality Improvement Program (NSQIP). In the period between 2005 and 2009, risk-

12 Collaborative Quality Improvement

143

Fig. 12.5 Risk-adjusted morbidity with general and vascular surgery: Hospitals in Michigan versus hospitals outside of Michigan, 2005–2009 (Source: Michigan Surgical Quality Collaborative and National Surgical Quality Improvement Program registries, 2005–2009 [2])

adjusted complication rates at MSQC hospitals fell from 13.1 to 10.5 % (p < 0.001), while the complication rate at non-Michigan NSQIP hospitals remained relatively flat between 2005 and 2008 with a modest decline in 2009 (Fig. 12.5). The 2.6 % decline in morbidity observed in the MSQC hospitals represents approximately 2,500 fewer patients with surgical complications annually, or an annual cost savings of roughly $20 million, far more than the $5 million annual cost of administering the MSQC [2]. In some instances, the cost savings have come not only from reducing rates of adverse outcomes but also from reducing unnecessary procedures. The Michigan Bariatric Surgery Collaborative (MBSC) launched in 2006 and now collects data on over 95 % of patients undergoing bariatric surgery in the state. As they began to collect data, it became apparent that almost 10 % of patients undergoing gastric bypass in Michigan hospitals had IVC filters placed preoperatively to prevent postoperative venous thromboembolism. IVC filter use varied widely, from 0 to 35 % across the 20 hospitals participating in the MBSC at that time. Analysis of data from the MBSC revealed that IVC filter use did not reduce risk for VTE or other complications and was itself a source of complications, such as filter migration [47]. Feedback of this data prompted a QI initiative leading to implementation of statewide guidelines for risk-stratified VTE prophylaxis. Within a year of implementation, IVC filter use had dropped to less than 2 % of patients. Given an estimated average cost for IVC filter placement of $13,000, the decline in this unnecessary procedure resulted in an estimated annual cost savings of $4 million. At the same time, implementation of VTE guidelines was associated with a decline in VTE-related mortality. Furthermore, between 2007 and 2009, 30-day mortality rates after bariatric surgery in Michigan hospitals declined at a faster rate than in non-Michigan hospitals participating in the NSQIP program (p D .045) (Fig. 12.6) [2].

144

J.F. Finks

Fig. 12.6 Thirty-day mortality after bariatric surgery: Hospitals in Michigan versus hospitals outside of Michigan, 2007–2009 (Source: Michigan Bariatric Surgery Collaborative and National Surgical Quality Improvement Program registries, 2007–2009 [2])

Quality improvement interventions resulting in improved care and/or reduced cost have occurred across the spectrum of the Michigan collaboratives. An intervention focused around implementation of practice guidelines and the use of bedside tools for risk assessment with percutaneous coronary interventions led to reductions in contrast-induced nephropathy, transfusions, strokes and vascular complications associated with this procedure [48]. Furthermore, a series of specific, focused interventions in cardiac surgery resulted in an increase in the use of internal mammary grafts for coronary artery bypass and a reduction in the use of two expensive therapies: intra-aortic balloon pumps and prolonged mechanical ventilation [49]. Finally, after implementation of a QI initiative centered on comparative performance feedback and dissemination of practice guidelines, Michigan urologists improved adherence with recommended staging practices, resulting in a decline in the use of expensive bone and computerized tomography scans for surveillance of prostate cancer in low and intermediate risk tumors [50]. As these examples illustrate, partnerships between surgeons, hospitals and payers can be a win for all. Payers have better access to capital resources and can provide infrastructure to support collaboration among surgeons and hospitals, as well as the resources necessary for interventions involving large numbers of patients and hospitals. Furthermore, payers may have the political influence with hospitals to ensure broad participation in CQI efforts [13, 51]. The payers then see returns in terms of a reduction in costs from adverse events and unnecessary tests and procedures. Hospitals receive compensation for their participation, as well as the assurance that their outcomes will not be publicly reported. Surgeons and other health care providers benefit from the professional satisfaction that comes from collaborative learning and interaction with their colleagues. Most important of all, patients receive better care.

12 Collaborative Quality Improvement

145

12.4 Challenges to Regional Collaborative Quality Improvement CQI efforts face a number of challenges. Regional centers often compete with one another and this can pose a problem when trying to create the sense of community that is required in order to build a successful collaborative. The process of engagement and development of mutual trust can take time, considerable effort and strong, effective clinical leadership. Furthermore, the financial incentives may be poorly aligned, as the time required to attend meetings and implement QI initiatives takes away from a provider’s practice and there is often no financial compensation for these activities. Finally, all CQI efforts rely on having a clinically rich and accurate patient registry housed on one data platform. The process of data abstraction can be expensive and without a funding source, such as a state or federal agency or payer, the cost for data entry falls to the hospitals, which may have limited ability to pay for it [43]. Given these constraints, studies of CQI projects across healthcare settings demonstrate varying degrees of success. Some CQI reviews suggest that only about 50–60 % of participating centers ever fully implement the recommended changes or achieve the desired outcome [52, 53]. Other studies have shown that even among centers that successfully implement recommended changes, only about 60 % will exhibit sustained improvements over time [52]. While it is not always known where the breakdown occurs, the variability in success with QI initiatives often reflects differences in the degree of implementation, a measure influenced by local leadership, attitudes, culture, commitment and resources [30].

12.5 Keys to Success with Collaborative Quality Improvement Although the hallmark of QI collaboratives is a bottom-up, participatory approach to decision making, a major contributor to success with CQI efforts is strong central leadership. The collaboratives represent a clinical community, whose members are often drawn from diverse professional backgrounds. It is the leader’s responsibility to ensure the cohesiveness of the community and coordinate the group’s efforts [19]. The leader must also be able to interpret the collected data and use it to identify targets for improvement, while ensuring that the chosen interventions are based on current best evidence [25, 54]. And she must be able to communicate a clear, uniting vision of where the collaborative is headed and what can be achieved by individual interventions [55]. For a number of reasons, leaders of QI collaboratives must be viewed as credible, authoritative and worthy of trust by the members of the collaborative. Members

146

J.F. Finks

need to feel that the leadership shares their values and goals and are not driven by other objectives (research, commercial, political, etc.). Leaders also need the support of the group to challenge practices once thought to be routine and must do so without alienating large segments of the membership [19]. Taking on the routine use of preoperative IVC filters in bariatric surgery patients exemplifies just such a challenge for the leadership of the MBSC. Another key role of leadership within collaboratives is to help build consensus and foster a sense of community. The expectations and goals of individual groups within a collaborative (e.g. physicians, nurses, administrators, etc.) may be quite disparate. Failure to obtain consensus regarding the goals and objectives of the collaborative can lead to disappointment and declining morale among some groups and a reduction in coordinated, effective action. Time spent on inclusive debate to build consensus around aims for the collaborative as a whole and for individual interventions will enhance the sense of community among all of the members and encourage individual groups to marshal their own resources in support of the collective interests of the community. Furthermore, clinicians are far more likely to change their behavior as part of a QI intervention if they participated in designing the intervention than if it were dictated to them from a third party (payer, government, hospital, etc.). The task of ensuring that all voices are heard falls on the shoulders of leaders within the collaborative [19]. Although it may seem obvious, selection of data to be collected and the manner in which it is used are critically important to the success of CQI efforts and require substantial thought and planning before the collection process begins. The data should be linked to interventions, outcomes and quality issues in order to be credible. At the same time, it should be limited in order to reduce the burden of collection. The data should also be adequate for identifying problem areas, assessing the impact of interventions and evaluating individual provider performance. Regular and timely feedback of performance data, benchmarked against other centers in the collaborative, is an important motivator, as it allows hospitals and providers to track their own progress [19, 25, 55]. Finally, one of the most important influences on the success of a CQI initiative is an understanding of local contextual factors that may bear upon adoption of a particular intervention. The existing culture, relationships and resources within an individual organization will all affect the outcome of a given strategy or approach. To some extent, the impact of these factors will be mitigated through the process of building consensus around development of the intervention, as local issues will help inform this process. But some centers may require additional resources or support, which may include extra educational materials, peer site visits and/or team training interventions [27] for centers that are falling behind. There may also be a role for the collaborative leadership to help clinical champions influence their local organization’s leadership to provide support for QI initiatives. At very least, open discussion of local barriers and enablers will assist in refining the interventions over time [19].

12 Collaborative Quality Improvement

147

12.6 Conclusion Collaborative quality improvement has emerged as an efficient and effective way of sharing knowledge and advancing innovation through a process of collective learning. Access to rich clinical data from a large patient sample drawn from multiple institutions allows for detection of problem areas where unwanted variation is high, as well as a robust assessment of the relationship between processes and outcomes. These large registries also enable an evaluation of QI interventions across multiple centers and facilitate a rapid iterative process to refine the interventions. At the same time, interventions reach large numbers of patients at once and allow individual hospitals an opportunity to improve more quickly than they could on their own. Collaboratives create a culture that fosters sharing of knowledge and collective learning as part of a community of practice. Because interventions are designed through participatory discussions, they are more likely to be adopted and are often more adaptable to local contexts. Finally, partnerships with payers, government agencies and national societies may prove critical to the long-term success of these initiatives. There remain a number of areas that will require further investigation if CQI efforts are to reach their potential for dramatic and long-lasting improvement. Understanding why some hospitals are more successful than others will be important when developing strategies to get struggling hospitals on board. Further study is also needed to determine which organizational attributes are required to ensure the sustainability of improvements made through CQI-driven interventions over time. Finally, CQI efforts need to be carefully compared with other QI efforts to determine the areas where CQI initiatives are likely to be most effective.

References 1. Dimick JB, Chen SL, Taheri PA, Henderson WG, Khuri SF, Campbell Jr DA. Hospital costs associated with surgical complications: a report from the private-sector National Surgical Quality Improvement Program. J Am Coll Surg. 2004;199(4):531–7. doi:10.1016/j.jamcollsurg.2004.05.276. 2. Share DA, Campbell DA, Birkmeyer N, Prager RL, Gurm HS, Moscucci M, et al. How a regional collaborative of hospitals and physicians in Michigan cut costs and improved the quality of care. Health Aff (Millwood). 2011;30(4):636–45. doi:10.1377/hlthaff.2010.0526. 3. Kohn LT, Corrigan JM, Donaldson MS (Institute of Medicine). To err is human: building a safer health system. Washington, D.C: National Academy Press; 2000. 4. Institute of Medicine. Crossing the quality chasm: a new health system for the 21st century. Washington, D.C.: National Academy Press; 2001. 5. Hawn MT, Vick CC, Richman J, Holman W, Deierhoi RJ, Graham LA, et al. Surgical site infection prevention: time to move beyond the surgical care improvement program. Ann Surg. 2011;254(3):494–9; discussion 9–501. doi:10.1097/SLA.0b013e31822c6929. 6. Stulberg JJ, Delaney CP, Neuhauser DV, Aron DC, Fu P, Koroukian SM. Adherence to surgical care improvement project measures and the association with postoperative infections. JAMA. 2010;303(24):2479–85. doi:10.1001/jama.2010.841.

148

J.F. Finks

7. Jha AK, Joynt KE, Orav EJ, Epstein AM. The long-term effect of premier pay for performance on patient outcomes. N Engl J Med. 2012;366(17):1606–15. doi:10.1056/NEJMsa1112351. 8. Lee GM, Kleinman K, Soumerai SB, Tse A, Cole D, Fridkin SK, et al. Effect of nonpayment for preventable infections in U.S. hospitals. N Engl J Med. 2012;367(15):1428–37. 9. Dimick JB, Nicholas LH, Ryan AM, Thumma JR, Birkmeyer JD. Bariatric surgery complications before vs after implementation of a national policy restricting coverage to centers of excellence. JAMA. 2013;309(8):792–9. doi:10.1001/jama.2013.755. 10. Fung CH, Lim YW, Mattke S, Damberg C, Shekelle PG. Systematic review: the evidence that publishing patient care performance data improves quality of care. Ann Intern Med. 2008;148(2):111–23. 11. Ketelaar NA, Faber MJ, Flottorp S, Rygh LH, Deane KH, Eccles MP. Public release of performance data in changing the behaviour of healthcare consumers, professionals or organisations. Cochrane Database Syst Rev. 2011(11):CD004538. doi:10.1002/14651858.CD004538.pub2. 12. Houle SK, McAlister FA, Jackevicius CA, Chuck AW, Tsuyuki RT. Does performance-based remuneration for individual health care practitioners affect patient care? A systematic review. Ann Intern Med. 2012;157(12):889–99. doi:10.7326/0003-4819-157-12-201212180-00009. 13. Birkmeyer NJ, Birkmeyer JD. Strategies for improving surgical quality–should payers reward excellence or effort? N Engl J Med. 2006;354(8):864–70. doi:10.1056/NEJMsb053364. 14. Werner RM, Asch DA. The unintended consequences of publicly reporting quality information. JAMA. 2005;293(10):1239–44. doi:10.1001/jama.293.10.1239. 15. Farmer SA, Black B, Bonow RO. Tension between quality measurement, public quality reporting, and pay for performance. JAMA. 2013;309(4):349–50. doi:10.1001/jama.2012.191276. 16. Eppstein MJ, Horbar JD, Buzas JS, Kauffman SA. Searching the clinical fitness landscape. PLoS One. 2012;7(11):e49901. doi:10.1371/journal.pone.0049901. 17. Kilo CM. A framework for collaborative improvement: lessons from the Institute for Healthcare Improvement’s Breakthrough Series. Qual Manag Health Care. 1998;6(4):1–13. 18. Benn J, Burnett S, Parand A, Pinto A, Vincent C. Factors predicting change in hospital safety climate and capability in a multi-site patient safety collaborative: a longitudinal survey study. BMJ Qual Saf. 2012;21(7):559–68. doi:10.1136/bmjqs-2011-000286. 19. Aveling EL, Martin G, Armstrong N, Banerjee J, Dixon-Woods M. Quality improvement through clinical communities: eight lessons for practice. J Health Organ Manag. 2012;26(2):158–74. 20. Parboosingh JT. Physician communities of practice: where learning and practice are inseparable. J Contin Educ Health Prof. 2002;22(4):230–6. doi:10.1002/chp.1340220407. 21. Shaw EK, Chase SM, Howard J, Nutting PA, Crabtree BF. More black box to explore: how quality improvement collaboratives shape practice change. J Am Board Fam Med. 2012;25(2):149–57. doi:10.3122/jabfm.2012.02.110090. 22. Stoopendaal A, Bal R. Conferences, tablecloths and cupboards: how to understand the situatedness of quality improvements in long-term care. Soc Sci Med. 2013;78:78–85. doi:10.1016/j.socscimed.2012.11.037. 23. Matthews JB. Risky business? Collaborative databases and quality improvement. Arch Surg. 2012;147(7):605–6. doi:10.1001/archsurg.2012.288. 24. Gurm HS, Smith DE, Berwanger O, Share D, Schreiber T, Moscucci M, et al. Contemporary use and effectiveness of N-acetylcysteine in preventing contrast-induced nephropathy among patients undergoing percutaneous coronary intervention. JACC Cardiovasc Interv. 2012;5(1):98–104. doi:10.1016/j.jcin.2011.09.019. 25. Watson SR, Scales DC. Improving intensive care unit quality using collaborative networks. Crit Care Clin. 2013;29(1):77–89. doi:10.1016/j.ccc.2012.10.008. 26. Livingood W, Marshall N, Peden A, Gonzalez K, Shah GH, Alexander D, et al. Health districts as quality improvement collaboratives and multijurisdictional entities. J Public Health Manag Pract. 2012;18(6):561–70. doi:10.1097/PHH.0b013e31825b89fd. 27. Pronovost P, Needham D, Berenholtz S, Sinopoli D, Chu H, Cosgrove S, et al. An intervention to decrease catheter-related bloodstream infections in the ICU. N Engl J Med. 2006;355(26):2725–32. doi:10.1056/NEJMoa061115.

12 Collaborative Quality Improvement

149

28. Pronovost PJ, Goeschel CA, Colantuoni E, Watson S, Lubomski LH, Berenholtz SM, et al. Sustaining reductions in catheter related bloodstream infections in Michigan intensive care units: observational study. BMJ. 2010;340:c309. doi:10.1136/bmj.c309. 29. Carlhed R, Bojestig M, Peterson A, Aberg C, Garmo H, Lindahl B. Improved clinical outcome after acute myocardial infarction in hospitals participating in a Swedish quality improvement initiative. Circ Cardiovasc Qual Outcomes. 2009;2(5):458–64. doi:10.1161/CIRCOUTCOMES.108.842146. 30. Crandall WV, Margolis PA, Kappelman MD, King EC, Pratt JM, Boyle BM, et al. Improved outcomes in a quality improvement collaborative for pediatric inflammatory bowel disease. Pediatrics. 2012;129(4):e1030–41. doi:10.1542/peds.2011-1700. 31. Powell AA, Nugent S, Ordin DL, Noorbaloochi S, Partin MR. Evaluation of a VHA collaborative to improve follow-up after a positive colorectal cancer screening test. Med Care. 2011;49(10):897–903. doi:10.1097/MLR.0b013e3182204944. 32. Cross RR, Harahsheh AS, McCarter R, Martin GR. Identified mortality risk factors associated with presentation, initial hospitalisation, and interstage period for the Norwood operation in a multi-centre registry: a report from the National Pediatric Cardiology-Quality Improvement Collaborative. Cardiol Young. 2014;24(2):253–62. doi:10.1017/S1047951113000127. 33. Likosky DS, Nugent WC, Ross CS. Improving outcomes of cardiac surgery through cooperative efforts: the northern new England experience. Semin Cardiothorac Vasc Anesth. 2005;9(2):119–21. 34. Nugent WC. Building and supporting sustainable improvement in cardiac surgery: the Northern New England experience. Semin Cardiothorac Vasc Anesth. 2005;9(2):115–18. 35. O’Connor GT, Plume SK, Olmstead EM, Morton JR, Maloney CT, Nugent WC, et al. A regional intervention to improve the hospital mortality associated with coronary artery bypass graft surgery. The Northern New England Cardiovascular Disease Study Group. JAMA. 1996;275(11):841–6. 36. O’Connor GT, Plume SK, Olmstead EM, Coffin LH, Morton JR, Maloney CT, et al. A regional prospective study of in-hospital mortality associated with coronary artery bypass grafting. The Northern New England Cardiovascular Disease Study Group. JAMA. 1991;266(6):803–9. 37. O’Connor GT, Birkmeyer JD, Dacey LJ, Quinton HB, Marrin CA, Birkmeyer NJ, et al. Results of a regional study of modes of death associated with coronary artery bypass grafting. Northern New England Cardiovascular Disease Study Group. Ann Thorac Surg. 1998;66(4):1323–8. 38. Dacey LJ, Munoz JJ, Johnson ER, Leavitt BJ, Maloney CT, Morton JR, et al. Effect of preoperative aspirin use on mortality in coronary artery bypass grafting patients. Ann Thorac Surg. 2000;70(6):1986–90. 39. Leavitt BJ, O’Connor GT, Olmstead EM, Morton JR, Maloney CT, Dacey LJ, et al. Use of the internal mammary artery graft and in-hospital mortality and other adverse outcomes associated with coronary artery bypass surgery. Circulation. 2001;103(4):507–12. 40. DeFoe GR, Ross CS, Olmstead EM, Surgenor SD, Fillinger MP, Groom RC, et al. Lowest hematocrit on bypass and adverse outcomes associated with coronary artery bypass grafting. Northern New England Cardiovascular Disease Study Group. Ann Thorac Surg. 2001;71(3):769–76. 41. Fillinger MP, Surgenor SD, Hartman GS, Clark C, Dodds TM, Rassias AJ, et al. The association between heart rate and in-hospital mortality after coronary artery bypass graft surgery. Anesth Analg. 2002;95(6):1483–8, table of contents. 42. Flum DR, Fisher N, Thompson J, Marcus-Smith M, Florence M, Pellegrini CA. Washington State’s approach to variability in surgical processes/outcomes: Surgical Clinical Outcomes Assessment Program (SCOAP). Surgery. 2005;138(5):821–8. doi:10.1016/j.surg.2005.07.026. 43. Kwon S, Florence M, Grigas P, Horton M, Horvath K, Johnson M, et al. Creating a learning healthcare system in surgery: Washington State’s Surgical Care and Outcomes Assessment Program (SCOAP) at 5 years. Surgery. 2012;151(2):146–52. doi:10.1016/j.surg.2011.08.015. 44. Cuschieri J, Florence M, Flum DR, Jurkovich GJ, Lin P, Steele SR, et al. Negative appendectomy and imaging accuracy in the Washington State Surgical Care and Outcomes Assessment Program. Ann Surg. 2008;248(4):557–63. doi:10.1097/SLA.0b013e318187aeca.

150

J.F. Finks

45. Dimick JB, Weeks WB, Karia RJ, Das S, Campbell Jr DA. Who pays for poor surgical quality? Building a business case for quality improvement. J Am Coll Surg. 2006;202(6):933–7. doi:10.1016/j.jamcollsurg.2006.02.015. 46. Englesbe MJ, Dimick JB, Sonnenday CJ, Share DA, Campbell Jr DA. The Michigan surgical quality collaborative: will a statewide quality improvement initiative pay for itself? Ann Surg. 2007;246(6):1100–3. doi:10.1097/SLA.0b013e31815c3fe5. 47. Birkmeyer NJ, Share D, Baser O, Carlin AM, Finks JF, Pesta CM, et al. Preoperative placement of inferior vena cava filters and outcomes after gastric bypass surgery. Ann Surg. 2010;252(2):313–18. doi:10.1097/SLA.0b013e3181e61e4f. 48. Moscucci M, Rogers EK, Montoye C, Smith DE, Share D, O’Donnell M, et al. Association of a continuous quality improvement initiative with practice and outcome variations of contemporary percutaneous coronary interventions. Circulation. 2006;113(6):814–22. doi:10.1161/CIRCULATIONAHA.105.541995. 49. Prager RL, Armenti FR, Bassett JS, Bell GF, Drake D, Hanson EC, et al. Cardiac surgeons and the quality movement: the Michigan experience. Semin Thorac Cardiovasc Surg. 2009;21(1):20–7. doi:10.1053/j.semtcvs.2009.03.008. 50. Miller DC, Murtagh DS, Suh RS, Knapp PM, Schuster TG, Dunn RL, et al. Regional collaboration to improve radiographic staging practices among men with early stage prostate cancer. J Urol. 2011;186(3):844–9. doi:10.1016/j.juro.2011.04.078. 51. Scales DC. Partnering with health care payers to advance the science of quality improvement: lessons from the field. Am J Respir Crit Care Med. 2011;184(9):987–8. doi:10.1164/rccm.201107-1238ED. 52. Glasgow JM, Davies ML, Kaboli PJ. Findings from a national improvement collaborative: are improvements sustained? BMJ Qual Saf. 2012;21(8):663–9. doi:10.1136/bmjqs-2011-000243. 53. Leape LL, Rogers G, Hanna D, Griswold P, Federico F, Fenn CA, et al. Developing and implementing new safe practices: voluntary adoption through statewide collaboratives. Qual Saf Health Care. 2006;15(4):289–95. doi:10.1136/qshc.2005.017632. 54. Palmer C, Bycroft J, Healey K, Field A, Ghafel M. Can formal Collaborative methodologies improve quality in primary health care in New Zealand? Insights from the EQUIPPED Auckland Collaborative. J Prim Health Care. 2012;4(4):328–36. 55. Harris Y, Kwon L, Berrian A, Calvo A. Redesigning the system from the bottom up: lessons learned from a decade of federal quality improvement collaboratives. J Health Care Poor Underserved. 2012;23(3 Suppl):11–20. doi:10.1353/hpu.2012.0145.

Part III

Tools of the Trade

Chapter 13

Large Databases Used for Outcomes Research Terry Shih and Justin B. Dimick

Abstract Health services researchers often focus on population-based assessment of the effectiveness of health care interventions, evaluation of broad-based delivery system reforms, or variation in use of services across regions. These analyses not only require large sample size, but also diverse practice settings. There are two main sources of secondary datasets, administrative databases and clinical registries, each with their advantages and disadvantages. Administrative databases, such as the national Medicare Provider Analysis and Review (MEDPAR), are primarily compiled for billing purposes. Other sources of data such as large clinical registries such as the Society of Thoracic Surgeons (STS) National Database and the American College of Surgeons National Safety Quality Improvement Program (ACS NSQIP) were created to facilitate quality improvement. The purpose of this chapter is to provide an overview of secondary databases. Keywords Database • Quality • Outcomes • Surgery • Medicare

13.1 Introduction Health services researchers often focus on population-based assessment of the effectiveness of health care interventions, evaluation of broad-based delivery system reforms, or variation in use of services across regions. These analyses not only require large sample size, but also diverse practice settings. Because randomized controlled trials are not technically feasible for all these settings due to cost

T. Shih, M.D. () • J.B. Dimick, M.D., M.P.H. Department of Surgery, University of Michigan Health System, 2800 Plymouth Road, Bldg 16, Office 100N-07, Ann Arbor, MI 48109, USA e-mail: [email protected] J.B. Dimick and C.C. Greenberg (eds.), Success in Academic Surgery: Health Services Research, Success in Academic Surgery, DOI 10.1007/978-1-4471-4718-3__13, © Springer-Verlag London 2014

153

154

T. Shih and J.B. Dimick

and sample size considerations, we often turn to large existing databases. These databases have the advantage of a large sample size to answer clinical questions about infrequently performed procedures (e.g. Whipple or esophagectomy) or to answer large population-based questions. There are two main sources of secondary datasets, administrative databases and clinical registries, each with their advantages and disadvantages. Administrative databases, such as the national Medicare Provider Analysis and Review (MEDPAR), are primarily compiled for billing purposes. Other sources of data such as large clinical registries such as the Society of Thoracic Surgeons (STS) National Database and the American College of Surgeons National Safety Quality Improvement Program (ACS NSQIP) were created to facilitate quality improvement. Both administrative databases and clinical registries have inherent strengths and limitations when used for research that will be discussed further in this chapter. The purpose of this chapter is to provide an overview of secondary databases. We will begin by discussing the types of research questions for which large databases are frequently used with examples of landmark studies. We will then differentiate between administrative databases and clinical registries with a discussion of the advantages and disadvantages as well as situations in which one may be better than the other. Finally, we will give a brief overview of frequently used databases.

13.2 Common Research Questions That Require Large Database Analysis 13.2.1 Studying Rare Diagnoses, Procedures, or Complications Health services researchers can leverage the advantages of large secondary databases to answer many clinical questions (Table 13.1). One natural advantage is the large sample size of these databases, allowing for meaningful conclusions to be drawn for rare diagnoses, procedures, or complications. For example, Sheffield et al. used state-wide Medicare data in Texas to explore the rate of common bile duct injury during cholecystectomy [1]. As the rate of this complication is quite rare (0.3–0.5 %), single institution studies of this complication are often inadequately powered to draw any significant inferences. Previous studies performed with both administrative and clinical registry data have illustrated a significant controversy regarding the role of intraoperative cholangiography in the prevention of common bile duct injury during cholecystectomy. Using an instrumental variable analysis (a technique described elsewhere in this book) to adjust for unmeasured confounding, Sheffield et al. demonstrated no statistically significant association between intraoperative cholangiography and common duct injury. This study was also able to link patient-level data to both hospital and surgeon-level characteristics, allowing for exploration of how factors at multiple levels influence patient outcomes.

13 Large Databases Used for Outcomes Research

155

Table 13.1 Common research questions that utilize large databases Type of question Examination of rare conditions, procedures, complications

Trends over time in utilization or outcomes

Example Sheffield et al. Association between cholecystectomy with vs without intraoperative cholangiography and risk of common duct injury. JAMA 2013 Santry et al. Trends in bariatric surgical procedures. JAMA 2005

Regional variations in utilization

Dartmouth Atlas of Health Care

“Real world” efficacy vs. effectiveness

Wennberg et al.: Variation in carotid endarterectomy mortality in the Medicare population. JAMA 1998

Relationship of hospital or surgeon factors on patient outcomes

Birkmeyer et al. Hospital volume and surgical mortality in the United States. NEJM 2002

Ghaferi et al.: Variation in hospital mortality associated with inpatient surgery. NEJM 2009

Policy evaluation

Dimick et al. Bariatric surgery complications before vs. after implementation of a national policy restricting coverage to centers of excellence. JAMA 2013

Description Used Texas Medicare data to explore a rare complication to demonstrate no statistically significant association between intraoperative cholangiography and common duct injury Used the Nationwide Inpatient Sample to demonstrate a national increasing trend in the use of bariatric surgical procedures Used national Medicare data to publish numerous studies exploring large regional variation in healthcare spending and outcomes Used national Medicare data to demonstrate carotid endarectomy mortality was significantly higher than reported in randomized controlled trials Used national Medicare data to demonstrate inverse relationship between hospital volume and surgical mortality among 14 major cardiovascular and cancer operations Used NSQIP database to demonstrate low mortality hospitals in general and vascular surgery had similar complication rates, but superior ability to rescue patients from complications when compared to high mortality hospitals Used the State Inpatient Database to demonstrate no benefit in complication rates or reoperation rates after enactment of policy to restrict bariatric surgery to designated centers of excellence

13.2.2 Defining Temporal Trends or Regional Differences in Utilization Secondary databases are also often used to explore the utilization rates of specific surgeries to discern temporal trends or regional differences. In 2005, Santry et al. used the Nationwide Inpatient Sample (NIS) to discern a significant increase trend in the use of bariatric surgical procedures over the years 1998–2002 [2]. Regional

156

T. Shih and J.B. Dimick

variation in rates of procedures has been described as early as 1970. The Dartmouth Atlas of Health Care has used national Medicare data to publish numerous studies exploring large regional variations in healthcare spending and outcomes [3].

13.2.3 Examining Surgical Outcomes in the “Real World”: Efficacy vs. Effectiveness Another natural application for the use of secondary databases is in the design of “real world” comparative effectiveness studies as opposed to efficacy studies. Efficacy studies are randomized controlled trials, generally regarded as the gold standard to practicing evidence-based medicine. These clinical trials are often performed in a narrow patient population with strict inclusion and exclusion criteria. They are also performed under ideal clinical conditions with close follow-up and ample hospital resources. These stringent criteria and ideal conditions can reduce type 1 error in randomized controlled trials. However, they threaten the external validity of the results. The large survival benefit of an intervention may not hold up in the real world, where patients and practice conditions are a lot messier. Because secondary databases tend to be population-based, these databases allow for assessment of outcomes in the real world. Wennberg et al. demonstrated this efficacy vs. effectiveness distinction examining carotid endarterectomy mortality using national Medicare data [4]. They report the mortality among Medicare beneficiaries undergoing carotid endarterectomy, a large heterogeneous cohort, was appreciably higher at low volume hospitals (2.5 %) compared to the higher volume hospitals (1.4 %) participating in two large well-designed randomized controlled trials: the North American Symptomatic Carotid Endarterectomy Trial (NASCET) and the Asymptomatic Carotid Atherosclerosis Study (ACAS).

13.2.4 Studying Outcomes Across Diverse Practice Settings Health services researchers also use secondary databases to explore variation in outcomes across diverse practices settings. Studies can be designed to explore the effect of hospital volume or patient demographics (age, gender, race, socioeconomic status) on outcomes. Birkmeyer et al. used national Medicare data to definitively establish the inverse relationship between hospital volume and surgical mortality among six major cardiovascular procedures and eight major cancer resections [5]. Most case reports on complex operations like pancreatic resection are from highvolume institutions. Lower-volume centers would not be represented in the literature without large database studies. This study has since inspired a multitude of other studies exploring the volume-outcome relationship in many other procedures.

13 Large Databases Used for Outcomes Research

157

Studies using secondary datasets have also examined surgeon and/or hospital characteristics and their impact on patient outcomes. Ghaferi et al. used the NSQIP database to explore the variation in hospital mortality associated with inpatient general and vascular surgery [6]. To study variations in mortality and other patient outcomes, a broad sample of hospitals is required. Without secondary data, it would not be feasible to compare outcomes across hospitals and draw meaningful inferences. Using a large clinical registry, Ghaferi et al. were able to compare lowmortality and high-mortality hospitals and discovered similar complication rates. The distinction in low-mortality hospitals was their superior ability to prevent mortality in patients experiencing complications.

13.2.5 Evaluating Health Care Policy As discussed above, there are many instances in which performing a randomized controlled trial for an intervention would be impractical, and using a large secondary database is the only option to assess outcomes. One example of this is in the evaluation of the impact of large-scale policy changes such as Medicare’s decision to restrict coverage of bariatric surgery to hospitals designated as centers of excellence. With such a large-scale policy decision, it would be infeasible to conduct a randomized controlled trial to evaluate this policy. Dimick et al. used the State Inpatient Database (SID) from 12 states to compare center of excellence hospitals and non-center of excellence hospitals using a difference-in-differences analytic approach incorporating existing time trends to demonstrate no benefit in rates of complications or reoperation after the enactment of this policy [7]. This is discussed in detail in elsewhere in this book.

13.3 Administrative Databases vs. Clinical Registries 13.3.1 Strengths and Weaknesses of Administrative Data Administrative databases offer several advantages in health services research (Table 13.2). These data sets are population based, allowing for the examination of time trends or regional differences as well as real world outcomes, and have a large sample size, allowing for the study of rare conditions, as discussed above. These datasets are also relatively inexpensive and readily available. Therefore, administrative data are a good source of preliminary data for a grant. Another benefit of administrative data is the ability to link patients across episodes of care and institutions. For example, this allows for more accurate tracking of readmissions than clinical data, which often relies on hospitals to keep track of their own

158

T. Shih and J.B. Dimick

Table 13.2 Strengths and weakness of administrative databases and clinical registries Administrative databases Strength Population-based Inexpensive, readily available Linkable across episodes Large sample size Clinical registries Strength Clinical granularity Large sample size

Weakness Lack clinical detail Inaccurate/variable coding Lags in availability Collected for billing purposes Weakness Lags in availability Relies on abstractors for reliability Resource-intensive

readmissions. This may be easy enough if a patient is readmitted to the same surgeon or service, but more complicated if a patient is readmitted to a different provider or service within the same hospital, or a completely different hospital altogether. The primary weaknesses in administrative data lie in the inaccuracy and variability in coding. Administrative databases were developed primarily for billing purposes. As a result, primary procedure and diagnosis codes, demographics, length-of-stay and outcomes such as mortality and readmission are recorded with good accuracy. However, administrative data often lack clinical granularity beyond these data points. Patient comorbidities, often used to adequately risk-adjust patients, rely on the varying quality of secondary diagnoses coding. Furthermore, the coding of complications, an important outcome measure in surgery, has also been criticized for clinical inaccuracy. The surgical health services researcher must have effective strategies to address these weaknesses as discussed below.

13.3.2 Identifying Comorbidities in Administrative Data Popular methods to perform comorbidity risk adjustment include the Charlson Index and the Elixhauser method [8, 9]. The Charlson Index assigns a specific point value for certain comorbidities to predict 10-year survival. It has been adapted for use with International Classification of Disease, 9th Revision, Clinical Modification (ICD9-CM) codes by various groups, with the most popular proposed by Deyo et al. [10]. The Elixhauser method uses a list of 30 comorbidities that can be identified by ICD-9-CM codes. These comorbidities are not simplified by an index, as each comorbidity can affect outcomes differently among different patient groups. Rather, each comorbidity is used as an indicator variable in logistic regression when performing risk adjustment. Both methods for risk adjustment are widely used and have been previously validated for use with administrative data. However, limitations still exist in using these methods. Though both the Elixhauser method and Charlson Index can discern the presence of comorbidities, the severity of comorbidities is not

13 Large Databases Used for Outcomes Research

159

discernable. Also, in situations where confounding by indication based on clinical severity are important, administrative data may not adequately identify patients with higher clinical severity. Administrative data will not yield the appropriate clinical detail to perform rigorous comparative effectiveness studies without a more advanced method for causal inference.

13.3.3 Identifying Complications in Administrative Data The Complications Screening Program developed by Iezzoni et al. is commonly used to identify complications from administrative data [11]. Using specific ICD-9CM codes, the following postoperative complications can be identified: pulmonary failure, pneumonia, myocardial infarction, deep venous thrombosis/pulmonary embolism, acute renal failure, hemorrhage, surgical site infection, and gastrointestinal hemorrhage. This method has also been previously validated by chart review and shown to be in good agreement. Again, however, we caution the reader in using these methods in studies for which other complications not targeted in the Complications Screening Program (e.g. anastomotic leak, urinary tract infection, central line associated blood stream infection, etc.) may be important outcome measures. Furthermore, the severity of these complications cannot be ascertained with administrative data.

13.3.4 Clinical Registries In contrast, clinical registries are developed for expressed research purposes or quality improvement. As a result, they contain more clinical granularity than administrative data. Severity of disease, intricacy of procedure, and complexity of post-operative care are all examples of clinical detail that can be obtained in clinical registries not found in administrative databases. For example, through the Society of Thoracic Surgeons Adult Cardiac Surgery Database a researcher can find information on preoperative ejection fraction, cardiopulmonary bypass time, and postoperative transfusions: all data that would not be possible through claims data. There are also disadvantages to clinical registries though. Participation in clinical registries often require full-time data abstractors at each site that review the medical record and enter data into a large data warehouse. Though research has suggested participation in clinical registries may lead to quality improvement with a return on investment, hospital administrators may not be able or willing to support the resource-intensive costs required for participation. Additionally, clinical registry data are never perfect. Some outcomes, such as long-term out-of-hospital survival or readmissions may not be adequately captured in clinical registries, depending on how these outcomes are reported and verified. Health services researchers must also realize that although clinical registry data improve greatly on the granularity of clinical detail, there are still limitations in the data that can be collected.

160

T. Shih and J.B. Dimick

Observational comparative effectiveness studies performed with clinical registries may still be biased by confounding factors that remain unobserved.

13.4 Example Datasets 13.4.1 Administrative Data Large administrative databases are produced and maintained by many sources. Several of the most commonly used databases are listed below (Table 13.3). Table 13.3 Examples of administrative databases Database

Description

Centers for medicare and medicaid services Medicare Provider Analysis and Review Medicare Part A claims for inpatient hospitals (MEDPAR) and skilled nursing facilities Part B Claims data Medicare Part B claims for physician fees and hospital outpatient care Surveillance, Epidemiology and End Results Clinical and demographic data from (SEER)-Medicare linked data population-based cancer registries linked to Medicare claims data Veterans affairs database Patient Treatment File (PTF) VA inpatient hospital discharge claims National Patient Care Database (NPCD) VA outpatient visits at VA-based clinics Healthcare cost and utilization project Nationwide Inpatient Sample (NIS) 20 % stratified sample of hospital discharge data from all-payers Kids’ Inpatient Database (KID) Sample of all pediatric inpatient discharges from 44 states State Inpatient Database (SID) State-specific inpatient discharges from 45 states. Individual state databases available for purchase State Ambulatory Surgery Database (SASD) State-specific data from 30 states for same-day procedures Marketscan Commercial Claims and Encounter Claims submitted to >100 health plans that Database (CCAE) contract with large private employers, public agencies, and public organizations in the United States Provider-level data for linkage American Medical Association (AMA) Provider-level database containing information Physician Masterfile on education, training, and professional certification American Hospital Association (AHA) Hospital-level database containing information Annual Survey Database on hospital demographics, organizational structure, facilities and services, utilization, expenses and staffing

13 Large Databases Used for Outcomes Research

161

13.4.1.1 Medicare Perhaps the most widely used database for surgical outcomes research is the Medicare Provider Analysis and Review (MEDPAR) file [12]. It contains Medicare Part A claims for services provided to fee-for-service beneficiaries admitted to Medicare certified inpatient hospitals and skilled nursing facilities. The Center for Medicare and Medicaid Services also maintains research files for Medicare Part B claims submitted by physicians or hospitals for outpatient care. The Chronic Condition Warehouse (CCW) is a 5 % sample of Medicare patients that provides claims across the care continuum and can be used to answer questions that require a variety of claims files.

13.4.1.2 SEER-Medicare Medicare claims data can be augmented by linkage to clinical registries. Commonly, the Medicare data can be linked to the Social Security Death Index to assess longterm survival. Also, Medicare data has commonly been linked with the Surveillance, Epidemiology and End Results (SEER) registry to obtain clinical and demographic data from population-based cancer registries from 18 SEER regions, representing approximately 28 % of the US population [13]. The linked SEER-Medicare database can be used to examine variation in cancer-directed surgery and long-term outcomes after cancer surgery.

13.4.1.3 Veterans Affairs Hospitals The federal government also gathers data for patients receiving care by the Department of Veterans Affairs. The Patient Treatment File (PTF) contains hospital discharge abstracts for inpatient care while the National Patient Care Database (NPCD) contains outpatient visits at VA-based clinics [14].

13.4.1.4 Healthcare Cost and Utilization Project The Healthcare Cost and Utilization Project (HCUP) is a family of useful healthcare databases developed through a federal-state-industry partnership sponsored by the Agency for Healthcare and Research Quality (AHRQ) [15]. It is the largest collection of longitudinal hospital care data with all-payer, encounter-level information beginning in 1988. The Nationwide Inpatient Sample (NIS) is a large, national database containing hospital discharge data for all payers, though does not contain 100 % of all discharges. It is a 20 % stratified sample of all US non-federal hospitals, which contains data from >1,000 hospitals in 45 states. Hospitals are selected to represent 5 strata of hospital characteristics: ownership-control, bed size, teaching status, rural-urban location, and geographic region. Weights based on sampling

162

T. Shih and J.B. Dimick

probabilities for each stratum are used in analysis so that the sample hospitals are representative of all US hospitals. State-specific databases are available through the State Inpatient Databases (SID). They contain all inpatient discharge abstracts in 46 states. The resultant databases represent approximately 97 % of annual discharges. A number of states make SID files available for purchase through HCUP. These databases are maintained in a uniform format, allowing for easy comparison of data between different states to examine geographic variation in utilization, access, charges, and outcomes. The State Ambulatory Surgery Databases (SASD) capture data from 30 participating states for same day procedures. The Kids’ Inpatient Database (KID), similar to the NIS, is an all-payer inpatient care database for children 20 years of age from 44 states in the US. Unlike the NIS, the KID does not involve sampling of hospitals. Instead, the KID is a sample of pediatric patients from all hospitals in the sampling frame. For the sampling, pediatric discharges in all participating states are stratified by uncomplicated in-hospital birth, complicated in-hospital birth, and all other pediatric cases. 13.4.1.5 AMA Masterfile and AHA Annual Survey Provider-level and hospital-level information can be obtained through the American Medical Association (AMA) and American Hospital Association (AHA). These can then be linked to existing HCUP and Medicare data to add provider-level variables in data sets. The AMA physician masterfile contains information about education, training, and professional certification for nearly all physicians in the US [16]. The AHA Annual Survey Database includes data from >6,000 hospitals with detailed information regarding hospital demographics, organizational structure, facilities and services, utilization, expenses, and staffing [17]. 13.4.1.6 Marketscan The Marketscan Commercial Claims and Encounter (CCAE) database is compiled from claims submitted to more than 100 health plans that contract with large private employers, public agencies and public organizations in the United States [18]. The database is available for purchase through Truven Health Analytics. Health plan types that are included in this database are employer-sponsored, private, feefor-service, and capitated insurance to employees and covered dependents. This longitudinal database tracks all patient-level inpatient and outpatient claims for as long as employees remain with their employers.

13.4.2 Clinical Registries Clinical registries were designed for clinical research and quality improvement. Many of these arose in response to the increasing use of administrative data

13 Large Databases Used for Outcomes Research

163

Table 13.4 Examples of clinical registries Database Society of Thoracic Surgeons (STS) National Database

American College of Surgeons National Surgical Quality Improvement Program (ACS-NSQIP) National Trauma Data Bank (NTDB)

National Cancer Data Base (NCDB)

Description National database divided into three components: Adult Cardiac, General Thoracic, Congenital Heart Surgery with >90 % participation by cardiothoracic surgery programs in United States Nationally validated, risk-adjusted, outcomes-based program to measure and improve surgical quality across surgical subspecialties National database for >400 level I-IV trauma centers across the United States maintained by the American College of Surgeons with trauma patient data including injury profiles, injury severity score, and mechanism of injury National database for >1,500 Commission on Cancer-accredited cancer programs in the United States with detailed clinical, pathological and demographic data on approximately 70 % of all incident cancer cases

which many viewed as suboptimal. With improved data accuracy and clinical granularity, these registries are ideal for questions that require in-depth data of disease severity, comorbidities, detail of operation, and complexity of post-operative care (Table 13.4).

13.4.2.1 STS National Database The Society of Thoracic Surgeons (STS) established a National Database in 1989 as an initiative for quality improvement and patient safety [19]. The database is divided into three components: Adult Cardiac, General Thoracic, and Congenital Heart Surgery with Anesthesiology participation within the Congenital Heart Surgery Database. The Adult Cardiac Database is the largest, with over 1,000 participating sites and data for over 4.9 million surgical procedures. Analyses suggest that the STS Database has enrollment from more than 90 % of cardiothoracic surgery programs in the United States. The STS National Database has been used to derive Risk Calculators for seven different individual procedures. Recently, the STS has developed a composite star rating for hospital or group quality for isolated coronary artery bypass grafting and isolated aortic valve replacement, and the organization is encouraging its members to allow these star ratings to be available to the public. Requests for data are reviewed by the STS Access & Publications Task Force five times per year through an online data request form.

164

T. Shih and J.B. Dimick

13.4.2.2 ACS NSQIP National Surgical Quality Improvement Program (NSQIP) began in the Department of Veterans Affairs and was brought into the private sector by the American College of Surgeons (ACS) in 2004 [20]. It is the first nationally validated, risk-adjusted, outcomes-based program to measure surgical quality across surgical specialties in the private sector. Participation by member hospitals requires a Surgical Clinical Reviewer (SCR) that collects clinical variables including preoperative risk factors, intraoperative variables, and 30-day postoperative mortality and morbidity outcomes for patients undergoing major and minor surgical procedures. The ACS NSQIP database has become a valuable tool for participating institutes for quality improvement and clinical studies. All ACS NSQIP participants may access the database by requesting the Participant Use Data File (PUF) through the ACS NSQIP website. This file contains Health Insurance Portability and Accountability Act (HIPAA) compliant patient-level, aggregate data, and does not identify hospitals, healthcare providers, or patients. The PUF is provided at no additional cost to employees of ACS NSQIP participating hospitals. Many system-wide and regional collaboratives participate in the ACS NSQIP database. However, some collaboratives maintain their own databases that may be used for research purposes. These include the Michigan Bariatric Surgical Collaborative (MBSC) [21] and the Michigan Surgical Quality Collaborative (MSQC) [22]. Finding regional collaboratives in your area may provide a unique opportunity to collect and analyze detail-rich data that may not exist in other databases.

13.4.2.3 National Trauma Data Bank Created by the ACS to serve as the principal national repository for trauma center registry data, the National Trauma Data Bank (NTDB) is composed of de-identified HIPAA compliant data from >400 level I-IV trauma centers across the United States. It includes information on patient demographics, vital signs, diagnoses, Injury Severity Scores (ISS), injury profiles, mechanism of injury (based on ICD-9-CM codes), procedures, complications, and in-hospital mortality [23]. To gain access to NTDB data, researchers must submit requests through an online application process.

13.4.2.4 National Cancer Data Base The National Cancer Data Base (NCDB) is jointly managed by the American College of Surgeons’ Commission on Cancer (COC) and the American Cancer Society. Created in 1988, it is a nationwide database for more than 1,500 COCaccredited cancer programs in the United States [24]. The NCDB contains detailed clinical, pathological, and demographic data on approximately 70 % of all US incident cancer cases. The NCDB Participate Use Data File (PUF) provides

13 Large Databases Used for Outcomes Research

165

HIPAA complaint de-identified patient level data that does not identify hospitals or healthcare providers. Investigators at COC-accredited cancer programs must apply for access through an online application process.

13.5 Conclusion This chapter has provided an overview of secondary data sources available to health services researchers and the scope of questions these databases can explore. Administrative data and clinical registries are both available with their respective strengths and weaknesses. Administrative data are relatively inexpensive and readily available for use, though they lack clinical granularity. Clinical registries improve greatly on the level of clinical detail available, however, they may be costly and do not exist for all diseases and populations. We have reviewed several available secondary databases, but ultimately the choice of data needs to be tailored to the specific research question and analytical strategies discussed elsewhere in this book need to be employed to ensure that sound conclusions are drawn.

References 1. Sheffield KM, Riall TS, Han Y, Kuo YF, Townsend Jr CM, Goodwin JS. Association between cholecystectomy with vs without intraoperative cholangiography and risk of common duct injury. JAMA. 2013;310(8):812–20. 2. Santry HP, Gillen DL, Lauderdale DS. Trends in bariatric surgical procedures. JAMA. 2005;294(15):1909–17. 3. The Dartmouth Atlas of Health Care. http://www.dartmouthatlas.org 4. Wennberg DE, Lucas FL, Birkmeyer JD, Bredenberg CE, Fisher ES. Variation in carotid endarterectomy mortality in the Medicare population: trial hospitals, volume, and patient characteristics. JAMA. 1998;279(16):1278–81. 5. Birkmeyer JD, Siewers AE, Finlayson EVA, et al. Hospital volume and surgical mortality in the United States. N Engl J Med. 2002;346(15):1128–37. 6. Ghaferi AA, Birkmeyer JD, Dimick JB. Variation in hospital mortality associated with inpatient surgery. N Engl J Med. 2009;361(14):1368–75. 7. Dimick JB, Nicholas LH, Ryan AM, Thumma JR, Birkmeyer JD. Bariatric surgery complications before vs after implementation of a national policy restricting coverage to centers of excellence. JAMA. 2013;309(8):792–9. 8. Charlson ME, Pompei P, Ales KL, MacKenzie CR. A new method of classifying prognostic comorbidity in longitudinal studies: development and validation. J Chronic Dis. 1987;40(5):373–83. 9. Elixhauser A, Steiner C, Harris DR, Coffey RM. Comorbidity measures for use with administrative data. Med Care. 1998;36(1):8–27. 10. Deyo RA, Cherkin DC, Ciol MA. Adapting a clinical comorbidity index for use with ICD-9CM administrative databases. J Clin Epidemiol. 1992;45(6):613–9. 11. Iezzoni L, Daley J, Heeren T, et al. Identifying complications of care using administrative data. Med Care. 1994;32(7):700–15.

166

T. Shih and J.B. Dimick

12. Medical Provider Analysis and Review (MEDPAR) File. http://www.cms.gov/ResearchStatistics-Data-and-Systems/Statistics-Trends-and-Reports/MedicareFeeforSvcPartsAB/ MEDPAR.html 13. SEER-Medicare Linked Database. http://healthservices.cancer.gov/seermedicare/ 14. VA Utilization Files. http://www.herc.research.va.gov/data/util.asp 15. Healthcare Cost and Utilization Project (HCUP). http://www.ahrq.gov/research/data/hcup/ index.html 16. AMA Physician Masterfile. http://www.ama-assn.org/ama/pub/about-ama/physician-dataresources/physician-masterfile.page? 17. AHA Data and Directories. http://www.aha.org/research/rc/stat-studies/data-and-directories. shtml 18. MarketScan Research Databases. http://www.truvenhealth.com/your_healthcare_focus/ pharmaceutical_and_medical_device/data_databases_and_online_tools.aspx 19. STS National Database. http://www.sts.org/national-database 20. Welcome to ACS NSQIP. http://site.acsnsqip.org 21. Michigan Bariatric Surgery Collaborative. https://michiganbsc.org/Registry/ 22. Michigan Surgical Quality Collaborative. http://www.msqc.org/index.php 23. National Trauma Data Bank. http://www.facs.org/trauma/ntdb/index.html 24. National Cancer Data Base. http://www.facs.org/cancer/ncdb/index.html

Chapter 14

Methods for Enhancing Causal Inference in Observational Studies Kristin M. Sheffield and Taylor S. Riall

Abstract When making health care decisions, patients, their providers, and health care policymakers need evidence on the benefits and harms of different treatment options. Many questions in medicine are not amenable to randomized controlled trials (RCT). The use of observational data, such as insurance claims data, tumor registry data, quality collaborative databases, and clinical registries, to evaluate the comparative effectiveness of various treatment strategies is an attractive alternative. However, causal inference is more difficult in observational studies than in RCTs because patients are not randomly assigned to treatment groups. The objectives of this chapter are to discuss the challenges with using observational data to assess treatment effectiveness and to review methods for enhancing causal inference. Investigators need to identify potential threats to the validity of their results including selection bias, confounding, and measurement bias. Studies that do not account for such threats to validity can produce biased effect estimates that contribute to inappropriate treatment and policy decisions. In this chapter, we focus on careful study design and encourage intimate knowledge of the observational dataset, especially when using administrative data. Finally, we review statistical methods including multivariate analysis, propensity score analysis, and instrumental variable analysis that may be used to adjust for bias and strengthen causal inference. Several clinical examples are provided throughout to demonstrate threats to validity and the statistical methods used to address them. Keywords Causal inference • Observational studies • Propensity scores • Instrumental variables • Outcomes research • Selection bias

K.M. Sheffield, Ph.D. • T.S. Riall, M.D., Ph.D. () Department of Surgery, The University of Texas Medical Branch, 301 University Boulevard, JSA 6.110c, Galveston, TX 77555-0541, USA e-mail: [email protected] J.B. Dimick and C.C. Greenberg (eds.), Success in Academic Surgery: Health Services Research, Success in Academic Surgery, DOI 10.1007/978-1-4471-4718-3__14, © Springer-Verlag London 2014

167

168

K.M. Sheffield and T.S. Riall

Abbreviations RCT SEER NSQIP INR FEV1 IV MI

randomized controlled trial Surveillance Epidemiology and End Results National Surgical Quality Improvement Program International Normalized Ratio forced expiratory volume instrumental variable myocardial infarction

14.1 Using Observational Data for Health Services and Comparative Effectiveness Research When making health care decisions, patients, their providers, and health care policymakers need evidence on the benefits and harms of different treatment options. Randomized controlled trials (RCT) are considered the most valid methodology, or ‘gold standard,’ for evaluating treatment effects. However, RCTs are expensive and many medications, treatments, tests, surgical procedures, and health care delivery methods have not been and cannot be evaluated in RCTs. High-risk groups, such as older and sicker adults, are not well represented in RCTs of medical interventions and procedures. Similarly, rare conditions are not easily studied in an RCT. While RCTs indicate how a treatment performs in a controlled trial setting (known as treatment efficacy), they have limited generalizability to community populations and settings. It is important to evaluate effectiveness—or how a treatment performs when used by regular doctors treating real patients in the community—even when a treatment has demonstrated efficacy in a randomized trial. A treatment might lack effectiveness in the community because of broadening of indications, comorbidities that interfere with treatment, poorer treatment adherence, or differences in the age and health status of the treated population. Increasingly, investigators are using non-randomized studies to evaluate the effectiveness and safety of medical interventions in real-world community practice settings where RCTs are not feasible. Insurance claims data (Medicare), tumor registry data (Surveillance Epidemiology and End Results, or SEER), hospital discharge data (Nationwide Inpatient Sample), the National Surgical Quality Improvement Program (NSQIP), and other observational datasets, many not collected for research purposes, can be used to conduct comparative effectiveness research. For example, observational datasets have been used to evaluate the effectiveness of emergent cardiac catheterization in patients with acute myocardial infarction [1], laparoscopic vs. open appendectomy [2], antecolic vs. retrocolic gastrojejunostomy in the prevention of internal hernia during gastric bypass [3], long-term survival after endovascular and open repair [4, 5], and observation vs. active treatment (surgery or radiation) for early stage prostate cancer [6, 7].

14 Methods for Enhancing Causal Inference in Observational Studies

169

14.2 Causal Inference with Observational Data Causal inference is more difficult in observational studies than in RCTs because patients are not randomly assigned to treatment groups. In every day practice, treatment decisions are influenced by clinician judgment and preference, patient characteristics, processes of care, access to care, and the presence of multiple concurrent interventions or problems. This process of individualized treatment based on non-random clinical judgment about risk/benefit creates systematic differences between treatment groups. If the reasons for receiving a given treatment are associated with predictors of the outcome of interest, then measures of treatment effectiveness may be biased. Furthermore, in observational studies the information that patients and providers use to select treatment is often inaccessible to the investigator. This is especially true when investigators use administrative data, which are not collected for research purposes. Measured factors (i.e. age, race) that influence treatment decisions can be accounted for by inclusion in multivariate models. However, unmeasured factors (i.e. frailty, functional status) that are correlated with both the treatment and the outcome will lead to biased estimates of the treatment effect. In observational studies, it can be very difficult to determine whether differences in outcomes between treatment groups should be attributed to a treatment effect or to unmeasured patient differences.

14.2.1 Threats to Validity Prior to conducting an observational study, investigators need to identify potential threats to the validity of their results. Below, we review the most common issues for comparative effectiveness research.

14.2.1.1 Selection Bias Selection bias refers to two processes: the selection of patients into the study sample and the allocation of patients into treatment groups. In the first process, selection bias occurs when the selection of patients into the study produces a sample that is not representative of the population of interest. In the second process, selection bias occurs when non-random factors that influence treatment lead to allocation of patients into treatment groups that are systematically different. Treatment selection can be influenced by patient factors such as demographics, severity of illness, functional status, comorbid conditions, exposure to the health care system, socioeconomic status, and concurrent treatment, and other factors associated with the provider, health care system, and environment. Selection bias is

170

K.M. Sheffield and T.S. Riall

common in observational studies evaluating the effectiveness of surgical procedures because patients who undergo surgery are selected for good underlying health, while patients who are not fit enough for surgery get included in the no surgery group.

14.2.1.2 Confounding Confounding occurs when the set of variables that determine treatment selection are also related to the outcome. There are several sources of selection bias or confounding in observational studies of treatment effectiveness [8]. Confounding by indication or disease severity occurs when the sickest patients are more likely to receive a new drug or type of therapy, such as adjuvant therapy for pancreatic cancer or a new thrombolytic for myocardial infarction [8]. As a result, the intervention is associated with increased risk estimates and it appears to cause the outcome it is intended to prevent. Another source of confounding is selective prescribing or treatment discontinuation in very sick patients [8]. Patients who are disabled, frail, cognitively impaired, or in otherwise poor health may be less likely to receive treatment (particularly preventive medication) and more likely to have poor health outcomes, which exaggerates the estimated treatment effect. Conversely, the healthy user/adherer bias occurs because patients who initiate a preventive medication or who adhere to treatment may be more likely to engage in other healthy behaviors and seek out preventive healthcare services [8]. This can exaggerate the estimated treatment effect as well as produce spurious associations between the treatment and other health outcomes. Table 14.1 lists some of the types of selection bias and confounding that might operate in studies of treatment outcomes using SEER-Medicare data. These selection biases occur even though the dataset contains information, such as tumor size, stage, and histologic grade that might be expected to control for selection effects.

14.2.1.3 Measurement Bias Measurement bias involves systematic error in measuring the exposure, outcome, or covariates in a study. Measurement error is a major concern for observational studies, particularly those using administrative data such as Medicare claims. The likelihood of measurement error differs based on the type of intervention, outcome, or covariates being measured. For example, it is fairly straightforward to define and identify surgical procedures in Medicare claims data, and costly procedures tend to be accurately coded in billing data. In contrast, it can be very difficult to identify medication use, define an exposure period, and classify patients as treated or untreated. An outcome such as survival is less likely to have measurement error than outcomes such as postoperative complications or incident disease. Similarly, comorbid conditions or risk factors such as smoking may be more difficult to measure. This is particularly true with claims data because diagnosis codes are

Variable association with tumor prognosis depending on site Healthier patients treated Variable association with tumor prognosis depending on site XRT for healthier patients XRT for more extensive tumor

Tumor prognosis

Tumor prognosis

Tumor prognosis

General health

General health

Note: XRT indicates radiation therapy

Surgery & XRT vs. surgery alone

Any treatment vs. no treatment

General health

Chemo prescribed only for patients with worse tumor prognosis Surgery for healthier patients

Tumor prognosis

Surgery vs. XRT in several clinical scenarios

Chemo prescribed only for healthier patients

General health

Chemo vs. no in several clinical scenarios

Expected selection bias

Selection factor

Treatment choice

XRT associated with decreased non-cancer mortality XRT associated with increased cancer mortality

Treatment associated with decreased non-cancer mortality Variable depending on site

Variable depending on site

Chemo patients have lower non-cancer mortality Chemo associated with increased cancer mortality Surgery associated with lower non-cancer mortality

Expected results from biases alone

Table 14.1 Outline of overall types and direction of selection biases involved in choice of cancer treatment

CC

C

C

CC

CC

CC

CC

CCC

Estimated strength of effect

14 Methods for Enhancing Causal Inference in Observational Studies 171

172

K.M. Sheffield and T.S. Riall

subject to considerable error and the use of a particular diagnosis code on a claim depends on the presence of the condition, a provider’s use of the code, and the presence of other, more serious conditions. One way to estimate the validity of the exposure and outcome variables in an observational study is to compare them with a gold standard such as patient self-report or the medical record.

14.2.2 Unmeasured Confounding Even very rich datasets such as medical records lack complete information on factors influencing selection of treatment. Perhaps the best example of the prognostic strength of missing variables is self-rated health. In most cohort studies, self-rated health is the strongest predictor of survival (after age and gender). More importantly, self-rated health remains a strong predictor in studies that include a rich array of medical, physiologic, social, and psychological variables, such as the Cardiovascular Health Survey. This means that there is a factor known by the patient and easily accessible by the physician (“How are you feeling?”) that clearly influences prognosis, that would likely influence treatment choice, and which is invisible in almost all comparative research studies using observational data. Causal inference relies on the assumption of no unmeasured confounding; however, there is no way to test that the assumption is correct, making causal inference risky in observational studies. Investigators must do their best to identify, measure, and adjust for all potential confounders. Studies that do not use the appropriate methodology to account for observed and unobserved sources of bias and confounding produce biased effect estimates that can contribute to inappropriate treatment and policy decisions. In the next section, we discuss methods of controlling for bias in observational studies, including steps investigators can take during the design and analysis phases to minimize unmeasured confounding.

14.3 Controlling for Bias in Observational Studies Careful study design and research methods are key for causal inference with observational data. No amount of sophisticated statistical analysis can compensate for poor study design. A helpful exercise to conduct when designing an observational study is to describe the randomized experiment the investigator would like to— but cannot—conduct, and attempt to design an observational study that emulates the experiment. Research investigators also should collaborate with statisticians, methodologists, and clinicians with relevant subject-matter knowledge during the study design and analysis process. These collaborators can provide expert input to identify issues with the research question and approach. They also can help to identify confounding variables—determinants of treatment that are also independent outcome predictors—and other potential sources of bias, and determine the expected strength and direction of the anticipated bias.

14 Methods for Enhancing Causal Inference in Observational Studies

173

14.3.1 Study Design The ideal way to minimize bias in observational studies of treatment effectiveness is to collect comprehensive patient, treatment, and outcome data suggested by relevant clinicians and methodologists. This is ideal for primary research studies; however, it is not an option for secondary data analysis on existing observational data sets. Investigators who use existing datasets cannot control how patients were identified and selected, and the analysis is limited to available variables and the way they were measured at the time of data collection. Therefore, it is critical for investigators using secondary data to consider the comprehensive list of potential patient, provider, and process of care factors as the investigator considers potential challenges to causal inference in order to evaluate the feasibility of answering the research question with the available data. We review several research practices for secondary data analysis that will help to improve causal inference. Prior to designing a study and analyzing the data, investigators must familiarize themselves with the dataset they will be using, including how the sample was selected, how the data were collected, what variables are included and how they were defined, and the potential limitations. For example, hospital discharge data such as the Nationwide Inpatient Sample represent hospital discharges and not individual persons. Patients with multiple hospitalizations will be counted multiple times, and the dataset does not contain unique patient identifiers that allow followup after discharge. Administrative claims data such as Medicare data were not collected for research purposes and do not contain direct clinical information. Rather clinical information has to be inferred from diagnosis and procedure claims. In addition, diagnosis codes listed on claims were designed for reimbursement rather than surveillance purposes, and conditions may be included based on reimbursement rather than clinical importance. Finally, sometimes a dataset contains inadequate information to investigate a particular question. For example, we wanted to use SEER data to examine breast cancer outcomes in patients with positive sentinel nodes who underwent sentinel lymph node biopsy alone or in combination with axillary lymph node dissection. After careful review of the SEER documentation for staging, lymph node status, and dissection variables, we discovered that SEER does not separately record the pathology status of sentinel lymph nodes from axillary lymph nodes. Investigators who are not familiar with their dataset may make incorrect assumptions about how data were collected or variables were defined that could jeopardize the results of their study and preclude causal inference. Investigators need to explicitly define the intervention or treatment of interest. A well-defined causal effect is necessary for meaningful causal inference. For many interventions, there are a number of ways to define or measure exposure, which could lead to very different estimates of effectiveness. For example, in a study evaluating the effectiveness of chemotherapy, the definition of exposure to chemotherapy could specify a certain number of doses of chemotherapy within a certain time frame, or it could require only one dose at any time point after

174

K.M. Sheffield and T.S. Riall

Table 14.2 Statistical methods to reduce confounding in observational studies Statistical method Multivariate regression Propensity score analysis (stratification, matching, inverse probability weighting, regression adjustment) Instrumental variable analysis

Purpose/use Estimate conditional expectation of dependent variable given independent variables Reduce imbalance in treatment and control groups based on observed variables Adjust for unobserved confounding

diagnosis. These definitions are likely to produce different results. Investigators using administrative claims data have to infer receipt of treatment based on claims for services and often have to develop surrogate measures of an intervention. This requires the investigator to make assumptions, and he/she must consider how results may be affected. Another prerequisite for causal inference is a well-characterized target population. Investigators need to explicitly define the subset of the population in which the effect is being estimated and the population to whom the results may be generalized. The investigator should carefully select the study cohort and construct the treatment comparison groups and carefully define the observation period in which outcomes will be monitored. Cohort selection criteria should be specified to construct a ‘clean’ patient sample. For example, in a recent study evaluating overuse of cardiac stress testing before elective noncardiac surgery, the cohort was restricted to patients with no active cardiac conditions or clinical risk factors [9]. Cardiac stress testing was clearly not indicated in such patients; therefore, investigators could label testing as overuse. When defining an observation period, investigators must consider the length of time that is appropriate for the research question and the study outcome. For example, 2-year survival would be an adequate amount of time to assess the effectiveness of interventions for pancreatic cancer, but not for breast or prostate cancer. Finally, investigators must determine the extent to which the potential confounders identified by the research team are observable, measurable, or proxied by existing variables in the observational dataset.

14.3.2 Statistical Techniques There are a number of statistical methods aimed at strengthening causal inference in observational studies of the comparative effectiveness of different treatments. Table 14.2 shows the most common statistical methods used to adjust for bias. Below, we briefly discuss the statistical methods with regard to their contributions to causal inference for observational data. A detailed description of these methods is beyond the scope of this chapter.

14 Methods for Enhancing Causal Inference in Observational Studies

175

14.3.2.1 Multivariate Regression Multivariate regression is the conventional method of data analysis in observational studies. Regression models may take many forms, depending on the distribution of the response variable and structure of the dataset. The most commonly used regression models include linear regression for continuous outcomes (e.g., the effect of age on FEV1), logistic regression for categorical outcomes (e.g., the effect of intraoperative cholangiography on bile duct injury), Cox proportional hazards models for time-to-event outcomes (e.g., effect of adjuvant chemotherapy on survival), and Poisson regression for count data (e.g., the effect of INR level on ischemic stroke rates). Regression analysis is used to disentangle the effect of the relationship of interest from the contribution of the covariates that may affect the outcome. Regression can control for differences between treatment groups by providing estimates of the treatment effect when the other covariates are held fixed. However, in order to control for a covariate, it must be measurable in the observational dataset; therefore, multivariate regression analysis is unable to control for the effects of unmeasured confounders.

14.3.2.2 Stratification or Restriction Prior to Multivariate Regression Stratification may be used as a method to adjust for a measurable prognostic factor that differs systematically between treatment groups, that is, a potential confounder. Patients are grouped into strata of the prognostic variable, and the treatment effect is estimated by comparing treated and untreated patients within each stratum. This method yields effect measures for each stratum of the prognostic variable, known as conditional effect measures. They do not indicate the average treatment effect in the entire population. Sometimes investigators estimate the treatment effect in only some of the strata defined by the prognostic factor, a form of stratification known as restriction. Stratification and restriction create subgroups that are more homogeneous, sometimes enabling the investigator to identify the presence of confounding. For example, a study assessing the short-term outcomes of incidental appendectomy during open cholecystectomy used restriction to evaluate the consistency and plausibility of their results [10]. Table 14.3 shows the unadjusted and adjusted associations between incidental appendectomy and adverse outcomes in the overall cohort and in restricted subgroups. Unadjusted comparisons showed paradoxical reductions in mortality and length of stay associated with incidental appendectomy. Multivariate models adjusting for potential confounders, such as comorbidity and nonelective surgery, showed increased risk of nonfatal complications with incidental appendectomy but no differences in mortality or length of stay. The investigators believed that unmeasured differences between the appendectomy and

176

K.M. Sheffield and T.S. Riall

Table 14.3 Outcomes of patients undergoing open cholecystectomy with vs. without incidental appendectomy for the overall patient cohort and low-risk subgroups

Overall cohort Unadjusted Adjusteda Low-risk subgroups, adjusted outcomesa Age 70 and elective surgery Elective surgery and no comorbidity

In-hospital death or (95 % CI)

Complications or (95 % CI)

Length of hospital stay adjusted differences (95 % CI)

0.37 (0.23, 0.57) 0.98 (0.62, 1.56)

1.07 (0.98, 1.17) 1.53 (1.39, 1.68)

0.46 (0.38, 0.54) 0.05 (0.02, 0.12)

2.65 (1.25, 5.64) 2.20 (0.95, 5.10)

1.49 (1.32, 1.69) 1.53 (1.35, 1.73)

0.12 (0.05, 0.19) 0.11 (0.04, 0.18)

Note: OR indicates odds ratio, CI confidence interval a Analyses adjusted for patients’ age, sex, primary diagnosis, comorbidity, and admission category, hospital teaching status and bed size, and year of surgery

no appendectomy groups were more likely to exist in high risk patients, confounding the estimates for the overall sample. After restricting the analysis to subgroups of patients with low surgical risk, incidental appendectomy was consistently associated with a small but definite increase in adverse postoperative outcomes.

14.3.2.3 Propensity Score Analysis A propensity score is the conditional probability that a patient will be assigned to a treatment group given a set of covariates, for example, the probability that a patient will undergo incidental appendectomy given his age, health status, primary diagnosis, and other factors. Propensity scores are generated using a logistic regression analysis modeling the receipt of treatment, and each patient is assigned a propensity score based on his or her individual characteristics. It is appropriate to use propensity score analysis when a large number of variables influence treatment choice. Propensity score analysis enables the investigator to balance treatment groups according to distributions of measured covariates. An implicit assumption of propensity score analysis is that balancing the observed patient characteristics minimizes the potential bias from unobserved patient characteristics. There are four general strategies for balancing patient characteristics using propensity scores: stratifying patients into groups (e.g., quartiles or quintiles) on the basis of the propensity score; matching patients with similar propensity scores across treatment groups; covariate adjustment using the propensity score in multivariate analyses; and weighting patients on the basis of their propensity score, also known as inverse probability of treatment weighting. To determine whether the propensity score model has adequately balanced the treatment groups, an investigator can compare the distributions of measured covariates between treatment groups in the propensity score matched sample, or within strata of the

14 Methods for Enhancing Causal Inference in Observational Studies

177

Table 14.4 Results of Cox models comparing lung cancer-specific survival of patients treated with segmentectomy vs. wedge resection, by propensity score methoda Propensity score method Adjustment for propensity score as covariate Stratifying by propensity score quintiles Matching based on propensity scores

Lung cancer survival HR (95 % CI) 0.76 (0.61, 0.94) 0.76 (0.61, 0.94) 0.72 (0.60, 0.86)

Note: HR indicates hazard ratio, CI confidence interval a All Cox models adjusted for number of lymph nodes evaluated during surgery

propensity score, or within the weighted sample [11]. Once satisfied that balance has been achieved, the investigator can then directly estimate the effect of treatment on the outcome in the matched, stratified, or weighted sample. If the investigator is using the covariate adjustment using the propensity score approach, then a regression model relating the outcome to treatment status and the propensity score must be specified. Studies comparing these four propensity score strategies have demonstrated that propensity score matching is the most effective at removing systematic differences in baseline characteristics between treated and untreated patients [11]. Let’s review an example from the literature of a study that used propensity score analysis to compare lung cancer-specific survival between patients who underwent either wedge resection or segmentectomy. This study used SEER registry data to identify 3,525 patients with stage IA non-small cell lung cancer [12]. A logistic regression model was used to estimate propensity scores for patients undergoing segmentectomy based on age, sex, race/ethnicity, marital status, and tumor characteristics. Baseline characteristics were balanced across the two treatment groups after adjusting for the estimated propensity scores. The investigators used three propensity score methods to estimate the association between segmentectomy and survival: adjusting for the propensity score in a Cox regression analysis; stratifying by propensity score quintiles and estimating a Cox model within the five strata; and matching based on propensity scores and using a Cox model to compare survival between matched groups. Table 14.4, above, shows the results for each method. Segmentectomy was associated with significant improvement in survival in all models, though propensity score matching resulted in slightly stronger associations. Propensity score analysis has several advantages over multivariate regression. First, propensity score analysis often allows the investigators to adjust for more covariates than it is possible to include in a conventional multivariate model. When a study outcome is uncommon, investigators are limited in the number of covariates that may be included in the regression model (there should be at least 10 outcome events for every covariate, so a model with 10 covariates should have at least 100 patients who experienced the outcome of interest). The propensity score model, on the other hand, is modeling receipt of treatment and can typically include many more covariates. Another advantage of propensity score analysis over conventional multivariate regression is that propensity score analysis allows the investigator to explicitly examine the degree of overlap in the distribution of baseline covariates

178

K.M. Sheffield and T.S. Riall

between treated and untreated patients. Sparse overlap is evident when there are few patients who are able to be matched on their propensity scores or when strata contain primarily either treated patients or untreated patients [11]. The investigator may choose to restrict the analysis to patients who have similar covariate distributions, or if the overlap is too sparse, the investigator may conclude that the treated and untreated patients are so different that it is not possible to compare their outcomes and discontinue the analysis.

14.3.2.4 Instrumental Variable (IV) Analysis Instrumental variable (IV) analysis provides a powerful means to eliminate confounding arising from both measured and unmeasured factors in observational studies of treatment effectiveness. The IV method employs a variable called an instrument (hereafter, IV) that is predictive of treatment but has no effect on the outcome, except through its influence on treatment assignment. The most familiar illustration of an IV is random assignment to treatment groups in RCTs. This random assignment is unrelated to patient characteristics and has no direct effect on the study outcome. A good IV creates an allocation that is similar to randomization. As a result, the comparison groups can be expected to have similar distributions of measured and unmeasured characteristics. In IV analysis, patients are compared based on their likelihood of receiving treatment according to the IV, rather than their actual receipt of treatment. That is, one would report the difference in outcomes between patients who were likely to receive treatment based on the IV and those who were unlikely to receive treatment based on the IV. This is similar to an intentionto-treat estimator in randomized trials, where one might report the difference in outcomes between assigned treatment arms. Let’s consider the example of a drug’s availability on a hospital formulary, described by Brookhart and colleagues [13]. A new thrombolytic medication, Drug X, recently became available for acute myocardial infarction (MI). Drug X is believed to be more effective than existing medications, but side effects are rare and its safety cannot be studied in smaller cohort studies. Administrative data would provide a large enough sample size but inadequate adjustment for severity of coronary artery disease. The medication is used preferentially in the sickest patients; therefore, outcomes may appear to be worse in patients treated with Drug X. The drug has been added to some hospital formularies but not others, due to its cost. The drug’s availability on the hospital formulary could be used as an IV to examine the effectiveness and safety of Drug X. Availability on the hospital formulary is clearly related to receipt of Drug X, but it should not affect outcomes except through receipt of the drug (provided the hospital formulary is not associated with hospital quality, etc.). Hospital formulary status (and consequently, receipt of Drug X) is effectively randomized to patients, who have no foreknowledge of the formula status. Outcomes would be compared between patients with acute MI who were admitted to hospitals with Drug X on the formulary and patients with acute MI who were admitted to hospitals without Drug X on the formulary.

14 Methods for Enhancing Causal Inference in Observational Studies

179

Table 14.5 Instrumental variables used in cancer outcomes research studies Category of instrument Availability of a key medical resource

Geographic variation in intensity of treatment utilization or provider practice patterns

Economic incentives to provider and/or cost to patient of alternative treatments Secular trends and/or changes in treatment patterns

Examples Number of oncologists per capita in the hospital referral region (HRR); distance to nearest radiology facility; travel distance to surgical provider Regional prevalence in the use of chemotherapy, androgen deprivation therapy, breast conserving surgery, radiation therapy; surgeon’s preceding patient’s receipt of adjuvant chemotherapy; proportion of patients treated with multiple cycles of chemotherapy in a center Medicare’s average physician fees for breast conserving surgery and mastectomy Year of surgery

Table 14.5 shows some examples of IVs that have been used in studies evaluating outcomes of cancer therapy. Finally, let’s review an example from the literature that used geographic variation in treatment utilization as an IV. This study used Medicare data linked with prognostic variables from baseline chart reviews to assess the association of cardiac catheterization with long-term mortality from acute MI in a national cohort of Medicare enrollees hospitalized with acute MI (n D 122,124) [1]. IV analysis was performed to adjust for suspected differences in unmeasured risk factors between patients who underwent cardiac catheterization and those who didn’t. Regional catheterization rate was selected as the IV. Mean catheterization rates ranged from 29 to 82 % across regions. A summary measure of acute MI severity, mean predicted 1-year mortality, was similar across regions that differed dramatically in catheterization rates and all measured risk factors were balanced across regions. This demonstrated that regional catheterization rate could serve as an effective IV and that the distribution of other unmeasured risk factors was likely balanced across regions as well. The authors compared results across several risk-adjustment methods. Table 14.6 shows relative mortality rates and absolute mortality differences by risk-adjustment method. In patients receiving cardiac catheterization, unadjusted and multivariate adjusted 4-year mortality was 33.9 and 20.7 % lower, respectively, than in patients not receiving catheterization. However, IV analysis showed that 4-year mortality was only 9.7 % lower in patient receiving catheterization, corresponding to a relative mortality rate of 0.84. Other risk-adjustment methods, including propensity score matching, produced relative mortality rates of about 0.50. The IV estimate of survival benefit was more consistent with results from RCTs than other riskadjustment methods.

180

K.M. Sheffield and T.S. Riall

Table 14.6 Mortality rates associated with receipt of cardiac catheterization among patients with acute myocardial infarction, by risk-adjustment method Risk-adjustment method Unadjusted cox model Adjusted cox modela Cox model in propensity-based matching cohortb Unadjusted linear regressionc Adjusted linear regressiona,c Instrumental variable analysisa,c

Relative mortality rate (95 % CI)

Absolute mortality difference (4)(SE)

0.36 (0.36, 0.37) 0.51 (0.50, 0.52) 0.54 (0.52, 0.56)

– – –

0.45 (0.44, 0.46) 0.67 (0.66, 0.68) 0.84 (0.79, 0.90)

0.339 (0.003) 0.207 (0.003) 0.097 (0.016)

Note: CI indicates confidence interval, SE standard error a Adjusted for 65 patient, hospital, and zip code characteristics associated with post-acute myocardial infarction mortality b Propensity score estimated based on 65 patient, hospital, and zip code covariates c Models use 4-year mortality (binary variable) as the dependent variable

14.3.3 Limits of Advanced Statistical Techniques The application of advanced statistical methods can leave investigators and practitioners with a false sense of security in the results. Propensity score analysis will leave residual confounding if there is imbalance across treatment groups in unobserved factors that influence health. IV analysis can eliminate confounding arising from observed and unobserved factors. However, IV analysis requires the identification of a strong and logically justifiable instrument, which can be very difficult. To critically evaluate their results, investigators can compare the magnitude and direction of the predicted treatment “effect” across several different adjustment methods. Investigators should also be on the lookout for implausible outcomes—outcomes that would be expected because of the factors influencing treatment choice, but that could not plausibly be related to an actual effect of treatment. The existence of implausible outcomes in a study should be a warning sign that the effect measures are biased. For example, incidental appendectomy during open cholecystectomy could not plausibly improve mortality or shorten length of stay in the hospital. The association was a result of lower risk surgical patients undergoing incidental appendectomy. The figure below shows implausible results from an analysis evaluating the effect of active therapy versus observation on survival of men with localized prostate cancer [7]. Giordano and colleagues replicated a prior SEERMedicare study that reported an overall survival benefit associated with active therapy after stratifying by propensity score quintiles to adjust for confounders [6]. Giordano and colleagues used identical analytic methods, but they also examined cause-specific mortality. Active therapy was strongly (and implausibly) associated with a reduction in mortality from heart disease, diabetes, and other causes of death, indicating a strong remaining bias (Fig. 14.1).

14 Methods for Enhancing Causal Inference in Observational Studies

181

Fig. 14.1 Plot of hazard ratio for death comparing active therapy versus observation in men with localized prostate cancer. CVD indicates cardiovascular disease, COPD chronic obstructive pulmonary disease, DM diabetes mellitus (Credit: Figure reprinted from Giordano et al. [7], with permission from John Wiley and Sons)

14.4 Conclusions Bias and confounding are major issues in studies that assess treatment effectiveness based on observational data, making causal inference difficult. Investigators must conduct a rigorous assessment of threats to the validity of their findings and estimate the strength and direction of suspected bias. Advanced statistical methods are available to adjust for confounding and improve causal inference. However, investigators must carefully consider the limitations of their data because sometimes confounding cannot be overcome with statistical methods, and some comparative effectiveness questions cannot be answered with currently available observational data. Researchers must carefully map the boundaries of comparative effectiveness research using observational data. There is important information to be learned from observational studies, especially population-based cohorts that include patients/providers who are unlikely to participate in randomized clinical trials.

References 1. Stukel TA, Fisher ES, Wennberg DE, Alter DA, Gottlieb DJ, Vermeulen MJ. Analysis of observational studies in the presence of treatment selection bias: effects of invasive cardiac management on AMI survival using propensity score and instrumental variable methods. JAMA. 2007;297(3):278–85.

182

K.M. Sheffield and T.S. Riall

2. Hemmila MR, Birkmeyer NJ, Arbabi S, Osborne NH, Wahl WL, Dimick JB. Introduction to propensity scores a case study on the comparative effectiveness of laparoscopic vs open appendectomy. Arch Surg. 2010;145(10):939–45. 3. Steele KE, Prokopowicz GP, Magnuson T, Lidor A, Schweitzer M. Laparoscopic antecolic Roux-En-Y gastric bypass with closure of internal defects leads to fewer internal hernias than the retrocolic approach. Surg Endosc Other Intervent Tech. 2008;22(9):2056–61. 4. Jackson RS, Chang DC, Freischlag JA. Comparison of long-term survival after open vs endovascular repair of intact abdominal aortic aneurysm among medicare beneficiaries. JAMA. 2012;307(15):1621–8. 5. Lee HG, Clair DG, Ouriel K. Ten-year comparison of all-cause mortality after endovascular or open repair of abdominal aortic aneurysms: a propensity score analysis. World J Surg. 2013;37(3):680–7. 6. Wong YN, Mitra N, Hudes G, Localio R, Schwartz JS, Wan F, et al. Survival associated with treatment vs observation of localized prostate cancer in elderly men. JAMA. 2006;296(22):2683–93. 7. Giordano SH, Kuo YF, Duan Z, Hortobagyi GN, Freeman J, Goodwin JS. Limits of observational data in determining outcomes from cancer therapy. Cancer. 2008;112(11):2456– 66. 8. Brookhart MA, Sturmer T, Glynn RJ, Rassen J, Schneeweiss S. Confounding control in healthcare database research: challenges and potential approaches. Med Care. 2010;48(6 Suppl):S114–20. 9. Sheffield KM, McAdams PS, Benarroch-Gampel J, Goodwin JS, Boyd CA, Zhang D, et al. Overuse of preoperative cardiac stress testing in medicare patients undergoing elective noncardiac surgery. Ann Surg. 2013;257(1):73–80. 10. Wen SW, Hernandez R, Naylor CD. Pitfalls in nonrandomized outcomes studies. The case of incidental appendectomy with open cholecystectomy. JAMA. 1995;274(21):1687–91. 11. Austin PC. An introduction to propensity score methods for reducing the effects of confounding in observational studies. Multivar Behav Res. 2011;46(3):399–424. 12. Smith CB, Swanson SJ, Mhango G, Wisnivesky JP. Survival after segmentectomy and wedge resection in stage I non-small-cell lung cancer. J Thorac Oncol. 2013;8(1):73–8. 13. Brookhart MA, Rassen JA, Schneeweiss S. Instrumental variable methods in comparative safety and effectiveness research. Pharmacoepidemiol Drug Saf. 2010;19(6):537–54.

Landmark Papers to Recommend to Readers • Giordano SH, Kuo YF, Duan Z, Hortobagyi GN, Freeman J, Goodwin JS. Limits of observational data in determining outcomes from cancer therapy. Cancer. 2008;112(11):2456– 66. • Stukel TA, Fisher ES, Wennberg DE, Alter DA, Gottlieb DJ, Vermeulen MJ. Analysis of observational studies in the presence of treatment selection bias: effects of invasive cardiac management on AMI survival using propensity score and instrumental variable methods. JAMA. 2007;297(3):278–85. • Wen SW, Hernandez R, Naylor CD. Pitfalls in nonrandomized outcomes studies. The case of incidental appendectomy with open cholecystectomy. JAMA. 1995;274(21):1687–91.

Chapter 15

Systematic Review and Meta-analysis: A Clinical Exercise Melinda A. Gibbons

Abstract The growth of new clinical knowledge continues to challenge how surgery is practiced, and several types of literature reviews attempt to consolidate this expansion of information. Systematic reviews and meta-analysis are common methodologies that integrate findings on the same subject collected from different studies. Unlike a systematic review, a meta-analysis arrives at a conclusion backed by quantitative analysis. This review provides an overview of the principles, application, and limitations of these methods, which is fundamental to interpreting and critiquing their results. Keywords Meta-analysis • Systematic review • Forest plot • Risk of bias • Odds ratio • Relative risk

15.1 Introduction Systematic reviews and meta-analyses are two approaches that combine results on the same subject obtained from different studies [1–4]. As clinical data continues to expand and practitioners are challenged to change treatment paradigms, such rigorous summaries of the existing literature are increasingly critical. Narrative or quantitative summaries combine results from individual studies, either by presenting the data together in one place or pooling the data, to determine whether a difference

M.A. Gibbons, M.D., M.S.H.S. () Department of Surgery, David Geffen School of Medicine at University of California, 72-215 CHS, 10833 Le Conte Avenue, Los Angeles, CA 90095, USA Department of Surgery, Olive View UCLA Medical Center, Sylmar, CA, USA Department of Surgery, Greater Los Angeles VA Medical Center, Los Angeles, CA, USA e-mail: [email protected] J.B. Dimick and C.C. Greenberg (eds.), Success in Academic Surgery: Health Services Research, Success in Academic Surgery, DOI 10.1007/978-1-4471-4718-3__15, © Springer-Verlag London 2014

183

184

M.A. Gibbons

in outcomes exists between two treatments or study arms. Results can resolve controversies within the literature and ultimately shape and change clinical practice. The first rule of a systematic review or meta-analysis is that it should be a clinical exercise. The statistics are mechanical and can be performed on any group of data. The only thing that gives them meaning is the thoughtful selection of what studies and what outcomes to pool, which is why the clinical judgment of “what makes sense” is the most important factor in summarizing or combining results. This overview introduces the rationale behind performing meta-analysis, including systematic reviews, reviews the methodology, and discusses its strengths and limitations.

15.2 Systematic Reviews Non-systematic reviews and systematic reviews both summarize results of previously published literature, but without the quantitative formal statistical pooling that comprises a formal meta-analysis. A traditional “review of the literature” condenses past studies in narrative form and gives generalized conclusions; however, nonsystematic reviews lack structure in how the studies were selected and statistical summary of the data. It is a convenience sample of the studies that the authors thought were important to discuss. However, without a structured method to identify and select the appropriate articles, there is little way to reproduce the findings. First, the studies selected may not represent all the available evidence, as a rigorous set of selection criteria are not employed. Thus the summation may support the reviewer’s bias. Second, a narrative review does not quantitatively combine or weight the data, which can create misleading interpretations [4]. Finally, nonsystematic reviews tend to lack transparency in how they addressed heterogeneity in the study populations, interventions, or outcomes. Thus non-systematic reviews provide more of a snapshot of a sample of studies on a particular topic and strong conclusions should not be made [3, 5]. A systematic review is a higher level of analysis as it follows specific methods to define a research question, criteria for study inclusion, and method of data collection [3, 5]. Systematic reviews overcome some limitations of a non-systemic review as they tend to have less bias in study design and selection of studies and therefore offer a less subjective or biased conclusion. However, systematic reviews do not formally combine (i.e., pool) data or weight the relative contribution of each study based on sample size, but they can provide a useful summary of results on a common topic.

15.3 Meta-analysis A meta-analysis follows strict methods to define the study question, establish study inclusion criteria, and quantitatively pool the data – when appropriate. The goal is to create an objective conclusion based on the available evidence [3, 5]. One of

15 Systematic Review and Meta-analysis: A Clinical Exercise

185

the main strengths of a meta-analysis is its ability to quantify results from multiple studies by weighting and pooling the contribution of each study, which is not done using a non-systematic or systematic review. Meta-analyses can offer conclusions where data are conflicting or inconclusive [3–5]. While randomized controlled trial (RCT)s are the gold standard of evidencebased medicine, it is common for the results from the individual trials to vary. Metaanalysis can help synthesize the results for a number of scenarios where findings of the studies: (1) show no effect due to small sample size, (2) varying direction of effect, or (3) conflicting results regarding effect versus no significant effect. For example, a meta-analysis by the Cochrane Collaboration (2004) compared laparoscopic versus open appendectomy [6]. The investigators identified a number of RCTs; however, none definitively established the benefits. The meta-analysis found that the laparoscopic approach was associated with lower wound infection rate, decreased incidence of intra-abdominal abscess, less post-operative pain, and shorter duration of hospital stay. A meta-analysis helped demonstrate the benefit of radiation for early breast cancer that was not clear as many of the studies had small sample sizes. The pooled analysis demonstrated that radiation (after lumpectomy) had three time lower local recurrence rate than surgery alone and no difference in mortality [7]. Meta-analysis combines smaller studies when large definitive trials were not available or where controversy between studies exist, thus adding to the strength and generalizability of the findings.

15.3.1 Methodology Meta-analysis follows an approach similar to that used for primary data collection research [3, 4, 8]. The steps include: (1) define the research question, (2) establish study selection criteria, (3) perform a literature search, (4) abstract study variable and outcomes, and (5) data analysis and presentation of results. The first step defines a specific research question and a formal protocol detailing objectives and hypotheses. This critical phase forms the backbone of the work. The research question should address the type of patients, intervention, comparison group, and clinical outcomes of interest [9]. The second step defines the inclusion and exclusion criteria for identifying eligible studies. Criteria needs to detail the type of study design (e.g. randomized controlled trial, observational cohort study), patients (e.g. age, gender, presence of medical conditions), data publication (e.g. abstract only or non-published data), language (e.g. inclusion of non-English studies), and time period [3, 4]. For example, there may be an evolution or change in clinical management over time. The degree of criteria specificity can impact results, as broad inclusion criteria tend to increase heterogeneity among studies, while narrow inclusion criteria lead to limited subgroup analysis [10].

186

M.A. Gibbons

Third, a literature search is performed to obtain all relevant studies that meet the inclusion criteria, such as Medline, the Cochrane Library, Current Contents, and EMBASE. Utilizing multiple databases helps ensure that pertinent publications are not omitted; non-English written articles should be included and translated when appropriate [3, 4, 8]. Scanning bibliographies of the retrieved articles (referred to as reference mining) and asking experts in the field will identify additional publications [3]. Articles with duplicate or previously published data should be excluded. The fourth step involves abstraction of study features, characteristics of the patient population, and outcomes onto a standard data collection form [8]. The quality of studies should be evaluated with regards to randomization, blinding, and explanation for dropouts and withdrawals, which addresses internal validity (minimization of bias) and external validity (generalizability) [11]. To maintain accuracy, two independent researchers should extract the data, and the degree of agreement between the reviewers should be calculated (e.g. kappa statistic) [8]. A formal process for how discrepancies are resolved must be established. Blinding researchers to the study authors and other identifying information may decrease the chance of bias, but this is not routinely done. The fifth step of a meta-analysis involves data analysis and presentation of results. The type of analysis depends on whether the outcome variable is continuous (e.g., length of hospital stay) or dichotomous (e.g., adverse event occurrence or rate). For continuous endpoints, the mean difference between two groups (e.g., control and treatment groups) is recorded [8]. Data must be translated to a common scale to allow for comparison. For example, a recent meta-analysis on bariatric surgery found that the majority of articles reported preoperative weight in kilograms, while some articles reported preoperative weight in pounds [12]. Transforming data into a common scale allows for maximal inclusion of data. If the endpoint is dichotomous (effect versus no effect), the odds ratio (OR) or relative risk (RR) is calculated [8]. Odds is defined as the ratio of events to nonevents, and the odds ratio is defined as the odds in one group (e.g. treatment group) divided by the odds in a second group (e.g., control group). OR greater than one means that the event is more likely in the treatment group and therefore the treatment group is favored if the “event” is desirable (e.g. survival). Risk is defined as the number of patients with an event divided by the total number of patients, and the risk ratio is defined as the risk in one group (e.g., treatment group) divided by the risk in a second group (e.g., control group). RR less than one favors the treatment group if the “event” is not desirable (e.g., reoperation). Relative risk tends to be an easier concept to understand when considering clinical outcomes – it is the risk of an event between two groups. For odds ratios, the interpretation is harder to understand – it is the ratio of odds between two groups. Of note, odds ratios do not translate directly to relative risks, especially as the effect size increases. If an odds ratio is assumed incorrectly to be a relative risk, effect size will be overestimated. In general, OR are reported for retrospective (case–control) studies and RR are reported for prospective cohort studies. Review Manager (RevMan) is a commonly applied software that allows researchers to enter the data for the included studies and generates pooled data results and graphic presentation diagrams [13].

15 Systematic Review and Meta-analysis: A Clinical Exercise

187

Fig. 15.1 Forest plot depicting pooled random effects meta-analysis and subgroup estimates according to dual versus single ring structure of wound protector

When reporting a meta-analysis, the combined study effect (i.e., difference between the study arms) is presented graphically along with results of individual studies, as demonstrated by the forest plot example in Fig. 15.1 [14]. In this example, the first column lists the six studies included in the pooled analysis on wound protectors at reducing surgical site infections. The number of infections in each study is displayed – for each “arm” – with and without the wound protector. The vertical line represents the point (i.e., OR of 1) where there is no difference in event rate. The OR for each study is represented with a square while the 95 % CI is depicted as a horizontal line. If the CI includes 1, then there is no statistically significant difference between the procedures. Under the weight column, a percentage is provided quantifying each study’s contribution, and the corresponding OR and 95 % CI are reported. The diamond-shaped symbol represents the pooled analysis results where the midpoint corresponds to the pooled estimate and the horizontal spread the pooled 95 % CI. If the horizontal spread does not meet the vertical line in this forest plot, there is a statistically significant difference between the treatments. If the horizontal spread of the diamond had traversed the vertical OR line, there would be no difference. Test for overall effect determines the statistical significance of the meta-analysis by generating the zvalue along with the p-value [2]. The forest plot also provides information about heterogeneity through the plots of individual study effect and the chi-square test.

188

M.A. Gibbons

If the CIs of the studies do not overlap, it suggests substantial variation not accounted for by chance. If the chi-square test’s p-value is less than 0.10 or the chi-square value is greater than the degrees of freedom, then study heterogeneity is likely present [2]. For RCTs, absolute measures, such as the risk difference (RD), also called the absolute risk reduction, (ARR) and number of patients needed to treat (NNT) can be calculated. The RD is defined as the risk in the treatment group minus the risk in the control group, which quantifies the absolute change in risk due to the treatment. In general, a negative risk difference favors the treatment group. The NNT is defined as the inverse of the risk difference and is the number of patients that need to be treated with the intervention to prevent one event. In the case where the risk difference is positive (does not favor the treatment group), the inverse will provide the number needed to harm (NNH) [2]. For example, a meta-analysis comparing stent treatments of infragenicular vessels in for chronic lower limb ischemia found that primary patency was significantly higher with the drug eluting stent compared to patients treated with bare metal stent (OR 4.51, 95 % CI 2.90 to 7.02, and NNT 3.5) [15]. Drug eluting stent increased the odds of vessel patency 4.5 fold higher than the metal stent and 3.5 patients had to receive a drug eluting stent to prevent loss of patency in one patient in the control arm. The meta-analysis technique for combining data also utilizes a weighted average of the results. Larger trials are given more weight since the results of smaller trials are more likely to be affected by chance [4, 8]. Either the fixed-effects or random-effects model can be used for determining the overall effect. The fixedeffects model assumes that all studies are estimating the same common treatment effect; therefore if each study were infinitely large an identical treatment effect could be calculated. The random-effects model assumes that each study is estimating a different treatment effect and hence yields wider confidence intervals (CI). A meta-analysis should be analyzed using both models. If there is not a difference between the models, then the studies are unlikely to have significant statistical heterogeneity. If there is a considerable difference between the two models, then the most conservative estimate should be reported, which is usually the random effects model [2]. There are additional types of meta-analysis. One approach is to run the analysis based on individual patient data. While this method requires a greater amount of resources, it lessens the degree of publication and selection bias, thus potentially resulting in more accurate results. Another example is the cumulative meta-analysis, which involves repeating the meta-analysis as new study findings become available and allows for the accrual of data over time [5]. It can also retrospectively pinpoint the time when a treatment effect achieved statistical significance. Meta-analysis can be performed using both observational or RCT data. Ideally, limiting the meta-analysis to only RCT data will produce results with a higher level of scientific evidence. Randomized data will be less likely to have significant selection bias or other confounding factors. Pooling non-randomized data has many limitations that must be considered in the final assessment of the results.

15 Systematic Review and Meta-analysis: A Clinical Exercise

189

Furthermore, a general rule of thumb is that observational data should not be combined with randomized data within an analysis. While in general a meta-analysis produces an overall conclusion with more power than looking at the individual studies, results must be interpreted with consideration of the study question, selection criteria, method of data collection, and statistical analysis [4]. The main limitation of a meta-analysis is the potential for multiple types of bias. Pooling data from different sources unavoidably includes biases of the individual studies [16, 17]. Moreover, despite the establishment of study selection criteria, authors may tend to incorporate studies that support their view, leading to selection bias [3, 4, 16–19]. There is also potential for bias in identification of studies because they are often selected by investigators familiar with field who have individual opinions [16]. Language bias may exist when literature searches failure to include foreign studies, because significant results are more likely to be published in English [4, 16]. Studies with significant findings tend to be cited and published more frequently, and those with negative or non-significant findings are less likely to be published, resulting in possible citation bias and publication bias [4, 16]. Since studies with significant results are more likely to be indexed in the literature database, database bias is another concern [4, 16]. Studies which have not been published in traditional journals, like a dissertation or a chapter, are referred to as “fugitive” literature and are less likely to be identified through the traditional database search. Finally, multiple publication bias can occur if several publications are generated from a multi-center trial or a large trial reporting on a variety of outcomes. If the same set of patients is included twice in the meta-analysis, the treatment effect can be overestimated [16]. These potent bias factors can affect the conclusions and must be considered during interpretation of the results. To combat these sources of bias, several tools are available. First, a sensitivity analysis can help examine for bias by exploring the robustness of the findings under different assumptions [16]. Exclusion of studies based on specified criteria (e.g. low quality, small sample size, or studies stopped early due to an interim analysis) should not significantly change the overall effect if the results of the meta-analysis are not significantly influenced by these studies. Second, the degree of study heterogeneity is another major limitation and the random-effects model should be used when appropriate [2]. A third approach to measure potential bias is the funnel plot, which is a scatter plot illustrating each study’s effect with reference to their sample size. The underlying principle is that as the sample size of individual studies increases, the precision of the overall estimate or effect difference improves. This is shown graphically as smaller studies would distribute widely while the spread of large studies should be narrow. The plot should show a symmetrical inverted funnel if there is minimal or no bias, as demonstrated in Fig. 15.2. By the same logic, the plot would be asymmetrical and skewed when bias exists. One standardized method of assessing and reporting the potential for bias is the Cochrane Risk of Bias [20]. RCTs included in a meta-analysis are evaluated for seven potential biases: random sequence generation, allocation assignment, blinding

190

M.A. Gibbons

Fig. 15.2 Illustration of a funnel plot for a hypothetical meta-analysis comparing hernia recurrence incidence following Procedure X versus Procedure Y. The y-axis reflects individual study weights as the log of the effect estimate, SE (log OR). The x-axis represents the odds ratio (OR) for each study. The symmetry of the plot distribution suggests absence of publication bias

Fig. 15.3 Cochrane risk of bias for hypothetical bariatric surgery randomized controlled trials. Two trials are depicted with ratings of high, indeterminate, or low for each of the seven bias categories

of participants and personnel, blinding of outcome assessment, incomplete outcome data, selective reporting and other bias. Each item is scored on a scale of high, indeterminate, or low bias. Reporting this potential for bias helps the reader assess the overall level of bias for the selected studies of interest (Fig. 15.3).

15 Systematic Review and Meta-analysis: A Clinical Exercise

191

Fig. 15.4 Cumulative meta-analysis detailing the evolution of the pooled effect estimate with addition of subsequent available trial data

Other areas of criticism involve the interpretation of meta-analysis results. One potential problem can occur when a meta-analyst neglects to consider important covariates, which could lead to misinterpretation of the results [18, 19]. For example, in a study involving cerebrospinal fluid drainage (CSFD) in thoracic and thoracoabdominal aortic surgical repair, the expertise of the surgical team varies among the included studies and could play a critical factor in the outcomes of interest – prevention of paraplegia [21]. Some argue that the inherent degree of study heterogeneity does not permit the pooling of data to produce a valid conclusion [16, 17]. Also, the strength and precision of a meta-analysis is in question when the results contradicts a large, well-performed RCT [16, 17]. As such, results of any individual study or trial may be overlooked in place of the pooled results. However, it is arguable that findings falling outside the group mean are likely a product of chance and may not reflect the true effect difference, which provides the rationale for formally pooling similar studies. Even if a real difference exists in an individual trial, the results of the group will likely be the best overall estimate (also known as Stein’s Paradox) [22]. Lastly, caution should be exercised when employing subgroup analysis to make decisions on individual patients. Meta-analysis approximates the overall effect of a treatment in a wide range of subjects and thus subgroup analysis is susceptible to bias. Figure 15.1 shows two subgroup analyses of single versus double ring wound protector, suggesting better effect of the double ring protector at reducing surgical site infections. However, a cumulative meta-analysis (Fig. 15.4) suggests that other changes that have occurred overtime may also play a role [14].

192

M.A. Gibbons

• • • •

First the design must ask a good clinical question Statistics are mechanical but must be based in clinical knowledge Ideally constructed of only RCTs with little heterogeneity Evidence tables will allow the reader to judge the appropriateness of combining the studies • Combining outcomes with different length of follow-up must be justified • Risk of bias for RCTs should be assessed and reported

Fig. 15.5 Pearls for conducting meta-analyses

Clinicians should consider the risks and co-morbidities of the studied population in comparison with their own patients to help decide whether the findings are clinically applicable. The recent review article of meta-analyses within general surgery by Dixon found many inadequacies in the quality of these studies [23]. Overall, the majority of the meta-analyses had major methodological flaws – median score of 3.3 on a scale from 1 to 7. Areas of weakness included errors in validity assessment, selection bias of patient populations, poor reporting of search strategies, and improper pooling of data. They found that meta-analyses of poorer quality tended to report a greater effect difference than the higher quality ones. These results emphasize the importance of performing meta-analysis using rigorous and high quality methodology. Specific general suggestions to follow when conducting metaanalyses are outlined in Fig. 15.5.

15.4 Conclusions Like primary research, meta-analysis involves a step-wise approach to arrive at statistically justifiable conclusions. Identifying the appropriate clinical question is critical to the success of the meta-analysis. It has the potential to provide an accurate appraisal of the literature along with quantitative summation and ultimately can help resolve clinical controversies. Meta-analysis overcomes the subjective problem of narrative reviews and provides a more transparent appraisal of the data. It also provides the quantitative analysis lacking in a systematic review. However, there are limitations and biases in meta-analysis methodology that must be acknowledged and minimized. The ability of a treatment to affect an individual patient cannot be predicted, and the decision to use an intervention must rely on the discretion of the clinician. The number of meta-analyses published in recent years has increased substantially [3, 24]. It is imperative that surgeons not only understand the strengths

15 Systematic Review and Meta-analysis: A Clinical Exercise

193

and weaknesses of this methodology, but also have the ability to critically judge the findings since the results of a meta-analysis have the potential to influence clinical practice.

References 1. Glass GV. Primary, secondary, and meta-analysis of research. Educ Res. 1976;5:3–8. 2. The Cochrane Collaboration Web site. Available at: http://www.cochrane.org. Accessed 11 Nov 2013. 3. Kelly GA. Meta-analysis: an introduction. Available at: http://www.pitt.edu/~super1/lecture/ lec3221/index.htm. Accessed 11 Nov 2013. 4. Wolf FM. Introduction to systematic reviews and meta-analysis. Available at: http://depts. washington.edu/k30/Meta-analysis/Meta-analysis%20clinical%20research%200603_files/ frame.htm. Accessed 11 Nov 2013. 5. Egger M, Smith GD. Potentials and promise. BMJ. 1997;315:1371–4. 6. Sauerland S, Lefering R, Neugebauer E. Laparoscopic versus open surgery for suspected appendicitis. Cochrane Database Syst Rev. 2004;(4):CD001546. 7. Early Breast Cancer Trialists’ Collaborative Group. Effects of radiotherapy and surgery in early breast cancer – an overview of the randomized trials. N Engl J Med. 1995;333:1444–55. [Erratum, N Engl J Med 1996;334:1003.] 8. Egger M, Smith GD, Phillips AN. Meta-analysis: principles and procedures. BMJ. 1997;315:1533–7. 9. Meade MO. Selecting and appraising studies for a systematic review. Ann Int Med. 1997;127:531–7. 10. Counsell C. Formulating questions and locating primary studies for inclusion in systematic reviews. Ann Int Med. 1997;127:380–7. 11. Moher D, Pham B, Jones A, et al. Does quality of reports of randomised trials affect estimates of intervention efficacy reported in meta-analyses? Lancet. 1998;352(9128):609–13. 12. Buchwald H, Avidor Y, Braunwald E, et al. Bariatric surgery: a systematic review and metaanalysis. JAMA. 2004;292(14):1724–37. 13. http://tech.cochrane.org/Revman. Accessed 21 Apr 2014. 14. Edwards JP, Ho AL, Tee MC, Dixon E, Ball CG. Wound protectors reduce surgical site infection: a meta-analysis of randomized controlled trials. Ann Surg. 2012;256(1):53–9. 15. Antoniou GA, Chalmers N, Kanesalingham K, Antoniou SA, Schiro A, Serracino-Inglott F, Smyth JV, Murray D. Meta-analysis of outcomes of endovascular treatment of infrapopliteal occlusive disease with drug-eluting stents. J Endovasc Ther. 2013;20(2):131–44. 16. Egger M, Smith GD. Bias in location and selection of studies. BMJ. 1998;316:61–6. 17. LeLorier J, Grégoire G, Benhaddad A, Lapierre J, Derderian F. Discrepancies between metaanalyses and subsequent large randomized, controlled trials. N Engl J Med. 1997;337:536–42. 18. Bailar III JC. The practice of meta-analysis. J Clin Epidemiol. 1995;48:149–57. 19. Bailar JC. The promise and problems of meta-analysis. N Engl J Med. 1997;337:559. 20. Higgins JPT, Altman DG, Sterne JAC. Cochrane handbook for systematic reviews of interventions version 5.1.0. In: Higgins JPT, Green S, editors. Oxford: The Cochrane Collaboration; 2011. 21. Cina CS, Abouzahr L, Arena GO, Lagana A, Devereaux PJ, Farrokhyar F. Cerebrospinal fluid drainage to prevent paraplegia during thoracic and thoracoabdominal aortic aneurysm surgery: a systematic review and meta-analysis. J Vasc Surg. 2004;40(1):36–44. 22. Effron B, Morris G. Stein’s paradox in statistics. Sci Am. 1977;236:119–227.

194

M.A. Gibbons

23. Dixon E, Hameed M, Sutherland F, Cook DJ, Doig C. Evaluating meta-analyses in the general surgical literature: a critical appraisal. Ann Surg. 2005;241(3):450–9. 24. Davey Smith G, Egger M. Meta-analysis. Unresolved issues and future developments. BMJ. 1998;316(7126):221–5.

Landmark References • Bailar JC. The promise and problems of meta-analysis. N Engl J Med. 1997;337:559. • Counsell C. Formulating questions and locating primary studies for inclusion in systematic reviews. Ann Int Med. 1997;127:380–7. • Egger M, Smith GD. Bias in location and selection of studies. BMJ. 1998;316:61–6. • Egger M, Smith GD, Phillips AN. Meta-analysis: principles and procedures. BMJ. 1997;315:1533–7. • Meade MO. Selecting and appraising studies for a systematic review. Ann Int Med. 1997;127:531–7.

Chapter 16

Medical Decision-Making Research in Surgery Clara N. Lee and Carrie C. Lubitz

Abstract As surgeons, we make highly challenging decisions every day about whether or not to operate on a patient, and which operation to do. Some of these decisions are challenging because they involve tradeoffs among high-stake risks and benefits, such as risk of stroke or restoration of bowel continuity. Other decisions are challenging because they involve deeply personal issues for the patient, such as continence or breast appearance. The ability of surgeons and patients to make these decisions has important implications for health outcomes, including quality of life and health care resource utilization. The science of evaluating, facilitating, and intervening on medical decisions is a relatively young field, which has evolved from other disciplines, including psychology, economics, health behavior, and engineering. Decision sciences encompass a broad range of research – from investigating the process of patient decision-making, to the development of patient decision-aids and informing provider decision-making with simulation disease modeling. This chapter describes the theory behind this growing field as well as applications in surgical research. It first describes decision-making from the patient perspective and then explains how decision-making by a provider or payer can be informed by decision analysis and comparative-effectiveness research. Keywords Decision-science • Decision-analysis • Comparative-effectiveness • Modeling • Shared decision making

C.N. Lee, M.D., M.P.P., F.A.C.S. () University of North Carolina, CB 7195, Chapel Hill, NC 27599-7195, USA e-mail: [email protected] C.C. Lubitz, M.D., M.P.H. Department of Surgery, Massachusetts General Hospital, Harvard Medical School, 55 Fruit Street, Yawkey 7B, Boston, MA 02114-3117, USA e-mail: [email protected] J.B. Dimick and C.C. Greenberg (eds.), Success in Academic Surgery: Health Services Research, Success in Academic Surgery, DOI 10.1007/978-1-4471-4718-3__16, © Springer-Verlag London 2014

195

196

C.N. Lee and C.C. Lubitz

16.1 Clinical Versus Preference Sensitive Decisions The approach to studying decisions depends on the nature of the decision and the perspective of the decision-maker. Many decisions have clear clinical indications based on a reasonable amount of evidence. For example, the indications for appendectomy are based almost exclusively on medical criteria and not on patient preferences, because very little variability exists among patients’ preferences about dying from appendicitis. In other cases, however, the decision about whether or not to operate depends on patient preference, because patients differ in how they feel about the potential outcomes. For example, some patients would be willing to lose their entire breast in order to reduce the risk of recurrence from breast cancer, while other patients would feel that the risk difference was not worth losing their breast. Another example in which patient preference and approach to risk are essential factors, is the difficult choice among surveillance, radiation, and surgery for early stage prostate cancer. These latter examples are “preference sensitive” decisions.

16.2 Evaluating Preference Sensitive Decisions One can use process measures or outcome measures to evaluate surgical decisions. Considerable debate has taken place over which is more appropriate. The right approach for your research study will depend on your aims.

16.2.1 Process Measures Measures of the decision making process are usually designed to elicit elements of communication between a patient and provider during a consultation. For example, OPTIONS is an instrument that is used during direct observation of patient-provider communication [1]. The evaluator completes the scale while observing an actual visit or, alternatively, a video or audio recording of a visit. Other process measures rely on patient or provider report of what took place, rather than direct observation. Measures that use patient or provider report are easier to implement than direct observation, but the results are less reliable secondary to recall bias. The Control Preference Scale asks the patient how much involvement she/he prefers to have in the decision making process [2].

16.2.2 Outcome Measures Most measures of decisional outcomes are patient-reported outcome measures that evaluate a patient state during or after a decision. The Decision Conflict Scale consists of 16 questions about perceived uncertainty, factors affecting uncertainty, and perceived effectiveness in decision making [3]. The Decision Regret Scale is

16 Medical Decision-Making Research in Surgery

197

Fig. 16.1 Example of questions from the Decision Quality Instrument © for breast cancer surgery

a five-item scale that measures regret after a decision [4]. The Satisfaction with Decisions Scale is a six-item scale that measures patient satisfaction with the decision itself, as opposed to satisfaction with care or with outcomes of care [5]. Another approach to measuring the outcome of decisions is to consider the choice itself. This approach is particularly appropriate for studies that seek to understand how decision-making affects utilization and practice variations. Measuring the choice itself has been less common than measuring patient states but may increase as more studies of surgical practice variations are conducted. A recent approach to measuring decisional outcomes is the development of decision quality measures. Many definitions of decision quality exist, but a recent international consensus process among experts in decision science and health care quality concluded with the following definition: the degree to which the decision is informed and consistent with patient preferences [6]. Based on this definition, decision quality instruments for various surgical decisions have been developed (Fig. 16.1). Each measure includes a knowledge scale and a scale to measure patient preferences or values. Each scale is specific to the clinical decision in question, such as breast cancer surgery or hip replacement.

198

C.N. Lee and C.C. Lubitz

16.3 Interventions to Improve Decisions 16.3.1 Why Intervene? Reports of large geographic practice variations in surgery have raised questions about the quality of decisions about those procedures. For example, the rate of breast reconstruction varies fivefold across the country, raising the question of how patients and their surgeons decide who gets breast reconstruction. For some conditions, patients have reported high levels of regret about surgery, and for others, patients have reported wishing they had known more about their options. In an effort to reduce unwarranted practice variations and improve decision making, patient decision aids have been developed.

16.3.2 Decision Aids A patient decision aid is defined by three primary components – provision of information, clarification of patient values, and preparation of the patient for interaction with the provider. The most common decision aid format is video, but a decision aid can consist of a piece of paper, a booklet, poster board, or website (Fig. 16.2). Decision aids are generally intended to be used by the patient prior to the provider visit. They can be used at home or in the health care setting just before the visit. They are intended to serve as an adjunct and not a replacement to patientprovider communication.

Fig. 16.2 Video and paper decision aids from the Informed Medical Decisions Foundation

16 Medical Decision-Making Research in Surgery

199

Over 80 randomized controlled trials of decision aids have been conducted, including many trials of decision aids for surgical decisions. Decision aid trials have found that decision aids were associated with higher patient knowledge, reduced decisional conflict, and greater satisfaction with decisions. In the subset of trials that measured treatment choice as a primary outcome, decision aids were associated with fewer invasive treatments and less surgery. Most studies of decision aids have been efficacy trials and not effectiveness studies, in which decision aids would be evaluated in clinical practice or the “real world” setting. A recent population-based study of decision aid implementation in a large HMO found lower rates of hip and knee replacement surgery [7]. Recent state-level policies encouraging the use of decision aids may facilitate opportunities for studying decision aid effectiveness.

16.4 Criteria to Evaluate Decision Aids The International Patient Decision Aid Standards (IPDAS) Collaboration has developed and published standards for evaluating the quality of decision aids. These include criteria for: the development process, how probabilities are presented, the use of patient testimonials, how decision aids are disseminated on the internet, and addressing health literacy. The IPDAS criteria have generally met wide acceptance in the medical decision-making research community and should be considered in any study to evaluate the quality of decision aids.

16.5 Opportunities for Surgeon Scientists Surgeons have unique opportunities to contribute to medical decision-making research. Unlike many medical decisions, most surgical decisions take place at discrete times that are readily identifiable. For example, if you wanted to study satisfaction with decisions in patients undergoing surgery for bladder cancer, it would be feasible to identify dates of surgical consultations and dates of surgery, which could be opportunities for objective measurement or intervention. Similarly, the treatment options and potential outcomes for surgical decisions tend to be discrete, lending them amenable to study. Few surgical decisions have been well-studied, leaving major opportunities for junior investigators to make a contribution and develop their research expertise. The ideal decision for a young surgeon to investigate is one for which some evidence exists, but for which clinicians disagree about ideal management or patients vary in preference for the procedure. For example, the decision about sphincter preservation in rectal cancer surgery draws on a growing body of evidence about efficacy. Surgeons differ in their judgment, however, about who is a candidate for sphincter preservation, and patients differ in how they feel about living with an ostomy. Many medical decision-making researchers who have methodological

200

C.N. Lee and C.C. Lubitz

expertise are actively seeking opportunities to collaborate with surgeons who have front-line experience, clinical insight, and access to patients. Surgeon investigators who develop their own skills in medical decision-making research methods could build their academic career by creating a unique niche in surgical decision-making research.

16.6 What Is Decision-Analytic Modeling? Decision science is used throughout many disciplines and is focused on assigning value and degree of uncertainty about a given choice. Based on defined assumptions, the best available evidence, and specified outcomes, you can identify the “optimal” strategy using decision analysis which is essentially computer-simulated decisionmaking or “modeling”. A key strength of decision analysis is the ability to apply logic to complex decisions, making it ideally suited for health care decisionmaking. In addition to assessing traditional medical endpoints, such as mortality or recurrence, decision analytic models can also factor in additional outcomes like cost, patient preference, quality of life, and quality of care. For instance, you can assess whether integrating a decision aid into clinical care affects clinical or quality of life outcomes. Modeling allows a provider to identify and quantify the trade-offs incurred with a specific intervention. There are a number of types of analyses utilized depending on the perspective (i.e. patient, health-care provider, or payer) and the value (i.e. quality of life, survival, cost). One useful method is cost-effectiveness analysis (CEA). CEA is a form of comparative effectiveness research that assigns costs or resources to each competing strategy. As shown in Fig. 16.3, strategies that have a comparatively low cost and better outcome (bottom right) are preferred. It differs from a costbenefit analysis in that the benefits or health consequences are not strictly monetary. Computer modeling or simulation is used to perform these, often complex, analyses.

Fig. 16.3 Cost-effectiveness map, showing trade-offs between cost (y-axis) and effectiveness (x-axis)

16 Medical Decision-Making Research in Surgery

201

16.7 Why Use Decision Analysis in Surgical Research? 16.7.1 Challenges in Surgical Research The body of surgical literature has been criticized for being of poor quality – both in data and study design. In fact, few “surgical” publications are considered level I evidence [8]. Many studies are underpowered, uncontrolled, and biased. This is due in part to the fact that (1) surgical diseases are less common and have heterogeneous populations, (2) there is variability in surgeon technique and institutional practices – making generalizability challenging, and (3) there is a general lack of formal epidemiologic and statistical training among surgeons. Historically, a surgeon’s “effort” was primarily clinical and surgical research was primarily focused on basic science. Until recently, surgeons were not supported in pursuing advanced training in statistics and epidemiology – vital to the development of a quality research program – and do not have the time or funding to run large-scaled trials.

16.7.2 Advantages of Decision Analytic Modeling Simulation disease modeling provides an alternative approach when data on traditional endpoints or “evidenced-based” data are lacking. Additionally, modeling provides structure and logic to complex decisions. In contrast to a clinical trial, wherein only one aspect of treatment is tested, with all others controlled by definition, a model is broad, allowing comparisons of multiple competing objectives. One can perform a comprehensive synthesis of the best available evidence (Chap. 15) to identify the preferred approach and areas in need of further research. Some of the advantages of a model are the flexibility, expandability, and efficiency of the method. Threshold analyses evaluate key input parameters with lacking or conflicting data. For example, you can assess a wide-range in costs of a drug to assess if the optimal strategy changes with adjusting a key variable – to test a “what-if” scenario. Given the small and heterogeneous patient populations as well as lack of controlled trial data, assimilation, synthesis, and rigorous testing of the best available evidence and being able to quantify the uncertainty in the outcome are essential. Not only is modeling a comprehensive and feasible “next best” alternative to a clinical trial, one can also simulate “no treatment” options where it would be unethical to do so in reality. For instance, a natural history model can be developed to simulate a patient’s course without treatment to quantify the effect of treatment (and potential over-treatment).

202

C.N. Lee and C.C. Lubitz

16.8 The Fundamentals of a Decision Analytic Model The basic framework of a decision analysis includes the following steps: (1) identify the key question you are trying to answer including a no-intervention alternative, (2) create a temporal and logical framework (i.e. how a patient would chronologically proceed through treatment), (3) synthesize the best available data to enter into the model, (4) perform the “base-case” or your best estimate of the inputs, and, (5) rigorously test your conclusions with sensitivity analyses (e.g. What would the best strategy be if the recurrence rate was really 10 %?) [9]. Depending on the complexity and nature of the problem, the perspective, and the available knowledge, different types of models can be used. In an attempt to standardize CEA, the Panel on Cost-Effectiveness in Medicine made a number of recommendations including the application of health-related quality of life (HRQoL) measures (i.e. QALY – quality adjusted life-years). General guidelines from the Panel recommend that the base-case be from societal perspective, that the cohort is broad with an appropriate time-line, that each intervention be compared to status quo, that patient-reported preference-based HRQoL with an interval scale (i.e. utility) be used to adjust life-years [10]. Lastly, that strength of the results (i.e. the uncertainty) can be assessed with sensitivity analyses.

16.9 How Do I Make a Model? A decision model, not to be confused with the popularized regression model used in statistical analyses, can be built using various computer programs/languages. Commonly used programs include TreeAge (Williamstown, MA: TreeAge Software, Inc.) and Visual CCC® (Microsoft® software). Models vary from very simple to very complex. One can simulate a single decision with a fixed time frame using a decision tree (Fig. 16.4) or a life-time course of recurring events using statetransition “Markov process” modeling which can be evaluated as a cohort of “Monte

Fig. 16.4 Basic structure of a decision-tree, illustrating decision, chance, and terminal nodes

16 Medical Decision-Making Research in Surgery

203

Fig. 16.5 Basic structure of a recursive Markov process

Carlo” microsimulation (Fig. 16.5). In the case of a decision tree, you can model the probability of events at chance nodes – or the likelihood of heading down one branch of the tree; while in the recursive Markov model, you enter the probabilities of transitioning between mutually exclusive “states” (i.e. alive, sick, dead). A model can simulate observable events in a “shallow” model or simulate the underlying biological processes in a “deep” model (i.e. a natural history model) [11]. By modeling unobservable events and “no treat” cohorts, it is possible to estimate the degree of key factors such as lead-time bias and over-diagnosis. Modeling consortia have developed both for collaboration as well as assessing the validity and generalizability of complex disease simulations. One such group is Collaborators at Cancer Intervention and Surveillance Modeling Network (CISNET), a group of National Cancer Institute-sponsored investigators, who have combined empiric and biologic models in multi-scale models of cancer [12]. The mission of CISNET is to improve an understanding of cancer screening and treatment and their effects on tangible outcomes. Work through this modeling group and others has help shape practice guidelines and direction of research priorities. On a practical level, modeling is an iterative process that requires rigorous and detailed analysis of current evidence. It is a powerful tool for the analysis of complex medical problems leading to objective decision-making and hypothesis generation.

16.10 Conclusion The decision sciences encompass a broad range of disciplines and content areas. For surgical decisions that are preference-sensitive, many opportunities exist to evaluate the quality of decisions and to develop and test interventions to improve decisions. Decision analysis and cost effectiveness analysis are powerful tools that determine optimal approaches to complex surgical decisions, even when evidence is uncertain. They can also be used to identify where new evidence is most needed. Medical decision-making research offers many opportunities for surgeons who are beginning their research careers to develop an area of expertise and make an impact.

204

C.N. Lee and C.C. Lubitz

References 1. Elwyn G, Hutchings H, Edwards A, et al. The OPTION scale: measuring the extent that clinicians involve patients in decision-making tasks. Health Expect. 2005;8:34–42. 2. Degner LF, Sloan JA, Venkatesh P. The control preferences scale. Can J Nurs Res. 1997;29: 21–43. 3. O’Connor AM. Validation of a decisional conflict scale. Med Decis Making. 1995;15:25–30. 4. Brehaut J, O’Connor A, Wood T, et al. Validation of a decision regret scale. Med Decis Making. 2003;23:281–92. 5. Holmes-Rovner M, Kroll J, Scmitt N, et al. Patient satisfaction with health care decisions: the satisfaction with decision scale. Med Decis Making. 1996;16:58–64. 6. Sepucha KR, Fowler FJ Jr, Mulley AG Jr. Policy support for patient-centered care: the need for measurable improvements in decision quality. Health Aff (Project Hope) 2004;Suppl Web Exclusive:VAR54-62. 7. Arterburn D, Wellman R, Westbrook E, et al. Introducing decision aids at group health was linked to sharply lower hip and knee surgery rates and costs. Health Aff (Project Hope). 2012;31:2094–104. 8. Wente MN, Seiler CM, Uhl W, Buchler MW. Perspectives of evidence-based surgery. Dig Surg. 2003;20:263–9. 9. Hunink M, Glasziou P, Siegel J, Weeks J, Pliskin J, Elstein A, Weinstein M. Decision making in health and medicine. New York: Cambridge University Press; 2009. 10. Weinstein MC, Siegel JE, Gold MR, Kamlet MS, Russell LB. Recommendations of the panel on cost-effectiveness in health and medicine. JAMA. 1996;276:1253–8. 11. Knudsen AB, McMahon PM, Gazelle GS. Use of modeling to evaluate the cost-effectiveness of cancer screening programs. J Clin Oncol. 2007;25:203–8. 12. Cancer Intervention and Surveillance Modeling Network. http://www.cisnet.cancer.gov/ (2013). Accessed 25 Jan 2013.

Landmark Papers • Braddock CH, Edwards KA, Hasenberg NM, Laidley T, Levinson W. Informed decision making in outpatient practice: time to get back to basics. JAMA. 1999;282:2313–20. • Charles C, Gafni A, Whelan T. Decision-making in the patient-physician encounter: revisiting the shared treatment decision-making model. Soc Sci Med. 1999;49:651–61. • Elwyn G, O’Connor A, Stacey D, et al. Developing a quality criteria framework for patient decision aids: online international Delphi consensus process. BMJ. 2006;333:417. • Hunink M, Glasziou P, Siegel J, Weeks J, Pliskin J, Elstein A, Weinstein M. Decision making in health and medicine. New York: Cambridge University Press; 2009. • Sepucha KR, Fowler FJ Jr, Mulley AG Jr. Policy support for patient-centered care: the need for measurable improvements in decision quality. Health Aff (Project Hope) 2004;Suppl Web Exclusive:VAR54-62. • Weinstein MC, Siegel JE, Gold MR, Kamlet MS, Russell LB. Recommendations of the panel on cost-effectiveness in health and medicine. JAMA. 1996;276:1253–8.

Chapter 17

Survey Research Karen J. Brasel

Abstract Surveys, and survey research, have become ubiquitous; as such, the value of survey research in the eyes of many has diminished. However, there are certain things that are best studied via survey, including beliefs and attitudes. Importantly, quality of life must be ascertained by survey. This chapter highlights the elements of high-quality survey research, focusing on instrument development, mode of administration, response burden, response rate, nonresponse bias, and reporting survey results. Specific strategies for developing good questions and increasing response rates are outlined. Specific analytic techniques related to nonrandom sampling frames, statistical packages, and Likert-type questions are reviewed. Keywords Nonresponse bias • Sampling frame • Pilot testing • Question development • Response burden • Response rate

17.1 Introduction It seems to happen on an almost daily basis—an email invitation to respond to a survey. Evaluations of Grand Rounds, departmental events, CME activities, etc : : : Due to the plethora of survey requests, and the ubiquitous nature of those that are poorly performed, survey research has become somewhat trivialized. Some of this is due to the use of surveys for market research, continuing medical education activities, and solicitation of opinions; for many of these surveys, scientific rigor is neither desired nor required. However, there are certain research questions that are best answered by survey, and it is possible to perform high-quality survey research.

K.J. Brasel, M.D., M.P.H. () Division of Trauma/Critical Care, Medical College of Wisconsin, Milwaukee, WI 53226, USA e-mail: [email protected] J.B. Dimick and C.C. Greenberg (eds.), Success in Academic Surgery: Health Services Research, Success in Academic Surgery, DOI 10.1007/978-1-4471-4718-3__17, © Springer-Verlag London 2014

205

206

K.J. Brasel

As with any other type of scientific inquiry, poorly done survey research will lead to bad science, which translates into potentially misleading, if not dangerous, information and implications [16].

17.2 When to Use a Survey Surveys can easily answer questions about knowledge, and best answer questions about attitudes and beliefs. They may also be useful to gather information on behaviors and practices, although this is dependent on the type of behavior or practice. For sensitive subjects, a survey may be more likely to reveal truth. For other subjects, data sources such as medical records may provide more reliable information.

17.3 Developing Questions The first and most important step in survey research is deciding what your research question is. This will allow you to determine whether you can use a previously validated survey or whether you must develop one on your own. Whenever possible, it is preferable to use a previously validated instrument without modification. This ensures that the questionnaire is reliable, valid, and responsive to change. Modifying a validated instrument reduces both its power and validity [1]. If you are unable to use a validated survey to answer your question, plan appropriately so you can spend time developing your survey instrument. This most important step is the aspect of survey research that is most often neglected. The quality of survey data will only be as good as the questions asked on the survey. This is true regardless of the ultimate mode of survey administration, and the techniques described below work for all modes. Often, both qualitative and quantitative techniques must be employed to create the optimal research tool. Clearly, the investigator and the research team have an idea of what questions they would like to include on a particular survey. Employing qualitative techniques using focus groups provides additional information about the topic area being investigated, and helps examine the assumptions brought by the research team during initial drafting of questions. Focus groups also help with how specific terms and vocabulary are understood by the population to be surveyed [4]. As an example, in developing a post-injury quality of life survey, you might ask a group of trauma patients “We are interested in the factors that have affected your quality of life since your injury. Can you tell us some of the things that have affected your quality of life both positively and negatively?” You would also want to ensure that each member of the group has an understanding of the term or concept quality of life. Each question that you include must be critical. Avoid including questions “just because you’re interested” or “because it would be nice to know”. All questions

17 Survey Research

207

should be clear and without bias. Avoid questions with two possible answers and leading questions that have a socially desirable answer. Normalizing statements, such as “It can be difficult to : : : ” prior to asking about a sensitive item increase the likelihood of an honest answer. Questions that require a closed response are the easiest to analyze, so if at all possible try to frame the question to require a closed response [4, 8, 11]. Acquiescence is the tendency to endorse any assertion made in a question, regardless of content. Using declarative statements lessens this effect, although when level of agreement is what needs to be measured it is impossible to avoid this effect completely. The magnitude of the acquiescence effect is approximately 10 %; approximately 52 % of respondents agreed with an assertion, while 42 % of respondents disagreed with its opposite [11]. There are two approaches to minimizing the effect of acquiescence—the first is to ask all questions in one direction (usually positive), and the second is to use two questions, one asked positively and one negatively, to check on the magnitude of the effect in a specific survey. Missing data tends to be more prevalent at the end of surveys, likely related to response burden. Therefore, ask the most important questions at the beginning, leaving the demographic questions to the end. Questions should be grouped thematically, as respondents fatigue if they must keep returning to a particular topic. In addition, questions should progress either from the general to the specific (funneling) or from the specific to the general (reverse funneling). If appropriate to the population, have your survey tested for language level and target the lowest likely educational level in your sample. Pilot testing your survey will provide valuable information that can help you improve the quality of data in addition to your response rate. There are two options for initial pilot testing, with and without observation. Observing a small number of people answering your survey will give you information about which questions take respondents longest to answer, suggesting that they may be awkwardly or ambiguously worded. You are also able to debrief this small group about areas of concern and get their ideas for improvement. Field testing a pilot group without observation relies on their willingness to answer each question as well as provide written feedback for each question. Both methods should ask this pilot group to give feedback on the aesthetics of the survey, the ease with which it was completed, and their interest in completing the entire questionnaire. The completed questionnaire should then be tested with a final pilot group, which allows calculation of the psychometric properties of the questionnaire. The pilot groups should be similar to your intended population, but should not be included in your final sample.

17.4 Population For very small populations, it is desirable to obtain information from the entire population. However, for most surveys this is both impractical and expensive and therefore a sample of the population must be surveyed. The sampling frame is

208

K.J. Brasel

a list of the entire population. When qualitative surveys, such as focus groups, are performed, non-random sampling may be appropriate. However, for all other surveys, random sampling provides the most generalizable information. This can be accomplished using a random number generator or using a more systematic approach, selecting those to be surveyed from the sampling frame starting from a random point on the list at equal intervals (every fifth person on the list, for instance) [10]. Obtaining the sampling frame can be quite problematic for surveying health professionals, as many organizations have specific policies about handling of membership lists that specifically prohibit contact for survey research. Some are specific for email contact, allowing mail-based survey research to proceed. Many lists contain out of date information, and much effort must be spent cleaning a “dirty” or inaccurate list. Although data from a simple random sample is optimal, it is often both extremely difficult and expensive to collect. From a practical standpoint, probability samples are often used to obtain data more cost-efficiently using complex sample designs. Probability samples divide the sampling frame into strata, and often to clusters within strata from which the sample is subsequently drawn. Complex sample designs are likely to result in unequal probability of selection for individual units of analysis, lack of independence of individual units within randomly sample clusters, and variable effect on estimates of precision. As a result, this approach which simplifies survey administration and data collection on the front end requires more complex statistical analysis [22]. The size of the sample necessary for a particular survey depends on the statistical analysis that will be performed, highlighting the importance of a well-thought analytic plan. The sample size can be calculated using a number of different computer packages or by consulting a statistician. Estimated non-response rates must be included in the sample size calculations.

17.5 Method of Administration 17.5.1 Face to Face Face to face surveys are important for qualitative research, and are the primary method used for conducting focus groups. For quantitative surveys, complex questions can be asked and visual aids can be used. Response rates are generally higher than other methods, as many people find it harder to refuse a face-to-face request. Disadvantages include inefficiency, the need for training interviewers, and prohibitive costs for a sample of any reasonable size [8, 9].

17 Survey Research

209

17.5.2 Phone Phone surveys allow a two-way interaction between interviewer and respondent similar to face-to-face surveys. This allows the use of complex questions, and also allows the interviewer to probe for explanatory answers. The refusal rate is higher than with face-to-face administration, but the costs are much less. In general, refusal rate is lower than with mail or internet surveys. Generating a sampling frame may be problematic, as telephone numbers may be more difficult to obtain than mail or email addresses. Random digit dialing surveys may or may not include cell phone numbers, limiting the representativeness of the population sampled [8, 9].

17.5.3 Mail Mail surveys are similar to internet surveys in that they are self-administered, without interaction between interviewer and respondent. A basic mail survey includes the survey tool and a return envelope—this is likely to achieve a response rate around twenty percent. The Dillman approach, or tailored design method described below outlines the optimal process for mailed surveys in order to achieve the highest response rate, which should be greater than 60 %. Mailed surveys are the mode of choice when surveying physicians in order to balance response rates and efficiency. They also allow the inclusion of an incentive, one of the factors with the greatest impact on response rate [8, 9].

17.5.4 Internet There are several advantages to email or internet survey administration. It is the least costly method of survey administration. In general, Internet surveys produce the highest quality data for the least amount of effort because the data can be logic-checked during survey administration [8]. Costs of an internet-based survey are approximately 20 % of a similar survey administered by mail. In addition, multimedia can be incorporated to enhance interest and engagement [10]. Incentives can be used, although they require additional effort to be expended to track and deliver. However, there are also several disadvantages to this approach. In order to get an appropriate sampling frame, correct email addresses must be obtained for all potential respondents. Particularly for physicians, response rates to internet surveys are significantly lower than response rates to mail-only surveys [10].

210

K.J. Brasel

17.5.5 Mixed-Mode Mixed mode surveys, which primarily include both mail and internet options for response, have some promise in terms of response rates. This is particularly true when electronic options are followed by standard mail. However, it is not yet clear whether the representativeness of the respondents in a mixed-mode approach differs by mode of response [12].

17.6 Response Burden Response burden is related to response rate, with increased perceived burden related to diminished response rate. Burden is most directly related to time required to complete the survey, with several factors contributing to time. Most easily measured is length, with shorter surveys having a decreased response burden. Other factors contributing to response burden include number of pages or internet screens, poorly worded questions, difficult questions, internet screens that are difficult to navigate and other technical difficulties accessing or responding to the survey [10].

17.7 Response Rates The response rate is simply the number of completed surveys received divided by the number of surveys sent to eligible respondents. The American Association for Public Opinion Research suggests that a complete survey contain responses to a minimum of 80 % of the questions [18]. An alternative definition requires 80 % of the questions of interest, allowing for nonresponse to demographic questions. Ineligible respondents include those with a wrong mail or email address and those surveys returned to sender. In order to estimate the true number of ineligible respondents, addresses should be checked on a sample of both respondents and non-respondents. Expressed mathematically, ARR D (R)/([R] C e[T–R–NE]), where ARR D adjusted response rate, R D eligible respondents, e D the proportion of non-respondents estimated to be ineligible, T D total number of surveys, and NE D ineligible respondents (including return to sender). Response rate is a critically important aspect of survey research, as low response rates can introduce significant bias. For instance, if 50 % of respondents respond in a particular way to a specific survey item, the true percentage is between 45 and 55 % if the survey has a response rate of 90 %, but ranges between 5 and 95 % if the response rate is only 10 %. Higher response rates also provide greater statistical power, reducing the chances of a type II error. Finally, higher response rates allow for greater generalizability to the population the respondents represent [3].

17 Survey Research

211

Given the critical importance of response rates, there are a variety of strategies available to maximize return. The Dillman approach, or tailored design method (TDM), is considered standard for mail questionnaires regardless of the population being studied. The first element of the Dillman approach is to make the questionnaire respondent-friendly. This includes a survey that is easy to read, makes use of bolded sentences, indentation, clear and unambiguous questions, and is relatively short. Other elements involve the cover letter and number of contacts, all by first-class mail. The first contact may be a postcard or letter letting the potential respondent know of the upcoming survey, or it may be the initial survey with a personalized cover letter and return envelope with a real first-class stamp. The next contact is a reminder postcard 2 weeks later, and the last contact is a resending of the survey with a return envelope [6, 7]. Response rates for physicians are generally below response rates of the general public. Specific strategies to increase response rates for physician questionnaires include use of a phone contact or registered mail contact after the three first-class mail contacts and the use of an unconditional monetary incentive sent with the initial survey. These strategies can increase response rates up to 20 % beyond that achieved when using the TDM approach alone [10, 19]. The amount of monetary incentive resulting in optimal response rates is unclear, with some studies showing amount directly related to response rate and others unable to establish a direct relationship between incentive amount and response rate. Often even a small monetary incentive (5$) achieves significant improvements in response rates [20]. If a monetary incentive is not possible, a nonmonetary incentive such as a pen, lottery ticket, or laser pointer can be used, although the effect may not be quite as great [14]. CME credit does not appear to be a worthwhile incentive [21].

17.8 Nonresponse Bias A response rate of 60 % or greater is required by many journals in order for survey research to be considered for publication [13]. This figure was originally targeted in order to minimize nonresponse bias, and it is certainly true up to a point that greater response rates minimize nonresponse bias. However, recent research questions the strength of the relationship between nonresponse rate and nonresponse bias. This is because the entire population may be viewed as having both a propensity to respond or not respond based on specific characteristics of the survey, the mode of administration, and personal characteristics that may vary over time. Key to determining how close the link is between nonresponse rates and nonresponse bias is how strongly correlated the survey variable of interest is with the likelihood of responding. Some analyses suggest that efforts to improve response rates above a particular target may actually worsen the quality of the data, as non-responders who are converted or coerced into responding may provide inaccurate information. Effects of tools to improve response rates on nonresponse bias remain unclear [15].

212

K.J. Brasel

Nonetheless, calculation of nonresponse bias is essential and can be performed in a number of ways. The simplest way to do this is to compare respondents and nonrespondents by demographic information, and provide estimates of response and nonresponse for key subgroups within the target population. Similar demographic profiles and response rates support lack of nonresponse bias, although use the simplistic assumption that the subgroup variables are the only possible causes of propensity to respond [7]. “Wave response” compares early and late wave respondents (based on whether they respond to the initial, second, or third request) in terms of demographics, response rates, and responses. This analysis assumes a continuum of response, suggesting that late wave respondents are more similar to non-respondents than early wave respondents. Its weakness is that there is no direct information provided about the non-respondents [18]. If an external source exists with which to compare information, respondent survey data can be compared to this external source. Although this does not provide direct information about the non-respondents, it allows an estimation of whether nonresponse bias has influenced the results. The availability of external benchmarks for health-related survey research is fairly rare. Other methods to assess nonresponse bias include collecting auxiliary variables on respondents and non-respondents to guide attempts to balance response rates, and the use of post-survey adjustments to test the sensitivity of the responses obtained. Whenever possible, multiple approaches to assess nonresponse bias should be used [7].

17.9 Likert Scales A Likert scale is most commonly used to measure agreement with a particular statement. Most commonly a 5-point scale is used. One common scale has two levels of agreement, a neutral option, and two levels of disagreement (strongly agree, agree, neutral, disagree, and strongly disagree). If you want to force either agreement or disagreement, the neutral option can be eliminated leaving a 4-point scale. Data reliability is not affected by offering a neutral or no opinion option, and there is some evidence that this option is most often chosen to avoid the cognitive work to generate an optimal answer. Often the levels of agreement and disagreement are collapsed in the analysis phase, leaving a functional 2- or 3-point scale. Another common scale begins at one extreme, increasing at approximately equal intervals to the opposite extreme (poor, fair, good, very good, excellent). Most questions that use a Likert scale are actually Likert-type questions, which are single items that use the Likert response alternatives. A Likert scale is a series of four or more Likert-type items combined into a single score during data analysis. The importance of this difference is that single Likert-type items are ordinal scale observations. These observations are ranked in that each successive response is greater than the one before, but how much greater is not specified and there is no

17 Survey Research

213

assumption of equal distance between one observation and another. Ordinal refers to the position in a list. Likert scale items, in contrast, are interval scale items which have a similar relative distance between points on a scale which does not have an absolute zero. Ordinal and interval scales are analyzed differently from each other as well as differently from nominal data (named categories without any position ranking or relative distance) [2, 5, 22].

17.10 Analysis Survey data are often presented using descriptive statistics, measures of central tendency, estimates of parameters, and procedures to estimate relationships between multiple variables. Before using any parametric statistical test, look again at the population surveyed and the sampling frame. If the survey population was a complex sample design rather than a random sample, nonparametric statistics and other statistical approaches must be used. Complex sample designs require the use of sampling weights, in order to reduce the potential sources of bias introduced by the use of probability rather than true random samples. Sampling weight is usually included as a weight variable in addition to the stratum and cluster variables used in generating the probability sample from the sampling frame [22]. Independent of sampling weight, ordinal Likert-type data should be presented using median or mode for central tendency, frequencies to describe variability, and Kendall’s tau to analyze associations. Chi-square analysis may also be appropriate. Means can be used for interval Likert scale data, with standard deviation used for variability, and Pearson’s r to describe associations. Other analyses that may be appropriate include ANOVA, t-test, and regression [2]. There are several software packages that are available to analyze survey data. These include SAS, SPSS using the separately purchased complex samples add-on module, Stata, and SUDAAN. Stata and SUDAAN offer the greatest flexibility and variety of options for analysis [22].

17.11 Reporting Results The key points in reporting survey research begin with explaining the purpose of the research and explicitly identifying the research question. Unfortunately, reporting of survey data in the medical field is extremely inconsistent, compromising both the transparency and reproducibility of the results [17]. As with most research, the methods section is extremely important and will be the basis on which your readers will determine whether they are able to generalize your results. The research tool or questionnaire must be described. If an existing tool is used without modification this section can be brief; if a new tool was used a detailed section on how the tool was developed and tested is important. Description of the sample includes how the

214

K.J. Brasel

potential subjects were identified, how were they contacted and how many times they were contacted, how many agreed to participate, how did the non-responders differ, and what was the response rate. The analytic plan is followed by the results.

17.12 Conclusions Obtaining high-quality data, particularly information on knowledge, beliefs, and attitudes, is possible via survey methodology. The survey must use a well-designed and thoroughly tested instrument, a representative sample from an appropriate sampling frame, and minimize nonresponse bias. Choice of survey method balances costs, effort, and response rates. Analytic methods must account for nonrandom sampling and nonresponse.

References 1. Alderman AK, Salem B. Survey research. Plast Reconstr Surg. 2010;126:1381–9. 2. Boone HN, Boone DA. Analyzing Likert data. J Ext. 2012;50:2T0T2. 3. Draugalis JR, Plaza CM. Best practices for survey research reports revisited: implications of target population, probability sampling, and response rate. Am Pharm Educ. 2009;73:1–3. 4. Fowler FJ. Improving survey questions: design and evaluation. Thousand Oaks: Sage; 1995. 5. Gob R, McCollin C, Ramalhoto MF. Ordinal methodology in the analysis of Likert Scales. Qual Quant. 2007;41:601–26. 6. Gore-Felton C, Koopman C, Bridges E, et al. An example of maximizing survey return rates: methodological issues for health professionals. Eval Health Prof. 2002;25:152–68. 7. Groves RM. Nonresponse rates and nonresponse bias in household surveys. Public Opin Q. 2006;70:646–75. 8. Jones TL, Baxter MAJ, Khanduja V. A quick guide to survey research. Ann R Coll Surg Engl. 2013;95:5–7. 9. Kelley K, Clark B, Brown V, Sitzia J. Good practice in the conduct and reporting of survey research. Int J Qual Health Care. 2003;15:261–6. 10. Klabunde CN, Willis GB, McLeod CC, et al. Improving the quality of surveys of physicians and medical groups: a research agenda. Eval Health Prof. 2012;35:477–506. 11. Krosnik JA. Survey research. Annu Rev Psychol. 1999;50:537–67. 12. Kroth PJ, McPherson L, Leverence R, et al. Combining web-based and mail surveys improves response rates: a PBRN study from PRIME Net. Ann Fam Med. 2009;7:245–8. 13. Livingston EH, Wislar JS. Minimum response rates for survey research. Arch Surg. 2012;147:110. 14. Olsen F, Abelsen B, Olsen JA. Improving response rate and quality of survey data with a scratch lottery ticket incentive. BMC Med Res Methodol. 2012;12:52–62. 15. Olson K. Survey participation, nonresponse bias, measurement error bias, and total bias. Public Opin Q. 2006;70:737–58. 16. Scholle SH, Pincus HA. Survey research: think : : : think again. Acad Psychiatry. 2003;27: 114–6. 17. Story DA, Gin V, na Ranong V, et al. Inconsistent survey reporting in anesthesia journals. Anesth Analg. 2011;113:591–5.

17 Survey Research

215

18. The American Association for Public Opinion Research. Standard definitions: final dispositions of case codes and outcome rates for surveys. 7th ed. Ann Arbor: AAPOR; 2011. 19. Thorpe C, Ryan B, McLean SL, et al. How to obtain excellent response rates when surveying physicians. Fam Pract. 2009;26:65–8. 20. Ulrich CM, Danis M, Koziol D, et al. Does it pay to pay? Nurs Res. 2005;54:178–83. 21. Viera AJ, Edwards T. Does an offer for a free on-line continuing medical education (CME) activity increase physician survey response rate? A randomized trial. BMC Res Notes. 2012;5:129. 22. West BT. Statistical and methodological issues in the analysis of complex sample survey data: practical guidance for trauma researchers. J Trauma Stress. 2008;21:440–7.

Chapter 18

Qualitative Research Methods Margaret L. Schwarze

Abstract This is a short introduction to the field of qualitative investigation. There are many methodologies and methods that support rigorous qualitative research, however, there are many controversies about the “right way” to do qualitative research and how qualitative research should be defined and judged. In this space, it is impossible to do more than provide a general overview of study design, instruments for qualitative data collection and introductory guidance for analytic processes. Many of the references cited provide excellent examples of rigorous qualitative work in the medical literature and will expose the reader to multiple options for future study design and execution. Keywords Theoretical sampling • Triangulation • Theoretical saturation • Resonance • Reflexivity • Constant comparison • Member checking

“Not everything that can be counted counts. Not everything that counts can be counted.” William Bruce Cameron, “Informal Sociology: A Casual Introduction to Sociological Thinking”

Imagine you have 10 blind men who know nothing about elephants. You place them in a circle around an elephant and ask them to briefly examine it with their hands. If you use quantitative analysis to synthesize their sampling you might conclude that, in general, the elephant has rough, hard skin and short, spiky hair. This conclusion about the elephant would be accurate but also, incomplete. If instead, you asked two blind men to examine the elephant’s trunk for an extended period of time, after a few hours they might be able to tell you that the elephant has a long and unusual appendage. This appendage can pick things up off the ground, M.L. Schwarze () Department of Surgery, University of Wisconsin School of Medicine and Public Health, G5/315 CSC, 600 Highland Avenue, Madison, WI 53792-3236, USA e-mail: [email protected] J.B. Dimick and C.C. Greenberg (eds.), Success in Academic Surgery: Health Services Research, Success in Academic Surgery, DOI 10.1007/978-1-4471-4718-3__18, © Springer-Verlag London 2014

217

218

M.L. Schwarze

blow water out the end and explore the world around it. Qualitative analysis of this new data would also lead to an accurate description of the elephant, one that would resonate with an outside observer who has actually seen an elephant. However, the observation would not be generalizable, i.e. you could not conclude that each man sampling the elephant would observe a trunk, nor would it be accurate to say that the elephant is covered with trunks. Nonetheless, qualitative analysis tells us something that is quintessentially important for description of an elephant. In this chapter we introduce the technique of qualitative analysis. For health services researchers, qualitative methods provide an essential adjunct for many quantitative endeavors and have robust power as a stand-alone methodology provided the study design and execution are performed with rigor.

18.1 When to Use Qualitative Analysis While quantitative analysis may provide a mile-high or bird’s eye view of the population being studied, qualitative analysis starts from the ground and moves upward. Qualitative analysis is ideally suited for examining processes or interactions between people within a specific context, for example doctors, nurses, and technicians in an operating room [1] and can be particularly helpful in identifying subtle and critical distinctions that are not appreciable using quantitative analysis. Qualitative analysis is useful for describing social constructs for example the (now outdated) taboo against disclosure of medical errors to patients [2], and can be instrumental for outcomes research given its power to identify latent or non-obvious processes or issues at high or low performing institutions [3]. Furthermore, qualitative analysis is a critical tool for policy creation and evaluation as it enables investigators to examine perspectives and interactions among different stakeholders [4]. Qualitative analysis can be hypothesis generating. It is a good method to start with when your question is “What is going on here?” as it allows the investigator to be open to theories or constructs that arise from the data as opposed to using the data to test a preexisting theory or hypothesis. The flexibility of the methodology helps the researcher avoid the problem of seeing only what he is looking for [5]. Finally, many investigators will use qualitative methods as a first step for survey design, both to identify important questions to ask respondents and to insure the internal validity of survey questions through the use of cognitive interviews [6, 7].

18.2 Formulating a Research Question Developing a concise, important and feasible research question is a challenge for all investigations and is particularly important for a qualitative study. To start, the investigator must acknowledge his theoretical assumptions and use these

18 Qualitative Research Methods

219

assumptions to focus the boundaries around the case to be sampled. In order to study a population or phenomenon in depth, the sample size for a qualitative study is, by necessity, typically small. As such the investigator is confined to study only some actors, in some contexts dealing with some decisions [8]. To define these boundaries, the investigator posits his theoretical assumptions outright in order to determine the case to be studied, aspects in which variability is desired and dimensions where homogeneity is important. In a quantitative investigation, it is critical that the sample is representative of the population studied in order to generalize the results. As such, the sampling mechanism for a diverse population should ensure inclusion of a range of ages, socioeconomic status, racial and cultural backgrounds (if this diversity is present in the population). For a qualitative investigation, the goal is not to achieve generalizability but rather to capture the phenomenon as it exists at a certain point in time for a particular group (further investigation may or may not demonstrate variability between groups). As such, the investigator needs to explicitly state his theoretical assumptions up front. For example he might state either that he does not believe age, race, or socioeconomic status will have an appreciable effect on the results (and give supporting evidence) or state that these effects are unknown but, due to study constraints are not the focus of the investigation at this time. In essence, the investigator is not ignoring a specific segment of the population, but is explicitly acknowledging the choices made as well as the theoretical assumptions behind these choices in order to develop an appropriate study design to answer a specific and discrete research question. Although boundaries are described from the start, because much of the research strategy and sampling methods (often called theoretical sampling) are grounded in the data, these boundaries need to be flexible or emerging [8]. This has led to the pejorative characterization of qualitative research as “make it up as you go along research” because this strategy is distinctly different than those used for quantitative investigations. However, the iterative process involved allows the investigator to shift the sampling frame in order to follow and expand important findings as they emerge from the data. Some examples of questions that are ideally suited for qualitative investigation include, “What processes are used to decrease mortality from gastric bypass in high performing centers?”, “What are the drivers of robotic surgery?” and “How do policies to improve access to general surgery in underserved populations impact different stakeholder groups?”

18.3 Sampling Strategy Once the research question is determined, the next step is to select a sampling strategy that reflects the theoretical assumptions and enables the desired analysis. Typically this is called purposeful sampling where the selected respondents or observation units (hospital wards, operating rooms, texts [9], critical documents

220

M.L. Schwarze

[10]) are specifically chosen to reflect the case that you desire to study. The sample is usually small so that it can be studied in depth and typically, the investigator does not set a predetermined sample size. Instead, respondents are sampled until the investigation reaches theoretical saturation, a point in the analysis where the themes and trends encountered occur with a degree of regularity. This can pose problems with the IRB, grant solicitation and for study staff. To avoid this, investigators often generate an estimate of their sample size but should offer a large enough range to allow sampling beyond initial estimates if needed. In addition, researchers more familiar with quantitative methods may bristle when, in a qualitative study, it is necessary for the analysis to proceed before all of the data has been collected. This step is required to determine whether additional respondents are needed, to allow for interrogation of unanticipated results emerging in the data and to determine whether theoretical saturation has been achieved. Because the sampling strategy is theory driven, it is important to state clearly the reasoning behind the selection of the sampling method used as the rigor of the study will be judged on whether there is alignment of the data collection process with the study purpose. There is a large number of sampling strategies, well described in chapter 2 of “Qualitative Analysis” by Miles and Huberman [8]. Some examples include maximum variation sampling where respondents are selected to include high variability in order to identify common patterns, contrasting case sampling where respondents or units are analyzed against each other in order to demonstrate differences [3] and snowball sampling where respondents of a unique or distinct trait are used to identify subsequent respondents for in depth investigation of an atypical point of view or phenomenon [11].

18.4 Structured vs. Unstructured Data Collection In addition to selecting a sampling strategy, the investigator will need to select an approach for data collection. The approach can range from a highly structured instrument using open-ended interviewing to a completely unstructured method, for example participant observation.

18.4.1 Focus Groups One example of a highly structured approach is the use of focus groups. Focus groups, like market research, are ideally suited to obtain feedback on actual practices or proposed interventions [12]. Respondents are chosen to meet specific characteristics, frequently homogeneous on some levels and heterogeneous on others, and are studied in a group to capture important interactions between respondents.

18 Qualitative Research Methods

221

The groups are typically small enough for all participants to become engaged (range 4–12) and investigators will typically use more than one focus group per investigation. The focus group is often formally moderated and carefully scripted with predetermined questions. A helpful reference for focus group design is “Focus Groups: A Practical Guide for Applied Research” by Richard Kruger and Mary Anne Casey (2009).

18.4.2 Open Ended Interviews Open-ended interviewing is a less structured approach that still retains a large degree of structure. Although the interviewing process should be iterative, the process typically starts with a pre-determined interview guide. The investigator designs open ended questions with care to avoid questions for which a yes or no answer would be possible. Rather than asking the respondent, “Can you tell me why you are having surgery?” (A question that can easily be answered with a “no”), a better example of an open-ended question is, “Tell me the story about how you decided to have surgery.” Instead of providing an “interview script” the pre-designed questions serve as a starting point and suggested probes are supplied in order to direct the respondent to the salient issues. Analysis should be ongoing with data collection so that during analysis, researchers can provide feedback to the interviewer to ensure that he or she will flush out important themes or concepts in subsequent interviews. This iterative process allows for investigation of unexpected results and enables the investigator to explore concepts or themes in great depth. It is critical that the interviewer is well trained in this type of questioning and is intimately familiar with the research question and relevant background information. It is frequently suggested that the principal investigator perform these interviews because his background and understanding will have a significant impact on the direction of the interview. This may not always be possible given time constraints. At times, this also may not be desirable if respondents are familiar with the investigator or his background and are likely to provide socially desirable answers. Not surprisingly, evolving interview questions are problematic for the institutional review board. Rather than providing the IRB with a script that will be read verbatim, it can be more effective to present the IRB with a list of question domains and sample questions and follow up probes for each domain. This will help to avoid returning to the IRB after each iteration for approval of a new line of questioning. A helpful tool to consider for open ended interviewing is the use of a vignette. Although it is challenging to design a clinical vignette that captures all of the complexities of clinical decision making, the presentation of a narrative account of a specific case can prompt a more instinctive and less abstract answer from the respondent.

222

M.L. Schwarze

18.4.3 Directed Observation For directed observation, the researcher is embedded in the study environment but directed to study specific elements or constructs within the environment. Explicit acknowledgement of the investigator’s theoretical assumptions upfront is critical for this type of investigation. Typically the researcher will perform prolonged observation of processes or events but enter the field with a list of predetermined elements to focus the observation. This has the advantage of facilitating data accrual rapidly with the cost of missing an important issue or construct because it was not recognized a priori.

18.4.4 Ethnography The least structured method for qualitative analysis is prolonged observation of participants in the field. This enables the investigator to share the daily environment of the study subjects including social interactions, language and habitual activities which allows for a rich description of processes and constructs. This is often described as ethnography (note: ethnography also refers to a qualitative methodology, not a method, so this can be confusing for those with more qualitative experience) and is a method commonly used by anthropologists and sociologists. The method is extremely time-intensive as the investigator is literally inserted into the daily routine of the population he is studying for a prolonged period of time, often on the order of months to years. The researcher is able to observe actions and counter-actions rather than simply eliciting the respondent’s perspective about what he or she might do in a specific situation. A well-known ethnography about surgeons is “Forgive and Remember” by Charles Bosk [13]. For this research, Bosk, a sociologist, embedded himself on the surgical service at the University of Chicago Hospital for 18 months. The resulting text is a powerful description of the customs and practices that govern surgical care and are determined by surgical training. The rituals, culture and normative behavior described will resonate with anyone who has experienced surgical residency. For a shorter reference, another well-done example of ethnography from the medical literature is Joan Cassel’s “Surgeons, intensivists, and the covenant of care: Administrative models and values affecting care at the end of life” [14].

18.5 Analysis After defining the study population and study design, the next step is to analyze the data as it is collected. This process is particularly noxious to those familiar with quantitative methods but it is critical for a robust qualitative study as the

18 Qualitative Research Methods

223

analysis is used to feedback into data collection to ensure that important themes and trends are examined in depth. For most qualitative investigations this next step requires coding the data. To this end, the investigator (or a team of investigators) will examine transcribed notes, transcripts of audio tapes, audio tapes themselves, video tapes or other media and code snippets of the data as events, processes, ideas, or concepts appear. The coding can either proceed deductively or inductively. For deductive coding, a specific theory is used to analyze the data. For example, the investigator might use the “theory of clinical inertia” [15] to analyze why surgeons fail to refer patients to high volume centers for pancreaticoduodenectomy or the “Input-transformation-output” model of healthcare professional performance [16] to analyze the structures and processes that enhance safety in the operating room. Alternatively, the coding might proceed inductively, which helps to anchor the empirical structure of the study and is particularly useful when non-obvious or latent issues are suspected or not previously well described. To do this, the investigator will use a technique called constant comparison where each new code is iteratively tested against previous uses of the code to ensure that the use of the code is consistent across the data set. Although inductive purity is frequently difficult to achieve, inductive analysis has the distinct advantage of allowing the investigator to discover new theories or constructs. Ultimately, whether the coding proceeds inductively or deductively the process is used to develop a coding taxonomy that is used to code subsequent data as it is collected. This taxonomy is flexible and allows for inclusion of new codes as they emerge from the data, refinement of existing codes based on the in-process data analysis, and termination of a code if it cannot represent the phenomenon as it exists in the data. The ability to reformulate or refine the concept or data unit increases the accuracy of the coding scheme based on empiric data which has an advantage over forcing the data into predefined and rigid categories. Coding can be performed by a single investigator but may be enhanced through the inclusion of multiple investigators, particularly if members of the coding group have different backgrounds. The use of multiple coders has two distinct advantages. First, it provides an immediate availability of different perspectives to derive meaning from the data. Second, the process of sorting through varied interpretations of the data can reveal assumptions based on each coder’s background and allow the group to attend to biases throughout the analysis. Investigators will use different strategies to manage the variability that may result from multiple coders. Some investigators will retain the diversity of coding as it may provide a critical signal for analysis of the data. Others will work to come to group consensus about the specific code, as the consensus building can be a gateway to higher level analysis. Finally, some authors will have coders code all the data independently and report interrater reliability. Although some strategies might be more productive than others, it is important for the investigator to both chose the strategy that will enable him to answer the specific research question and clearly describe and justify his reasoning for this choice. There is typically a second level of data analysis, sometimes referred to as higher level analysis or axial coding (that complements the first step of open coding). This is the process of making sense of large volumes of data, drawing

224

M.L. Schwarze

connections between concepts and processes and refining or developing theories and hypotheses. In contrast to quantitative analysis where the goal is to condense and reduce the data for presentation, the goal of qualitative analysis is to expand the data and develop ideas [17]. This process can be particularly challenging as the data is frequently voluminous and unwieldy and the techniques for analysis are not standardized and have been described in myriad ways by leaders in the field. Simple diagrams mapping interactions and relationships can be useful. Miles and Huberman demonstrate multiple methods for higher level analysis. For example, a context chart or matrix can be helpful tools to ensure maximal fit and faithful data representation [8]. Given that the data produced in a qualitative investigation is unstructured and typically massive it can be daunting to manage. There are several commercially available programs designed to catalogue qualitative data that can assist the investigator with organization. Programs such as NVIVO (QSR International–Melbourne, Melbourne, Australia) have the capacity to maintain many different types of primary data from simple word documents to video and enable the user to record and arrange the coded data. The computer programs do not actually perform the analysis, but they can certainly retain the data and associated codes in a manner that allows for future retrieval and higher level analysis.

18.6 Ensuring Rigor The standards for qualitative research are less familiar to most investigators and readers of the surgical literature. Although some may bristle at a direct comparison of the standards for quantitative methods because the standards are not perfectly analogous, this structure may prove a useful introduction for judging qualitative research. Rigorous quantitative research has internal validity, external validity, reliability and generalizability. In turn, though not perfectly in parallel, qualitative research should be judged by its credibility, dependability, confirmability and transferability [18].

18.6.1 Credibility Credibility refers to the internal consistency of the research wherein the prolonged exposure of the investigator to study subjects allows for a thick and rich description that attends to culture, context and setting. Credibility can be enhanced by reflexivity which is the process of stating one’s positions and biases upfront. Another technique common in qualitative research is called member checking where, after analysis, the investigator returns to the study subjects with the results to see if the analysis rings familiar with study participants [19].

18 Qualitative Research Methods

225

18.6.2 Dependability Dependability is enhanced by a clear and in-depth description of the processes and design choices used throughout the study. Many qualitative researchers will refer to an “audit trail” that enables the reader to fully understand the investigator’s steps and assess the validity of the conclusions based on the choices made and the procedures presented. This includes a clear statement about the investigator’s theoretical assumptions, a robust description of the iterative processes that influenced data collection and a detailed explanation of the coding process and higher level analysis.

18.6.3 Confirmability Confirmability speaks to bias and perspective with respect to the investigator. Although such biases exist in quantitative analysis (for example, the goal is typically to confirm the hypothesis tested through rejection of the null) these biases are not often explicitly presented to the audience. In contrast, qualitative researchers explicitly state their biases upfront and the steps taken to manage these biases. One commonly used technique is triangulation. This refers to incorporation of multiple perspectives in order to describe the studied phenomenon or resultant theory as objectively as possible. To triangulate, the investigator may use multiple different frames of reference to gather data on the study population. For example, Bradley and colleagues interviewed physicians, quality managers and administrators from hospitals with both high and low use of beta-blockers post-myocardial infarction [3]. This design enabled the investigators to provide a rich description of the hospital culture that determined practice, a result that might have been missed or inaccurate had they interviewed only physicians. Another form of triangulation is to construct a study team for analysis that represents multiple perspectives. Although no one person can be expected to represent an entire group, a mix of professional and personal identities can enable the investigator to use multiple perspectives to interpret the data.

18.6.4 Transferability Where the goal of quantitative research is to make statements that are generalizable, generalizability is not typically within the power of qualitative research. Instead, qualitative researchers aim for resonance. Although an assessment of resonance is left to the reader, the goal is for the investigator to present enough information about the context, processes and participants for the reader to judge how the results may transfer in to other settings or domains [20]. While the goal is not to generate

226

M.L. Schwarze

universal statements about populations, the characterization of specific behaviors, rituals, and actions and the conditions under which these occur, can illuminate and enlighten many health care practices.

18.7 Summary Qualitative investigation is a powerful tool for health services researchers as it can illuminate processes, concepts, trends and constructs that are difficult to identify with quantitative methods. Although completion of a qualitative study is quite time consuming the results can have significant impact.

References 1. Hu YY, Arriaga AF, et al. Protecting patients from an unsafe system: the etiology and recovery of intraoperative deviations in care. Ann Surg. 2012;256(2):203–10. 2. Gallagher TH, Waterman AD, et al. Patients’ and physicians’ attitudes regarding the disclosure of medical errors. JAMA. 2003;289(8):1001–7. 3. Bradley EH, Holmboe ES, et al. A qualitative study of increasing beta-blocker use after myocardial infarction: why do some hospitals succeed? JAMA. 2001;285(20):2604–11. 4. Robinson JC, Casalino LP. Vertical integration and organizational networks in health care. Health Aff (Millwood). 1996;15(1):7–22. 5. Johnson RB, Onwuegbuzie AJ. Mixed methods research: a research paradigm whose time has come. Educ Res. 2004;33(7):14–26. 6. Lee CN, Hultman CS, et al. What are patients’ goals and concerns about breast reconstruction after mastectomy? Ann Plast Surg. 2010;64(5):567–9. 7. Schwarze ML, Bradley CT, et al. Surgical “buy-in”: the contractual relationship between surgeons and patients that influences decisions regarding life-supporting therapy. Crit Care Med. 2010;38(3):843–8. 8. Miles MB, Huberman AM. Early steps in analysis. In: Qualitative data analysis. Thousand Oaks: SAGE; 1994. p. 50–89. 9. Neuman MD, Bosk CL. What we talk about when we talk about risk: refining surgery’s hazards in medical thought. Milbank Q. 2012;90(1):135–59. 10. Steinman MA, Bero LA, et al. Narrative review: the promotion of gabapentin: an analysis of internal industry documents. Ann Intern Med. 2006;145(4):284–93. 11. Curlin FA, Dinner SN, et al. Of more than one mind: obstetrician-gynecologists’ approaches to morally controversial decisions in sexual and reproductive healthcare. J Clin Ethics. 2008;19(1):11–21; discussion 22–13. 12. Frosch DL, May SG, et al. Authoritarian physicians and patients’ fear of being labeled ‘difficult’ among key obstacles to shared decision making. Health Aff (Millwood). 2012;31(5):1030–8. 13. Bosk CL. Introduction. In: Forgive and remember. Chicago: The University of Chicago Press; 1979. 14. Cassell J, Buchman TG, et al. Surgeons, intensivists, and the covenant of care: administrative models and values affecting care at the end of life–updated. Crit Care Med. 2003;31(5):1551–7; discussion 1557–1559.

18 Qualitative Research Methods

227

15. O’Connor PJ, Sperl-Hillen J, Johnson PE, Rush WA, Blitz G. Clinical inertia and outpatient medical errors. In: Henriken K, Battles JB, Marks ES, Lewin DI, editors. Advances in patient safety: from research to implementation. Vol 2: Concepts and methodology, vol. 2. Rockville: Agency for Healthcare Research and Quality; 2005. p. 293–308. 16. Karsh BT, Holden RJ, et al. A human factors engineering paradigm for patient safety: designing to support the performance of the healthcare professional. Qual Saf Health Care. 2006;15 Suppl 1:i59–65. 17. Murphy E, Dingwall R, et al. Qualitative research methods in health technology assessment: a review of the literature. Health Technol Assess. 1998;2(16): iii–ix, 1–274. 18. Malterud K. Qualitative research: standards, challenges, and guidelines. Lancet. 2001;358(9280):483–8. 19. Marrow SL. Quality and trustworthiness in qualitative research in counseling psychology. J Couns Psychol. 2005;52(2):250–60. 20. Kuper A, Lingard L, et al. Critically appraising qualitative research. BMJ. 2008;337:a1035.

Landmark Papers • Bradley EH, Holmboe ES, et al. A qualitative study of increasing beta-blocker use after myocardial infarction: why do some hospitals succeed? JAMA. 2001;285(20):2604–11. • Cassell J, Buchman TG, et al. Surgeons, intensivists, and the covenant of care: administrative models and values affecting care at the end of life–updated. Crit Care Med. 2003;31(5):1551–7; discussion 1557–1559. • Gallagher TH, Waterman AD, et al. Patients’ and physicians’ attitudes regarding the disclosure of medical errors. JAMA. 2003;289(8):1001–7.

Part IV

Career Development

Chapter 19

Engaging Students in Surgical Outcomes Research Kyle H. Sheetz and Michael J. Englesbe

Abstract The introduction to the student research group brochure within our Department reads: I cannot imagine there is a better job than being an academic surgeon. Each day brings new challenges, opportunities to make an impact on patients and disease, and engagement with inspired colleagues. Admittedly, becoming an academic surgeon is not for everyone, but if you think it is for you, then we want to foster your enthusiasm and facilitate your success.

Inspiring the next generation of successful academic surgeons is a core mission of any successful Department of Surgery and academic surgeon. This includes training students, residents, and fellows. Special considerations are necessary when developing an outcomes research program that successfully involves students. The introductory statement above highlights four core domains essential to engaging students in a successful research development program. These include enthusiasm, approachability, effort, and productivity (Fig. 19.1). This chapter focuses on these four components and their importance to both the faculty member and the student. The faculty perspective has been written by a faculty member with a robust and productive student research group. The student perspective has been written by a 4th year medical student who has had exceptional experience learning and doing health services research. Keywords Medical student research • Medical student mentorship

K.H. Sheetz • M.J. Englesbe, M.D. () Department of Surgery, University of Michigan Medical School, 2926A Taubman Center, 1500 East Medical Center Drive, Ann Arbor, MI 48109-5331, USA e-mail: [email protected] J.B. Dimick and C.C. Greenberg (eds.), Success in Academic Surgery: Health Services Research, Success in Academic Surgery, DOI 10.1007/978-1-4471-4718-3__19, © Springer-Verlag London 2014

231

232

K.H. Sheetz and M.J. Englesbe

19.1 Enthusiasm and the Faculty Mentor Enthusiasm is the most important characteristic of a successful student mentor. Enthusiasm cannot be faked. The faculty member who engages medical students and undergraduates in mentorship must love academic surgery. Ideally, one shares equal enthusiasm for clinical surgery, research, and teaching. Though most students are sophisticated, they are often unable to filter through caustic humor and sardonic wit. The cynical surgical mentor is better suited to train higher level individuals (senior surgical residents and young faculty). Enthusiasm for a clinical research project is rooted in the potential clinical implications of the work. It is the job of the faculty mentor to make the students understand and care about the project. Explaining the goals of the project and how this could affect surgical care drives productivity. Highly motivated students are created by great leadership. The primary fuel for their efforts is personal enthusiasm for the project.

19.2 Enthusiasm and the Student Enthusiasm is crucial for students interested in surgical outcomes research. Students should make every effort to understand the motivations and clinical importance of research projects. This can be difficult, especially when clinical training is limited. Mentors understand this, and good ones will make every effort to convey the significance of the work. Students must meet them half-way by showing excitement, passion, and dedication to the project. In doing this, students develop a deeper understanding for the clinical problem being studied and how their efforts could impact patient care. Surgeons are busy people, and excited students resonate with faculty members more than anything during sporadic, brief meetings. However, there is a fine line between enthusiasm and aggressiveness. Students should respect the fact that surgeons juggle many obligations at the same time- most importantly patient care. That being said, it is the student’s responsibility to make sure that their project does not fail on account of busy schedules or lack of direction. At times it can feel like you are working for an unknown or unachievable goal. Enthusiastic students stay the course.

19.3 Approachability and the Faculty Mentor Students are scared of surgeons. This is for several reasons, many of which are primarily of historical significance. A good faculty mentor facilitates a comfortable and informal working relationship. Discussing life as a surgeon and the balance of clinical medicine with research and family are among the most important topics for dialogue early in the relationship.

19 Engaging Students in Surgical Outcomes Research

233

The more approachable a surgeon makes himself/herself; the more they will be approached by students. Knowing his/her students, facilitating their career development, and celebrating their successes must be a core commitment. Surgeons are busy and under increasing pressure to generate revenue. Mentoring students is done in a surgeon’s free time. Before a surgeon decides to take on a large number of students, he/she must be sure that they are willing to commit significant amounts of time. For the dedicated teacher, the benefits are worth the efforts.

19.4 Approachability and the Student Approachability is relevant to the student in two ways. When selecting a potential mentor, students should seek out those individuals with a reputation for mentoring medical students and undergraduates. This is frequently more junior level surgeons, as senior faculty primarily serve as mentors for residents, fellows, or other faculty. Students benefit from discussing potential mentors with other more senior medical students- focusing on how their experiences were productive and valuable. Above all, the most desirable attribute in a potential mentor is a track record of supporting medical students in research and career development. Students participating in outcomes research must also be approachable, flexible and adaptable. Mentors can have unpredictable schedules. Approachable students make it a priority to meet and engage faculty whenever possible. This requires some amount of sacrifice on the part of the student. However, it is important to recognize that your mentor sacrifices in a similar manner to provide a valuable learning experience for you. This mutual effort is what makes the mentor-mentee relationship thrive.

19.5 Effort by the Faculty Mentor The key to a successful outcomes research program that provides opportunities for undergraduate and medical students is effort by the faculty mentor. This goes beyond a simple willingness to meet with the students. The faculty member must pick suitable projects that set the student up to succeed. These projects must empower the student to control their own success and not rely upon the goodwill and talents of other individuals within the research group. Complex data abstraction from the clinical record for a single center retrospective study cannot be reliably done by an undergraduate student. Undergraduate students are well-suited for prospective, observational clinical research studies that interact with patients. This seems counterintuitive but with appropriate infrastructure, mature and reliable undergraduate students can enroll and engage patients better than a research coordinator. That being said, the data collection methods must be simple and well suited for the students to collect. Undergraduate students have an over 90 % enrollment

234

K.H. Sheetz and M.J. Englesbe

rate in one of our clinical research studies that involves frailty assessments on liver transplant patients. The full-time research coordinators have an enrollment rate less than 50 %. Patients love young, soft-spoken, and engaging students. With appropriate oversight by a professional research coordinator, developing a clinical study that facilitates undergraduate interaction with patients can be a productive and rewarding experience. Most importantly, patient interactions for undergraduate students can be transformative for their career development. Medical students should be given clinical research studies for which they can take ownership. Along these lines, they should be expected to complete the entire study and write an abstract and manuscript as the first author. Within this context, medical students are poorly suited for patient enrollment and prospective, observational or interventional clinical research studies. These studies usually take years to complete, and medical students do not have much time to devote to a long-term project. Large database studies with biostatistical support, chart abstraction, image analysis, or patient phone survey studies are all well-suited for medical student participation. This work is best done from the patient-level perspective as policy-level research often requires too broad of a scope for student research. Once again, the effort the mentor puts into designing a research study well-suited for a medical student is more important than the effort the medical student puts into the project. Efforts to secure funding for research projects are critical for a successful student research program. The attending surgeon cannot be expected to provide day to day mentorship and oversight to the student. A clinical research infrastructure is necessary to provide this support. This includes administrative and technical support. Further, a biostatistician who can effectively communicate with- and is willing to mentor students is a valuable resource. The best mentors for students are actually other students and trainees. Every senior medical student working on research with residents likely wants to be a surgical house officer. Similarly, every undergraduate wants to be a first-year medical student. A large team of trainees ranging from undergraduate students to junior faculty offer the ideal infrastructure for this type of mentorship (Fig. 19.2). This is similar to how patient care happens on surgical services: the medical student does not go directly to the attending surgeon; there is a clear chain of command within the house officer ranks. Re-creating such a hierarchy within a student research group improves efficiency, career development, and reduces the amount of time the faculty must devote to nuanced project details. Other specific attending-level efforts should include establishing a career development portfolio for students and trainees within their research group. This includes a catalog of previous successful student grants, applications, and CVs. It is invaluable for a first-year medical student to see the level of productivity that a successful fourth-year medical student has achieved. This establishes a standard for excellence early in their medical school career. The faculty mentor should specifically focus on teaching presentation and writing skills. Few students do these tasks well and modest effort can have a remarkable impact on the student’s success and confidence.

19 Engaging Students in Surgical Outcomes Research

235

Fig. 19.1 The four domains of a successful student research program Fig. 19.2 Student mentorship is collaborative and involves trainees from all levels of experience

19.6 Effort by the Student Students pursuing surgical outcomes research typically have little to no experience in the field. Good mentors recognize this and are able to structure projects that harness a blend of talents that are not teachable: intelligence and dedication. Within this context, it is important for students to develop their aptitude for research while

236

K.H. Sheetz and M.J. Englesbe

participating in research projects. This is analogous to on-the-job training and it can produce unique skill-sets that will be valuable for your entire career in academic surgery or medicine. To borrow a saying from one surgeon-mentor at our institution, “you should always be able to do the job of the person one level above you.” This is an excellent framework for students to guide their research skill development. As an undergraduate or pre-clinical student, you should be critically assessing what senior medical students contribute to projects and how this is different from your current role. Similarly, senior medical students should use house-officers as a metric for how their skills can improve. As a testament to the value of surgical outcomes research, these talents include research design, statistical analysis, scientific writing, and oral presentation. Medical students interested in devoting more time to surgical outcomes research have many options, including research fellowships and dual degree programs. These programs carry risks and benefits that should be considered by interested students. These programs prolong training and the time to independent practice. However, many proponents of fellowships/dual-degree programs cite the acquisition of research design and analytic skills as true assets to the aspiring academic surgeon. Fellowships are offered by various local and national organizations that regularly fund faculty research (e.g. NIH, Howard Hughes Foundation, etc.). Students interested in dual-degree programs most commonly seek a Master’s in Public Health (MPH), though degrees in business (MBA) and public policy (MPP) are becoming more popular. The NIH also has dedicated funding towards joint researchmaster’s degree programs for pre-doctoral trainees, which potentially combine the benefits of fellowship (dedicated research time) and degree programs (education in statistics and research design). Any student interested in these programs should set concrete goals and expectations with their mentor prior to applying. Planning ahead will allow for maximum academic development and productivity. Though controversial, most mentors will advocate against year out programs for the sole purpose of increasing competitiveness for residency. With the risk and benefits in mind, students ought to have clear goals that can only be met through a dual-degree or research fellowship program. Finally, advanced skills and experience do not substitute for true dedication. Students who wish to be academically productive and relevant should expect to work hard. This is particularly true when research is being conducted in addition to your coursework responsibilities. Nonetheless, surgical outcomes research is wellsuited for this and can inform and inspire career ambitions for the dedicated student.

19.7 Productivity and the Faculty Mentor Student projects have to result in academic productivity: abstracts, presentations, and manuscripts. Focusing on higher-level research skill and personal development through experience is an appropriate goal for the undergraduate researcher. Their job

19 Engaging Students in Surgical Outcomes Research

237

is to clarify their own desire to pursue a career in medicine or science. Conversely, medical students have to complete projects and show academic productivity. Spending a thousand hours working on a project that does not yield academic credit is unfair to the student and is a result of poor faculty mentorship. Establish an environment where the onus and the opportunity are centered on the student. Never push a student about completing the task; surgeons are too busy to do this. Creating such an environment is best done by commitment to rapid response and turnover for students. Letters of recommendation must be written quickly. Abstracts must be edited and returned within 24 h. The result section of a manuscript must be reviewed immediately and discussed with students on the research team in an effort to craft a narrative for a manuscript. The student should be given writing templates and a general outline of both the introduction and the discussion of the manuscript. No student research project should sit in the faculty inbox. Establish an environment where work gets done efficiently and requires continuous reiteration. Edits and revisions are where students learn how to do research. In summary, productivity starts from the top, never be a hurdle to the productivity of your mentees. Students must be in encouraged to send abstracts into a vast array of meetings. These include national meetings, but local and institutional research meetings are just as important. If a student wants to stay at a specific institution for their next level of training, presentations and exposure within that institution are more important than national presentations. Local and regional research meetings are a wonderful opportunity for students to develop presentation skills and increase evidence of academic productivity. Similarly, there are a broad range of award opportunities for students and they should always be encouraged to apply.

19.8 Productivity and the Student Being a productive student assumes that many of the mentor- and mentee-related attributes discussed above are met. A good mentor will set you up to succeed, but it is up to the student to follow through. Productivity can be measured in many ways. However, the volume abstracts, papers, and presentations is often what determines whether a student is “productive” or not. Productivity requires active participation by the student. For example, one should never rest on data that could result in an abstract or manuscript- just write it. Good mentors will shape your work, but he or she needs a starting point. It is up to the student to keep their own projects moving forward. As mentioned above, research meetings provide an excellent venue for exposure and improvement of presentation skills. Students should welcome the opportunity to present their work whenever possible. While often anxiety-provoking, oral presentation skills only improve with practice and repetition. Similarly, writing can be difficult, but will become easier and more fluid with experience. Students should also regularly apply for awards and honors. Regardless of the scale (institutional,

238

K.H. Sheetz and M.J. Englesbe

national, etc.), awards bolster one’s resume and can improve a student’s exposure within their chosen field. The most productive students have a tendency to seek-out success and never turn down opportunities, regardless of scope. Finally, students need to document their success. Update your curriculum vitae (CV) regularly. A good mentor, in addition to older students, can help design a CV that highlights a specific student’s strengths. A good CV is a critical first step in advertising one’s productivity to future research mentors and residency program directors.

19.9 Summary A successful clinical research program that includes robust opportunities for students requires specific considerations. The cornerstones of success of this program include enthusiasm, approachability, effort, and productivity by the faculty mentor as well as the student. The importance of tangible elements such as administrative and analytical support cannot be understated. However, the most successful student outcomes research programs thrive on the relationship between mentor and mentee. When all entities within a research program understand their specific roles, the strengths of each can be maximized.

Chapter 20

Finding a Mentor in Outcomes Research Omar Hyder and Timothy M. Pawlik

Abstract Mentors play a key part in the learning process through their guidance and nurturing, thereby ensuring that the clinician-in-training will transition into an excellent, independent thinker and practitioner. Mentoring plays an important role in outcomes research. Unlike going from medical school, to residency, and subsequently to fellowship – which tends to be a linear well-defined, career path, success in outcomes research can depend more on identifying a mentor who can provide the mentee with a roadmap for success. In this way, success in outcomes research may rely more on an apprenticeship model of learning built on interactions between mentee and mentor. The ideal outcomes research mentor is someone who has a well-established research program, “technical” expertise in research methodology, an excellent track record of publications, independent research funding, and is well recognized in the field. Obviously, finding such a “perfect” mentor is challenging, if not impossible. It is therefore important to identify several different mentors who can fulfill each of these roles. Each aspect of this “ideal” mentor can substantially contribute to the mentee’s experience and their ability to become a future successful, independent outcomes researcher. Success in outcomes research often depends on a robust apprenticeship with an experienced clinician-outcomes researcher who can guide a mentees career. Keywords Mentorship • Mentoring • Mentee • Career development • Leadership • Training

O. Hyder Department of Anesthesia, Massachusetts General Hospital, Boston, MA, USA T.M. Pawlik () Department of Surgery, Johns Hopkins Hospital, 600 N. Wolfe Street, Blalock 688, Baltimore, MD 21287, USA e-mail: [email protected] J.B. Dimick and C.C. Greenberg (eds.), Success in Academic Surgery: Health Services Research, Success in Academic Surgery, DOI 10.1007/978-1-4471-4718-3__20, © Springer-Verlag London 2014

239

240

O. Hyder and T.M. Pawlik

20.1 Introduction As clinicians, we undergo extensive training that enables us to transition from student to independent practitioner providing direct patient care. Mentors play a key part in this learning process through their guidance and nurturing, thereby ensuring that the clinician-in-training will transition into an excellent, independent thinker and practitioner. Mentoring plays a similar role in outcomes research. In fact, it may play even a greater role because, unlike going from medical school, to residency, and subsequently to fellowship – which tends to be a linear welldefined, career path, success in outcomes research can depend more on identifying a mentor who can provide the mentee with a roadmap for success. In this way, success in outcomes research may rely more on an apprenticeship model – i.e., learning built on interactions between mentee and mentor. Because everyone’s path of training in outcomes research is unique and customizable, a good mentor acts as the “cartographer,” helping chart with the mentee the academic “seas” that they plan to sail together to ensure that the mentee finds their way, learns from the mentor’s experience, and eventually begins to chart their own course. The term ‘mentor’ has varying definitions. Classically, the term derives from Homer’s Odyssey, in which “Mentor” educated Telemachus through his transformative years. Sometimes, a mentor can be an individual who does not bear primary responsibility for the mentee’s work and productivity. In this way, the mentor can provide unbiased advice on the mentee’s professional, and at times personal, life and goals. Other times, mentors may be supervisors, preceptors, and advisors – those individuals who we work with closely and provide advice and council on a regular basis. In its purest sense, however, a mentor is someone who will provide the support, knowledge, and expertise to cultivate your professional growth over a lifetime. Mentors can fulfill a variety of roles in professional development, and frequently a mentee should seek out multiple mentors. In fact, mentorship will often be most successful when adopting a hybrid approach, in which the mentee seeks different mentors to fulfill a variety of roles. For example, a mentee may have different mentors for clinical surgery, research, and career advice. Academic medicine and outcomes research tends to be a team sport. To this end, the best mentee-mentor relationships are partnerships (Table 20.1). The relationship between mentor and mentee is dynamic and should be continuously renewed and improved. The mentor and mentee should determine the parameters of the relationship, as well as discuss – and routinely re-examine – the structure and terms of the relationship (e.g. how often will they meet, what are the expectations, what projects can the mentee “take with them”, etc.). The mentor should provide the core competency, experience, and expertise to facilitate the success of the mentee. The mentor should provide a space for training, impart knowledge, as well as act as an advocate for the mentee in his or her career aspirations. In turn, the mentee needs to provide enthusiasm, motivation, and the drive to be creative, innovative, and “pave their own way.” It is not the role of the mentor to motivate the mentee. Rather, the mentee needs to build a partnership with the mentor that is

20 Finding a Mentor in Outcomes Research

241

Table 20.1 Mentor’s and mentee’s relative roles and responsibilities The mentor’s roles and responsibilities Traditional view Paternalistic Boss/authority Stern/strict In charge Protective “Raise” the mentee The mentee’s roles and responsibilities Traditional view Subservient/obedient Favorite son Passive “Made” by mentor Think alike Responds to power/orders

Newer view Empowering Friend/partner Inspiring Let’s go Protective “Develop” the mentee Newer view Responsible equal Mentoree Active Help “make” yourself Think “outside the box” Responds to motivation, self-motivated

Adapted from Souba [2] and Cothren et al. [1]. Used with permission

grounded in responsibility, dedication, and motivation to mature into an independent investigator. Most productive mentee-mentor relationships are defined by excellent communication, a unified drive to achieve excellence, and a shared enthusiasm about the specific scientific question. In this relationship, while the mentor provides guidance, experience, and training to assist the mentee in achieving an independent career, the mentee provides the mentor with an enthusiastic, intelligent, and hard working apprentice. In this model of teamwork, the hybrid mentor-preceptor relationship allows both the mentor and mentee to flourish.

20.2 What Makes a Good Outcomes Research Mentor? The ideal outcomes research mentor is someone who has a well-established research program, “technical” expertise in research methodology, an excellent track record of publications, independent research funding, and is well recognized in the field. Obviously, finding such a “perfect” mentor is challenging, if not impossible. As noted, it is therefore important to identify several different mentors who can fulfill each of these roles. Each aspect of this “ideal” mentor can substantially contribute to the mentee’s experience and their ability to become a future successful, independent outcomes researcher. One aspect of an ideal mentor is “excellence” and “competency.” The mentor is the expert, experienced, and trusted individual who can impart knowledge and wisdom to the mentee. As such, mentees should ideally look for mentors who have a well-established research program and a proven track record in mentoring.

242

O. Hyder and T.M. Pawlik

High

Lack of independence and selfconfidence

Low

Loss of bearings; Developmental failure

Extent to which the Mentee is Supported

Substantial growth: Self-actualization

Apprehension; Withdrawal

Extent to which the Mentee is “Stretched”

Fig. 20.1 The mentor-mentee relationship requires a balance between the mentor’s roles of supporting and “stretching” the mentee relative to its impact on mentee’s self-development (Used with permission: Souba [3])

Mentoring is a skill that is honed over time. Mentors who have successfully mentored other students or junior faculty in the past are more likely to be effective with new mentees. In addition, mentors who have an established research program often work with a wide group of other researchers and collaborators. By aligning yourself with an established researcher, the mentee is more likely to have access to these other individuals who in turn can participate in peer-mentoring or collaborative mentorship. These other types of mentorship can be very important. Some of the best advice and insight can often come from peer-to-peer mentorship or mentorship from a more junior faculty member who has a perspective/experience that is more similar to the trainee. While the mentee should seek out an experienced and senior mentor, a senior researcher may have more constraints on their time and may not always be accessible. As such, the mentee may be well served to seek out more “junior” or “up and coming” mentors who have more available time – not to mention enthusiasm and energy. Early career faculty members – assistant and associate professors – may have more time to spend with the mentee discussing career plans and may have better availability to help navigate problems faced by the mentee at the start of their outcomes research work. On the other hand, senior faculty members may have more experience in mentoring students, more connections within and outside the institution that can be leveraged to the mentee’s benefit. By identifying both senior and junior mentors, a mosaic of mentors can be used to optimally support the mentees burgeoning career. By embedding yourself in a research group with an experienced mentor, the mentee can transition to independence over time and gradually decrease reliance on the mentor’s input. In certain circumstances, the approach can be tailored to the mentee’s previous training and abilities (Fig. 20.1). For example, someone with previous training in economics may be able to start exploring the cost-effectiveness of a certain intervention much sooner than a different mentee who lacks that prior training. For most surgical residents without previous training in research methods, a gradual and gentle introduction to the process of outcomes research is appropriate. A clear direction and focus can also be an important consideration when seeking a

20 Finding a Mentor in Outcomes Research

243

mentor. As a future independent outcomes researcher, it is critical to have an area of focus and develop a core knowledge basis around a topic or specific body of work. Although it is desirable to use multiple methodological approaches to work around a central theme, there is a tendency for some outcomes researchers to indulge in a wide variety of topics and methodologies – “chasing” various ideas or projects. The mentee would be better served to avoid research mentors who lack a consistent arc or trajectory to their work. A “scattered” or “shot-gun” approach to outcomes research that involves indiscriminate analyses of large databases should be avoided. In most instances, such research is not answering an important question, lacks an underlying hypothesis, and does not lead to a systematic body of work that the mentee can use to build on for a future independent research career. Similar to how the resident in the operating room seeks mentorship from the “master” surgeon, the individual interested in outcomes research should seek a mentor who has methodological or “technical” expertise. Areas of expertise in outcomes research methodology may include the formulation of research questions, biostatistical or epidemiologic proficiency, or grantsmanship. In order to become an independent investigator the mentee needs to master a large number of concepts, techniques, and skills. Having a mentor who has skill in these different domains of outcomes research can be important. No resident would ever go into a basic science lab that is investigating the molecular genetics of cancer if the head of the lab had no or minimal knowledge in genetics! Similarly, when seeking a mentor in outcomes research, the mentee should seek a mentor that has a solid foundation in research methodology. Again, the mentee may need to find several mentors who have methodological expertise in different areas of outcomes research that may be of interest (e.g. comparative effectiveness, cost-analysis, survey instruments, decisionmaking, etc.). Whatever the area, the mentee will want to identify a mentor that has technical expertise in a certain methodological area so that the mentee can learn the appropriate application of this skill set to their research area. When the mentee has a problem or issue, having an experienced mentor can help avoid spending large amounts of time on “road blocks” or methodological “conundrums.” While mentees can sometimes identify mentors with methodological expertise by the mentor’s past participation in formal training programs or degrees from a School of Public Health, the mentee should also look at the potential mentor’s publication history. Prior published work from the mentor and his/her group can help the mentee identify areas of expertise on the part of the mentor, as well as help gauge whether there might be overlapping interests to facilitate a future mentor-mentee relationship. Another important aspect of a good outcomes researcher mentor is someone who can teach and guide the mentee in the skill of writing. Publication is the currency of academics. While it might not be fair to “publish or perish,” publication of your research has important implications. Publication of your research represents the final step of the research process and is the culmination of your work. It can lead to personal and professional satisfaction, not to mention publication of your work disseminates knowledge and hopefully answers a pertinent and useful question. A good mentor will help you develop a plan to publish. This typically involves helping to decide when the work is ready for publication, tailoring the main finding/message

244

O. Hyder and T.M. Pawlik

for publication, as well as helping to target the appropriate audience and journal. Mentees should seek mentors who formulate excellent and important research questions, utilize rigorous methodology, and those who consistently publish their work. Mentors need to be active participants with mentees in helping to organize, draft, and critically revise the manuscript. There is nothing more frustrating to a mentee than to work extremely hard on a project and then have the manuscript “stall” with the mentor. Similar to other skills, writing and publishing is a skill that needs to be learned from an expert mentor. The mentor needs to value the mentees time and honor agreed upon goals and time-lines for publication with timely review of the mentee’s work. Another important issue with publication concerns authorship. An ideal mentor can openly discuss authorship with his/her mentees in a transparent and comfortable manner. Authorship should reflect effort. Specifically, the first author is typically the individual who does the majority of the work on the project (e.g. formulation of research idea, data collection, analysis, drafting and critical revision of the paper). Beware of the mentor who routinely must identify him/herself as the first author on publications. Such a pattern reflects either the inability of the mentor to cultivate independent work on the part of the mentee or a need to self-aggrandize – neither of which is good! Good mentorship involves open communication and discussion around authorship that is fair and transparent. It is important that the mentor recognizes contributions by the mentee and gives fair credit on publications. A consistent track record of independent publication by the mentees and the independent academic success of mentees subsequent to their work with the mentor speaks volumes about both the mentor’s choice of resourceful individuals as mentees and the mentor’s own efforts in helping his or her team succeed. While publication of manuscripts is obviously important, grantsmanship is a similarly important skill. An ideal mentor in outcomes research will be someone who has a track record of peer-reviewed funding for his/her research. In an era of funding constraints, it is increasingly difficult to obtain funding for research. Funding of outcomes research is important, however. Peer-reviewed funding can provide the necessary funds to allow you to obtain important resources necessary to do your work (e.g. data, soft/hardware, personnel, and consultants, etc.). In addition, funding can provide salary support so that the mentee can have the time and space to focus on research in addition to clinical demands. Peer-reviewed funding also is a demonstrable sign that your work is important and is addressing a relevant topic that warrants financial resources. An ideal mentor in outcomes research is an individual who has experience identifying funding mechanism, submitting grants, and obtaining extra-mural funding. As a mentee, you are trying to establish the foundation for a long career in outcomes research and funding can help sustain you as a researcher. Try to identify a mentor who can help work with you as a mentee to apply for training grants (e.g. T-32, etc.). Working with mentors that have a successful track record of extramural funding for outcomes research will allow the mentee to learn skills relevant to grant writing and applying for mentored development awards (e.g. K-08, etc.) early in their career.

20 Finding a Mentor in Outcomes Research

245

An ideal mentor for surgical outcomes research, may or may not have a busy clinical practice. Seeking an outcomes research mentor who is also an active clinician has its pros and cons. Active clinicians who dedicate a significant proportion of their time to patient care may have a greater opportunity to remain in touch with issues arising in clinical practice that can be explored using outcomes data. A mentor who is in touch with current clinical practices and the most relevant/“hot” topics in the field can be valuable in helping to generate new, exciting questions for outcomes research. Mentors who are too busy clinically, however, can end up having little time to do sophisticated outcomes research. A mentor who is always in the operating room or on the wards may have no time to discuss research ideas, review the data, help with methodological difficulties, or assist with the writing of grants and publications. As such, an ideal surgical outcomes research mentor is probably someone who is an active clinician, yet someone who has dedicated time to make research a priority.

20.3 How to Locate a Good Outcomes Research Mentor and Determine Mutual Compatibility? Before setting off on the search for an outcomes research mentor, the trainee has to determine where his or her interest lies. While flexibility on the part of the mentor is an important consideration, it is as important for the mentee to have some flexibility with regard to their research plans. Not infrequently, potential mentees may think they have the “best” idea for their research focus. Outcomes research entails understanding the available data, formulating hypotheses, and asking questions that can be validly answered using the available resources. While the mentee should strive to identify questions that are relevant and valid for outcomes research on their own, the mentee should also be open to new possibilities and input from a potential mentor about a different direction their idea may take. When setting out to identify a mentor, cast a wide net. In general, the more people you contact, ask for advice, and meet up with, the more likely it is that you will find someone who will be a good match for your interests and personality. A good starting point for prospective mentees may be to look for individuals within your own department of surgery or for those individuals working on a topic you may be interested in at your affiliated School of Public Health. Conferences and meetings, especially larger ones such as the Academic Surgical Congress and the American College of Surgeons’ annual clinical congress have specialty sessions on topics relating to outcomes research. The trainee might identify a mentor through interaction with individuals at these meetings. While traditionally face-to-face meetings were absolutely necessary for a mentor-mentee relationship in the past, in the digital age this relationship can be maintained using email, telephone, or video conferencing. It is probably best, however, if the mentee can find a mix of both local and “remote” mentors.

246

O. Hyder and T.M. Pawlik

Doing your homework before approaching a mentor is extremely important. A researchers’ work is readily accessible on the internet – a search of the PUBMED database will provide a good starting point for you to build a profile on a potential mentor. Using online resources, you can determine if the potential mentor works mostly with his/her own local team or has multiple collaborators from other institutions. Support for collaborative research maybe an important long-term consideration for mentees that are looking to spend their research time at an outside institution (i.e. research residents) and later come back to their home institution to complete residency. The expanse, variety, and focus of the mentor’s past and current work can be judged by reviewing the publications that included him/her as the first or the senior author. The importance of doing this homework cannot be emphasized enough. It allows the potential mentee to get a fair idea of whether his/her ambitions fit in with the mentor’s academic focus. It also allows both parties to avoid spending unnecessary time explaining what the mentor has done in the past. A key factor to discern when meeting with a potential mentor is the level of investment the mentor is willing to make in the mentee’s professional development. During any meeting with a potential mentor, the mentee should evaluate the mentor’s level of interest. Multi-tasking, lack of attention to the prospective mentee’s opinions, and general disconnection with the discussion are ominous signs that may spell trouble down the road. When asked about how they ended up choosing one mentor over another equally talented mentor, many mentees often cite their “gut feeling” as the deciding factor. At the end of the day, the decision to work with a particular academic mentor on outcomes research is a personal one. It is always a balance of the academic and personal factors. Another notable issue that is often overlooked by potential mentees is their fit with other trainees working with the same mentor. A fair and honest mentor will allow you access to all of his/her trainees so you can get a feel of whether you will fit in with the group. Outcomes research is a team effort, and the fit within the team is very important. The prospective trainee should make every effort to meet with as many of the current “laboratory” members as possible. Every group has own its culture, which is usually a good reflection of the mentor him/herself. It is also important to review where each member of the team is in terms of his/her career development. While career development is clearly dependent on a number of factors, the mentee can get some sense of the mentor’s commitment to mentorship by examining the track record of previous mentees.

20.4 Summary Mentoring for trainees interested in outcomes research is key to their development as researchers. Success in outcomes research often depends on a robust apprenticeship with an experienced clinician-outcomes researcher who can guide a mentees career. It is important for the prospective mentee to find a mentor who is focused, fair, and will be invested in the mentee’s professional development both during his/her training, as well as over his/her entire academic career.

20 Finding a Mentor in Outcomes Research

247

References 1. Cothren C, Heimbach J, Robinson TN, Calkins C, Harken AH. Academic surgical mentoring. In: Souba WW, Wilmore DW, editors. Surgical research. London: Academic; 2001. 2. Souba WW. Mentoring young academic surgeons, our most precious asset. J Surg Res. 1999;82:113–20. 3. Souba WW. The essence of mentoring in academic surgery. J Surg Oncol. 2000;75:75–9.

Selected Readings • Healy NA, Cantillon P, Malone C, Kerin MJ. Role models and mentors in surgery. Am J Surg. 2012;204:256–61. • Sambunjak D, Straus SE, Marusic A. Mentoring in academic medicine: a systematic review. JAMA. 2006;296:1103–15. • Selwa LM. Lessons in mentoring. Exp Neurol. 2003;184:S42–7.

Chapter 21

What Every Outcomes Research Fellow Should Learn Yue-Yung Hu

Abstract Although entitled “What Every Outcomes Research Fellow Should Learn,” this chapter actually encapsulates what this outcomes research fellow learned during her research experience. I found my laboratory years to be incredibly rewarding – a result of pure dumb luck as well as design. Nevertheless, I hope this experience will be instructive to others – in both what-to and what-not-to-do. Keywords Surgical outcomes • Surgical research fellowship • Advice

21.1 Finding a Mentor (& a Lab) This was the single most critical decision that I made – more aptly, lucked into – during my three lab years. Many residents pick their labs using the number of publications as the sole criterion. Certainly, productivity is important; we all need something to show for the time we’ve taken out of our clinical training. However, your mentor represents a great deal more than a block of text on your CV. S/he is your introduction to a new world! S/he will educate you – not only in the conduct of science, but also in the (often political) navigation of academics, both at your own institution and on a regional/national/international platform. S/he will be key to your ability to pursue and achieve success in the future, whether or not (but especially if) you intend for that to involve research. A true mentor will be your greatest advocate. It’s a reciprocal relationship, really – in exchange for knowledge, advice, and moral support, you’re going to work towards advancing his/her scientific vision.

Y.-Y. Hu, M.D., M.P.H. () Department of Surgery, Beth Israel Deaconess Medical Center, 110 Francis Street, Suite 9B, Boston, MA 02215, USA e-mail: [email protected] J.B. Dimick and C.C. Greenberg (eds.), Success in Academic Surgery: Health Services Research, Success in Academic Surgery, DOI 10.1007/978-1-4471-4718-3__21, © Springer-Verlag London 2014

249

250

Y.-Y. Hu

Ideally, that vision is one that resonates with you. I’ve realized that I work much harder when engaged. Throughout high school, undergraduate, and my postcollegiate/premedical year, I worked in basic science labs, and, in all that time, I never published. The problem was not my mentors – I had quite a few who were wonderful people and accomplished scientists – but, rather, my own lack of interest. I was forcing it, and poorly so; because I found it utterly painful to read the relevant literature, to think about the problems in the abstract, and therefore to troubleshoot the experiments or rationalize about the next steps, I was unable to drive my own projects. When it came time to pick a lab in residency, I was adamant about studying subjects that appealed to me. In fact, my projects fascinated me, and during those 3 years, I’d find my thoughts wandering to them at all times – while watching TV, reading trashy women’s magazines, riding public transportation, etc. I simply enjoyed turning the problems and their potential solutions around in my head, so the entire research process became more intuitive, less effortful, and subsequently much more productive. Don’t be afraid to leave your home institution to find a lab you love! There are not many programs with established surgical outcomes research fellowships. The Surgical Outcomes Club’s website has a short list of institutions: http://www. surgicaloutcomesclub.com/links. I applied to many of them, as personally, I get anxious if I don’t have a Plan B (through Z). I knocked on a lot of doors, but I didn’t regret that investment of time. It was a sort of informal review of all the options in surgical outcomes research – until I did this, I hadn’t realized that there were so many different pathways. Finally, you are going to work with your mentor more closely and over a much longer period of time than any other attending in residency, so it’s important to choose wisely! A mismatch in style or personality can result in a miserable few years, with each of you thinking the other is always wrong, lazy/overbearing, etc. Think about whether you prefer complete independence (at the risk of lack of guidance and occasionally flailing) or to be constantly pushed (and often micromanaged), and figure out whether your potential mentor is likely to give you the former or the latter. It’s a question of personality (which, if you can’t tell by meeting them, you can probably figure out from other residents), as well as time commitment. If you’re going to need a substantial amount of guidance (and many – myself included – of us do, especially at the beginning), it may not be best to pick someone whose protected time is comprised of evenings and weekends. Going into the lab, I was conscious to gravitate towards people who were senior enough that they didn’t have to actively build their practices and who already had some funding (and therefore viable projects, as judged by standards more rigorous than mine). Later, in trading stories with residents from other labs, I realized I had also serendipitously found someone who was not too senior, by which I mean: her administrative duties and national leadership roles did not prevent me from accessing her on a regular basis.

21 What Every Outcomes Research Fellow Should Learn

251

21.2 Getting Chosen This advice about finding a dream lab and mentor, of course, begs the question, “How do I make said lab and mentor want me?” – particularly if you, like I was, are looking at institutions outside of your own. For me, funding was the first hurdle. At the time I went into the lab, the default pathway in my program was one of basic science research; our institution had several wet labs with training grants, and there was a tradition of residents rotating in and out of the available slots. I felt so strongly about the type of lab I wanted that I offered (threatened) to go into it without any salary or benefits. Ultimately (and very fortunately for me), my mentor and my program decided not to let that happen, and they split the financial responsibility. I do still feel that if I had not been so stubborn, this support may not have materialized, and I would have been nudged into a research experience that I would have only halfheartedly pursued. Visibility (or lack thereof) was my other issue. Being an unknown compounded the being-a-mouth-to-feed issue. Why would another institution invest in me, when its own familiar residents were lined up at the door? My mentor says that she knew to take me because, “You kept coming back,” which may just be a euphemistic way of saying, “I could not get rid of you.” Initially, we met because my clinical advisor, who had been in residency with her, had sent me to her for advice (a lucky break that taught me the value of being vocal about your interests and putting out feelers everywhere, even with people who don’t seem directly positioned to help you). At the end of our meeting, she said, “Follow up with me in a few weeks,” and I interpreted that literally. I emailed her updates about my applications to the programs she had recommended, and I attended as many Center for Surgery and Public Health (her home base, at the time) events as I could. I suppose that between constantly showing my face (including electronically) and offering to essentially volunteer full-time, I demonstrated my level of commitment. Even though I felt like a stalker with my constant “following up,” surgeons are incredibly busy people, and I may have otherwise been forgotten.

21.3 Setting Goals Corny, I know – I’ve always hated this exercise, too. Admittedly, though, it’s a useful one. I wrote up my short- and long-range goals as part of a training grant application, rolling my eyes the entire time. When I re-read the page, I was pretty surprised; there were all my aspirations, in print, and many I had never previously acknowledged to myself! I want to enter the rat race? Be constantly evaluated and judged? Compete for shrinking amounts of money against ever more qualified competitors? Evidently. And I wanted to do it in a relatively untested way – qualitative research in surgery. Who knew? Not me – until then.

252

Y.-Y. Hu

Perhaps you are more perceptive or reflective than me. Nevertheless, it’s still useful to have these thoughts organized and packaged for mass consumption for: (1) grant applications, as I mentioned, (2) your mentor. As I emailed my essay to my mentor (along with the other grant materials), I realized that I had never previously said any of these things aloud to her. How was she supposed to help me get what I wanted if I hadn’t told her what that was? And, cheesy though it was, it was likely a lot more intelligible than whatever I would have stammered if she had asked me in person. I don’t mean that you need to send your mentor an expository essay, but you should have some thoughts about the following questions: What do I want out of this research experience? What are my priorities (learning techniques, learning to read more critically, publishing, networking, having time off)? What questions would I like to answer? What methodologies do I want to learn to use? How does this fit into my overall career trajectory – am I just doing this to so I can be competitive for fellowships or do I want an academic career? If the latter, what do I envision my (research) field of expertise will be?

21.4 Getting an Education I did a Masters in Public Health (MPH) and thought the experience fabulous. My parents commented that it was the first time they had ever heard me say that I loved school – and I think it’s because it is entirely a degree you get for yourself. I didn’t need it to function as a surgeon, but I wanted the knowledge. I also think it would have been incredibly difficult to complete an outcomes research fellowship without some formal education; the overview we all get in medical school and ABSITE review is woefully inadequate for actual research. Some basic statistics (to get you through regressions, at a minimum) and epidemiology (to learn about research design) classes will save you from a lot of flailing. A computer laboratory component is also helpful as each statistical software program has its own nonintuitive quirks. Many schools (Harvard, for example) offer these intro-level classes in an intensive summer program; those 2–3 months will give you enough knowledge to be a functional research fellow. I completed the degree, rather than stopping after the summer, because I suspected the additional knowledge gained would eventually come in handy, since I planned to continue research beyond those 3 years. I took more statistics (e.g. survival analysis, decision analysis) and epidemiology classes. Aside from their obvious practicality to a future outcomes researcher, these courses were incredibly helpful in that the instructors encouraged us to submit our own projects as schoolwork and gave me quite a bit of pre-submission feedback. I also took several classes that I initially considered self-indulgent; they were interesting, but I never planned to use them for anything (e.g. global health delivery, public health law). I now think of these courses as being integral to my education. They opened my eyes to the many possibilities in public health research that remain untapped by surgeons.

21 What Every Outcomes Research Fellow Should Learn

253

21.5 Choosing Projects/Managing Time Generally, when you enter a lab, you are taking over an exiting resident’s projects and just getting started on a few new ones. Some of these starter projects may be predetermined – ones for which you have been hired, but others may just be floating around, awaiting an owner. When I first started, I worried that I wasn’t producing, so I said yes to everything. (Having spoken to other residents, not-doing-enough seems to be a common concern at the beginning, perhaps because we’ve all just left residency and are unused to the pace in the lab.) Most of these projects were small. While they allowed me to get comfortable with SAS (statistical software) and publish a little, I later wished I had invested my time in more important projects. Perhaps, however, it was doing those baby projects that gave me precisely that perspective; after performing one chart review study, for example, I realized how labor-intensive and subjective such projects may be, and I vowed to do them again only if they would be broadly applicable (e.g. not single institution), and if they answered previously-unasked questions that were not investigable in any other way. I subsequently only sought out projects if I thought I’d learn something new (how to use an unfamiliar – but nationally recognized – database, how to use new research methodologies or techniques) and/or contribute something new to the surgical literature (new perspectives/questions). If my mentor didn’t think it could be submitted to and actually reviewed by a high impact journal, I (we) passed. Learn your own criteria quickly (and, if possible, not through extensive trial-and-error) so you can be efficient with your time – 2–3 years will turn out to be a surprisingly short amount of time! Because the time passed so quickly, I learned the importance of establishing timelines. In my former life as a basic science researcher, I had worked in one lab that didn’t publish anything for 3 years before I got there, or 1 year after I left (at which time the PI left the institution). I experience firsthand that in research, you can do endless troubleshooting and further investigating – everything can always be more perfect – until you get scooped. Our lab uses conference abstract deadlines as targets for preliminary data analysis, and we aim to finalize the analyses and therefore a manuscript draft before the presentation. I still use these guidelines. The promise/threat of giving a presentation is highly motivating!

21.6 Having a Life It’s a taboo topic – we don’t like to let on that we aren’t hard-core robots whose only mission is to work, work, work. It should be a priority. This is a break from residency – and you deserve to take it! I know many residents who moonlit upwards of 25 nights a month. They made a lot of money, but they were burned out before they re-entered clinical training. Money wasn’t primary for me – I liked the extra cash, but more than that, I appreciated the time to decompress, make up for lost time

254

Y.-Y. Hu

with friends and family, date (!), live healthfully (eat non-hospital/non-fast food, work out), and explore this city I’d been living in for 2 years without really seeing. Additionally, I felt that I wouldn’t be able to dedicate myself as fully to research with that level of clinical activity. Because so many of my ideas came when I was not actually in the lab, and because clinical work is so all-consuming, I think they would have never come at all if I had been moonlighting extensively. I needed the space to think.

21.7 Re-entering Residency Because I was concerned that I had built a lot of research momentum that would be lost if I returned to residency, I took a third year in the lab. To some extent, this was justified – many of my papers were submitted and (multiply) revised in the first half of that third year, and it may have been difficult to accomplish this on top of re-starting clinical training. However, because I also worried that I needed more data and more primary publications to account for that third year, I took on several new projects. Most didn’t begin data collection until the last half or quarter of that year, so I re-entered residency with much of the analyses incomplete. I don’t think it is necessarily a bad idea to keep producing new data; however, I should have identified people (incoming residents) to help me finish them. For some, I did, and those are now in the manuscript-drafting phase. The others are making progress at a glacial speed – they are still in the data clean-up phase. As surgical residents, we are trained to always follow through, but I should have considered dropping some of these projects when it became apparent that the data collection would be occurring very late, and instead invested the time into finishing the other(s). Currently, I am still struggling with balancing residency and research. I do have some anxiety over the fact that by the time I am ready to re-enter research (for most of us, it’ll be 2–3 years of residency, then 1–3 years of fellowship), I will have become rusty and my CV outdated. (Plus, I am loathe to give up something I enjoy so much!) I have been advised by more senior researchers to not worry about keeping active projects, as we do need to be fully dedicated to our clinical training. Thus, I am trying to limit myself to finishing my last projects and keeping in touch with my lab (e.g. I try to give feedback on the projects in which I was involved, now that they are becoming abstracts and manuscripts). I do still use abstract deadlines and meeting presentations as timeframes, although I submit less frequently than before. I am thankful that my mentor(s) and lab(s) have been so supportive and understanding. I hope my story is helpful. Enjoy, and good luck! I wish you all similarly happy experiences!

Chapter 22

Funding Opportunities for Outcomes Research Dorry Segev

Abstract “It’s All About the Benjamins” (Sean Combs a.k.a. Puff Daddy, June 30, 1997) It is critical for any discussion of research to address funding; four major reasons come to mind. First, high-quality outcomes research is not free. Contrary to the popular belief, outcomes research is not something you do on your laptop while watching television; it is a complex endeavor that requires time, expertise, collaborators, data, computing, and often patient engagement, and all of these things cost money. Second, few of us work in an environment where our clinical margin can finance our research, so money from outside of our clinical practice is necessary. Third, funded research is highly valuable to institutions, mostly because of prestige and indirects (the 25–75 cents or so in facilities and administrative fees that are paid to the institution for every dollar of research funds awarded to the PI). Fourth (likely as a result of third), funded researchers are highly respected in academic institutions; in fact, research funding is often a criterion for promotion, bonuses, etc. This chapter will address what line items are commonly found in budgets of outcomes research grants (What Costs Money), various sources of research funding including government, societies, foundations, and other less-traditional sources (Who Has the Money), the types of grants that are funded, such as career development versus research grants, and their target audiences (Who Gets the Money), an overview of the NIH grant review and funding process (The Road to Riches), and some grantwriting advice (Selling the Drama). Clearly, a handful of pages cannot begin to cover all of the details and advice that an investigator needs

D. Segev, M.D., Ph.D. () Department of Surgery and Epidemiology, Department of Surgery, Johns Hopkins Medical Institutions, 720 Rutland Ave, Ross 771B, Baltimore, MD 21205, USA e-mail: [email protected] J.B. Dimick and C.C. Greenberg (eds.), Success in Academic Surgery: Health Services Research, Success in Academic Surgery, DOI 10.1007/978-1-4471-4718-3__22, © Springer-Verlag London 2014

255

256

D. Segev

to be the Puff Daddy of research funding; but hopefully this will serve as a starting point, beyond which the reader is advised to identify one or more well-funded investigators with a track record of facilitating this process for their mentees. Keywords Benjamins • Grantwriting • NIH • AHRQ • PCORI

22.1 What Costs Money Young outcomes researchers often ask me if “the kind of work we do is even fundable.” In other words, it seems intuitive to anyone that laboratory science would need funding: you need reagents, lab equipment, technicians, mice, cells, etc. It might seem that “just crunching data” would not follow this model and, as such, would not lend itself to traditional funding mechanisms. However, an overview of the typical expenses found in outcomes research (and reassurance that, yes, the NIH considers these to be “viable expenses”) reminds us that the laboratory funding model is not so different from outcomes research. Even if the research does not involve patient interactions, “crunching data” is expensive. Even the data themselves are expensive, and funding agencies do not expect that you have already paid for data. Some datasets (such as claims or pharmacy data) can cost hundreds of thousands of dollars. Additionally, dataset linkages are also time-consuming and fundable. Demonstrating feasibility and potential effects in either a small subset of the main data, or in data from a different source with similar structure, can suffice for preliminary data; there is certainly no requirement to have conducted the entire study before applying for the grant. No matter the source, data arrive dirty, and require cleaning and extensive exploration to ensure high-quality data prior to the primary analysis; again, pilot data can be derived from a subset or a different source, and in either case the main data still require this work, which requires the time and effort of a research assistant or an analyst (or both). The analysis itself is often complex, requires computers and statistical software (you might have the software, but the licenses might require updating) and, most importantly, personnel. For an analysis to be reliable, redundancy is likely required. To ensure the best methods are used, analysts are required but not sufficient; faculty (typically from departments of biostatistics and/or epidemiology) with extensive, published experience in methods relevant to the science are critical, and must contribute enough effort (at least 1.2 calendar-months per year) to demonstrate full engagement with the research team. Since a clinical understanding is required to inform exploration of the data and the analytical approach, substantial effort from the PI and/or clinical experts is also required. If patients are involved, expenses add up quite quickly. These can include patient incentives to participate in the research (gift cards, other tokens, meals) as well as expense reimbursement (travel, parking). Research assistants are often

22 Funding Opportunities for Outcomes Research

257

required to collect the data directly from the patients, abstract data from medical records, and enter data into whatever computing system has been established for capturing the data. The data collection system is also an expense of the research; often, the pilot study uses a more rudimentary data collection system, and this is expanded once funding is secured. As with data analysis, data collection does not just involve those collecting the data, but requires supervision, redundancy, and faculty collaborators with extensive experience in conducting human subjects research contributing enough effort to demonstrate engagement in the process of data collection design, subject recruitment and retention, and protection of human subjects.

22.2 Who Has the Money The holy grail of medical research support is the NIH. Readers to whom this statement is a surprise are encouraged to rethink their interest in research. Not only does the NIH have the largest budget for medical research (just under $30 billion in 2013), it has the process of grant review widely considered to be the most robust; as such, achieving NIH funding, serving on an NIH study section, or even having the letters “NIH” on your license plate are considered prestigious in the academic community and highly valuable to promotions committees. NIH publishes a Weekly NIH Funding Opportunities and Notices newsletter that is well worth receiving through their free e-mail subscription. The general type of grants offered by the NIH (Who Gets the Money) and the NIH grant review process (The Road to Riches) are discussed in more detail below. However, it is also not breaking news that the NIH budget has recently stagnated (which effectively means a decrease in funding), and that matters are likely to become far worse. As such, it is critical to be aware of (and seek) alternative sources of research funding. The following list illustrates a broad range of non-NIH funding opportunities but is undoubtedly incomplete, and the reader is encouraged to explore funding opportunities independently. Three major government-based funding sources specific to medical research are the Agency for Healthcare Research and Quality (AHRQ), the Patient-Centered Outcomes Research Institute (PCORI), and the Health Resources and Services Administration (HRSA). AHRQ functions basically like a “mini-NIH” in terms of funding opportunities, grant review, and grant structure; the other agencies use their own mechanisms. The Centers for Disease Control and Prevention (CDC) also offer occasional disease-specific grant or contract opportunities, and the National Science Foundation (NSF) is a staple for science and engineering funding that might align well with engineering/medicine collaborations. Recently, the Center for Medicare and Medicaid Services (CMS) has introduced some interesting outcomes research and innovation funding opportunities as well. Finally, the Department of Defense

258

D. Segev

funds a Congressionally Directed Medical Research Program for areas of medicine directly relevant to service members, their families, and other military beneficiaries. Investigators who practice in the VA system (US Department of Veterans Affairs) have access to a rich source of intramural VA grants ranging from career development awards to large research awards. Those in the VA system are strongly encouraged to pursue these funding opportunities, but those outside of the VA system unfortunately need to look elsewhere. A number of foundations offer research funding, mostly in the form of career development awards, but some also offer larger so-called “R01-equivalent” grants. These include, but are certainly not limited to, the Doris Duke Charitable Foundation, Burroughs Wellcome Fund, Robert Wood Johnson Foundation, and the American Federation for Aging Research. Disease-specific foundations also include the American Heart Association, American Stroke Association, Cancer Research Institute, American Gastroenterological Association, Juvenile Diabetes Research Foundation, National Pancreas Foundation, American Diabetes Association, and National Kidney Foundation. Remember to think “outside of the surgical box” about the diseases that require surgical intervention, and consider pursuing funding from groups that seek to better understand and treat these diseases. While some of these foundations might seem at first glance to be more “medical” (as opposed to surgical) or “basic science” (as opposed to patient-oriented research), there are many examples of surgical investigators funded by these foundations and associations. Many professional societies also offer funding opportunities to their members; as above, some societies that might at first glance seem “medical” have funded surgical investigators and are well worth pursuing. Surgical societies include the American College of Surgeons, Association for Academic Surgery, and Society of University Surgeons; specialty surgical societies include the Society for Vascular Surgery, American Society of Transplant Surgeons, Society of Surgical Oncology, American Pediatric Surgical Association, and many others. Societies that might overlap with surgical research also include the American Cancer Society, American Geriatrics Society, American Society of Nephrology, American Association for the Study of Liver Diseases, and others. Other funding sources certainly exist but might be less formalized; the lack of formality should not discourage the enthusiastic and creative applicant, as inspiring tales of riches and glory can be found by those who seek them. Most institutions have internal career development and seed grants which are, strangely, often wellkept secrets (hopefully less so after the writing of this chapter). Some investigators have forged successful collaborations with insurance companies, state Medicaid, or the “hospital side” of institutions to conduct health services research that benefits those who pay for (or endure the cost of) medical care; the overlap between safety, quality improvement, and outcomes research can often be a strong point of leverage. Finally, grateful patients and other philanthropic efforts can support the most exciting, fast-paced, high-risk projects that are usually also the most highimpact but (unfortunately but not surprisingly) least fundable through traditional methods.

22 Funding Opportunities for Outcomes Research

259

22.3 Who Gets the Money Medical research funding begins with the larval medical/graduate student and extends through the furthest reaches of full professorship. The PI of a research group should strive for extramural funding across this spectrum; not only does a broad portfolio reflect well on a research group and develop a culture of grantwriting and funding, but it also provides a track record and foundation of success for those junior investigators able to secure such funds. Most grants fall into two major categories: career development awards, where the unit of funding is the individual (who, of course, proposes to conduct some type of research), and research awards, where the unit of funding is the proposed research (which, of course, will be conducted by a group of individuals). More technically, a proposal for a research award is typically based around a set of specific research aims, while a proposal for a career development award adds a layer of training and/or career development to these research aims. This section will explore these categories in the context of NIH awards, but similar patterns will be seen in other government, foundation, and society mechanisms.

22.3.1 F Awards The “F” stands for fellowship, but this mechanism includes both predoctoral awards (such as F30 student awards) and postdoctoral awards (such as F32 postdoctoral fellowships). As a family, the F awards are also known as the National Research Service Award, or NRSAs. For medical students pursuing MD/PhDs, the F30 is an amazing mechanism that will basically pay for all of medical and graduate school tuition, provide a stipend, and even provide some research resources. For surgical residents spending “time in the lab” between clinical years, the F32 is an ideal mechanism, particularly for those seeking graduate degrees; similar to the F30, it provides a stipend (although most likely less than clinical PGY-based salary), research resources, and tuition support for a graduate degree. For those who are unsuccessful in obtaining an F32, some institutions hold T32 grants which are research area-specific and provide similar support while administered through the institution (with or without a formal application process). Experienced researchers who frequently mentor F-level trainees are encouraged to collaborate with the goal of establishing T32 mechanisms if these are not already available at the institution.

22.3.2 K Awards The “K” stands for career development (apparently nobody wants “C-awards”?) and in general supports mentored research training at the instructor or assistant professor

260

D. Segev

level (such as K01 for PhD-trained investigators, K08 for MD-trained investigators conducting non-human subjects research including laboratory science or secondary data analysis, and K23 for MD-trained investigators conducting patient-oriented research) and mentoring at the associate professor level (such as the K24 midcareer investigator award). Those with established, R01-funded research groups who actively mentor might consider the K24 to support this endeavor. However, “K awards” generally refer to the first step on the faculty pathway to independent research: a junior faculty member starts with some departmental or institutional funding, obtains a K08 or K23, and hopefully eventually bridges the “K to R” (see R awards below) transition for independent research funding.

22.3.3 R Awards The “R” stands for research, and these independent research awards include the holy grail of all that is research funding in the universe, the R01, as well as smaller research grants (R03 and R21). The R01 is not only the holy grail because of its resources (in general, up to $500,000 per year for up to 5 years, with the possibility for competitive renewal for many years thereafter) and scope (any topic acceptable to an institute and interesting to a study section), but also because of the consistency with which funding is correlated to study section score and the advantages offered to early stage and new investigators (see The Road to Riches below).

22.3.4 P and U Awards The “P” stands for program project, and these include the P01 which is basically several integrated R01-scope projects “involving a number of independent investigators who share knowledge and common resources.” The “U” has an unclear etymology (at least unclear to this author), comprised mostly of the U01 Cooperative Agreements which are NIH-administered multi-center studies where each center applies for the funding required for their contribution. In general, the only readers of this chapter for whom the P and U awards are appropriate as PI are those readers already familiar with them.

22.4 The Road to Riches The NIH pathway to funding is complex but well worth understanding for the sake of strategy and sanity. The NIH is divided into a number of disease-specific institutes, such as NIDDK (National Institute of Diabetes and Digestive and Kidney Diseases), NCI (National Cancer Institute), NIA (National Institute on Aging), and

22 Funding Opportunities for Outcomes Research

261

19 others. The institutes fund the grant applications (which hereafter we will refer to as “grants” for convenience and consistency with colloquial precedent), but do not necessarily review the grants. There are hundreds of study sections that review grants, some of which belong to the individual institutes, some of which belong to the Center for Scientific Review (CSR, another branch of the NIH), and some of which are ad-hoc special emphasis panels. In general, a grant is submitted through a funding opportunity announcement (FOA, more below), “accepted” by an institute (acknowledging that the topic is within their mission and that they participate in the particular FOA), and reviewed and given a score (with or without a study section-specific percentile) by a study section. The institute evaluates the score and/or percentile in the context of (1) the rules of the FOA (including the institute’s established payline for R01s), (2) the relative scores of other applications, (3) the financial situation of the FOA (i.e. if money was set aside for this particular opportunity), and (4) the financial situation for the institute (how much money they were appropriated and how much they have spent). The most important distinction is between who reviews the grants and who funds the grants. The study section reviews the grant, while the institute funds the grant. In the case of most R-grant FOAs (R01s, R03s, R21s), the study section is drawn from the CSR. The investigator can request a study section, and in general if this is an appropriate request it will be granted. As such, investigators applying to most R grants are strongly encouraged to research the CSR study sections both online (the topic areas and rosters are public information) and through word-of-mouth. Some Rgrants, and most F/K/P/U grants, are reviewed by a study section that belongs to the institute rather than the CSR (either institute-specific standing study sections, or adhoc special emphasis panels); in these cases, investigator request is not applicable. In general, the R-grants labeled “PA-xx” have no money assigned to them, are evaluated by CSR, and draw from the institute’s general budget. Most institutes have “parent” FOAs for the R-grants that are not disease-specific (other than the mission of the institute) and can be used for all investigator-initiated ideas. Some PAs are specific to a disease or a type of research; these still do not have money assigned to them, but they indicate a priority for the institute; at the end of the day, this just means that if the score is borderline (i.e. not quite under the payline), they might be able to use discretionary money to fund it. R-grants labeled “PAR-xx” are similar to PAs except they are usually more targeted and more than often use a special emphasis panel for review. Finally, R-grants labeled “RFA-xx” have actual money assigned to them (that is separate from the institute’s general budget) and are usually reviewed by a study section that belongs to (or was ad-hoc created for) the institute and/or the FOA. The R01 has the most comprehensible and predictable (if such is possible with the NIH) funding pathway. Other than certain RFAs, PARs, or other unusual circumstances, an R01 receives a score from the study section, is assigned a percentile based on the recent distribution of scores (usually specific to that study section), and is selected for funding if the percentile is below the payline for that fiscal year. Each institute establishes and publishes a payline, so interpreting the

262

D. Segev

percentile is relatively straightforward. In fact, most institutes establish at least two paylines, one for established investigators and one for new investigators, i.e. those who have never received an R01 or R01-equivalent from the NIH. Some institutes establish a third payline for early stage investigators who are not only new investigators but are within 10 years of their final degree (rumor has it that clinicians can make a compelling case that residency or fellowship was part of their training, and as such the clock would start after all clinical training). Sometimes an institute will also establish a formal payline (either by percentile or score) for other mechanisms (such as R21s, K awards, etc.) but these are not nearly as consistent or predictable as the R01 paylines, so conversations with the NIH staff are required to make sense of the scores and fundability.

22.5 “Selling the Drama” (Ed Kowalczyk of Live, April 26, 1994) It is well beyond the scope of this chapter to describe the ninja skills required to identify an important, innovative scientific endeavor and present it in a compelling way to a room full of critical, likely somewhat cynical individuals who review about 10 as many grants as are funded. The skills are also specific to the research approach: for example, review of a secondary analysis might focus on the analytical approach and data quality, while that of patient-oriented research might focus on recruitment feasibility and measurement error. A good place to start might be an institutional or society grantwriting course. In general terms, it is important to remember that most grant applications come with very detailed instructions, and it is actually important to read these instructions carefully. Violations of page limits, margins, font sizes, organization of the grant, letters of support or recommendation, and other seemingly trivial issues are easy ways to get a grant rejected before it even has the chance to take up a study section’s time. Similarly, in light of the number of applications that each study section member has to read, it is important to make it easy for them to like the application, both in terms of formatting (organize thoughts into clearly labeled subsections, leave space, use figures and tables to break the monotony of text) and content (explain the importance of the research, support the feasibility and likely success of the proposed science with preliminary data, clearly describe the approach so that the reviewers can actually imagine how the research will be conducted, and identify potential problems that might arise and how they might be addressed). Finally, the overview (the specific aims page of an NIH grant, for example) is critical and must “sell” the grant; many reviewers will not read past a poor overview (usually because they have 50 other grants to read), and those who do will already have a bad impression of the grant which is likely not recoverable. A research grant (R-grant in NIH terms) involves the identification of a significant knowledge gap in the field, an approach likely to address this knowledge gap,

22 Funding Opportunities for Outcomes Research

263

and a prediction of how the findings might affect patient care (or policy). A career development grant (K-grant in NIH terms) involves these elements but with a (very important) parallel layer for the investigator in training: the significant knowledge gap in the field must parallel a knowledge (training) gap in the investigator, the approach likely to address the knowledge gap must also teach the investigator a new set of skills, and the prediction of how the findings might affect patient care must parallel short-term and long-term career goals for the investigator. The career development and training layer of a career development grant should not be taken lightly, and is more often than not the Achilles’ heel of the grant. Examples of successful grants abound, and investigators are encouraged to learn from these examples. Almost every funding source publishes a list of previous recipients, and many grant recipients (particularly those in one’s own institution, field, or collaborative network) are willing to share parts of their applications. The NIH goes one step further with NIH RePORTER (Research Portfolio Online Reporting Tools), a database of all funded grants, searchable by name, institution, department, topic, mechanism, and other characteristics; most importantly, the database also includes the narrative for each application which usually lists the specific aims and other critical elements. A read through NIH Reporter is a worthwhile lesson in what topics and approaches are “fundable”, how research can be framed in a compelling manner, and how to align a workscope with a funding mechanism. Furthermore, a creative use for this resource is the identification of mentors and collaborators; for example, those writing K-grants can search their own institution for R01-funded investigators with expertise in the areas where they are hoping to train.

22.6 “You See What You Look For” (Stephen Sondheim, April 26, 1970) The research funding environment is increasingly competitive and frustrating. That said, there are still billions of dollars out there for research funding (that’s a lot of Benjamins), and the emphasis on research directly applicable to patient care and policy (such as outcomes and other health services research) is growing. The classic NIH pathway of F32 to K23 to R01 is still very feasible for those with persistence, mentorship, institutional support, and a research trajectory that lends itself to NIH funding. Others will find success through foundations, philanthropy, and other creative funding pursuits. Most everyone will have more failures than successes, most grants will require resubmission, and most successfully funded researchers submit multiple grants every year. In the context of these somewhat painful but still hopeful realities, let us never forget that the only grant with no chance of success is the grant not submitted.

Chapter 23

Choosing Your First Job as a Surgeon and Health Services Researcher Scott E. Regenbogen

Abstract An academic surgeon’s first faculty job is a key determinant of future success. The process for finding that job, however, is poorly defined. The three key stages are looking for jobs, screening opportunities, and crafting a position. At each stage, careful attention to the candidate’s goals, skills, and career mission are essential. In this chapter, I will discuss some guiding principles to help find a job with greatest chance of academic success and personal and professional fulfillment. Keywords Academic surgery • Job search • Surgical health services research

23.1 Introduction When aspiring academic surgeons approach the end of training, the process of finding a faculty job is often daunting. After 5–10 years of postgraduate surgical training, it has been a long time these individuals have undertaken a job search that involved anything other than a pro forma application and a computer matching algorithm. Tenure-track academic surgery positions are rarely found in the back of journals or in online job databases, so the search is often idiosyncratic, driven more by networking and the influences of mentors and local department leaders than by the forces of the traditional job market [1]. The choices made in this process are extremely important, however, as the first faculty position is often cited by successful academics as an absolutely critical factor in the development of a productive career in academic medicine [2]. A measured, careful and structured approach to the job search process is, therefore, highly advised, yet often bypassed, in favor of familiar settings and known entities. The process of obtaining a S.E. Regenbogen, M.D., M.P.H. () Department of Surgery, University of Michigan, 1500 East Medical Center Drive, Ann Arbor, MI 48109, USA e-mail: [email protected] J.B. Dimick and C.C. Greenberg (eds.), Success in Academic Surgery: Health Services Research, Success in Academic Surgery, DOI 10.1007/978-1-4471-4718-3__23, © Springer-Verlag London 2014

265

266

S.E. Regenbogen

first faculty position in academic surgery comes in three main stages, which I will consider separately: looking for jobs, screening opportunities, and crafting a position. Before embarking on any of this, though, a period of self-evaluation is required.

23.2 Preparing for the Job Search Success at each stage will require an honest and detailed cataloguing of needs, interests, motivations and goals. It is important to identify the elements of a job that are keys to development and success. Some of these elements will be negotiable; some will not. Thus, before beginning the process, some real soul-searching must be done. It may be helpful to compose a personal “mission statement” that summarizes career goals across the various components of a job in academic surgery – scholarly, administrative, didactic, and clinical. This document will communicate your priorities to potential mentors, employers and collaborators. And even more importantly, if properly constructed, it will be the touchstone by which you will adjudicate job choices and other critical career decisions along the way [3]. This is a task that deserves dedicated time and careful attention before the job search even begins. With a mission in hand, the next major framing decision will involve the emphasis and weighting of activities. How much of the professional effort will be patient care, and how much will be scholarly? Assuming that health services research is a meaningful part of the mission, there are generally two models for the academically influential surgical health services researcher. Some will be surgeons who publish – a clinically busy surgeon with a reputation as a skilled practitioner, a regional, national, or international referral base, access to clinical outcomes data, and collaboration with researchers who provide analytic expertise and continuity to the flow of academic work. These surgeons will generally spend 60–80 % of their time on clinical work, external grant funding will not be a major contributor to their salary, and their incentive compensation will be determined primarily by clinical revenue. Others will be the health services researchers’ version of the surgical scientist – a researcher who operates. These surgical scientists will generally be known for their publications and presentations. Their clinical practice will be designed to consume no more than 20–40 % of their time. They will support their salaries as principal investigators with external grant funding to offset the opportunity costs of lost clinical revenue, and their incentive compensation will ideally be tied to scholarly productivity and revenue from grant funding [4]. For some young surgeons nearing completion of clinical training, it may be hard to distinguish which of these roles is a better fit. After all, this degree of selfdetermination has been largely absent from the trainee’s recent years’ experience. But there are some clues you can use for self-assessment. The non-negotiable elements of the mission statement will often come from the side of the equation

23 Choosing Your First Job as a Surgeon and Health Services Researcher

267

that should be receiving more emphasis. Also, “stretch goals” will often define the area deserving of most attention. Some may imagine achieving recognition through clinical volume and reputation, others through administrative promotion and leadership, and others from a major grant or paper published in a prestigious journal. Understanding which of these achievements will be the primary motivator, and which job model is a better career fit will shape the kinds of faculty positions sought, and the ways that potential employers will evaluate your candidacy. The last step in preparation is an honest self-examination, to evaluate skills, expertise, and gaps in your track record. Do you have the knowledge, experience, and tools to carry out a career plan to reach your goals? Scrutinize your publication record, previous mentor relationships, and curriculum vitae, and try to imagine how your experience will be rated by interviewers, potential mentors, and department chairs. How many first-author publications do you have to demonstrate your ability to conduct and complete a research project? Have you ever obtained research funding, such as a training grant, or local project support? Do you have the classroom didactic training to support your research plan as an independent scholar? The health services research toolkit typically involves knowledge of biostatistics, clinical epidemiology, health policy and management. If any of these are lacking, but required for future work, consider how you might obtain formal training either before or, if necessary, during your appointment [5]. Some departments may be able to support enrollment in relevant coursework, but this will need to be considered early on.

23.3 Looking for Opportunities Once you have carefully compared your future goals and current skills, it is time to begin putting out some feelers for faculty job opportunities. Depending on your clinical specialty, these jobs may be posted on institutions’ human resources websites, or advertised in the classified sections of journals or at meetings. However, it is most likely that the right job won’t be found this way. Most jobs in academic surgery come from networking and referrals from colleagues or mentors [6]. So, a first step is simply to talk with those around you – residency and fellowship directors, division and department chairs, and other acquaintances in your clinical and/or research specialty. In health services research, these people can be found at the Quality, Outcomes, and Cost sessions of the American College of Surgeons Clinical Congress Surgical Forum, at the Outcomes Sessions of the Academic Surgical Congress, and through the Surgical Outcomes Club meetings and website. Seek out people whose papers you read and whose talks you admire. In particular, think about young faculty you might emulate – their mentors are likely to be people who can provide wise and essential guidance toward job opportunities. The more you talk to people, and the more visible you make yourself, the wider the range of opportunities you will have to evaluate and consider.

268

S.E. Regenbogen

An early decision to be made about the setting for the job is what type of research environment to seek out. Though surgical health services research has grown substantially in the past decade, from a sparsely populated cottage industry to a solid establishment distributed widely around the country, there are still a limited number of surgical departments that can boast a well-apportioned infrastructure in the field. One option is to seek out one of these institutions – a number of them can be found on the Surgical Outcomes Club website (http://www.surgicaloutcomesclub. com/links). These are departments in which an established senior investigator has ongoing projects and opportunities for junior faculty, residents and fellows. There may be up-and-coming protégés who have obtained career development awards with this senior mentor, or even progressed to their own major project grants under this tutelage. These institutions may have an established data management infrastructure, experienced analysts, and a core of other investigators with whom a junior faculty member might collaborate. Beyond these few surgical health services research hubs, however, there is a far wider variety of academic centers with a mature infrastructure of non-surgeons, and even non-physicians, doing this kind of work. Many of these groups would be wellserved by the addition of a young surgical investigator among their ranks. Often, their work will extend to areas of clinical expertise for which a surgeon can offer important practical insights. Senior investigators in these settings can often provide very good mentorship and development opportunities for junior faculty. They may be somewhat unfamiliar, however, with the clinical demands and expectations placed on junior surgical faculty. The academic medical doctor will often do just one half-day outpatient clinic per week, or attend an inpatient service several weeks per year. They have large blocks of purely non-clinical time, whereas academic surgeons will typically have weekly clinic, operations, and a steady stream of patients in the hospital, requiring academic and clinical work to proceed in parallel. So, if you plan to be the surgeon in a medical health services research group, you will need clear, mutual understanding between your mentor and your clinical chief to establish a productive mentoring and organizational structure and career development plan that works in this setting. The third option for research setting is to “go it alone” as the pioneer surgical health services researcher in a department seeking to expand its research domain. Many surgery departments’ traditional focus on basic science research has broadened, as the advent of outcomes benchmarking, public reporting, payfor-performance, and reimbursement reform have increased recognition of health services research as an essential component of surgical care improvement. These institutions may be interested in recruiting young faculty with training and experience in epidemiology, biostatistics, program evaluation, and health economics to found a home-grown surgical health services research group and provide training opportunities for residents and scholarly support to other faculty. These settings provide a great opportunity for early independence, especially if a highly-motivated department chair commits meaningful resources and influence to the effort. On the other hand, there are real challenges to the young surgeon trying to establish both a clinical practice and a research infrastructure at the same time. And there is a

23 Choosing Your First Job as a Surgeon and Health Services Researcher

269

very real risk of intellectual isolation without other like-minded faculty around. A young investigator going solo will be well-served, therefore, by close ties and collaborations with former mentors or other allied colleagues in other institutions.

23.4 Screening Jobs Understanding the advantages and limitations associated with each of these settings, the next step will be to field recruiting offers and reach out to institutions that may be of interest. As these communications proceed, you will need to decide which ones are potentially viable opportunities worth investigating with a visit and interview. This is a good time to revisit that mission statement, refresh and update your priorities and goals. They will be used as a checklist by which to evaluate each institution and the jobs they propose. A useful first screen will be to evaluate the success of academic junior faculty they have hired in the past 5 years or so. The experience of recent hires is a very good predictor for how things might go if you join the department. Read their faculty profiles on the department’s website, search PubMed (http://www.ncbi.nlm. nih.gov/pubmed) for their publication records and the NIH RePORTER (http:// projectreporter.nih.gov/reporter.cfm) for their federal funding results. How many of them have obtained career development awards or society grant funding to support their startup? Are they publishing successfully? When an interview visit is planned, be sure that some of these young faculty members are included in your itinerary as they will be a very important inside information source about the viability of the clinical/academic balance in the institution. If they have been well supported and positioned to succeed, there is a good chance that you can follow their lead. During the first interview visit, there is a great amount of detailed information to be obtained. But it is also essential to get the “30,000 foot view” of what is valued and rewarded in the department. What is the leadership’s vision for the department and its faculty? Do they want to be the dominant clinical referral center? How do they view their role in the surgical community? What do they want to be known for? And what is the salary and incentive structure for faculty? Most departments will award financial bonuses for clinical productivity, but there are also some that offer equivalent or alternate rewards for academic success [3, 7]. These allow faculty to forgo additional clinical referrals in favor of scholarly work, without sacrificing their income. Even more importantly, however, the presence of these incentives demonstrates a commitment to academic activity and anchors the goals of the institution by putting money on the table. At the most basic level, try to figure out whether the priorities and vision of the department align with your personal ambitions, and whether your most important goals will be supported? [8] If the department’s mission and goals seem a reasonable match, try to define and imagine yourself in one or two very specific jobs within the department, both the clinical part and the research part. On the clinical side, assess the demand – is there a need for someone with your training? Is there an established source of

270

S.E. Regenbogen

referrals to the institution, or will you need to generate a personal referral stream? Who will be your competition, either within the institution or in the region? If you can bring a needed set of skills to an environment in which referrals are directed to the health system, rather than to particular surgeons, it will take far less effort to get a clinical practice started. Young faculty who need to generate personal referrals will be more likely to sacrifice research time in order to respond to unplanned referrals and generate good will from their referring doctors. At the other extreme, be sure that there is some opportunity to share clinical responsibilities with others, so that the burdens of urgent referrals will not too often interfere with planned academic time. Overall, there should be general agreement between how busy you want to be, and how busy you will need to be in the particular setting being considered. Probably the most important element to investigate, however, is the availability of adequate mentorship [6]. An effective mentor will guide career development training, ensure effective navigation of local politics, shape and optimize the quality and viability of manuscripts and grant applications, and guide involvement in key committees and specialty organizations [9]. An independently successful mentor can also provide research resources, as a mentee can often make secondary use of the data, equipment, and collaborators already present in the mentor’s organization. A mentor with a well-established and productive research infrastructure may provide even more resources in-kind than could be obtained with departmental start-up funds. And the academic mentor can be an essential line of defense against competing demands from clinical superiors. As discussed above, some will find appropriate mentorship within a department of surgery, while others will seek out this support elsewhere in the institution, or even beyond. Finally, think about the physical space. Is the hospital physically separate from the university and academic center? As a surgeon, rounding on postoperative patients, attending departmental meetings and other clinical responsibilities may leave relatively few days with no clinical commitments at all, so the time spent in transit can add up, unless travel between the clinical and research sites is relatively easy. Some distance, however, may be helpful, as physical departure from the clinical environment allows more complete separation of time commitments.

23.5 Crafting the Job The true negotiation begins with an offer letter from the department. The offer letter should detail salary and other compensation, clinical responsibilities, teaching roles, administrative support, academic expectations, research start-up funds and other resources. Some institutions will even include explicit academic and clinical mentoring relationships in the letter as well. On the clinical side, you should aim to define what proportion of your effort is supposed to be spent on patient care, and how that effort might be adjusted if research funding is obtained. If a career development award, such as federal “K” grant, is part of your academic plan, be aware that they typically require 75 %

23 Choosing Your First Job as a Surgeon and Health Services Researcher

271

effort dedicated to research, and ask whether this sort of funding mechanism could be accommodated. Some will make protected time for academic work explicit, whereas others may define effort allocations more conceptually. Simple measures of clinical expectation might be how many days per week you will operate, and how often you will be on call. Some departments will have minimum productivity expectations, measured in dollars or work RVUs. Others may treat the individual surgeon as a cost center and expect faculty to generate revenue to cover their costs and salary. At the most basic level, will your regular paycheck be determined by your clinical productivity? Regardless of the accounting method, it will be important to understand how clinical volume will be measured and at what point a new faculty member will be held accountable. Some will offer 2 or 3 years of allowance to grow clinical practice volume, but this is not universal. The letter should also define what resources will be available to support clinical work. Will operating room block time be allocated to you directly? Will clinic space, medical assistants, and clinic support staff be available? Are there midlevel providers or clinical care support staff to answer patients’ phone calls, obtain records, and help with the logistics of clinical care? On the research side, the offer letter should state general expectations and metrics for success in scholarly work. Physical space for conducting research should be explicit. If you are joining a well-established group, make sure that you have an office, cubicle, or at least a dedicated seat alongside your collaborators and mentors. If the group already has data managers, programmers, analysts, and/or statisticians, your start-up financial contribution to the group could involve department funding for a time-share of one or more of these people – perhaps half a data analyst for 2 or 3 years. Remember that this should include support for both salary and benefits for this individual. Otherwise, consider requesting salary and benefit support to hire a research assistant who can perform some of these tasks. Some start-up discretionary cash should be included as well, to allow the purchase of computer equipment, software, databases – the typical health services research needs. Although the cash support (“hard money”) needed for health services research may be less than is needed to run a basic science lab, it will still be invaluable to have money available, especially if not joining a group with established data sources and analytic support. In the end however, the offer letter and subsequent negotiations are just a starting point. This conversation will define the essential details of the job, but the realities of the work will continue to evolve even after the final signed version of the contract [8]. Upon arrival in the job, start immediately setting consistent patterns for your involvement in academic work. Put boundaries on clinical time. Establish your limits for unplanned, urgent consultations. Make sure time is blocked on your calendar for research meetings, reading, thinking, and writing, and treat these time blocks as mandatory meetings. And share your research progress with your clinical team, to help them understand the value and importance of both sides of your professional life. The precedents set in the first few months of the job will be hard to alter later, and will be more rigid determinants of the realities of the job than the paper contract signed in advance.

272

S.E. Regenbogen

23.6 Summary The task of finding a first faculty job in academic surgery can be daunting, but a few guiding principles can keep the process on track. First, go back to your mission statement often. Remind yourself why you chose to pursue this path in life, and why it is important to you and others. There is no one job that would be right for everyone, so the requirement is to find the particular job that is right for you. Second, keep a long view. A career in surgery is likely to last three decades or more, so think about long research arcs and their impact. Choose a line of inquiry that will motivate you to get out of bed in the morning, and feel important enough to counterbalance the very compelling demands of patient care. And finally, take a great opportunity over great money. Mentorship, collaborators, protected time, institutional support, and successful work/life balance will be far greater determinants of success and satisfaction than salary level, start-up funds and other compensation. The process is poorly defined and the stakes are high – the first job will likely set the course for the rest of your academic career. Often the process is undertaken during a time of high mental and physical workload in clinical training, with less time for careful consideration and negotiation than is deserved. The needs and interests of the academic surgeon – challenging clinical work, competition for research funding, and the drive to scholarly discovery – are complex, and often competing. But the opportunity and the privilege to make important contributions to patient care and public health as a surgical health services researcher are very real.

References 1. Skitzki J, Reynolds HL, Delaney CP. Academic university practice: program selection and the interview process. Clin Colon Rectal Surg. 2006;19(03):139–42. 2. Nelson PR. Timeline for promotion/overview of an academic career. In: Chen H, Kao LS, editors. Success in academic surgery. London: Springer; 2011. p. 11–30. 3. Souba WW, Gamelli RL, Lorber MI, Thompson JS, Kron IL, Tompkins RG, et al. Strategies for success in academic surgery. Surgery. 1995;117(1):90–5. 4. Staveley-O’Carroll K, Pan M, Meier A, Han D, McFadden D, Souba W. Developing the young academic surgeon. J Surg Res. 2005;128(2):238–42. Epub 2005/10/26. 5. Kuy S, Greenberg CC, Gusani NJ, Dimick JB, Kao LS, Brasel KJ. Health services research resources for surgeons. J Surg Res. 2011;171(1):e69–73. 6. Ghobrial IM, Laubach JP, Soiffer RJ. Finding the right academic job. Hematol Am Soc Hematol Educ Prog. 2009; 2009(1):729–33. doi:10.1182/asheducation-2009.1.729. 7. Poritz LS. Research in academic colon and rectal surgery: keys to success. Clin Colon Rectal Surg. 2006;19(3):148–55. Epub 2006/08/01. 8. Schulick RD. Young academic surgeons participating in laboratory and translational research. Arch Surg. 2007;142(4):319–20. Epub 2007/04/18. 9. Sosa JA. Choosing, and being, a good mentor. In: Chen H, Kao LS, editors. Success in academic surgery. London: Springer; 2011. p. 169–80.

Chapter 24

Building a Health Services Research Program Samuel R.G. Finlayson

Abstract Successful research related to surgical health services and outcomes requires organized effort and infrastructure. A combination of committed and welltrained investigators, well-managed resources, and a clear mission and vision form the basis for a strong surgical health services research program. Keywords Outcomes research • Health services • Research administration

24.1 Introduction A growing number of surgery departments in academic medical centers are striving to develop capacity in surgical outcomes and health services research (HSR). While some surgeons still harbor the misconception that surgical HSR is something that can be done on nights and weekends with data on a personal computer, there is increasing recognition that to perform genuinely meaningful surgical research related to policy and clinical practice requires substantial commitment and infrastructure. This chapter will outline the important components of surgical HSR programs, and provide suggestions for program building based on the author’s observations and experience leading the Center for Surgery and Public Health at Brigham and Women’s Hospital in Boston, MA.

S.R.G. Finlayson, M.D., M.P.H. () Department of Surgery, University of Utah School of Medicine, 30 N 1900 E, 3B110 SOM, Salt Lake City, UT 84132, USA e-mail: [email protected] J.B. Dimick and C.C. Greenberg (eds.), Success in Academic Surgery: Health Services Research, Success in Academic Surgery, DOI 10.1007/978-1-4471-4718-3__24, © Springer-Verlag London 2014

273

274

S.R.G. Finlayson

24.2 Mission, Vision, and Goals Surgical HSR programs function more effectively when mission-driven. A mission represents the shared purpose of the members of the research group, and guides decision-making. HSR is very broad in scope, and spreading effort and resources across a range of disparate purposes can weaken a research program, just as spreading burning coals can extinguish a fire. Successful surgical HSR programs have typically started with a focus on specific areas – such as quality of care, system innovation, or regional collaboratives – with specific goals in mind. Thoughtfully articulated mission and vision statements can have great value when developing a surgical HSR program. A mission statement typically outlines the program’s aims, identifies the constituencies that the program serves, and describes how the program is uniquely suited to making its intended contribution. In essence, the mission statement describes why the program exists and what it can do. The value of a mission statement is in its ability to guide resource and effort allocation, and align them with specific program goals. The mission statement also articulates the framework within which the program functions. A vision statement describes the ideal to which the program aspires, and is intended to inspire effort toward the program’s objectives. The mission and vision of the program are ideally translated into actionable strategic and tactical goals. Strategic goals describe broadly the successes that the program would like to attain, such as changing practice across a clinical collaborative, achieving high levels of external funding, or creating a strong analytic core. Tactical goals describe more specific, easily measured tasks that lead to the achievement of strategic goals, such as successfully competing for a program grant, or hiring a talented data analyst. A HSR program’s mission, vision, and goals should be periodically revisited and revised based on the successes, failures, and evolving strengths of the program. Successful programs adapt to changing circumstances, both inside the organization (e.g. faculty turnover) and outside of the organization (e.g. new NIH funding opportunities).

24.3 Organization and Governance The organizational structure of a surgical HSR program ideally allows direct control over critical program assets, such as databases, servers, and the work priorities of key personnel. Program leaders who are just starting out and have limited funding are often forced to rely on data sources belonging to other groups, analyzed by programmers who report primarily to investigators outside the program. This is a situation that should be escaped as soon as possible. When funding permits (often internal or departmental at first), program leaders would be well advised to find low cost sources of data (see research platforms below), and to hire part or full time analytic and/or project support. Program leaders without their own data and

24 Building a Health Services Research Program

275

personnel reporting directly to them will often find their projects at the end of someone else’s queue. In addition to a core of faculty investigators, ideally a surgical HSR program will include administrators, project managers, masters-level data analysts, and doctorallevel biostatistical support. The number of individuals within these categories depends on the size of the program’s research portfolio. Specialized talent – such as systems engineers, decision analysts, or clinical coordinators – may also be needed depending on the type of research pursued within the program. Where there are adequate resources and faculty mentorship, research trainees add significantly to a surgical HSR program, bringing energy as well as some level of programmatic and analytic support. Program leaders should strategically direct the allocation of assets and resources, the most valuable of which is the time of the program personnel. Leaders should carefully oversee how much time analysts, project managers, and other personnel spend on each research project, and direct these personnel to give greatest priority to the work that is most in line with the mission and goals of the program. Because the Center for Surgery and Public Health was not only engaged in externally-funded independent research but also served as a faculty resource for the Department of Surgery, projects were placed into one of three categories: (1) externally funded, (2) center-sponsored, and (3) department-sponsored. Externally funded projects were always given high priority, as both an obligation to the sponsor, and also to provide the best possible service to investigators who bring funding to the center. Center-sponsored projects represented research that the center supported as an investment. Typically, this was work in which committed junior faculty were engaged under the mentorship of program leadership, and was supported in the context of a well thought-out research plan that we believed would lead to external funding at the center. Department-sponsored research in The Center for Surgery and Public Health represented research support obligations that the Department of Surgery had made to specific faculty (e.g. as part of a hiring package), and was therefore internally funded. To manage our analytic assets, we reviewed the project portfolio of each analyst and project manager weekly, including percent time spent on each project for the preceding weeks, and provided guidance when necessary to ensure that the highest priority projects got the attention they needed. (see Table 24.1) In practical terms, available resources (including time) need to be seen as investments, and must be directed toward maximizing “returns” most in line with the strategic and tactical goals of the program.

24.4 Challenges in Faculty Development Faculty development is a critical part of any surgical HSR program, but is often difficult. Surgeons seeking to become HSR experts face several challenges; including time constraints, monetary disincentives, limited fellowship training opportunities, and a paucity of senior faculty mentors.

Project name PE prevention OR simulation Elderly vascular dz DoD free flap Appropriate consults OR safety Lung resection Thyroid cancer

Principal investigator Jones Jones Wong Sargeant Kirby Tanner Hernandez Abdul Support category Externally funded Externally funded Externally funded Externally funded Center-sponsored Center-sponsored Department-sponsored Department-sponsored

Table 24.1 Sample project management worksheet Funded effort 20 % 20 % 15 % 30 % n/a n/a n/a n/a

Past week actual effort (%) 18 22 10 27 5 3 5 10

1 week ago (%) 22 12 19 30 8 0 6 3

2 weeks ago (%) 20 22 17 18 4 10 4 5

Past 4 week average (%) 21 19 16 28 4 3 4 5

276 S.R.G. Finlayson

24 Building a Health Services Research Program

277

Compared to other specialties in medicine, surgery has historically demanded greater time commitment to clinical activity, making it difficult for surgeons to find time for research. This challenge is compounded by the common misperception among many academic surgeons that HSR requires little time to perform. Surgical HSR program leaders must convince surgeons who want to pursue HSR (and their clinical leaders) to make the significant time investment required to build a successful surgical HSR portfolio. Financial disincentives to focus on HSR are also particularly difficult for surgeons to overcome. The gap between research funding for salary support and what a surgeon typically earns with clinical activity is larger than for less generously remunerated specialties: the maximum NIH salary cap at the time of this writing is US$178,700 compared to a typical academic surgeon salary of approximately US$300,000. When 0.20 FTE salary support is awarded in an NIH grant, this would typically cover only about 12 % of a typical surgeon salary. In the current tight market for research funding, doing clinical work is by far the easiest way for a surgeon to achieve targets for income generation, whether determined institutionally or personally. Compared to other specialties in medicine, surgery has fewer research fellowship opportunities in HSR, and the ones that do exist are highly competitive. Furthermore, the training paradigm in surgery is challenging for those pursuing a research career. Trainees in the medical specialties can enter research fellowships directly following residency training, which provides a smooth transition to an academic career. In contrast, surgeons typically complete HSR fellowships between years for residency, followed by 2–6 years of further surgical training, after which they remain on a steep clinical learning curve while building a surgical practice. This situation makes a strong start in surgical research difficult for young surgeons. Surgical HSR is a relatively young field, but is growing rapidly. Many academic surgery departments aim to recruit surgeon researchers who can contribute to the increasingly active policy dialogue related to healthcare delivery. Because of this, surgical HSR experts are in high demand. While this environment provides many opportunities for faculty to assume positions of leadership and responsibility early in their careers, it also points to a paucity of senior mentors for the increasing number of trainees and junior faculty who want to pursue surgical HSR.

24.5 Creating a “Faculty Pipeline” Given the above challenges, creating an effective “pipeline” for faculty is of utmost importance to a surgical HSR program. The essential components of a faculty development pipeline include initial protected research time, training programs and mentorship, access to active intellectual forums, and grants administration support. As with any field of surgical research, a young surgeon entering an academic faculty position needs protected research time. Since very few new faculty have mentored training grants at the start, this protection typically comes from academic

278

S.R.G. Finlayson

departments in the form of salary guarantees and reduced clinical volume targets. Leaders of surgical HSR programs must work closely with their departmental leaders to ensure that newly recruited research faculty obtain adequate support. While not absolutely mandatory, formal training in research methods and clinical investigation is extraordinarily helpful to young surgeon investigators. Presently, many junior faculty with an interest in health services research will have completed a masters degree or research fellowship during residence training. If not, there are an increasingly large number of university settings that offer participation in parttime or short-term programs that teach the fundamentals of health services research. Such programs can help fast-track the development of junior faculty. Mentoring of junior faculty is perhaps the most critical part of growing and developing a successful surgical HSR program. While junior faculty may have strong analytic skills and bring valuable perspectives and ideas to their work, they often do not know how to leverage these assets to advance their research careers. Faculty mentors not only help junior faculty develop hypotheses and design research, but also help them set realistic career goals, such as funding milestones and academic rank advancement, and help them develop important networks for academic engagement and collaboration. Mentoring along the typical research trajectory for a junior faculty member includes helping them find seed funding for pilot studies early on, and mentored career development funding when possible (e.g. NIH K level funding). These early mentored efforts ideally provide the basis for applications for higher levels of external grant funding sufficient to support time devoted to independent research (e.g. NIH R level funding or grants from major foundations). Faculty development is also facilitated by infrastructure to support grant writing. Competitive grant funders such as the NIH typically provide complicated and daunting grant application processes that can be very time-consuming. The complexity of grant applications is a nearly prohibitive technical and psychological barrier to a surgeon if unaided. To the extent that surgeon investigators can focus exclusively on developing and writing the “science” of a grant application, they will be more eager and able to pursue them. From the perspective of research program building, support of grant writing to make the process as easy as possible for faculty should be seen as an investment in the program, because if more grant applications are submitted, there will likely be more funding to support the overall research program. In addition to individual mentoring and grant-writing support, effective faculty development also requires access to intellectual forums where research ideas, methods, and interpretation of results are exposed to colleagues’ constructive critique, and fertilized with new ideas, study designs, and analytic approaches. These forums can take a variety of forms, such as regular research meetings, “work-in-progress” seminars, and interdisciplinary conferences. To build strong intellectual forums for a surgical HSR program, one cannot underestimate the importance of creating effective working space, preferably with offices clustered together (and actually used by the program members), with formal and informal gathering areas. Not

24 Building a Health Services Research Program

279

only does dedicated space create important opportunities for interaction between researchers, it also provides an escape from competing obligations (e.g. clinical work). While constructing a “virtual” research center is attractive conceptually in a space-constrained academic center, this model is typically unsuccessful in achieving all the goals of a surgical HSR program. Finally, research fellowship training programs should also be viewed as an important part of the faculty pipeline. Research fellows not only bring energy to the research environment and extend the capacity of faculty researchers, but also become the “farm team” for future faculty recruitment efforts.

24.6 Creating a “Funding Pipeline” Early on, surgical HSR programs typically depend on infrastructure investment from their hosting institutions (e.g. hospital, department, or occasionally other sources of public or private grant funding). Eventually, however, programs are expected to stand on their own financially, or at least incur only minimal ongoing local funding to support the services the program provides its host institution. To gain independence financially requires considerable focus on the grant production pipeline. A funding pipeline starts of course with motivated investigators who have wellarticulated and meaningful research plans and teams well-suited to carrying them out. However, to move from this to a complete, competitive grant application requires a lot more. At a minimum, administrative support for the “pre-award” process is critically important. This function requires fastidious attention to detail, good communication with internal regulatory bodies (e.g. human subjects review committees, human resources departments), and familiarity with the unique requirements set by a diverse group of funding organizations. As competition for limited federal funding increases, investigators are increasingly looking toward alternative sources of funding. These sources of funding include private foundations and philanthropy, as well as industry partnerships. Healthcare payers and large employers have also emerged as a source of research funding when they partner with health services researchers to better understand and improve quality of care [1].

24.7 Research Platforms Successful surgical health services research requires an appropriate research platform. A useful research platform can take any of a number of forms, including electronic datasets, clinical settings, care networks or collaboratives, or specialized analytic tools. Building a surgical health services program requires identifying a

280

S.R.G. Finlayson

set of research platforms that are appropriate to the kinds of questions investigators want to answer, and that are within reach given the resources available.

24.7.1 Electronic Datasets Electronic datasets are a common, often easily accessible research platform, and have traditionally been the backbone of surgical health services research. Because of electronic data’s accessibility and ease of use, many successful health services researchers have started their careers using large datasets to establish a research niche, and have then graduated to other research platforms as their work has gained momentum and funding has been garnered. The least expensive electronic data is administrative data, which is typically drawn from sources not originally intended for research purposes, such as hospital discharge abstracts or data created for billing. Many such electronic datasets are now organized and augmented to facilitate their use as research tools, such as the Healthcare Cost and Utilization Project (HCUP) datasets made available through the federal Agency for Healthcare Research and Quality [2]. In addition to administrative data, clinical data sources have become more numerous with the growth and success of clinical registries such as the American College of Surgeons National Surgical Quality Improvement Program (ACS-NSQIP) [3] and the Society of Thoracic Surgeons (STS) National Database [4]. Other specialized sources of electronic data that are now frequently used in health services research include the US Census, physician workforce data from the American Medical Association, geographical data, and publically reported hospital quality measures.

24.7.2 “Local Labs” Surgical health services researchers also find research platforms within their own clinical settings, using local communities, hospital clinics, and operating rooms to examine, measure, and intervene in surgical care delivery. Local clinical settings as a research platform have particular utility in patient-centered outcomes research, wherein measures and outcomes often require new information derived from direct patient contact. The same can be said for clinical decision science, preference assessment, surveys, implementation science, and qualitative research. Local clinical settings are also useful to measure the effect of system design innovations on surgical outcomes and quality. Research using a hospital as a local lab for research has the potential advantage of providing direct benefit to the hospital, which may be willing to provide funds to support it. Simulation centers are also a kind of local lab that can be used as a controlled setting to examine the provider behavior component of surgical care delivery.

24 Building a Health Services Research Program

281

24.7.3 Networks and Collaboratives Provider networks and collaboratives created to measure and improve quality and value of care are increasingly used as platforms for health services research. Examples include the Northern New England Cardiovascular Disease Study Group [5], the Michigan Surgical Quality Collaborative [1], and the Surgical Clinical Outcomes Assessment Program [6] in Washington State, all of which have resulted in important discoveries related to surgical care delivery.

24.7.4 Analytic Tools A large body of health services research has been built on the use of analytic tools that synthesize information drawn largely from medical literature, including most notably meta-analysis, decision analysis, and cost-effectiveness analysis. Expertise in these methods can serve as a platform for health services research to refine research questions and motivate further work using other research platforms.

24.8 Collaborations Building a health services research program is greatly facilitated by the ability to create collaborations across research disciplines. While a single investigator with a little statistical knowledge, an electronic dataset, and a personal computer can write and publish plenty of papers, the most meaningful health services research typically draws from a range of collaborations with other talented investigators who bring a variety of skills and knowledge to bear on the targeted research questions. While collaboration in surgical health services research once meant finding other surgeon investigators to join in a project, many successful health services research programs have discovered significant benefit in working side-by-side with nonsurgeon health services researchers who provide different perspectives, suggest alternate study designs and analytic methods, and provide important opportunities for junior investigators to find research mentorship. As surgical health services research has become more sophisticated, surgical investigators have benefited from finding collaborators across a very broad range of expertise, including economists, psychologists, anthropologists, sociologists, systems engineers, and experts in informatics, biostatistics, qualitative methods, management, and health policy. In summary, building a surgical health services program requires carefully surveying the local academic landscape to identify what resources are available in terms of potential funding, research platforms, and expertise, then creating a clear vision with goals that are well aligned with the inventory of assets available. Building a successful program will then require identifying and cultivating faculty

282

S.R.G. Finlayson

talent, building bridges to collaborators and mentors with valuable expertise, investing available funding and resources judiciously to maximize their return, and identifying and pursuing potential internal and external funding sources through a carefully constructed and meaningful research agenda. Ultimately, success in the endeavor comes to those who work hard, collaborate well, genuinely care about the questions their research aims to answer, and are prepared to benefit from good providence when it comes their way.

References 1. Birkmeyer NO, Share D, Campbell DA, et al. Partnering with payers to improve surgical quality: the Michigan plan. Surgery. 2005;138:815–20. 2. “Healthcare cost and utilization project” AHRQ website. Retrieved September 26, 2013. http:// www.ahrq.gov/research/data/hcup/index.html 3. “ACS-NSQIP” American College of Surgeons website. Retrieved September 26, 2013. http:// site.acsnsqip.org/ 4. “STS National Database” Society for Thoracic Surgery website. Retrieved September 26, 2013. http://www.sts.org/national-database 5. O’Conner GT, Plume SK, Olmstead EM, et al. A regional prospective study of in-hospital mortality associated with coronary artery bypass grafting. JAMA. 1991;266:803–9. 6. Flum DR, Fisher N, Thompson J, et al. Washington State’s approach to variability in surgical processes/outcomes: Surgical Clinical Outcomes Assessment Program. Surgery. 2005;138: 821–8.

Index

A ACA. See Affordable Care Act (ACA) ACS-NSQIP. See American College of Surgeons National Surgical Quality Improvement Program (ACS-NSQIP) Administrative data AMA Masterfile and AHA annual survey, 162 vs. clinical registries, 159–160 collection, 66, 67 comorbidity risk, 158–159 complications, 159 description, 160 disparities research, 57 electronic, 280 HCUP, 161–162 hospital billing, 135 limitations, 77 Marketscan, 162 Medicare, 161 P4P and QI programs, 134 SEER-Medicare, 161 self-identification bias, 58 sources, 66, 67 strengths and weaknesses, 157–158 surveys, 97 veterans affairs hospitals, 161 Affordable Care Act (ACA) CER, 10 Medicare, 39 PCORI, 14 physician and hospital Payment Reform, 39 The Agency for Healthcare Research and Quality (AHRQ) CER, 13 HCUP, 161, 280 HSR, 4

MMA, 12 research synthesis, 18 AHA. See American Hospital Association (AHA) AHRQ. See The Agency for Healthcare Research and Quality (AHRQ) AMA. See American Medical Association (AMA) American College of Surgeons National Surgical Quality Improvement Program (ACS-NSQIP), 66, 68–70, 72, 163, 164 American Hospital Association (AHA) administrative data, 160 and AMA, 162 American Medical Association (AMA) administrative databases, 160 and AHA annual survey, 162 American Recovery and Reinvestment Act (ARRA), 12 Awareness, 55, 56, 133

C Cancer Intervention and Surveillance Modeling Network (CISNET), 203 Causal inference with observational data cancer treatment, 170, 171 comparative effectiveness research, 181 confounding, 170 health services and comparative effectiveness research, 168 investigators, 181 IV, 178–180 limitations, 180–181 measurement bias, 170, 172 multivariate regression, 175 patients and providers information, 169

J.B. Dimick and C.C. Greenberg (eds.), Success in Academic Surgery: Health Services Research, Success in Academic Surgery, DOI 10.1007/978-1-4471-4718-3, © Springer-Verlag London 2014

283

284 Causal inference with observational data (cont.) population-based cohorts, 181 propensity score analysis, 176–178 selection bias, 169–170 statistical analysis, 172 stratification/restriction, 175–176 study design, 172–174 treatment decisions, 169 unmeasured confounding, 172 CEA. See Cost-effectiveness analysis (CEA) Center for Medicare and Medicaid Services (CMS) bariatric surgery, 40 COEs, 40–41 costs, 120 impacts, 40–41 measures, 66 organizations, 73 outcomes research and innovation funding opportunities, 257 policies, 40 surgical quality measurement, 73 Centers of excellence (COEs) health care systems, 120 Medicare’s policy, 134 CER. See Comparative effectiveness research (CER) CISNET. See Cancer Intervention and Surveillance Modeling Network (CISNET) Climate vs. culture, 102–104 safety, 104–106 Clinical registries ACS NSQIP, 164 vs. administrative data, 157–159 description, 162–163 NCDB, 164–165 NTDB, 164 STS national database, 163 CMS. See Center for Medicare and Medicaid Services (CMS) COEs. See Centers of excellence (COEs) Collaborative quality improvement (CQI) advantages, 136 communication, 145–146 decision making, 145 definition, 135 description, 133–134 efforts, payers and policy makers, 134 implementation interventions, 86, 88–91, 146 initiatives, 134–135 leadership, 146

Index MHA Keystone Center’s Intensive Care Unit (ICU) Project., 136–137 NNECDSG, 138–139 partnering with payers, 141–144 P4P programs, 134 regional, 145 relationship, 147 SCOAP, 139–140 selection, data, 146 sharing knowledge and collective learning, 147 strategies, 147 Comparative effectiveness research (CER) ACA, 10, 13 ARRA, 12 bloodletting, 11 CEA, 200 complication, 11 costs, 12 data sources, 11 decision analysis, 19 evidence-based information, 10 FCC, 10, 13 health outcomes, 9–10 IOM, 11 MMA, 12 observational studies, 16–18 patient perspectives, 6 PCOR, 14 public and private sector, 13 quality of care, 98 randomized trials, 15–16 RCT, 168 research synthesis, 18–19 stakeholder, 14 timeline, 12, 13 Cost-effectiveness analysis (CEA), 200, 202, 203 CQI. See Collaborative quality improvement (CQI)

D Databases, outcomes research administrative data, 157–162 clinical registries (see Clinical registries) description, 153–154 efficacy vs. effectiveness, 156 exploring volume-outcome relationship, 156–157 health care policy, 157 studying diagnoses, procedures/complications, 154–155 temporal trends, 155–156

Index Decision aids IPDAS criteria, 199 patient, 198 RCTs, 199 video and paper, 198 Disease risk scores (DRS), 17–18 E Electronic datasets, 280 Electronic health records (EHR), 73 Enthusiasm and energy, 242 faculty mentor, 232, 240 shared, 241 and student, 232 F Faculty pipeline components, 277 infrastructure, 278 intellectual forums, 278–279 mentoring, junior faculty, 278 protection, 277–278 research fellowship, 278, 279 FCC. See Federal Coordinating Council (FCC) Federal Coordinating Council (FCC), 10, 13 FOA. See Funding opportunity announcement (FOA) Funding opportunities competition and frustration, 263 computers and statistical software, 256 “crunching data”, 256 data collection system, 256–257 dissemination and implementation research, 99 FOA, 261 foundations, 258 government-based funding sources, 257–258 institutions, 258 NIH (see National Institutes of Health (NIH)) surgical societies, 258 VA system, 258 Funding opportunity announcement (FOA), 261 Funding pipeline, 279 G Gain momentum, surgery carotid endarterectomy, 25 hip fracture, 26

285 HRRs, 24–26 Medicare claims, 25 operations types, 25 Gender confounding, 172 control, 78 disparities, 49 patient demographics, 156 self-identification bias, 58 and SES, 52 study design, 185 surgical care and outcomes, 47

H HCUP. See Healthcare Cost and Utilization Project (HCUP) Healthcare Cost and Utilization Project (HCUP) AHRQ, 161 electronic datasets, 280 SID, 162 Health care quality measurements accessibility and timeliness, 66 accurate, 73 assessment, 65, 66, 73 attribution, 69 CMS, 73 controversial initiatives, 73 cost, 63 data source and collection process, 66–67 development programs, 70–72 EHR, 73 framework, 64 internal and external, 69 late 1980s, 64 mechanisms, 70 outcomes, 65 outlier status, 69 patient safety, 63 processes, 65 reliability, 68–69 reporting, 69, 70 risk-adjustment, 67–68 strategies, 65–66 structure, 64–65 Health policy, surgery amputation rate, 32, 33 Dartmouth Atlas, 32, 34–35 econometrics, 39 evidence-based policymaking, 38 hospital payment reform, 39–41 mortality rates, 32, 33 natural experiments, 32, 38

286 Health policy, surgery (cont.) NSQIP, 32 outcomes, 38 policy implementation, 32, 34–35 quasi-experimental study design, 38 revascularizations, 32, 33 spillover effects, 39 surgical training and workforce, 42–43 vascular care, 32 Health services research (HSR) program AcademyHealth, 4 AHRQ, 4 analytic tools, 281 career development, 7–8 center- and department-sponsored projects, 275 clinical research, 3 collaborations, 281–282 description, 273 electronic datasets, 280 experts, 275 externally funded projects, 275 faculty development, 275, 277 faculty pipeline, 277–279 funding pipeline, 279 health care policy, 5 local labs, 280 meta-analysis, 7 mission and vision, 274 networks and collaboratives, 281 NSQIP, 6 outcomes research, 4–5 program leaders, 274–275 project management worksheet, 275, 276 quality measurement, 6 randomized clinical trials, 5 Hospital payment reform ACA, 39 bariatric surgery, 40 CMS policies, 40 COEs, 41 coronary artery bypass grafting, 41 cost curve, 39 “differences-in-differences” design, 38–39 econometrics, 44 federal register, 44 HQID, 41 Medicare’s Hospital Value Based Purchasing Program, 41 morbidity and mortality, 41 regression models, 44 risk, 39–40 Hospital Quality Incentive Demonstration (HQID), 41

Index Hospital referral regions (HRRs) carotid endarterectomy, 25 hospital service area, 24–25 lumbar fusion, 29 HQID. See Hospital Quality Incentive Demonstration (HQID) HRRs. See Hospital referral regions (HRRs) HSR. See Health services research (HSR) HSR program. See Health services research (HSR) program Human factors engineering analysis, cognitive psychology, 122 education, 127 and medical care providers, 123 optimize system performance, 122 principles, 122 surgical outcomes, 128

I Implementation science and QI clinical practice, 86 CQI and TQM, 88–91 and dissemination research, 86–88 Donabedian’s model, 86 efficacy vs. effectiveness, 91, 93 health care, 85 internal validity, 96 interventions, 88 measurements, 96–97 outcomes, 86 patient safety practices, 96 PDCA cycles, 88–90 PRECIS, 91, 92 publishable and fundable, 97–98 quality of care, 98 quantitative vs. qualitative research, 95–96 RCTs, 93–95 resources, 98–99 strong vs. weak recommendations, 91 structure and process, 86 theories, 86 tools and methods, 86 Institute of Medicine (IOM) complication, 11 grand opportunity, 12 healthcare organizations, 104 implementation outcomes, 96 quality care, 85 Instrumental variables (IV) analysis, 154, 174, 178 cancer outcomes research studies, 179

Index elimination, confounding, 178 geographic variation, 179 mortality rates, 180 outcomes, cancer therapy, 179 and RCT, 178 risk–adjustment methods, 179–180 thrombolytic medication, 178 treatment assignment, 178 unmeasured confounders, 18 values, 18 Insurance companies, 258 employees and covered dependents, 162 HIPAA, 164 Medicare, 165 status, 51, 52 IOM. See Institute of Medicine (IOM)

J Job search academic centers, 268 career goals, 266 challenges, 268–269 clinical practice, 266 crafting, 270–271 early decision, 268 faculty position, 265–266 grant funding, 266 knowledge, experience and tools, 267 networking and referrals, 267 postgraduate surgical training, 265 research, 268 screening, 269–270 “stretch goals”, 267 surgical departments, 268 training, 267

L Local quality improvement CABG outcomes, 77 collaboratives, 76 interventions, 78 issues, 76 low performing hospitals, 77 mortality data, 76 multidisciplinary site visits, 76 national public reporting, 77 pay for performance and participation, 77–78 perioperative patient care, 77

287 M Mail and internet surveys, 209 Marketscan Commercial Claims and Encounter (CCAE) database, 162 Masters in Public Health (MPH), 252 Medical decision-making research advantages, 201 assumptions, 200 CEA, 200, 202, 203 CISNET, 203 clinical vs. preference sensitive decisions, 196 computer, 200 fundamentals, 202 measures, 196 opportunities, surgeon, 199–200 outcomes, 196–197 patient decision aids, 198–199 procedures, 198 RCTs, 199 recursive Markov process, 203 statistical analysis, 202 strength, 200 structure, decision-tree, 202 surgical literature, 201 trade-offs, 200 Medical student mentorship administrative and technical support, 234 approachable, flexible and adaptable, 233 career development, 233, 234 clinical medicine, 232 database studies, 234 domains, 235 enthusiasm, 232 fellowships, 236 intelligence and dedication, 235 potential mentors, 233 productivity, 236–238 risks and benefits, 236 skill development, 236 undergraduate students, 233–234 Medicare Modernization Act (MMA), 12 Mentorship, outcomes research active clinicians, 245 authorship, 244 career development, 246 cartographer, 240 conferences and meetings, 245 cost-effectiveness, 242 definitions, 240 excellence and competency, 241 flexibility, 245

288 Mentorship, outcomes research (cont.) function and responsibilities, 240–241 funding, 244 grantsmanship, 244 guidance and nurturing, 240 multi-tasking, 246 online resources, 246 peer-to-peer, 242 publication, 243–244 scattered/shot-gun approach, 243 senior and junior, 242 technical expertise, 241, 243 Meta-analysis, clinical exercise abstraction, 186 analytic tools, 281 bias, 189–190 changes, 184 Cochrane risk, 190 confidence intervals (CI), 186, 188 criticism, 191 data and outcomes, 186 description, 184–185 fixed-and random effects model, 188 forest plot, pooled random effects, 186–188 hernia recurrence, 189, 190 inclusion and exclusion criteria, 185 individual patient data, 188 limitations, 189, 192 literature search, 186, 192 observational/RCT data, 185, 188–189 OR/RR calculation, 186 radiation, 185 research question, 185 risk difference, 188 stent treatments, 188 strengths and weakness, 192, 193 subgroup, 191–192 surgical disparities, 49 synthesis, 18, 19, 185 Michigan Health and Hospital Association (MHA) Keystone Center’s Intensive Care Unit (ICU) Project, 136–137 MMA. See Medicare Modernization Act (MMA) MPH. See Masters in Public Health (MPH)

N National Cancer Data Base (NCDB), 163–165 National Institutes of Health (NIH) budget, 257 career development, 263 database, 263

Index disease-specific institutes, 260–261 F awards, 259 and FOA, 261 fund faculty research, 236 granting, 277 investments and activities, 12, 13 K awards, 259–260 P and U awards, 260 R awards, 260 R-grants labeled “PA-xx”, 261 R01 paylines, 261–262 and TIDIRH, 99 trans-disciplinary funding opportunity, 98 violations, 262 National Surgical Quality Improvement Program (NSQIP), 142 postoperative complications, 166 quality measurement, 6, 32 SCR, 164 VA system, 72, 120 National Trauma Data Bank (NTDB), 163, 164 NCDB. See National Cancer Data Base (NCDB) NIH. See National Institutes of Health (NIH) NNECDSG. See Northern New England Cardiovascular Disease Study Group (NNECDSG) Nonresponse bias, surveys, 211–212 Northern New England Cardiovascular Disease Study Group (NNECDSG), 138–139 NSQIP. See National Surgical Quality Improvement Program (NSQIP) NTDB. See National Trauma Data Bank (NTDB)

O Observational studies, CER DRS, 17 exclusion restriction, 18 instrumental variable analysis, 18 internal validity, 16 medical students, 231–238 mentorship, 239–246 propensity score analysis, 17 regression analysis, 17 Operating room (OR). See Patient safety, OR Organizational culture in surgery changes, 105–106 climate vs. culture, 102–104 health care, 101 high reliability, 102 measuring safety culture, 105

Index resilience, 107 risk, 106 safety climate, 104–105 surgeons, 106 Outcomes research AHRQ, 4 databases (see Databases, outcomes research) funding opportunities, 255–263 PCOR, 4 quality of care, 4 surgical fellowship (see Surgical research fellowship) traditional clinical research, 5

P Patient-centered care access to care, 112 centeredness, 111 communication and decision making, 115–116 consultation, 112–114 coordination and integration, 111 description, 110–111 family and friends involvement, 112 fear, anxiety and emotional support, 112 health status, 114 implementation, 115 individuals, 114–115 information, communication and education, 112 paradigm shift, 109, 110 patient’s values and preferences, 111 physical comfort, 112 principles, 111 quality, 109–110 RCTs, 112 relationship, 115 researchers and policymakers, 115 satisfaction, 114 stakeholders, 115 structure, process and outcomes, 109 transition and continuity, 112 Patient Centered Outcomes Research (PCOR) ACA, 14 FCC, 14 stakeholders, 14 Patient safety, OR ACS NSQIP, 120 adverse events, 120 AHRQ, 120 cost, 120 databases, 120

289 meta-analysis, 186 methodologies, 120 point-of-care research (see Point-of-care research) policy and quality development, 120 quality and safety, 119 retrospective analysis, 121 and SEIPS model, 121 voluminous literature, 120 Pay for Performance (P4P) programs, 77, 134, 135 PCOR. See Patient Centered Outcomes Research (PCOR) PDCA cycles. See Plan-Do-Check-Act (PDCA) cycles Phone surveys, 209 Plan-Do-Check-Act (PDCA) cycles, 88–90 Point-of-care research collaborators identification, 122 conceptual framework, 122–123 data analysis, 125–126 data types, 123 description, 121 design, 130 disseminate outcomes and implementation, 126 human factors, cardiac surgery, 128–129 optimal presentation, outcomes, 126 organizational differences, learning, 127–128 patients protection, unsafe system, 127 protocol-driven communication, cardiac bypass, 129–130 sampling strategies, 125 sources, 124–125 P4P programs. See Pay for Performance (P4P) programs Pragmatic-Explanatory Continuum Indicator Summary (PRECIS), 91, 92 Productivity abstracts, presentations and manuscripts, 236–237 active participation, 237 clinical, 269, 271 curriculum vitae (CV), 238 enthusiasm, 232 expectations, 271 and funding, 266 level of, 234 local and regional research meetings, 237 and mentee’s work, 240 presentation skills, 237–238 stakeholders, 79 student effort, 236

290 Propensity score analysis CER, 17 datasets and tools, 7 description, 176 distribution, 17 DRS, 17 generation, 176 lung cancer-specific survival and Cox models, 177–178 observational studies, 174 SEER-Medicare, 180 strategies, balancing patient characteristics, 176–177 Prostate-specific antigen (PSA), 21–22, 27 PSA. See Prostate-specific antigen (PSA)

Q QI. See Quality improvement (QI) Qualitative research methods confirmability, 225 credibility, 224 dependability, 225 description, 217–218, 226 face to face surveys, 208 formulation, 218–219 interaction, people, 218 vs. quantitative, 94–96 sampling strategy, 218–220 social construction, 218 structured collection (see Structured vs. unstructured data collection) tools, 5 transferability, 225–226 Quality improvement (QI) ACS-NSQIP, 6, 32, 54, 56, 66, 154, 164, 280 administrative data, 67 clinical registries, 159, 162–163 collaborative (see Collaborative quality improvement (CQI)) efforts, 64, 65, 70, 80 health care (see Health care quality measurements) hospital, 69 implementation and dissemination, 79 and implementation science (see Implementation science and QI) local (see Local quality improvement) measures, 80 patient demographics, 55 regional, 6

Index selective referral, 75–76 stakeholders, 79–80 variations, 75

R Race control, 78 disparities, 48–49 patient demographics, 156 quantitative investigation, 219 SEER registry data, 177 treatment decisions, 169 Randomized controlled trials (RCTs) and CER, 168 cluster, 16 decision aids, 199 efficacy, intervention, 15 implementation science, 93–95 instrumental variables, 178 medical decision-making research, 199 meta-analysis, 185, 188–189 novel therapy development, 15 patient-centered care, 112 pragmatic and adaptive trials, 16 treatment heterogeneity, 16

S Safety climate changes, 105–106 literature, manufacturing industries, 104 patients, 104 vs. safety culture, 102–104 surveys, 104–105 Safety culture changes, 105–106 vs. climate, 102–104 measures, 105 SCOAP. See Surgical Care and Outcomes Assessment Program (SCOAP) SCR. See Surgical Clinical Reviewer (SCR) Screening jobs advantages and limitations, 269 department’s mission and goals, 269–270 detailed information, 269 faculty profiles, 269 mentorship, 270 physical space, 270 SEER-Medicare database. See Surveillance, Epidemiology and End Results (SEER)-Medicare database

Index SID. See State Inpatient Database (SID) Society of Thoracic Surgeons (STS), 163 Socioeconomic status (SES), 51 Spine surgery variations discrepant regional outcomes, 50 lumbar fusion, 28 Medicare reimbursements, 28, 29 National Institutes of Health, 29 orthopedic procedures, 29, 30 randomized trials, 28 real-time, 28 State Inpatient Database (SID), 157, 160, 162 Structured vs. unstructured data collection coding, 223–224 ethnography, 222 feedback, 222–223 focus groups, 220–221 open ended interviews, 221 qualitative investigation, 224 STS. See Society of Thoracic Surgeons (STS) Surgery variations analytic methods, 27 complex systematic processes, 22 end of life care, 27 gain momentum, 23–27 health policy, 32–35 hospital services areas, 23, 24 nationalized health care, 22 pancreatic pseudocyst, 21 patterns of care, 23 PSA test, 21–22 spine surgery variation, 28–29 tonsillectomy rates, 23 vascular surgery, 29–31 Surgical Care and Outcomes Assessment Program (SCOAP), 139–140 Surgical Clinical Reviewer (SCR), 164 Surgical disparities age, 49 appropriate care provision, 52 awareness, 55 causes of, 48 description, 47–48 disabilities, 50 disadvantages, 57–58 gender, 49 geographic location, 50 hospital volume, 53–54 identification, 55 implementation, 56

291 insurance, 51 monitoring, 56 patient case-mix, 54–55 patient-centered clinical research, 55 policy work, 48 prior health condition and comorbidities, 52 race, 48–49 SES, 51 solution and interventions, 56 surgeon choice, 53 timely access, 51–53 Surgical research fellowship advice, 251 education, 252 experiments, 250 institution, 250 personality, 250 productivity, 249 projects/time management, 253 re-entering residency, 254 setting goals, 251–252 taboo topic, 253–254 visibility, 251 Surgical training, health policy evidence based policymaking, 43 mortality, 42 perforation, 43 population-based studies, 42 resident and faculty, 42 supply sensitive care, 43 Surveillance, Epidemiology and End Results (SEER)-Medicare database, 161 Surveys acquiescence, 207 analytic methods, 214 bias, 206–207 data analysis, 213 description, 205–206 design, 214 determination, 206 face to face, 208 information, 206 Likert scales, 212–213 mail and internet, 209 missing data, 207 mixed mode, 210 nonresponse bias, 211–212 phone, 209 pilot testing, 207 population, 207–208 qualitative and qualitative techniques, 206

292 Surveys (cont.) reporting, 213–214 response burden and rates, 210–211 sensitive subjects, 206 software packages, 213 validation, 206 Systematic reviews, clinical exercise changes, 184 data collection, 183 limitations, 183 literature, 183 and non-systematic, 183 rigorous and quantitative summaries, 183

Index Systems engineering and human factors, 122 SEIPS model, 121, 123

V Vascular surgery variations endovascular revolution, 29–31 Medicare patients, 30, 31 orthopedic procedures, 30 revascularization, 29, 30 Veteran Affairs (VA) system, 72, 120