This textbook guides the reader on how to undertake high-quality literature reviews, from traditional narrative to proto
141 76 7MB
English Pages 621 [602] Year 2022
Table of contents :
Preface
Synopsis
What Are Literature Reviews About?
Narrative Overviews
Narrative Reviews
Systematic Literature Reviews
Systematic Reviews
Literature Reviews for Empirical Studies
Contents
About the Authors
About the Contributors
List of Figures
List of Tables
List of Boxes
1 Introduction
1.1 What Are [Systematic] Literature Reviews About?
1.2 Brief History of Systematic Reviews
1.3 Addressing Variety in Literature Reviews
1.3.1 Variety in Types of Literature Reviews
1.3.2 Specific Approaches for Disciplines
1.4 Scope and Outline of Book
1.4.1 What Does the Book Cover
1.4.2 What Does the Book Not Cover
1.4.3 Part I: Basic Concepts for Effective Literature Reviews
1.4.4 Part II: Quantitative Analysis and Synthesis
1.4.5 Part III: Qualitative Analysis and Synthesis
1.4.6 Part IV: Reporting Literature Reviews
1.4.7 Epilogue
1.5 How to Use This Book?
1.5.1 Type of Study and Use
1.5.2 Structure of Chapters
1.5.3 Informed Choices for Undertaking and Assessing Literature Reviews
1.6 Key Points for Book
References
Part I Basic Concepts for Effective Literature Reviews
2 Objectives and Positioning of [Systematic] Literature Reviews
2.1 Literature Sensitivity and Professional Knowledge
2.2 Research Processes and Literature Reviews
2.2.1 Basic Research Processes
2.2.2 Where Literature is Used and How in Empirical Studies
2.2.3 Solving Practical Problems and Use of Literature
2.3 Evaluating Literature
2.3.1 Difference Between Critical Evaluation and Critiquing
2.3.2 Elements of Appraisals
2.4 Synthesising Literature
2.5 Archetypes of Literature Reviews
2.5.1 Narrative Overviews
2.5.2 Narrative Reviews
2.5.3 Systematic Literature Reviews
2.5.4 Systematic Reviews
2.5.5 Umbrella Reviews
2.6 Propositional Logic and Literature Reviews
2.6.1 Brief Introduction to Propositional Logic
2.6.2 Overview of Literature Reviews Related to Propositional Logic
2.7 Research Paradigms and Literature Reviews
2.8 Avoiding Plagiarism
2.9 Key Points
2.10 How to …?
2.10.1 …Select an Appropriate Approach to a Literature Review?
2.10.2 … Evaluate Which Studies Are of Interest for a Review
2.10.3 … Avoid Plagiarism
2.10.4 …Write a Literature Review
References
3 Quality of Literature Reviews
3.1 Quality Based on Fitness for Purpose as Frame of Reference for Literature Reviews
3.2 Quality Based on Archetypes of Literature Reviews
3.2.1 Narrative Overview
3.2.2 Narrative Review
3.2.3 Systematic Literature Review
3.2.4 Systematic Review
3.2.5 Relating the Archetypes to Quality of the Review
3.2.6 Criteria for Systematic Literature Reviews and Systematic Reviews
3.3 Associating Research Paradigms with Literature Reviews
3.3.1 Distinguishing Between Idiographic and Nomothetic Research
3.3.2 Background to Research Paradigms
3.3.3 (Post)Positivist Perspectives and Archetypes of Literature Reviews
3.3.4 Interpretivist Perspectives and Archetypes of Literature Reviews
3.3.5 Extending the Interpretivist Approach to Academic Mastery
3.3.6 Hermeneutic Perspectives on Literature Reviews
3.4 Quality by Effectively Linking Literature Reviews to Empirical Studies
3.5 Quality by Evidencing Engagement with Consulted Studies
3.5.1 Close Reading
3.5.2 Achieving Rigour for Citations-In-Text
3.6 Key Points
3.7 How to …
3.7.1 … Evaluate the Quality of a Literature Review
3.7.2 … Achieve a Higher Degree of Accuracy in Literature Reviews
3.7.3 … Write Literature Reviews
References
4 Developing Review Questions
4.1 Differentiating Research Objectives and Review Questions
4.2 What Are Good Questions for a Literature Review?
4.2.1 Guiding Collection and Analysis of Literature
4.2.2 Single Guiding Question as Point of Reference
4.2.3 Narrowly Focused
4.2.4 Clarity of Good Review Questions
4.2.5 Assuming Possibility of Different Outcomes or Opinions
4.2.6 Building on Sound Assumptions
4.3 Starting Points for Review Questions
4.3.1 From Generic to Specific and Vice Versa
4.3.2 Establishing Causation
4.3.3 Testing and Falsification of Theories
4.3.4 Considering the Spatial Dimension
4.3.5 Considering the Temporal Dimension
4.3.6 Artefacts, Methods and Tools
4.3.7 Setting and Evaluating Policy
4.3.8 Investigating Assumptions
4.3.9 Rigour and Reliability
4.4 Population-Intervention-[Comparison]-Outcome
4.4.1 Root Format Population-Intervention-Outcome
4.4.2 Format Population-Intervention-Comparison-Outcome and Other Variants
4.4.3 Enhancing Population-Intervention-Outcome by Using Models
4.4.4 Using Theories and Laws of Observed Regularities
4.5 Scoping Study for Review Questions
4.5.1 Scoping Review as Protocol-Driven Literature Review
4.5.2 Scoping Study for Topical Mapping
4.6 Key Points
4.7 How to …?
4.7.1 … Develop Review Questions That Are Worthwhile
4.7.2 … Conduct and Time a Scoping Review or Scoping Study
4.7.3 … Write a Literature Review
References
5 Search Strategies for [Systematic] Literature Reviews
5.1 Criteria for Retrieving Publications
5.2 Types of Sources
5.2.1 Primary Sources
5.2.2 Secondary Sources
5.2.3 Tertiary Sources
5.2.4 Propositional Writings
5.2.5 Professional Publications
5.2.6 Other Types of Publications and Sources
5.3 Iterative Search Strategy
5.4 Keywords, Controlled Vocabulary and Database Search Strategies
5.4.1 Defining Keywords as Search Terms
5.4.2 Search Operators
5.4.3 Field Searching
5.4.4 Controlled Vocabulary
5.4.5 Selecting Databases
5.4.6 Using Databases and Search Engines
5.4.7 On Using Publishers’ Databases
5.5 Other Search Strategies
5.5.1 Hand Searching
5.5.2 Snowballing
5.5.3 Backward and Forward Searching
5.5.4 Root and Branch Searches
5.5.5 Citation Pearl Growing
5.5.6 Expert Panels
5.6 Enhancing Effectiveness of Search Strategies
5.6.1 Determining Saturation When Searching
5.6.2 Trading Off Specificity and Sensitivity
5.6.3 Complementary Search Strategies
5.6.4 Expert Panels
5.7 Grey Literature
5.8 Undertaking the Search and Recording Results
5.9 Scoping Reviews and Scoping Studies for Search Strategy
5.10 Key Points
5.11 How to …?
5.11.1 … Set an Appropriate Search Strategy
5.11.2 ... Determine Which Type of Sources to Consider
5.11.3 ... Write a Literature Review
References
6 Setting Inclusion and Exclusion Criteria
6.1 Filtering for Relevant Sources
6.2 Inclusion Criteria
6.2.1 Content
6.2.2 Date of Publication
6.2.3 Language
6.2.4 Types of Source
6.2.5 Research Design and Method
6.2.6 Sampling
6.2.7 Data Analysis
6.3 Exclusion Criteria
6.4 Quality of Evidence
6.4.1 Hierarchy of Evidence Pyramid—Systematic Reviews
6.4.2 GRADE: Grading of Recommendations, Assessment, Development and Evaluations
6.4.3 Quality of Evidence for Qualitative Analysis and Synthesis
6.5 Determining Level of Evidence for Other Archetypes
6.6 Scoping Reviews and Scoping Studies for Setting Inclusion and Exclusion Criteria
6.7 Key Points
6.8 How to …?
6.8.1 … Set Effective Inclusion and Exclusion Criteria
6.8.2 … Evaluate the Quality of Evidence in Studies
6.8.3 … Write a Literature Review
References
Part II Quantitative Analysis and Synthesis
7 Principles of Meta-Analysis
7.1 Introduction to Meta-Analysis
7.2 Basics of Meta-Analysis
7.2.1 Conditions for Applicability
7.2.2 Use of Data in Meta-Analysis
7.2.3 Process for Meta-Analysis
7.3 Identifying and Coding Variables and Attributes for Inclusion in Meta-Analysis
7.4 Models for Calculating Effect Sizes
7.4.1 Common-Effect Models
7.4.2 Fixed-Effects Models
7.4.3 Random-Effects Models
7.4.4 Mixed-Effects Models
7.4.5 Meta-Regression
7.5 Common Measures for Effect Size Used in Meta-Analysis
7.5.1 Standardised Mean Difference
7.5.2 Weighted Mean Difference
7.5.3 Odds Ratio and Risk Ratio
7.5.4 Correlation Coefficients, Proportions and Standardised Gain Scores (Change Scores)
7.6 Methods for Meta-Analysis
7.7 Determining Between-Study Heterogeneity
7.7.1 Distinguishing Types of Between-Study Heterogeneity
7.7.2 Determining Statistical Heterogeneity
7.7.3 Forest Plot
7.7.4 Funnel Plot
7.7.5 L’Abbé Plot
7.7.6 (Galbraith) Radial Plot
7.8 Publication Bias and Sensitivity Analysis
7.8.1 Assessing Publication Bias
7.8.2 Sensitivity Analysis
7.9 Assessing Quality of Meta-Analysis
7.10 Key Points
7.11 How to …?
7.11.1 … Choose the Most Appropriate Statistical Method for Meta-Analysis
7.11.2 … Write A Literature Review
References
8 Meta-Analysis in Action: The Cochrane Collaboration
8.1 Background
8.2 Cochrane Collaboration
8.3 Cochrane Reviews
8.3.1 Question Choice
8.3.2 Identifying Relevant Studies
8.3.3 Analysing Results
8.3.4 Potential Sources of Bias
8.3.5 Biases in the Systematic Review Process
8.3.6 Analysing Data (Including Meta-Analysis)
8.3.7 Alternatives to Meta-Analysis
8.3.8 Summary of Findings
8.3.9 Drawing Conclusions
8.4 Example of a Cochrane Review
8.4.1 Organised in Patient (Stroke Unit) Care
8.5 Advantages of Cochrane Reviews
8.5.1 What Features Indicate that Cochrane Reviews Are Reliable?
8.6 Challenges for the Cochrane Collaboration
8.7 Key Points
References
9 Other Quantitative Methods
9.1 Network Meta-Analysis: Making More Than One Comparison
9.1.1 Key Considerations in Network Meta-Analysis
9.1.2 Example of Network Meta-Analysis
9.2 Best-Evidence Synthesis
9.2.1 Conducting Best-Evidence Synthesis
9.2.2 Example: Exercise Prescription for Treating Shoulder Pathology
9.2.3 Example: Students’ Learning with Effective Learning Techniques
9.2.4 Strengths and Weaknesses of Best-Evidence Synthesis
9.3 Qualitative Modelling of Causal Relationships
9.4 Bibliometric Analysis
9.4.1 Purpose of Bibliometric Analysis
9.4.2 Conducting Bibliometric Analysis
9.4.3 Caveats of Bibliometric Analysis
9.4.4 Complementing Bibliometric Analysis
9.5 Systematic Quantitative Literature Reviews
9.5.1 What is It?
9.5.2 It is Systematic
9.5.3 Quantifying Literature
9.5.4 Methods for Analysis and Displaying Results
9.5.5 Conclusions
9.6 Key Points
References
Part III Qualitative Analysis and Synthesis
10 Principles of Qualitative Synthesis
10.1 Differentiating Qualitative Synthesis from Quantitative Synthesis
10.2 Purpose of Qualitative Synthesis
10.2.1 Aggregative Synthesis
10.2.2 Interpretive Synthesis
10.2.3 Supplementing Quantitative Synthesis
10.3 Selecting a Qualitative Review Approach
10.3.1 Overview of Methods for Qualitative Synthesis
10.3.2 Selecting a Method for Qualitative Synthesis
10.4 Qualitative Synthesis of Findings
10.4.1 Extraction of Data and Findings
10.4.2 Synthesising Findings
10.4.3 Software and Online Tools for Qualitative Synthesis
10.5 Quality Assessment for Qualitative Synthesis
10.5.1 Assessment of Quality of Studies in Qualitative Synthesis
10.5.2 Appraising Qualitative Synthesis
10.6 Key Points
10.7 How to …?
10.7.1 … Undertake a Qualitative Synthesis
10.7.2 … Select the Most Appropriate Method for Qualitative Synthesis
10.7.3 … Write a Literature Review
References
11 Methods for Qualitative Analysis and Synthesis
11.1 Meta-Summary: Aggregating Findings and Preparing for Meta-Synthesis
11.1.1 Some Caution Before Starting
11.1.2 Qualitative Meta-Summary
11.1.3 Some Dos and Don’ts
11.1.3.1 Do Not Forget to Translate
11.1.3.2 Do Aim at Synthesis
11.1.3.3 Do Not Exclude Too Much
11.1.3.4 Look at What Is NOT There
11.1.3.5 Look out for ‘No-Findings’ Reports
11.1.3.6 Be Cognisant of ‘Topical surveys’
11.1.3.7 Do Not Only Look at the Results Section
11.1.4 Overview of Meta-Synthesis Versus Meta-aggregation
11.2 Thematic Analysis and Thematic Synthesis
11.2.1 Thematic Analysis in Overview
11.2.2 Thematic Analysis Step-by-Step
11.2.3 Advantages and Disadvantages of Thematic Analysis as Qualitative Synthesis Approach
11.3 Meta-Ethnography
11.3.1 Seven Step Approach
11.3.2 Pros and Cons of Meta-Ethnography
11.4 Grounded Theory
11.4.1 Basics of Grounded Theory Methodology
11.4.2 Grounded Theory Meta-Synthesis
11.4.3 Limitations
11.5 Discourse Analysis
11.5.1 Application of Discourse Analysis to Systematic Literature Reviews
11.5.2 Trustworthiness
11.6 Key Points
References
12 Combining Quantitative and Qualitative Syntheses
12.1 Purpose of Mixed-Methods Synthesis
12.2 Approaches to Designs of Mixed-Methods Synthesis
12.3 Methods for Integrated Mixed-Methods Syntheses
12.3.1 Sequential Exploratory Method
12.3.2 Sequential Explanatory Method
12.3.3 Convergent Qualitative Method
12.3.4 Convergent Quantitative Method
12.4 Quality Criteria for Mixed-Methods Syntheses
12.5 Other Methods for Diversity in Mixed-Methods Synthesis
12.6 Key Points
12.7 How to …?
12.7.1 ... Select the Most Appropriate Method for Mixed-Methods Synthesis
12.7.2 ... Write a Literature Review
References
Part IV Presentation and Writing
13 Reporting Standards for Literature Reviews
13.1 Relevance of Adequate Reporting
13.2 Reporting Methods and Tools for Extraction of Data, Analysis and Synthesis
13.2.1 Methods
13.2.2 Protocols
13.2.3 Spreadsheets and Other Tools
13.2.4 Online Tools
13.3 Making Results and Findings Accessible
13.3.1 Annotated Bibliography
13.3.2 Chronological Reporting
13.3.3 Thematic Reporting
13.3.4 Tabulation and Visualisation
13.4 Formats for Reporting
13.4.1 PRISMA Reporting
13.4.2 Other Formats for Reporting
13.4.3 Domain-Specific Reporting Formats
13.5 What Not to be Reported
13.6 Key Points
13.7 How to …?
13.7.1 ... Report What is Needed
13.7.2 ... Identify What Needs No Reporting
13.7.3 ... Write a Literature Review
References
14 Data Management and Repositories for Literature Reviews
14.1 Data Management for Literature Reviews
14.2 Processes for Data Management
14.3 Repositories for Systematic Reviews
14.3.1 The Campbell Collaboration
14.3.2 Cochrane Library
14.3.3 Environmental Evidence Library of Evidence Syntheses
14.3.4 EPPI Centre
14.3.5 JBI
14.3.6 PROSPERO
14.3.7 Systematic Review Data Repository
14.4 Other Repositories
14.4.1 Academic Libraries
14.4.2 Scholarly Journals
14.4.3 Funding Councils
14.5 Preparing Data Sets for Repositories
14.6 Literature Reviews and Data Repositories Enabling Open Science
14.7 Key Points
References
15 Writing Literature Reviews
15.1 What Makes Writing Literature Reviews Different
15.2 Timing the Start of Writing
15.3 Getting to the Writing of Literature Reviews
15.4 Process of Writing
15.5 Gaining Proficiency in Writing
15.6 Improving Text Structure: Revisiting the Basics
15.6.1 Paragraphs
15.6.2 Sentences
15.6.3 Punctuation
15.6.4 Use of Aspirates
15.6.5 Vocabulary: Some Common Pitfalls
15.7 Avoiding Some Common Errors and Inaccuracies When Writing
15.7.1 How Can Accuracy in Writing Be Improved?
15.7.2 How Can Efficiency in Writing Be Improved?
15.7.3 Avoiding Common Mistakes
15.7.4 Added Value of Peer Review
15.8 Key Points
References
16 Publishing Literature Reviews
16.1 When is a Literature Review Worth Publishing
16.1.1 ‘So What?’ Test
16.1.2 Articulation of Contribution to Scholarly Knowledge
16.2 Selecting Journals
16.3 Final Preparation of Manuscript
16.3.1 Taking Advantage from Feedback by Peers
16.3.2 Making It Happen
16.3.3 Letter to the Editor
16.4 Processes of Submission and Review
16.4.1 Submission and Initial Check
16.4.2 Review by Journal
16.4.3 Decision by Editor
16.4.4 Processes After Acceptance
16.4.5 Special Issues
16.4.6 Publication Fees and Open Access
16.5 Review Processes
16.5.1 Double-Blind Peer Review Process
16.5.2 Single-Blind Peer Review Process
16.5.3 Open Peer Review Processes
16.5.4 Other Types of Review Processes
16.6 Revising the Manuscript
16.7 Key Points
References
Epilogue
17 The Dissenting Voices
17.1 Literature Reviews as Activation
17.2 And Caution Nurtures Craftsmanship
17.3 Balancing the Approach to Literature Reviews for Those Starting Out
References
Correction to: Developing Review Questions
Correction to: Chapter 4 in: R. Dekkers et al., Making Literature Reviews Work: A Multidisciplinary Guide to Systematic Approaches, https://doi.org/10.1007/978-3-030-90025-0_4
Appendix A Generic, Specialist Databases and Search Engines
A.1. Generic Databases and Search Engines
A.2. Business and Management Studies
A.3. Economics
A.4. Education
A.5. Engineering
A.6. Geography and Environmental Science
A.7. Information Systems
A.8. Law
A.9. Life Sciences
A.10. Medicine and Nursing
A.11. Psychology
A.12. Social Sciences
Appendix B List of Journals Publishing Only Literature Reviews
B.1. Generic
B.2. Business and Management Studies
B.3. Chemistry
B.4. Computer Science
B.5. Earth and Planetary Sciences
B.6. Economics
B.7. Education
B.8. Engineering
B.9. Environmental Science
B.10. Healthcare and Life Sciences
B.11. Psychology
B.12. Medicine and Nursing
B.13. Pharmacology
B.14. Physics
B.15. Social Sciences
Index
Rob Dekkers Lindsey Carey Peter Langhorne
Making Literature Reviews Work: A Multidisciplinary Guide to Systematic Approaches
Making Literature Reviews Work: A Multidisciplinary Guide to Systematic Approaches
Rob Dekkers Lindsey Carey Peter Langhorne •
•
Making Literature Reviews Work: A Multidisciplinary Guide to Systematic Approaches
123
Rob Dekkers University of Glasgow Glasgow, UK
Lindsey Carey Glasgow Caledonian University Glasgow, UK
Peter Langhorne University of Glasgow Glasgow, UK
ISBN 978-3-030-90024-3 ISBN 978-3-030-90025-0 https://doi.org/10.1007/978-3-030-90025-0
(eBook)
© Springer Nature Switzerland AG 2022, corrected publication 2022 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
What is Your Love by Gordana Durovic What is your love, storm or sunny day let me do systematic review of all our days research interest of all our past and future objectives I will set clear to solve my dilemma, is all this pain worth just the crumbles from your plate? where is the gap and how could we fill it allow me to ask you this question I want an answer that you did not give me before where is the purpose of our life you speak about searching the engine that is sufficient for our case yet the methods are confusing, but you must divide qualitative and quantitative ways you want to know what are the key words when you search for happiness of our days yet my criteria are not exclusive for all your crimes I will be judging carefully, my heart is trembling bird betrayal that you’ve done I will exclude inclusion of sadness that you caused to me must be there yet the question is clear or not, I need to confirm am I searching for gold in moon light?
am I just lost and blind from love? where is the original source of all our passion when did you turn and seek it somewhere else where is the time when happiness rides away you want to comment future before we conclude what is behind maybe writing with endnote in LATEX could help you there is not a single word to describe my love and yet you are trying to outsource our case Poem created and presented during the course ‘Making [Systematic] Literature Reviews Work for You’, Radboud Summer School, Nijmegen, 6–10 August 2018.
To Nil for the idea, support and patience during the journey longer than expected. To Jean, Kate and Tom because writing always eats into family time.
Preface
The saying goes ‘in every end, there is also a beginning’, which was certainly true for this book. In our case, it was the abrupt ending of delivering the ‘Workshop [Systematic] Literature Reviews’, co-organised by the Adam Smith Business School and the Graduate School of Glasgow Caledonian University since May 2014. However, this led to the delivery team coming together to write a book so that we could still reach out to those who were seeking advice on how to do literature reviews better. Often, those who are looking for guidance, whether it be a traditional literature review or protocol-driven literature review, are in the early stages of defining their research project, and suggestions for literature reviews often are also conversations about the appropriateness of topics for study. Therefore, it is of paramount importance to support students, doctoral students and early career researchers in their ventures to engage with literature reviews, and we hope this book will provide this inspiration; for those who were on waiting lists, planned to participate later or could not attend for other reasons the workshops, summer schools and seminars, we hope that this book provides encouragement and guidance for using systematic approaches to literature reviews of any kind and for any discipline. The writing of the book builds on the seminars and sessions organised by the delivery team, but shortly after agreeing to take on writing this guide, we also realised that the manuscript would be a journey for us and others that worked with us. The questions and issues that were raised by participants during the workshops became the starting point for our journey. And, we identified complementary matter pertaining to literature reviews to realise guidance needed for all kinds of studies. This turned it into a quest to search for comprehensiveness and to ideate further concepts that would support scholars of any kind in their engagement with literature reviews. We are thankful for the participants of the workshops for their frank and unrestrained manner when asking questions and raising issues, which inspired this book; we hope that this book provides more permanent and extended guidance for using systematic approaches to literature reviews. Further inspiration was drawn from the enthusiastic engagement and support of administrative staff during the provision of the workshops on systematic literature ix
x
Preface
reviews. At the Adam Smith Business School, these were: Tanja Bozic, Christine Haley, Shannon Keany, Victoria Livett-Frater, Donna McGrady, Kelly Park (now: Kelly Connelly), Sheena Phillips, Jackie Williamson and Lorna Wilson. Furthermore, Dickon Copsey from the College of Social Sciences at the University of Glasgow helped the workshops reach out to a wider audience. At Glasgow Caledonian University, we were supported by Ivana Covic, Karen Coyle, Alexandra Ingvarsson, Louise Lowe, Grace Poulter and Hilary Tennant. Without their support for all kinds of arrangements, we could not have concentrated as much on the content of the workshops which has transformed into the basis for the current book. In addition, the staff at Radboud Summer School, particularly Paula Haarhuis, Lisette te Hennepe, Mariken Jacobs, Sana Koulij, Wessel Meijer and Alice Nieboer, and at Windesheim University of Applied Sciences, Ronald de Boer and Michiel Steenman, provided additional opportunities to deliver similar workshops; this helped us to understand the needs of a wider global audience. Also, those that delivered worked examples of their work, often in the context of their doctoral study, we are grateful to. Among them are: Alexis Barlow, Graeme Donald, Johannes Hinckeldeyn, Erkan Kabak, Marianna Koukou, Eduardo Gomes Salgado, Elizabeth Williamson and Qijun Zhou. As they will notice, some of their efforts and writings have found their way into the book. Furthermore, the workshops and the book benefited indirectly, and a few times directly, from the special sessions that were delivered. These have resulted in specific methods and topics being integrated into the book, albeit sometimes contributed by others. Niels Cadée, Jon Godwin, Lisa Kidd, Maggie Lawrence, Grace Poulter and Margaret Sutherland we want to thank for delivering these sessions. Over and above, we are grateful to peers and contributors that helped us to form this book: Pat Baxter, Margaret Bearman, Elisabeth Bergdahl, Dianna R. Dekelaita-Mullet, Parastou Donyai, Clare Morrison, Judi Petticrew, Katie Robinson, Lynn Irvine, Lisa Kidd, Catherine Pickering and Harm-Jan Steenhuis. For some contributors, it has taken a little longer before their efforts are published, and some had to rush getting involved in later stages. They gracefully engaged with us when editing their contributions. We are delighted being the first publishing a poem by Gordana Durović; we sincerely hope this will mark her poems reaching out to a broader audience, other than scholars. And, we should not forget to acknowledge Chris Hicks for giving us a worthwhile hint about self-plagiarism. We are also grateful to Eli Moody and Optimal Workshop for permission to use illustrations we found on their websites. We should not forget the representatives of Springer Nature, particularly Anthony Doyle, for their patience. It took a little longer to overcome all the hurdles that we unexpectedly encountered on the way, but we hope the longer than anticipated manuscript is worth the waiting. The comments by Ron Alexander from the Royal Alexandra Hospital on the acabadraba of systematic reviews were ringing in our ears when overcoming our final hurdle.
Preface
xi
And, last but not least, our families and friends have been supportive and patient, when at times we seemed to be caught up in the world of literature reviews. We are indebted to Rob’s wife who suggested the idea for the book after the workshops ended. May we have forgotten to acknowledge somebody who contributed to this book, directly or indirectly, accept this as our apology. Glasgow, UK August 2021
Lindsey Carey Rob Dekkers Peter Langhorne
Synopsis
Literature reviews are essential to scholarly work, and therefore, they should be conducted in an appropriate way to assure relevant content, their quality and accessibility through adequate presentation.
What Are Literature Reviews About? In generic terms, literature reviews aim at making scholarly knowledge accessible, identifying gaps in this knowledge, determining research questions for empirical research and informing evidence-based interventions, policies, practices and treatments. Such requires the appraisal of appropriate sources and studies. However, this is not necessarily critiquing other works, but evaluating from the perspective of review questions that have been clarified and set. In addition, the analysis of thoughts and evidence in sources leads to synthesis of findings, often based on themes. This means that literature reviews go beyond summarising extant literature, and for this reason, they offer a different perspective on scholarly knowledge and relevance to practice depending on the more specific aims of each review. Since the emphasis on aims may vary across literature reviews, there are many methods and frameworks available to conduct a literature review. The application of methods and frameworks depends on the specific review, the discipline and whether the contribution is more scholarly or more practical oriented. Reviews can be written as part of an empirical study, providing insight and direction for the research method and data collection, or as stand-alone investigation, making a scholarly contribution and guiding evidence-based interventions, policies, practices and treatments. The current book presents methods and frameworks across disciplines and aims of literature reviews; however, authors of literature reviews need to consider which approach to a literature review is more appropriate, for which the book also provides direction and tips.
xiii
xiv
Synopsis
Keeping the aims of literature reviews, and the broad range of methods and practices for reviews within disciplines in mind, there are four archetypes of literature reviews that are briefly described in the next sections (see pages overleaf): • Narrative overviews. • Narrative reviews. • Systematic literature reviews. • Systematic reviews. Each description of the archetypes that follows in the synopsis refers to relevant chapters and sections in this book; note that these pointers to more information are only highlighting the most relevant chapters and sections.
Defining Research Objectives
Purpose of Narrative Overview
Defining Topics Redefining topics
Topic 1
Topic 2
Topic n
Searching for Studies
Searching for Studies
Searching for Studies
Appraisal of Studies
Appraisal of Studies
Appraisal of Studies
Additional Search • Clarification • Further Arguments • Constructs
Synthesis of Findings
Fig. S.1 Symbolic overview of processes for narrative overviews. Typically, a narrative is divided into topics for which literature is appraised. The discussion of these topics leads to synthesised findings. The dotted line around ‘defining research objectives’ indicates that not necessarily these reviews are written with a specific empirical study in mind. The dotted line around ‘defining topics’ indicates that not always the topics are defined explicitly.
Synopsis
xv
Narrative Overviews The first archetype—narrative overviews—aims at providing direction for an empirical study that follows or communicating a specific perspective; see Figure S.1 for a symbolic representation. In general, they do not need to contain an extensive, critical discussion of arguments, which could also cover counterarguments, and they tend to focus on a point being made. Therefore, the selection of sources for arguments presented may be biased. However, it is not necessarily the point of a narrative overview that all literature is covered and that it presents an unbiased view on a topic. In addition to being found as introducing empirical research, narrative reviews are found as commentaries, editorials, propositional writings and research notes. Actually, also introductory sections for the other three archetypes (narrative reviews, systematic literature reviews and systematic reviews) often resemble narrative overviews. More Information • Section 2.2 describes a method for appraisal of sources that can be used for a narrative overview. • Section 2.5 provides more information about this type of review and compares it with the three other archetypes. • Section 3.2 contains a more detailed description of the process for this archetype and sets out how its quality is determined. • Section 3.5 gives advice how to improve the appraisal of studies and the quality of citing references. • Section 4.2 addresses how questions for a review can be developed using five recommendations. • Section 5.3 provides more detail on search strategies that are appropriate for this type of review; to some extent, also the search strategy outlined in Section 5.4 can be of use. • Chapters 15 and 16 contain tips for how to write and to publish these studies. NOTE For narrative overviews, there are limited conventions for how to write them. It also depends strongly on the domain and the type of publication what is expected from this type of literature review. Therefore, the processes depicted in Figure S.1 are symbolic for how steps of this review can look like.
xvi
Synopsis Defining Research Objectives
Aim of Narrative Review
Defining Themes
Setting Review Questions
Retrieval of Sources
Theme 1
Theme 2
Theme n
Appraisal of Studies
Appraisal of Studies
Appraisal of Studies
Synthesis of Findings
Fig. S.2 Overview of processes for narrative reviews. Differing from a narrative overview as shown in Figure S.1, in a narrative review more attention is paid to setting appropriate review questions and retrieval of sources. For a narrative review, a topic is divided into themes for which literature is appraised. The literature review can be enhanced by defining themes, derived from conceptualisations, theories, laws of observed regularities and methods. The discussion of these themes leads to synthesised findings. The dotted line around ‘defining research objectives’ indicates that not necessarily these reviews are written with a specific empirical study in mind.
Synopsis
xvii
Narrative Reviews A narrative review could be a stand-alone study or preparation for an empirical study (or multiple studies); see Figure S.2 for an outline of its processes. When paving the way for an empirical study, this type of literature review considers a (broad) range of literature, resulting in a rationale for the research method and data collection. In the case of a narrative review being a stand-alone study the outcome is an agenda for further research, including gaps in knowledge and deficiencies of current studies that need to be overcome. The contribution to scholarly knowledge in both cases should be explicitly stated. A narrative review differs from a narrative overview in the extent of literature consulted and appraised. In a narrative review, all sources for key constructs and, in case of development of scholarly thought, all key sources for a critical appraisal of sources are included. This also leads to counterarguments to be discussed in the literature review. To this purpose, deviant studies and other points of view are also taken in and evaluated on their contribution to the topic at hand. By being extensive in covering key constructs and key works, and allowing for counterarguments and adverse perspectives, narrative reviews are less biased than narrative overviews. More Information • Section 2.2 describes a method for appraisal of sources that could be used for a narrative review. • Sections 2.5 and 3.2 provide more information about the narrative review and compare it with the three other archetypes. • Section 3.3 contains guidelines for the quality of literature reviews. • Section 4.2 addresses how questions for a review can be developed using five criteria. • Part of Section 4.4 focuses on the use of conceptualisations, laws of observed regularities, models, theories, etc., that could lead to more direction in the review in the form of themes that need to be addressed; these are the dashed boxes and arrows in Figure S.2. • Sections 5.3 and 5.4 provide more detail on search strategies that are appropriate for this type of review. • Chapters 15 and 16 contain tips for how to write and to publish these studies. NOTE There are limited canonical approaches for how to write narrative reviews. Depending on the domain and the purpose of the publication, what is expected from this type of literature review varies. Sections 3.3 and 6.5 may inspire further improvement of the analysis of retrieved studies and synthesis of findings. The processes depicted in Figure S.2 are indicating how undertaken this review can be structured.
xviii
Synopsis
Informed by Scoping Study
Defining Research Objectives
Purpose of Systematic Literature Review
Conceptualisations, Theories and Models
Setting Review Questions
Forming Themes for Literature Review
Keywords and Databases
Inclusion and Exclusion Criteria
Retrieval of Studies
Quantitative Analysis of Studies
Qualitative Analysis of Studies
Synthesis of Findings
Fig. S.3 Overview of processes for systematic literature reviews. In this archetype of literature reviews (systematic literature reviews), setting appropriate review questions is the starting point. The review questions guide the retrieval of relevant sources, based on keywords, inclusion and exclusion criteria. The literature review can be enhanced by defining themes, derived from conceptualisations, theories, laws of observed regularities, models, etc. The discussion of findings from the analysis leads to synthesised findings; for the analysis, it could be essential to separately consider quantitative and qualitative studies. The dotted line around ‘defining research objectives’ indicates that not necessarily these reviews are written with a specific empirical study in mind. Furthermore, a scoping study might precede the systematic literature review.
Synopsis
xix
Systematic Literature Reviews Systematic literature reviews are normally stand-alone studies, but could also be part of empirical studies; see Figure S.3 for an overview. As a stand-alone study they are qualitative in nature, though models, formulae and laws of observed regularities may be part of the study. Their purpose is to appraise literature, assess rigour and reliability across studies and find out gaps and deficiencies of existing studies. Processes for systematic literature review are similar to those of systematic reviews, but differ due to their qualitative nature and focus on scholarly knowledge. The qualitative nature is often caused by the heterogeneity of studies that are retrieved. And, different from systematic reviews, they are rarely written with practitioners in mind. Systematic literature reviews can be preceded by scoping studies; see Figure S.3. Scoping studies aim at identifying which literature is available on a topic, how to identify studies and how to narrow down a topic for a review. More Information • Section 2.2 describes a method for appraisal of sources that could be used for systematic literature reviews. • Sections 2.5 and 3.2 provide more information about the archetype systematic literature reviews and compares it with the three other archetypes. • Section 3.3 contains guidelines for the quality of literature reviews, building on research paradigms. • Section 3.5 gives advice how to improve the appraisal of studies and the quality of citing references. • Section 4.2 addresses how questions for a review can be developed using five recommendations. • Part of Section 4.4 focuses on the use of conceptualisations, laws of observed regularities, models and theories that could direct themes for the review; these are the dashed boxes and arrows in Figure S.3. • Sections 4.5, 5.9 and 6.6 detail the purpose of scoping studies, which could precede undertaking a systematic literature review. • Sections 5.4, 5.5 and 5.6 provide more detail on search strategies that are appropriate for this type of review. • Sections 6.2 and 6.3 give guidance for setting inclusion and exclusion criteria. • Section 6.5 offers how the quality of evidence can be evaluated. • Sections 9.2, 9.3, 9.4 and 9.5 and Chapters 10 and 11 present methods for the analysis and synthesis, and some of them are quantitatively oriented. • Chapter 13 sets forth points for when reporting these reviews. • Chapter 14 informs about research data management for reviews. • Chapters 15 and 16 contain tips for writing and publishing these studies.
Synopsis
Informed by Scoping Review or Scoping Study
xx
Context of Systematic Review
Using Logic Models
Developing Review Questions
Detailed Topics or Themes for Analysis
Keywords and Databases
Inclusion and Exclusion Criteria
Retrieval of Studies
Extraction of Quantitative Data
Extraction of Qualitative Data
Quantitative Analysis of Studies
Qualitative Analysis of Studies
Synthesis of Findings
Fig. S.4 Overview of processes for systematic reviews. For this archetype systematic reviews setting appropriate review questions is the starting point, often following specific formats such as population-intervention-outcome and its variants. A scoping review or study might inform the topic and protocol for the systematic literature review. The review questions guide the retrieval of relevant sources, based on keywords, inclusion and exclusion criteria. The literature review can be enhanced by defining themes, derived from conceptualisations, theories, laws of observed regularities and models. The analysis can be quantitative, such as meta-analysis, or qualitative, when heterogeneity and the nature of studies do not allow numerical analysis; also, it is possible to conduct mixed-methods synthesis. The discussion of findings from the analysis leads to synthesised findings.
Synopsis
xxi
Systematic Reviews Systematic reviews are stand-alone studies, mostly associated with evidence-based interventions, policies, practices and treatments; see Figure S.4 for an overview. They aim at reviewing all evidence whether an intervention, policy or practice is effective towards defined outcomes. Sometimes, scoping studies or scoping reviews are undertaken before systematic reviews; see the dashed line in Figure S.4. Unlike systematic reviews, scoping studies do not typically result in recommendations, even though following protocols akin to systematic reviews. Systematic reviews can be used by practitioners and policymakers to determine which intervention, policy, practice or treatment should be adopted, and if so, under which conditions. In addition, systematic reviews also point to further research to be undertaken. Not only do they consider the effectiveness of interventions and practices, they examine variations across studies and underlying causes or factors and can be theory-driven. Inconclusive points raised or weak reliability found for specific points lead to research agendas. Analysing variations is also found in umbrella reviews, which are systematic reviews of systematic reviews following the same processes and guidelines as systematic reviews. More Information • Sections 2.5 and 3.2 provide more information about the systematic review and compare it with the three other archetypes. • Section 3.3 contains guidelines for the quality of systematic reviews, building on how research paradigms and literature reviews relate. • Section 4.2 addresses how questions for a review can be developed using five recommendations. • Section 4.4 focuses on review questions using the formant population, intervention, outcome and its variants. Also, this section describes the use of conceptualisations, laws of observed regularities, models and theories that could direction the review. dashed boxes and arrows in Figure S.4. • Sections 4.5, 5.9 and 6.6 detail the purpose of scoping studies and scoping reviews, before undertaking a systematic literature review. • Sections 5.4 and 5.6 provide more detail on search strategies that are appropriate for this type of review. • Sections 6.2 and 6.3 give guidance for setting inclusion and exclusion criteria. • Sections 6.4 and 6.5 present how the quality of evidence can be evaluated, with a particular focus on GRADE. • Chapters 7 and 9 detail how to undertake meta-analysis and other quantitative-oriented methods, Chapters 10 and 11 qualitative synthesis and Chapter 12 mixed-methods synthesis. • Chapters 13 and 14 indicate how to report systematic reviews, how to manage data and which repositories can be used. • Chapters 15 and 16 describe the writing and publishing of these reviews.
Actual Data Collection
Designed Data Collection
Research Method
Outcomes of Literature Review • Gaps in Scholarly Knowledge • Amalgamation of Knowledge into – Methods, Frameworks, etc. – Effectiveness of Interventions • Conceptualisations, constructs, etc. for primary data collection • Findings Related to Research Methods • [Limitations and Countermeasures]
(Detailed/ Refined) Research Questions
Literature Review
Data Analysis
Fig. S.5 Connecting and positioning literature reviews to empirical studies. For empirical studies, a literature review precedes the design, the empirical study and the data collection. The literature review in the context of empirical studies has five main outcomes. The first one is that gaps in scholarly knowledge can be identified. This indicates how the empirical study will contribute to filling this gap and advancing scholarly knowledge. The second outcome is the synthesising of current scholarly knowledge into theoretical frameworks, methods and effectiveness of interventions, policies and practices, etc., that can be used during the empirical study. This is related to the third outcome, which is conceptualisations, constructs, perspectives, etc., that will be drawn on during the primary data collection and analysis. A fourth outcome is how specific data, results and findings are related to research methods used in extant literature. These four considerations lead to a possible fifth outcome: which limitations an empirical study may have and what countermeasures are to improve the quality of the study. Literature reviews aiming at informing empirical studies can be of the archetypes narrative overview, narrative review and systematic literature review.
Literature Review • Narrative Overview • Narrative Review • Systematic Literature Review
Disciplines and domains
Research Objectives
Topic Domain
xxii Synopsis
Synopsis
xxiii
Literature Reviews for Empirical Studies Literature reviews inform how empirical studies will be undertaken and are conducted before the design of the research method and the primary data collection; see Figure S.5 for the position of literature reviews in an empirical study. They serve two purposes. The first one is to identify which scholarly knowledge exists and acknowledge how the empirical study will build on it. The second purpose is to inform the design of the empirical study and its data collection. To these two purposes, a literature review has normally five outcomes: • Gaps in scholarly knowledge related to the topic of the empirical study and the suitability of this knowledge for the data collection and analysis. • Amalgamation of scholarly knowledge into theoretical frameworks, methods, interventions, policies, practices, etc., again to be used for the empirical study (but by providing more detail than the previous point). • Conceptualisations, constructs, perspectives, variables, etc., that can be operationalised during the data collection. • Overviews of how findings from previous studies are related to research methods and analysis of results. • Tentative limitations and related countermeasures for the design of the empirical study. These five outcomes ensure that an empirical study can make a tentative, meaningful contribution to scholarly knowledge. The purpose and the related outcomes leave open how the literature review takes place; it can take the form of the archetypes: • Narrative overview, when the emphasis is on justification of the empirical study and the discovery of relevant constructs, perspectives and variables for the data collection. • Narrative review, when the focus is the evaluation of literature with regard to sources that contain key conceptualisations, constructs, perspectives, theories, etc., and relevant sources to a review question in a domain. • Systematic literature review, when the research objective of the empirical study is relatively having a narrow focus. More Information • Section 2.2 describes the generic research process for empirical studies and how literature is part of it. • Sections 2.6 and 2.7 relate the specific content of a literature review to propositional logic and research paradigms. • Section 3.4 details how to effectively link literature reviews to the design and conduct of empirical studies.
xxiv
Synopsis
• The relevant sections and chapters for the three suitable archetypes of literature reviews are found in other sections of the synopsis. • Chapter 13 indicates how to report protocol-driven literature reviews. • Chapter 15 sets out the writing of literature reviews.
Contents
1
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
1 2 3 5 5 7 7 8 9
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
9 11 12 13 14 14 14 17
.... .... ....
18 18 19
Objectives and Positioning of [Systematic] Literature Reviews . . . . 2.1 Literature Sensitivity and Professional Knowledge . . . . . . . . . . 2.2 Research Processes and Literature Reviews . . . . . . . . . . . . . . .
25 26 27
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 What Are [Systematic] Literature Reviews About? . . . . . . 1.2 Brief History of Systematic Reviews . . . . . . . . . . . . . . . . 1.3 Addressing Variety in Literature Reviews . . . . . . . . . . . . . 1.3.1 Variety in Types of Literature Reviews . . . . . . . . 1.3.2 Specific Approaches for Disciplines . . . . . . . . . . . 1.4 Scope and Outline of Book . . . . . . . . . . . . . . . . . . . . . . . 1.4.1 What Does the Book Cover . . . . . . . . . . . . . . . . 1.4.2 What Does the Book Not Cover . . . . . . . . . . . . . 1.4.3 Part I: Basic Concepts for Effective Literature Reviews . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4.4 Part II: Quantitative Analysis and Synthesis . . . . . 1.4.5 Part III: Qualitative Analysis and Synthesis . . . . . 1.4.6 Part IV: Reporting Literature Reviews . . . . . . . . . 1.4.7 Epilogue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.5 How to Use This Book? . . . . . . . . . . . . . . . . . . . . . . . . . 1.5.1 Type of Study and Use . . . . . . . . . . . . . . . . . . . . 1.5.2 Structure of Chapters . . . . . . . . . . . . . . . . . . . . . 1.5.3 Informed Choices for Undertaking and Assessing Literature Reviews . . . . . . . . . . . . . . . . . . . . . . . 1.6 Key Points for Book . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Part I 2
Basic Concepts for Effective Literature Reviews
xxv
xxvi
Contents
2.2.1 2.2.2
Basic Research Processes . . . . . . . . . . . . . . . . . . . Where Literature is Used and How in Empirical Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.3 Solving Practical Problems and Use of Literature . . 2.3 Evaluating Literature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.1 Difference Between Critical Evaluation and Critiquing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.2 Elements of Appraisals . . . . . . . . . . . . . . . . . . . . . 2.4 Synthesising Literature . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5 Archetypes of Literature Reviews . . . . . . . . . . . . . . . . . . . 2.5.1 Narrative Overviews . . . . . . . . . . . . . . . . . . . . . . . 2.5.2 Narrative Reviews . . . . . . . . . . . . . . . . . . . . . . . . 2.5.3 Systematic Literature Reviews . . . . . . . . . . . . . . . . 2.5.4 Systematic Reviews . . . . . . . . . . . . . . . . . . . . . . . 2.5.5 Umbrella Reviews . . . . . . . . . . . . . . . . . . . . . . . . 2.6 Propositional Logic and Literature Reviews . . . . . . . . . . . . 2.6.1 Brief Introduction to Propositional Logic . . . . . . . . 2.6.2 Overview of Literature Reviews Related to Propositional Logic . . . . . . . . . . . . . . . . . . . . . . . 2.7 Research Paradigms and Literature Reviews . . . . . . . . . . . . 2.8 Avoiding Plagiarism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.9 Key Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.10 How to …? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.10.1 … Select an Appropriate Approach to a Literature Review? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.10.2 … Evaluate Which Studies Are of Interest for a Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.10.3 … Avoid Plagiarism . . . . . . . . . . . . . . . . . . . . . . . 2.10.4 … Write a Literature Review . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
...
27
... ... ...
29 32 32
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
33 33 35 36 37 39 40 41 42 42 43
. . . . .
. . . . .
. . . . .
45 46 49 51 53
...
53
. . . .
. . . .
53 54 54 54
..
57
. . . . . . .
. . . . . . .
58 60 60 64 66 69 70
..
73
Quality of Literature Reviews . . . . . . . . . . . . . . . . . . . . . . . . . . . co-authored by Harm-Jan Steenhuis 3.1 Quality Based on Fitness for Purpose as Frame of Reference for Literature Reviews . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Quality Based on Archetypes of Literature Reviews . . . . . . . 3.2.1 Narrative Overview . . . . . . . . . . . . . . . . . . . . . . . . 3.2.2 Narrative Review . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.3 Systematic Literature Review . . . . . . . . . . . . . . . . . 3.2.4 Systematic Review . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.5 Relating the Archetypes to Quality of the Review . . 3.2.6 Criteria for Systematic Literature Reviews and Systematic Reviews . . . . . . . . . . . . . . . . . . . . . . . .
. . . .
Contents
xxvii
3.3
Associating Research Paradigms with Literature Reviews . . . 3.3.1 Distinguishing Between Idiographic and Nomothetic Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.2 Background to Research Paradigms . . . . . . . . . . . . . 3.3.3 (Post)Positivist Perspectives and Archetypes of Literature Reviews . . . . . . . . . . . . . . . . . . . . . . . 3.3.4 Interpretivist Perspectives and Archetypes of Literature Reviews . . . . . . . . . . . . . . . . . . . . . . . 3.3.5 Extending the Interpretivist Approach to Academic Mastery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.6 Hermeneutic Perspectives on Literature Reviews . . . 3.4 Quality by Effectively Linking Literature Reviews to Empirical Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5 Quality by Evidencing Engagement with Consulted Studies . 3.5.1 Close Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5.2 Achieving Rigour for Citations-In-Text . . . . . . . . . . 3.6 Key Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.7 How to … . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.7.1 … Evaluate the Quality of a Literature Review . . . . 3.7.2 … Achieve a Higher Degree of Accuracy in Literature Reviews . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.7.3 … Write Literature Reviews . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Developing Review Questions . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 Differentiating Research Objectives and Review Questions 4.2 What Are Good Questions for a Literature Review? . . . . . 4.2.1 Guiding Collection and Analysis of Literature . . . 4.2.2 Single Guiding Question as Point of Reference . . 4.2.3 Narrowly Focused . . . . . . . . . . . . . . . . . . . . . . . 4.2.4 Clarity of Good Review Questions . . . . . . . . . . . 4.2.5 Assuming Possibility of Different Outcomes or Opinions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.6 Building on Sound Assumptions . . . . . . . . . . . . . 4.3 Starting Points for Review Questions . . . . . . . . . . . . . . . . 4.3.1 From Generic to Specific and Vice Versa . . . . . . 4.3.2 Establishing Causation . . . . . . . . . . . . . . . . . . . . 4.3.3 Testing and Falsification of Theories . . . . . . . . . . 4.3.4 Considering the Spatial Dimension . . . . . . . . . . . 4.3.5 Considering the Temporal Dimension . . . . . . . . . 4.3.6 Artefacts, Methods and Tools . . . . . . . . . . . . . . . 4.3.7 Setting and Evaluating Policy . . . . . . . . . . . . . . . 4.3.8 Investigating Assumptions . . . . . . . . . . . . . . . . . 4.3.9 Rigour and Reliability . . . . . . . . . . . . . . . . . . . .
..
74
.. ..
74 75
..
78
..
81
.. ..
85 87
. . . . . . .
90 93 94 95 98 99 99
. . . . . . .
. . 100 . . 100 . . 101
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
107 108 110 110 110 111 111
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
112 114 114 114 116 117 118 118 119 119 120 121
xxviii
Contents
4.4
Population-Intervention-[Comparison]-Outcome . . . . . . 4.4.1 Root Format Population-Intervention-Outcome . 4.4.2 Format Population-Intervention-ComparisonOutcome and Other Variants . . . . . . . . . . . . . . 4.4.3 Enhancing Population-Intervention-Outcome by Using Models . . . . . . . . . . . . . . . . . . . . . . 4.4.4 Using Theories and Laws of Observed Regularities . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5 Scoping Study for Review Questions . . . . . . . . . . . . . . 4.5.1 Scoping Review as Protocol-Driven Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5.2 Scoping Study for Topical Mapping . . . . . . . . 4.6 Key Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.7 How to …? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.7.1 … Develop Review Questions That Are Worthwhile . . . . . . . . . . . . . . . . . . . . . . . . . . 4.7.2 … Conduct and Time a Scoping Review or Scoping Study . . . . . . . . . . . . . . . . . . . . . . 4.7.3 … Write a Literature Review . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Search Strategies for [Systematic] Literature Reviews . . . . co-authored by Lynn Irvine 5.1 Criteria for Retrieving Publications . . . . . . . . . . . . . . . 5.2 Types of Sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.1 Primary Sources . . . . . . . . . . . . . . . . . . . . . . . 5.2.2 Secondary Sources . . . . . . . . . . . . . . . . . . . . . 5.2.3 Tertiary Sources . . . . . . . . . . . . . . . . . . . . . . . 5.2.4 Propositional Writings . . . . . . . . . . . . . . . . . . 5.2.5 Professional Publications . . . . . . . . . . . . . . . . 5.2.6 Other Types of Publications and Sources . . . . . 5.3 Iterative Search Strategy . . . . . . . . . . . . . . . . . . . . . . . 5.4 Keywords, Controlled Vocabulary and Database Search Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4.1 Defining Keywords as Search Terms . . . . . . . . 5.4.2 Search Operators . . . . . . . . . . . . . . . . . . . . . . 5.4.3 Field Searching . . . . . . . . . . . . . . . . . . . . . . . 5.4.4 Controlled Vocabulary . . . . . . . . . . . . . . . . . . 5.4.5 Selecting Databases . . . . . . . . . . . . . . . . . . . . 5.4.6 Using Databases and Search Engines . . . . . . . . 5.4.7 On Using Publishers’ Databases . . . . . . . . . . . 5.5 Other Search Strategies . . . . . . . . . . . . . . . . . . . . . . . . 5.5.1 Hand Searching . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . 122 . . . . . . 122 . . . . . . 123 . . . . . . 126 . . . . . . 131 . . . . . . 132 . . . .
. . . .
. . . .
. . . .
. . . .
. . . .
132 136 136 138
. . . . . . 138 . . . . . . 139 . . . . . . 140 . . . . . . 140 . . . . . . 145 . . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
146 150 150 152 152 153 153 153 154
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
159 159 164 166 167 168 170 172 172 172
Contents
5.5.2 Snowballing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.5.3 Backward and Forward Searching . . . . . . . . . . . . 5.5.4 Root and Branch Searches . . . . . . . . . . . . . . . . . 5.5.5 Citation Pearl Growing . . . . . . . . . . . . . . . . . . . . 5.5.6 Expert Panels . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.6 Enhancing Effectiveness of Search Strategies . . . . . . . . . . 5.6.1 Determining Saturation When Searching . . . . . . . 5.6.2 Trading Off Specificity and Sensitivity . . . . . . . . 5.6.3 Complementary Search Strategies . . . . . . . . . . . . 5.6.4 Expert Panels . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.7 Grey Literature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.8 Undertaking the Search and Recording Results . . . . . . . . . 5.9 Scoping Reviews and Scoping Studies for Search Strategy 5.10 Key Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.11 How to …? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.11.1 … Set an Appropriate Search Strategy . . . . . . . . . 5.11.2 ... Determine Which Type of Sources to Consider 5.11.3 ... Write a Literature Review . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
xxix
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
174 174 175 176 177 178 178 180 182 184 184 188 190 191 193 193 193 193 194
Setting Inclusion and Exclusion Criteria . . . . . . . . . . . . . . . . . . . 6.1 Filtering for Relevant Sources . . . . . . . . . . . . . . . . . . . . . . . 6.2 Inclusion Criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2.1 Content . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2.2 Date of Publication . . . . . . . . . . . . . . . . . . . . . . . . . 6.2.3 Language . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2.4 Types of Source . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2.5 Research Design and Method . . . . . . . . . . . . . . . . . 6.2.6 Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2.7 Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3 Exclusion Criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4 Quality of Evidence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4.1 Hierarchy of Evidence Pyramid—Systematic Reviews . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4.2 GRADE: Grading of Recommendations, Assessment, Development and Evaluations . . . . . . . . . . . . . . . . . 6.4.3 Quality of Evidence for Qualitative Analysis and Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.5 Determining Level of Evidence for Other Archetypes . . . . . . 6.6 Scoping Reviews and Scoping Studies for Setting Inclusion and Exclusion Criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.7 Key Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
201 202 203 203 204 205 206 206 207 207 208 210
. . . . . . . . . . . . . . . . . . .
. . 211 . . 213 . . 219 . . 220 . . 223 . . 226
xxx
Contents
How to …? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.8.1 … Set Effective Inclusion and Exclusion Criteria . 6.8.2 … Evaluate the Quality of Evidence in Studies . . 6.8.3 … Write a Literature Review . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.8
. . . . .
. . . . .
. . . . .
228 228 228 228 229
Principles of Meta-Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1 Introduction to Meta-Analysis . . . . . . . . . . . . . . . . . . . . . . . 7.2 Basics of Meta-Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2.1 Conditions for Applicability . . . . . . . . . . . . . . . . . . 7.2.2 Use of Data in Meta-Analysis . . . . . . . . . . . . . . . . . 7.2.3 Process for Meta-Analysis . . . . . . . . . . . . . . . . . . . . 7.3 Identifying and Coding Variables and Attributes for Inclusion in Meta-Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.4 Models for Calculating Effect Sizes . . . . . . . . . . . . . . . . . . . 7.4.1 Common-Effect Models . . . . . . . . . . . . . . . . . . . . . 7.4.2 Fixed-Effects Models . . . . . . . . . . . . . . . . . . . . . . . 7.4.3 Random-Effects Models . . . . . . . . . . . . . . . . . . . . . 7.4.4 Mixed-Effects Models . . . . . . . . . . . . . . . . . . . . . . . 7.4.5 Meta-Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.5 Common Measures for Effect Size Used in Meta-Analysis . . 7.5.1 Standardised Mean Difference . . . . . . . . . . . . . . . . . 7.5.2 Weighted Mean Difference . . . . . . . . . . . . . . . . . . . 7.5.3 Odds Ratio and Risk Ratio . . . . . . . . . . . . . . . . . . . 7.5.4 Correlation Coefficients, Proportions and Standardised Gain Scores (Change Scores) . . . . . . . 7.6 Methods for Meta-Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 7.7 Determining Between-Study Heterogeneity . . . . . . . . . . . . . . 7.7.1 Distinguishing Types of Between-Study Heterogeneity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.7.2 Determining Statistical Heterogeneity . . . . . . . . . . . 7.7.3 Forest Plot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.7.4 Funnel Plot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.7.5 L’Abbé Plot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.7.6 (Galbraith) Radial Plot . . . . . . . . . . . . . . . . . . . . . . 7.8 Publication Bias and Sensitivity Analysis . . . . . . . . . . . . . . . 7.8.1 Assessing Publication Bias . . . . . . . . . . . . . . . . . . . 7.8.2 Sensitivity Analysis . . . . . . . . . . . . . . . . . . . . . . . . 7.9 Assessing Quality of Meta-Analysis . . . . . . . . . . . . . . . . . . . 7.10 Key Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . .
. . . . . .
237 238 240 240 242 244
. . . . . . . . . . .
. . . . . . . . . . .
247 250 251 252 254 256 258 260 261 263 263
Part II 7
. . . . .
Quantitative Analysis and Synthesis
. . 265 . . 266 . . 268 . . . . . . . . . . .
. . . . . . . . . . .
269 270 274 276 278 280 282 283 285 286 288
Contents
xxxi
7.11 How to …? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.11.1 … Choose the Most Appropriate Statistical Method for Meta-Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 7.11.2 … Write A Literature Review . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
9
Meta-Analysis in Action: The Cochrane Collaboration . . . . 8.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2 Cochrane Collaboration . . . . . . . . . . . . . . . . . . . . . . . . 8.3 Cochrane Reviews . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.3.1 Question Choice . . . . . . . . . . . . . . . . . . . . . . . 8.3.2 Identifying Relevant Studies . . . . . . . . . . . . . . 8.3.3 Analysing Results . . . . . . . . . . . . . . . . . . . . . . 8.3.4 Potential Sources of Bias . . . . . . . . . . . . . . . . 8.3.5 Biases in the Systematic Review Process . . . . . 8.3.6 Analysing Data (Including Meta-Analysis) . . . . 8.3.7 Alternatives to Meta-Analysis . . . . . . . . . . . . . 8.3.8 Summary of Findings . . . . . . . . . . . . . . . . . . . 8.3.9 Drawing Conclusions . . . . . . . . . . . . . . . . . . . 8.4 Example of a Cochrane Review . . . . . . . . . . . . . . . . . . 8.4.1 Organised in Patient (Stroke Unit) Care . . . . . . 8.5 Advantages of Cochrane Reviews . . . . . . . . . . . . . . . . 8.5.1 What Features Indicate that Cochrane Reviews Are Reliable? . . . . . . . . . . . . . . . . . . . . . . . . . 8.6 Challenges for the Cochrane Collaboration . . . . . . . . . . 8.7 Key Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . 291 . . . 291 . . . 292 . . . 292
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
299 299 300 301 301 302 303 303 303 304 305 305 306 308 308 309
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
310 310 311 311
Other Quantitative Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.1 Network Meta-Analysis: Making More Than One Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.1.1 Key Considerations in Network Meta-Analysis . . . . 9.1.2 Example of Network Meta-Analysis . . . . . . . . . . . . 9.2 Best-Evidence Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.2.1 Conducting Best-Evidence Synthesis . . . . . . . . . . . . 9.2.2 Example: Exercise Prescription for Treating Shoulder Pathology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.2.3 Example: Students’ Learning with Effective Learning Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.2.4 Strengths and Weaknesses of Best-Evidence Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.3 Qualitative Modelling of Causal Relationships . . . . . . . . . . .
. . 313 . . . . .
. . . . .
314 314 316 319 320
. . 321 . . 323 . . 324 . . 327
xxxii
Contents
9.4
Bibliometric Analysis . . . . . . . . . . . . . . . . . . . . . . . . 9.4.1 Purpose of Bibliometric Analysis . . . . . . . . . 9.4.2 Conducting Bibliometric Analysis . . . . . . . . . 9.4.3 Caveats of Bibliometric Analysis . . . . . . . . . . 9.4.4 Complementing Bibliometric Analysis . . . . . . 9.5 Systematic Quantitative Literature Reviews . . . . . . . . Authored by Catherine Pickering and Clare Morrison 9.5.1 What is It? . . . . . . . . . . . . . . . . . . . . . . . . . . 9.5.2 It is Systematic . . . . . . . . . . . . . . . . . . . . . . 9.5.3 Quantifying Literature . . . . . . . . . . . . . . . . . . 9.5.4 Methods for Analysis and Displaying Results 9.5.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 9.6 Key Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Part III
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
330 331 333 335 336 336
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
337 337 340 342 344 344 345
Qualitative Analysis and Synthesis
10 Principles of Qualitative Synthesis . . . . . . . . . . . . . . . . . . . . co-authored by Lisa Kidd 10.1 Differentiating Qualitative Synthesis from Quantitative Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.2 Purpose of Qualitative Synthesis . . . . . . . . . . . . . . . . . . 10.2.1 Aggregative Synthesis . . . . . . . . . . . . . . . . . . . 10.2.2 Interpretive Synthesis . . . . . . . . . . . . . . . . . . . . 10.2.3 Supplementing Quantitative Synthesis . . . . . . . . 10.3 Selecting a Qualitative Review Approach . . . . . . . . . . . . 10.3.1 Overview of Methods for Qualitative Synthesis . 10.3.2 Selecting a Method for Qualitative Synthesis . . . 10.4 Qualitative Synthesis of Findings . . . . . . . . . . . . . . . . . . 10.4.1 Extraction of Data and Findings . . . . . . . . . . . . 10.4.2 Synthesising Findings . . . . . . . . . . . . . . . . . . . . 10.4.3 Software and Online Tools for Qualitative Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.5 Quality Assessment for Qualitative Synthesis . . . . . . . . . 10.5.1 Assessment of Quality of Studies in Qualitative Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.5.2 Appraising Qualitative Synthesis . . . . . . . . . . . . 10.6 Key Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.7 How to …? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.7.1 … Undertake a Qualitative Synthesis . . . . . . . . . 10.7.2 … Select the Most Appropriate Method for Qualitative Synthesis . . . . . . . . . . . . . . . . . . . . 10.7.3 … Write a Literature Review . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . 353
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
354 356 357 358 360 361 361 366 374 376 377
. . . . . 379 . . . . . 379 . . . . .
. . . . .
. . . . .
. . . . .
. . . . .
379 380 381 382 382
. . . . . 382 . . . . . 383 . . . . . 383
Contents
xxxiii
. . 389
11 Methods for Qualitative Analysis and Synthesis . . . . . . . . . . . . . 11.1 Meta-Summary: Aggregating Findings and Preparing for Meta-Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Authored by Elisabeth Bergdahl 11.1.1 Some Caution Before Starting . . . . . . . . . . . . . . . . . 11.1.2 Qualitative Meta-Summary . . . . . . . . . . . . . . . . . . . 11.1.3 Some Dos and Don’ts . . . . . . . . . . . . . . . . . . . . . . . 11.1.4 Overview of Meta-Synthesis Versus Meta-aggregation . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Thematic Analysis and Thematic Synthesis . . . . . . . . . . . . . Authored by Margaret Bearman 11.2.1 Thematic Analysis in Overview . . . . . . . . . . . . . . . . 11.2.2 Thematic Analysis Step-by-Step . . . . . . . . . . . . . . . 11.2.3 Advantages and Disadvantages of Thematic Analysis as Qualitative Synthesis Approach . . . . . . . . . . . . . . 11.3 Meta-Ethnography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Authored by Katie Robinson and Judi Pettigrew 11.3.1 Seven Step Approach . . . . . . . . . . . . . . . . . . . . . . . 11.3.2 Pros and Cons of Meta-Ethnography . . . . . . . . . . . . 11.4 Grounded Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Authored by Parastou Donyai 11.4.1 Basics of Grounded Theory Methodology . . . . . . . . 11.4.2 Grounded Theory Meta-Synthesis . . . . . . . . . . . . . . 11.4.3 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.5 Discourse Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Authored by Dianna Dekelaita-Mullet 11.5.1 Application of Discourse Analysis to Systematic Literature Reviews . . . . . . . . . . . . . . . . . . . . . . . . . 11.5.2 Trustworthiness . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.6 Key Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . .
. . . .
404 406 407 408
. . . .
. . . .
408 410 411 411
12 Combining Quantitative and Qualitative Syntheses . . . . . . . . . 12.1 Purpose of Mixed-Methods Synthesis . . . . . . . . . . . . . . . . 12.2 Approaches to Designs of Mixed-Methods Synthesis . . . . 12.3 Methods for Integrated Mixed-Methods Syntheses . . . . . . 12.3.1 Sequential Exploratory Method . . . . . . . . . . . . . . 12.3.2 Sequential Explanatory Method . . . . . . . . . . . . . . 12.3.3 Convergent Qualitative Method . . . . . . . . . . . . . . 12.3.4 Convergent Quantitative Method . . . . . . . . . . . . . 12.4 Quality Criteria for Mixed-Methods Syntheses . . . . . . . . . 12.5 Other Methods for Diversity in Mixed-Methods Synthesis . 12.6 Key Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
417 417 419 423 423 425 427 429 431 433 434
. . . . . . . . . . .
. . . . . . . . . . .
. . 390 . . 390 . . 391 . . 393 . . 396 . . 396 . . 398 . . 399 . . 400 . . 400 . . 401 . . 403 . . 404
xxxiv
Contents
12.7 How to …? . . . . . . . . . . . . . . . . . . . . . . . . . . 12.7.1 ... Select the Most Appropriate Method Mixed-Methods Synthesis . . . . . . . . . . 12.7.2 ... Write a Literature Review . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Part IV
. . . . . . . . . . . . 435
for . . . . . . . . . . . . 435 . . . . . . . . . . . . 435 . . . . . . . . . . . . 436
Presentation and Writing
13 Reporting Standards for Literature Reviews . . . . . . . . . . 13.1 Relevance of Adequate Reporting . . . . . . . . . . . . . . 13.2 Reporting Methods and Tools for Extraction of Data, Analysis and Synthesis . . . . . . . . . . . . . . . . . . . . . . 13.2.1 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . 13.2.2 Protocols . . . . . . . . . . . . . . . . . . . . . . . . . . 13.2.3 Spreadsheets and Other Tools . . . . . . . . . . . 13.2.4 Online Tools . . . . . . . . . . . . . . . . . . . . . . . 13.3 Making Results and Findings Accessible . . . . . . . . . 13.3.1 Annotated Bibliography . . . . . . . . . . . . . . . 13.3.2 Chronological Reporting . . . . . . . . . . . . . . . 13.3.3 Thematic Reporting . . . . . . . . . . . . . . . . . . 13.3.4 Tabulation and Visualisation . . . . . . . . . . . . 13.4 Formats for Reporting . . . . . . . . . . . . . . . . . . . . . . . 13.4.1 PRISMA Reporting . . . . . . . . . . . . . . . . . . 13.4.2 Other Formats for Reporting . . . . . . . . . . . . 13.4.3 Domain-Specific Reporting Formats . . . . . . 13.5 What Not to be Reported . . . . . . . . . . . . . . . . . . . . 13.6 Key Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.7 How to …? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.7.1 ... Report What is Needed . . . . . . . . . . . . . . 13.7.2 ... Identify What Needs No Reporting . . . . . 13.7.3 ... Write a Literature Review . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 Data 14.1 14.2 14.3
. . . . . . . . 441 . . . . . . . . 442 . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . .
444 444 446 447 447 449 449 450 451 452 453 453 455 456 457 458 458 458 459 459 460
Management and Repositories for Literature Reviews Data Management for Literature Reviews . . . . . . . . . . . Processes for Data Management . . . . . . . . . . . . . . . . . Repositories for Systematic Reviews . . . . . . . . . . . . . . 14.3.1 The Campbell Collaboration . . . . . . . . . . . . . . 14.3.2 Cochrane Library . . . . . . . . . . . . . . . . . . . . . . 14.3.3 Environmental Evidence Library of Evidence Syntheses . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.3.4 EPPI Centre . . . . . . . . . . . . . . . . . . . . . . . . . . 14.3.5 JBI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.3.6 PROSPERO . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
465 465 467 470 470 470
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
471 472 472 472
. . . . . . . . . . . . . . . . . . . . .
Contents
xxxv
14.3.7 Systematic Review Data Repository . . . . . . . . . 14.4 Other Repositories . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.4.1 Academic Libraries . . . . . . . . . . . . . . . . . . . . . . 14.4.2 Scholarly Journals . . . . . . . . . . . . . . . . . . . . . . 14.4.3 Funding Councils . . . . . . . . . . . . . . . . . . . . . . . 14.5 Preparing Data Sets for Repositories . . . . . . . . . . . . . . . 14.6 Literature Reviews and Data Repositories Enabling Open Science . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.7 Key Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 Writing Literature Reviews . . . . . . . . . . . . . . . . . . . . . . . . . . 15.1 What Makes Writing Literature Reviews Different . . . . . 15.2 Timing the Start of Writing . . . . . . . . . . . . . . . . . . . . . . 15.3 Getting to the Writing of Literature Reviews . . . . . . . . . 15.4 Process of Writing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.5 Gaining Proficiency in Writing . . . . . . . . . . . . . . . . . . . 15.6 Improving Text Structure: Revisiting the Basics . . . . . . . Authored by Pat Baxter 15.6.1 Paragraphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.6.2 Sentences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.6.3 Punctuation . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.6.4 Use of Aspirates . . . . . . . . . . . . . . . . . . . . . . . . 15.6.5 Vocabulary: Some Common Pitfalls . . . . . . . . . 15.7 Avoiding Some Common Errors and Inaccuracies When Writing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Authored by Pat Baxter 15.7.1 How Can Accuracy in Writing Be Improved? . . 15.7.2 How Can Efficiency in Writing Be Improved? . . 15.7.3 Avoiding Common Mistakes . . . . . . . . . . . . . . . 15.7.4 Added Value of Peer Review . . . . . . . . . . . . . . 15.8 Key Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 Publishing Literature Reviews . . . . . . . . . . . . . . . . . . . 16.1 When is a Literature Review Worth Publishing . . . 16.1.1 ‘So What?’ Test . . . . . . . . . . . . . . . . . . . . 16.1.2 Articulation of Contribution to Scholarly Knowledge . . . . . . . . . . . . . . . . . . . . . . . 16.2 Selecting Journals . . . . . . . . . . . . . . . . . . . . . . . . . 16.3 Final Preparation of Manuscript . . . . . . . . . . . . . . . 16.3.1 Taking Advantage from Feedback by Peers 16.3.2 Making It Happen . . . . . . . . . . . . . . . . . . 16.3.3 Letter to the Editor . . . . . . . . . . . . . . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
473 473 473 473 474 474
. . . . . 474 . . . . . 476 . . . . . 476 . . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
479 479 483 484 486 489 491
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
491 492 493 494 494
. . . . . 496 . . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
496 497 497 498 499 500
. . . . . . . . . 503 . . . . . . . . . 504 . . . . . . . . . 504 . . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
506 507 511 511 513 517
xxxvi
Contents
16.4 Processes of Submission and Review . . . . . 16.4.1 Submission and Initial Check . . . . 16.4.2 Review by Journal . . . . . . . . . . . . 16.4.3 Decision by Editor . . . . . . . . . . . . 16.4.4 Processes After Acceptance . . . . . . 16.4.5 Special Issues . . . . . . . . . . . . . . . . 16.4.6 Publication Fees and Open Access 16.5 Review Processes . . . . . . . . . . . . . . . . . . . 16.5.1 Double-Blind Peer Review Process 16.5.2 Single-Blind Peer Review Process . 16.5.3 Open Peer Review Processes . . . . 16.5.4 Other Types of Review Processes . 16.6 Revising the Manuscript . . . . . . . . . . . . . . 16.7 Key Points . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
518 518 520 521 521 522 522 523 523 523 524 524 525 526 527
Epilogue 17 The Dissenting Voices . . . . . . . . . . . . . . . . . . . . . . 17.1 Literature Reviews as Activation . . . . . . . . . . 17.2 And Caution Nurtures Craftsmanship . . . . . . . 17.3 Balancing the Approach to Literature Reviews Starting Out . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . 531 . . . . . . . . . . . . . 531 . . . . . . . . . . . . . 533
for Those . . . . . . . . . . . . . 534 . . . . . . . . . . . . . 535
Correction to: Developing Review Questions . . . . . . . . . . . . . . . . . . . . .
C1
Appendix A: Generic, Specialist Databases and Search Engines . . . . . . . 537 Appendix B: List of Journals Publishing Only Literature Reviews . . . . . 541 Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 547
About the Authors
Rob Dekkers is Reader in Industrial Management at the Adam Smith Business School (University of Glasgow). He has convened workshops and delivered seminars on literature reviews for all disciplines and provides research seminars on other research methods, too. In addition to publishing literature reviews, he has (co)authored books, edited books, chapters in edited books, journal publications, contributions to conferences and reports, totalling more than 170. He is chairing the Early Career Researchers and Doctoral Training Programme for the International Foundation for Production Research; in addition, he is actively involved in the mentoring of early career researchers. His activities are driven by research interdisciplinary research interests in manufacturing systems, manufacturing strategy, innovation and technology management, new product development and industrial networks (incl. supply chains); furthermore, activities are often inspired by system theories and evolutionary (biological) models. He holds a master’s degree in Mechanical Engineering and a doctoral degree, both from Delft University of Technology; he has held positions in industry before embarking on an academic career. Lindsey Carey is Senior Lecturer in the Glasgow School for Business and Society at Glasgow Caledonian University. She is actively involved in research in the area of consumer behaviour and sustainability particularly within the context of ethical and organic products. She has a specialist interest in research methods and teaches the subject at masters and doctoral level. An emerging area of research focus is well-being and social media, particularly food blogging. She has presented on these topics at conferences and published in peer-reviewed academic journals. She is also an external examiner, a reviewer for academic journals, member of the scientific committee of various conferences, and she is called upon to comment on consumer and retail issues within the UK national press.
xxxvii
xxxviii
About the Authors
Peter Langhorne is Professor of Stroke Care at the University of Glasgow, UK. His research work has focussed on the effectiveness of different treatment strategies for stroke patients, including service delivery (e.g. stroke units and early supported discharge) and stroke rehabilitation (e.g. early rehabilitation). This has involved the use of systematic review and meta-analysis methodologies for complex interventions. He is previous Co-chair of the Cochrane Collaboration and Coordinating Editor of the Cochrane Stroke Group.
About the Contributors
Pat Baxter Pat Baxter’s background is in the public sector, latterly in curriculum development. In retirement she trained as an editor and proofreader, and is a professional member of the Chartered Institute of Editing and Proofreading (CIEP) and a qualified CELTA teacher. Having recently completed an MLitt in History at the University of Dundee she now edits academic books in a variety of disciplines within the humanities, including history, art history, classical studies, English literature, social and political issues; she has edited chapters of several Oxford Handbooks as diverse as History of Greek and Roman Slavery, BRICS and Emerging Economies, and Dynamic Capabilities. Specialising in editing non-native English for a variety of students and private clients, she is also copy-editor for the online journal, Valuation Studies, an open access journal published under the aegis of one of Sweden’s top universities. She is an inveterate champion of lifelong learning. Margaret Bearman Margaret Bearman is a Research Professor within the Centre for Research in Assessment and Digital Learning (CRADLE), Deakin University. She holds a first class honours degree in computer science and a PhD in medical education. Over the course of her career resesearching higher and clinical education, Margaret has written over 100 publications and regularly publishes in the highest ranked journals in her fields. Recognition for her work, includes Program Innovation awards from the Australian Office of Learning and Teaching and Simulation Australasia. Margaret’s research foci include: assessment/feedback, digital education, and sociomateriality. She has a particular interest in qualitative synthesis methodologies. Elisabeth Bergdahl Elisabeth Bergdahl is a certified nurse, Med. PhD, senior lecturer in caring science, and from 2019 she works at the Department of Health Sciences, Örebro University (Sweden). She defended her doctoral thesis in 2012 at Karolinska Institute (Sweden). Her research area ranges from palliative care, with a special focus on home care, where she and other researchers have developed a theory on co-creation in palliative care to create opportunities for dying patients to xxxix
xl
About the Contributors
achieve important goals, in end of life. She works with the development and review of qualitative methods, qualitative meta-methods, and scientific theory. She also examines deductive approaches in qualitative research methods. She has worked as a nurse for 17 years before she began her research, where her deep interest in palliative home care grew. In her spare time, she works with visual art. Dianna Dekelaita-Mullet Dianna R. Dekelaita-Mullet, Ph.D. is an Assistant Professor of Psychology at Navajo Technical University, where she also serves as Department Head of the School of Arts and Humanities. Her research focuses on talent development and academic success of underserved groups of students in higher educational contexts, particularly students underrepresented in the sciences and technology. Parastou Donyai Parastou Donyai is Professor of Social and Cognitive Pharmacy at the University of Reading. Her research explores the psychology and language of decisions about medication usage and non-usage. She has examined decisions about drug holidays in attention deficit hyperactivity disorder, challenges to medication adherence in hormonal therapy for breast cancer, the dangers of antipsychotic usage in dementia, and public attitudes to medicines reuse. Parastou studied pharmacy at King’s College London, graduating in 1993, and after her pre-registration training in hospital pharmacy returned to King’s to complete a PhD in 1998. Inspired by her experience as a pharmacist, Parastou turned her focus to studying people within healthcare settings on her return to academia in 2003. She completed a postgraduate diploma in psychological research methods in 2007 and a further degree in psychology in 2014. As well as grounded theory, Parastou uses discourse analysis and hypothesis-testing methods to advance knowledge in her field. Gordana Durovic Gordana Durović holds a bachelor degree in Biotechnology from the University of Montenegro and a MSc in Agricultural Entomology from Ankara University. She is a doctoral student within an innovative training network grant from Marie Skłodowska-Curie Actions (European Commission). Currently, she is working at KU Leuven (Belgium). And she is writing a collection of poems about the life of early-stage researchers, science, love, travelling, nature and stars. For leisure, she practises yoga and contemplates the beauty of life. Lynn Irvine Lynn Irvine is the College Librarian for the College of Social Sciences at the University of Glasgow. She is a qualified librarian with an MA joint honours degree from the University of Glasgow and a postgraduate qualification in Information Science and Librarianship from the University of Strathclyde. She is an Associate Fellow of the Higher Education Academy. Lynn has been an academic librarian in several roles in Higher Education since 1996 and in this time, was able to also to squeeze in a period managing an art bookshop for a contemporary art centre in Glasgow. She has recently co-authored a book on legal information skills (2017) and contributed to a book on research skills for legal practitioners (2019). Her research interests include learning spaces, information literacy and information
About the Contributors
xli
seeking behaviour. She regularly does peer review for the journal Global Knowledge, Memory and Communication. Lisa Kidd Dr Lisa Kidd is a Reader in the School of Medicine, Dentistry and Nursing at the University of Glasgow, UK. Lisa is a nurse by background and leads a programme of research that focusses on the implementation of supported self-management in stroke care. Lisa’s other research interests include person-centred care and patient and public involvement, and is experienced in qualitative and mixed method research, including qualitative syntheses and systematic reviews, and realist synthesis/evaluation. Clare Morrison Dr Clare Morrison is a lecturer at Griffith University, Australia and an academic editor. She has over 50 peer-reviewed publications in the fields of conservation, ecology, sustainable development, environmental management and climate change amongst others. She has co-authored articles using the systematic quantitative literature review method, delivered online workshops on the method and helped researchers from a range of disciplines, including international law, business, finance, marketing, and science, collect and analyse their data following the method. Judi Petticrew Prof Judith Pettigrew is Associate Professor of Occupational Therapy at the University of Limerick. With academic backgrounds in social anthropology, occupational therapy and history her current research explores changing patterns of therapeutic occupation and professionalisation in mid-20th century mental health practice and service users’ experiences of contemporary forensic mental health services. She is principal investigator on an interdisciplinary research project on the experience of multiple stakeholders in an Irish COVID-19 field hospital. She is author of more than 90 qualitative peer reviewed publications. Catherine Pickering Professor Catherine Pickering is an academic at Griffith University, Australia with more than 250 publications, including over 130 peer-reviewed papers. In addition to her own and laboratory groups research, she continues to develop and present live workshops, online videos and publish articles on research skills for PhD students and academics. There are now hundreds of publications using the systematic quantitative literature review method she has helped develop and promote, with tens of thousands of people using the support resource for this and other strategies including how to structure any literature review. Based on this work, she has received University and national awards for her contributions to research. More detail of the methods and links are available by searching the internet for Pickering and ‘systematic quantitative literature reviews’.1
1
Currently, the information is available at https://www2.griffith.edu.au/griffith-sciences/schoolenvironment-science/research/systematic-quantitative-literature-review.
xlii
About the Contributors
Katie Robinson Dr Katie Robinson is Senior Lecturer in Occupational Therapy and principal investigator in the Ageing Research Centre at the University of Limerick. Katie holds an undergraduate degree in Occupational Therapy, an MSc in Disability Management and a PhD in Occupational Therapy. Katie’s research interests include ageing research, rehabilitation and qualitative methods. She is author of 50 peer reviewed publications, most of which draw on qualitative research methods. Harm-Jan Steenhuis Dr. Harm-Jan Steenhuis is Associate Dean and Professor of Management, International Business at the College of Business, Hawaii Pacific University. He has published three books and over 150 refereed articles, book chapters and conference proceedings on international operations, (international) technology transfer and related topics; strategic operations and global supply chains; methodology; and the interface of instructor and student learning. He has a special interest in the aviation industry and additive manufacturing. He is Editor-in-Chief of the Journal of Manufacturing Technology Management and the International Journal of Information and Operations Management Education, and reviews for several other journals. He has served on more than 25 conference scientific committees. He is board member of academic organisations, such as IAMOT and PICMET, and previously was on the Board of Directors of the Spokane Intercollegiate Research and Technology Institute. He participates in the Micro-economics of Competitiveness Network, run by the Harvard Business School’s Institute for Strategy and Competitiveness.
List of Figures
Fig. Fig. Fig. Fig. Fig.
S.1 S.2 S.3 S.4 S.5
Fig. Fig. Fig. Fig. Fig.
1.1 2.1 2.2 2.3 2.4
Fig. Fig. Fig. Fig. Fig. Fig.
3.1 3.2 3.3 3.4 3.5 3.6
Fig. 3.7 Fig. 3.8 Fig. 3.9 Fig. 3.10 Fig. 4.1 Fig. 4.2 Fig. 4.3 Fig. 4.4
Symbolic overview of processes for narrative overviews . . . . Overview of processes for narrative reviews . . . . . . . . . . . . . . Overview of processes for systematic literature reviews . . . . . Overview of processes for systematic reviews . . . . . . . . . . . . Connecting and positioning literature reviews to empirical studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Typology of literature reviews . . . . . . . . . . . . . . . . . . . . . . . . Overview of basic steps for empirical research . . . . . . . . . . . . Use of literature for empirical studies . . . . . . . . . . . . . . . . . . . Generic process for literature reviews . . . . . . . . . . . . . . . . . . . Overview of research paradigms and appraisal for literature reviews . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Overview of archetypes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Indicative process for archetype narrative overviews . . . . . . . Indicative process for archetype narrative reviews . . . . . . . . . Process for archetype systematic literature reviews . . . . . . . . . Process for archetype systematic reviews . . . . . . . . . . . . . . . . Symbolic representation of archetypes for selection and coverage of literature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Empirical cycle for research . . . . . . . . . . . . . . . . . . . . . . . . . . Revised empirical cycle for research with position of literature reviews . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Symbolic representation of hermeneutic cycle for literature reviews . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Connecting literature reviews to design of research method based on research paradigms . . . . . . . . . . . . . . . . . . . . . . . . . . Example of population-intervention-outcome . . . . . . . . . . . . . Example of logic model for population-interventioncomparison-outcome . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Distinction between subsystems and aspects . . . . . . . . . . . . . . Overview of models based on systems theories . . . . . . . . . . .
. . . .
. xiv . xvi . xviii . xx
. . . . .
. xxii . 6 . 28 . 30 . 31
. . . . . .
. . . . . .
52 61 63 65 67 68
.. ..
72 81
..
82
..
88
.. 92 . . 123 . . 127 . . 128 . . 129 xliii
xliv
List of Figures
Fig. 4.5 Fig. 4.6 Fig. 5.1 Fig. 5.2 Fig. 5.3 Fig. 5.4 Fig. 5.5 Fig. 5.6 Fig. 5.7 Fig. 5.8 Fig. 6.1 Fig. 6.2 Fig. 6.3 Fig. 6.4 Fig. 6.5 Fig. 6.6 Fig. 7.1 Fig. Fig. Fig. Fig.
7.2 7.3 7.4 7.5
Fig. 7.6 Fig. 7.7 Fig. 7.8
Positioning scoping reviews for archetype systematic review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Positioning scoping studies for archetype systematic literature review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Iterative search strategy for literature reviews . . . . . . . . . . . . . Stylised concept map for writing about case study methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Search strategy based on keywords and databases . . . . . . . . . Example of keywords based on population-interventionoutcome . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Example of multiple field (title, abstract, keyword) phrase search on EBSCOhost using the guided style . . . . . . . . . . . . . Example of search with controlled vocabulary using EBSCO Business Thesaurus . . . . . . . . . . . . . . . . . . . . . . . . . . Scoping study for search strategy of archetype systematic literature reviews . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Scoping review for search strategy of archetype systematic reviews . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Representation of traditional hierarchy of evidence for interventions as pyramid . . . . . . . . . . . . . . . . . . . . . . . . . . Representation of modified hierarchy of evidence for interventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Evidence-based healthcare pyramid for finding pre-appraised evidence and guidance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Overview of methods for assessing quality of evidence . . . . . Scoping study for setting inclusion and exclusion critera for archetype systematic literature review . . . . . . . . . . . . . . . . Scoping review for setting inclusion and exclusion criteria for archetype systematic reviews . . . . . . . . . . . . . . . . . . . . . . . Replication continuum and appropriateness type of synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Pooled data versus meta-analysis . . . . . . . . . . . . . . . . . . . . . . Positioning pooled data on replication continuum . . . . . . . . . . Process for a systematic review using meta-analysis . . . . . . . . Symbolic representation for dentifying attributes and variables for meta-analysis . . . . . . . . . . . . . . . . . . . . . . . . Symbolic representation of common-effect, fixed-effects and random-effects models . . . . . . . . . . . . . . . . . . . . . . . . . . . Illustrative examples of data structures for mixed-effects models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Meta-regression for duration of short sleep in mortality risk . .
. . 135 . . 137 . . 155 . . 158 . . 160 . . 161 . . 166 . . 169 . . 189 . . 191 . . 211 . . 213 . . 214 . . 224 . . 225 . . 226 . . . .
. . . .
242 243 243 245
. . 248 . . 252 . . 257 . . 260
List of Figures
Fig. 7.9 Fig. 7.10 Fig. 7.11
Fig. Fig. Fig. Fig.
7.12 7.13 7.14 7.15
Fig. 7.16 Fig. 7.17 Fig. 7.18 Fig. Fig. Fig. Fig.
8.1 8.2 8.3 8.4
Fig. 8.5 Fig. 9.1 Fig. 9.2 Fig. 9.3 Fig. 9.4 Fig. 9.5 Fig. 9.6 Fig. 9.7 Fig. 9.8 Fig. 9.9 Fig. 10.1 Fig. 10.2 Fig. 10.3 Fig. 10.4
Positioning of cluster and subgroup analysis within replication continuum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Symbolic representation of forest plot . . . . . . . . . . . . . . . . . . . Forest plot for Frost Multidimensional Perfectionism Scale: Concern over Mistakes subscale with standardised effect sizes for change between pre- and post-intervention . . . . . . . . Symbolic representation of funnel plot . . . . . . . . . . . . . . . . . . Funnel plot of effect sizes on aging and vocabulary scores . . . Symbolic representation of L’Abbé plot . . . . . . . . . . . . . . . . . L’Abbé plot for the effect of echinacea on incidence of common cold . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Symbolic representation of radial plot . . . . . . . . . . . . . . . . . . . Radial plot for the effect of material incentives on responses to web surveys . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Expanded process for systematic reviews with meta-analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Clinical research and evidence-based practice . . . . . . . . . . . . . Process of a Cochrane systematic review . . . . . . . . . . . . . . . . Sources of bias in clinical trials . . . . . . . . . . . . . . . . . . . . . . . Effect of antenatal corticosteroids given to women at risk of preterm birth on the risk of perinatal death . . . . . . . Example of a funnel plot showing evidence of publication bias . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Simple network of comparisons . . . . . . . . . . . . . . . . . . . . . . . Network plot of all studies . . . . . . . . . . . . . . . . . . . . . . . . . . . Network meta-analysis plot for different types or organised (stroke unit) care . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Causal loop diagram for dynamic interaction of factors related to rural water services . . . . . . . . . . . . . . . . . . . . . . . . . Impact of factors on decision for outsourcing of information technology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Layered configuration of the five clusters for green supply chain management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Segmented growth of annual number of cited references 1650–2012 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Process for systematic quantitative literature review . . . . . . . . Overview of where studies were conducted and which area was studied . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Symbolic representation of aggregative synthesis . . . . . . . . . . Symbolic representation of interpretive synthesis . . . . . . . . . . Overview of commonly used methods for qualitative synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Selecting methods for qualitative synthesis based on hypothetico-deductive approach . . . . . . . . . . . . . . . . . . . . .
xlv
. . 274 . . 275
. . . .
. . . .
276 277 278 279
. . 280 . . 281 . . 282 . . . .
. . . .
289 300 302 304
. . 305 . . 306 . . 315 . . 317 . . 319 . . 328 . . 329 . . 332 . . 333 . . 338 . . 343 . . 358 . . 359 . . 367 . . 371
xlvi
List of Figures
Fig. 10.5 Fig. 10.6 Fig. 11.1 Fig. 11.2 Fig. Fig. Fig. Fig.
12.1 12.2 12.3 12.4
Fig. 12.5 Fig. 12.6 Fig. 12.7 Fig. 13.1 Fig. Fig. Fig. Fig.
13.2 14.1 14.2 15.1
Fig. 16.1 Fig. 16.2 Fig. 17.1
Generic process for systematic reviews based on qualitative synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Generic process for synthesising findings in a literature review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Overview of meta-synthesis versus meta-aggregation . . . . . . . Steps taken in grounded theory meta-synthesis of qualitative research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Segregated design for mixed-methods synthesis . . . . . . . . . . . Integrated design for mixed-methods synthesis . . . . . . . . . . . . Overview of mixed-methods synthesis . . . . . . . . . . . . . . . . . . Sequential exploratory method for integrated design of mixed-methods syntheses . . . . . . . . . . . . . . . . . . . . . . . . . . Sequential explanatory method for integrated design of mixed-methods syntheses . . . . . . . . . . . . . . . . . . . . . . . . . . Convergent qualitative method for integrated design for mixed-methods syntheses . . . . . . . . . . . . . . . . . . . . . . . . . Convergent quantitative method for integrated design for mixed study syntheses . . . . . . . . . . . . . . . . . . . . . . . . . . . . Screenshot of spreadsheet for extraction of data and analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Overview of PRISMA reporting for selection of studies . . . . . Decreasing availability of data . . . . . . . . . . . . . . . . . . . . . . . . Processes of data management for literature reviews . . . . . . . Flow chart used for early stages of development for the book ‘Making Literature Reviews Work’ . . . . . . . . . . Expanded ‘so what?’ test for manuscripts . . . . . . . . . . . . . . . . Submission process for peer-reviewed journals . . . . . . . . . . . . Continuum of approaches for literature reviews . . . . . . . . . . .
. . 375 . . 378 . . 397 . . . .
. . . .
405 420 422 423
. . 424 . . 426 . . 428 . . 430 . . . .
. . . .
448 454 467 468
. . . .
. . . .
487 505 519 532
List of Tables
Table Table Table Table Table Table
2.1 2.2 2.3 3.1 3.2 3.3
Table 3.4 Table 3.5 Table 3.6 Table Table Table Table
4.1 4.2 4.3 5.1
Table 5.2 Table 5.3 Table 5.4 Table 5.5 Table 6.1 Table 6.2 Table 7.1 Table 7.2
Archetypes of literature reviews . . . . . . . . . . . . . . . . . . . . . . Propositional logic and literature reviews . . . . . . . . . . . . . . . Research paradigms and literature reviews . . . . . . . . . . . . . . Archetypes of literature reviews and quality. . . . . . . . . . . . . Overview of research paradigms for literature reviews . . . . . Criteria for literature reviews following the (post)positivist paradigm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Criteria for literature reviews following the interpretivist paradigm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Criteria for literature reviews as academic mastery . . . . . . . Criteria for literature reviews following the hermeneutic approach. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Examples of questions guiding literature reviews . . . . . . . . . Common extensions to population-intervention-outcome . . . Comparison of scoping review with three archetypes . . . . . . Overview of type of publications and their suitability for literature reviews . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Search operators for five generic databases and platforms . . Overview of methods for four closely related alternative search strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Examples of studies on search filters related to databases . . Rationales for inclusion of ‘file drawer’ and ‘practitionergenerated’ literature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Factors that may lead to upgrading or downgrading in the GRADE framework . . . . . . . . . . . . . . . . . . . . . . . . . . Methods for assessing quality of evidence in qualitative literature reviews . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Weights and variances for the fixed-effects, random-effects model and unrestricted weighted least squares . . . . . . . . . . . Effect sizes and their approximate variance for common measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . .
38 44 47 71 77
..
80
.. ..
84 86
. . . . .
. . . .
. 89 . 115 . 124 . 134
. . 151 . . 165 . . 177 . . 181 . . 186 . . 215 . . 221 . . 256 . . 262 xlvii
xlviii
List of Tables
Table 8.1 Table 9.1 Table 9.2 Table 9.3 Table 10.1 Table 10.2 Table 10.3 Table 10.4 Table 10.5 Table 11.1 Table 11.2 Table 12.1 Table Table Table Table Table
12.2 13.1 13.2 16.1 16.2
Example of GRADE for quality of evidence . . . . . . . . . . . . Inconsistency table for a network meta-analysis . . . . . . . . . . Best-evidence synthesis of exercise prescription for the overhead athlete . . . . . . . . . . . . . . . . . . . . . . . . . . . . Utility assessment and ratings of generalisability for learning techniques after ranked analysis . . . . . . . . . . . . Typical research designs for quantitative and qualitative studies (healthcare) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Essential differences between quantitative and qualitative synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Overview of methods associated with qualitative synthesis with notable works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Purpose of methods for qualitative synthesis . . . . . . . . . . . . Overview of methods for qualitative synthesis and purpose of review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Example of paradigm model created for study examining medication experiences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Analytical framework for application of discourse analysis to systematic literature reviews. . . . . . . . . . . . . . . . . . . . . . . Indicative overview of methods for analysis and synthesis in mixed-methods synthesis with an integrated design . . . . . Quality criteria for mixed-methods synthesis . . . . . . . . . . . . Generic concept matrix for literature reviews . . . . . . . . . . . . Example of tabulation for presenting results. . . . . . . . . . . . . Categories of ranked deficiencies for rejecting manuscripts . Deficiencies in literature reviews for rejecting manuscripts .
. . 307 . . 318 . . 323 . . 325 . . 354 . . 355 . . 362 . . 368 . . 369 . . 406 . . 409 . . . . . .
. . . . . .
429 432 452 453 506 507
List of Boxes
Box Box Box Box Box
2.A 2.B 3.A 3.B 4.A
Box 4.B Box 5.A Box 5.B Box 5.C Box 5.D Box 6.A Box 6.B Box 6.C Box 6.D Box 7.A Box 9.A Box 10.A Box 12.A Box 13.A Box 14.A Box 15.A Box 15.B
Method for Appraisal of Literature . . . . . . . . . . . . . . . . . . . . Example of Narrative Review . . . . . . . . . . . . . . . . . . . . . . . . Two Examples for Incorrect Paraphrasing . . . . . . . . . . . . . . . Example of Citation-In-Text Being Questionable. . . . . . . . . . Example of Review Question Using the Five Recommendations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Guidelines for Scoping Reviews . . . . . . . . . . . . . . . . . . . . . . Seminal or Not? The Case of ‘Open Innovation’ . . . . . . . . . . Worked Example for Search Strategy of Protocol-Driven Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Example of Saturation for Retrieval of Studies . . . . . . . . . . . Example of Staged Systematic Literature Review Combined with Delphi Study . . . . . . . . . . . . . . . . . . . . . . . . Example I of Inclusion and Exclusion Criteria (Systematic Review) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Example II of Inclusion and Exclusion Criteria (Systematic Literature Review) . . . . . . . . . . . . . . . . . . . . . . . Categories for Quality of Evidence (GRADE) . . . . . . . . . . . . Quality of Evidence for End-User Involvement During New Product Development . . . . . . . . . . . . . . . . . . . . . . . . . . Degree of Freedom for Statistics and Meta-Analysis . . . . . . . Good Practices for Qualitative Modelling . . . . . . . . . . . . . . . RETREAT Criteria For Selecting Method of Qualitative Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Example of Review into Fuzzy Front End for New Product Development. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Example for Reporting Periods . . . . . . . . . . . . . . . . . . . . . . . URLs for Repositories of Systematic Reviews. . . . . . . . . . . . Distinguishing Between Multidisciplinary, Interdisciplinary and Transdisciplinary Reviews . . . . . . . . . . . . . . . . . . . . . . . Structuring of Paragraphs: An Example . . . . . . . . . . . . . . . . .
. . . .
. . . .
34 39 96 97
. . 113 . . 133 . . 149 . . 163 . . 179 . . 183 . . 208 . . 210 . . 215 . . 222 . . 271 . . 330 . . 373 . . 433 . . 450 . . 471 . . 482 . . 492 xlix
l
List of Boxes
Box Box Box Box Box
15.C 15.D 15.E 15.F 15.G
Box Box Box Box
16.A 16.B 16.C 16.D
Print and Online Dictionary Resources . . . . . . . . . . . . . . . . . Checklist for Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Checklist for Efficiency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Countable and Uncountable Nouns . . . . . . . . . . . . . . . . . . . . Some Common Misconceptions When Writing Literature Reviews . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sample of Journals Only Publishing Literature Reviews . . . . Key Tests for Manuscripts Before Submission . . . . . . . . . . . Checking Manuscript to Journal Guidelines . . . . . . . . . . . . . . Generic Template of ‘Letter to the Editor’. . . . . . . . . . . . . . .
. . . .
. . . .
495 496 497 498
. . . . .
. . . . .
499 511 514 515 517
Chapter 1
Introduction
Literature reviews can be challenging, sometimes be cumbersome, but also be inspiring and provide insight beyond single studies. Notwithstanding challenges and efforts, knowing how to draw on existing literature to inform evidence-based actions and to design empirical studies, is key to any academic study, forming of policy and interventions by practitioners. Examples are the Bolton Enquiry (1971) to inform policies for small firms and the initiatives to evidence-based policy change by Oxfam (Mayne et al. 2018), a charity aiming at reducing global poverty and injustice by providing urgent humanitarian support, supporting long-term development projects, and ‘influencing’ to address the root causes of poverty. Perhaps even Darwin’s (1859) work could be seen largely as a literature review rather than just empirical research. There are many more approaches to literature reviews than these examples as part of empirical studies and as stand-alone studies. The practices that have been developed for literature reviews of all kinds are at the core of this book. This chapter will introduce what the book covers with regard to all types of literature reviews and how to use it for conducting a literature review. It will start by introducing briefly what literature reviews are about in Section 1.1. This will be followed by a brief history of systematic reviews, a particular kind of review, in Section 1.2. The following Section 1.3 will concisely inform why literature reviews are not just critiquing existing published knowledge; this topic will appear in more detail in Chapters 2 and 3. Section 1.4 addresses the variety in literature reviews and also differing practices in scientific disciplines. Then, Section 1.5 gives an overview of the content of the book and how it can be used by readers. Thus, this introductory chapter tells how the book is structured, whereas the detail on how to undertake a literature review, as part of an empirical study or as stand-alone work, can be found in the chapters that follow.
© Springer Nature Switzerland AG 2022 R. Dekkers et al., Making Literature Reviews Work: A Multidisciplinary Guide to Systematic Approaches, https://doi.org/10.1007/978-3-030-90025-0_1
1
2
1.1
1
Introduction
What Are [Systematic] Literature Reviews About?
Literature reviews in general are about finding out which (scholarly) knowledge exists with regard to a specific topic or research objective. Thus, they provide access to scholarly knowledge. In addition, they identify areas of prior scholarship to prevent duplication and give credit to other researchers, and thus, play a pivotal role in dissemination of scholarly knowledge and research efforts being productive. Moreover, they ascertain inconsistencies in previous studies, pinpoint gaps in knowledge and locate questions left open in other research. This means that literature reviews serve multiple purposes from the perspective of advancing scholarly knowledge. A second aim of a literature review is to consider whether further study is necessary from a defined perspective. The vast majority of literature reviews in scholarly publications takes place in the context of preceding an empirical study. In this context, they direct the design of the research methodology and its data collection. When literature reviews take place for informing an empirical study, they consider scholarly knowledge in a more specific manner, but will identify further avenues for research, the empirical study being one of them. As a stand-alone study, literature reviews evaluate existing works and studies from a specific perspective, resulting in the assessment to what extent scholarly knowledge is adequate. Also, this will result in setting out what further research needs to be undertaken to complete and advance insight. Thus, whether a literature review is part of an empirical study or a stand-alone work, it will always indicate which further research is necessary and set out pathways for doing so. Both aims of literature reviews mean that they are more than just providing summaries of existing scholarly works. This implies that they identify the relationship of works in context of its contribution to the topic at hand and to other relevant works. These wide-ranging aims and their implications for literature reviews are probably the reason why students, doctoral students and early career researchers are seeking guidance how to conduct them in an effective and efficient manner. In this respect, both Boote and Beile (2005) and Maxwell (2006) contend that (doctoral) students must be scholars before they can become researchers, though their views and approaches differ in detail. This centrality of literature reviews for any type of academic work is the starting point of this book, in particular, how to engage with scholarly publications in an effective manner. In this perspective of conducting literature reviews ensuring appropriate acknowledgement of other works and evaluating content to advance arguments, literature reviews have gradually transited from acknowledging few to incorporating many studies; this has affected practices, too. Well into the twentieth century, it was common that literature reviews contained reference to few works. Nowadays, the broad body of scholarly knowledge and the eased access electronically have also shifted the approaches to literature reviews. It was already Glass (1976, p. 4) who noted: ‘The armchair literature review in which one cites a couple of dozen studies from the obvious journals can’t do justice to the
1.1 What Are [Systematic] Literature Reviews About?
3
voluminous literature of educational research that we now confront.’ Consequently, the methods for literature reviews have moved from giving credit to thoughts of others and works to including protocol-driven approaches. In this regard, the protocol-driven approaches for so-called systematic reviews, are seen as the ‘golden standard’, for instance, by Borgnakke (2017, p. 200) and Furunes (2019, p. 228), particularly those of the Cochrane Collaboration, even though a variety of other approaches exist.
1.2
Brief History of Systematic Reviews
The origins of systematic reviews—that have inspired many to consider a point of reference for how best to conduct literature reviews—goes back to astronomy. Based on mathematical methods for dealing with games of chances that were used for gambling, astronomers in the eighteenth and nineteenth century started to compare and combine their observations (O’Rourke 2006). A first textbook on this matter was written by George Bidell Airy (1861), Astronomer Royal at the time of its publication. However, according to Chalmers et al. (2002, p. 14) it was the French statistician Adrien-Marie Legendre, who developed the least squares method for combining observations, that led the way to approaches for what are now called research synthesis and meta-synthesis. In addition to referrring to Adrien-Marie Legendre, Chalmers et al. (ibid., pp. 14–5) indicate that others had already raised the issue of combining studies before George Bidell Airy and Adrien-Marie Legendre, with James Lind, a Scottish naval surgeon looking at a plethora of reports about the prevention and treatment of scurvy in the eighteenth century, and Arthur Young, a gentleman farmer, noting that just conducting single experiments does not lead to proof about the benefits of any method. Thus, the development of research synthesis as combining outcomes of different studies was preceded by thoughts and methods that arose in astronomy, medicine and some other disciplines. The next noteworthy step was the paper by Pearson (1904) about typhoid, mortality and the inoculation status of soldiers serving in various parts of the British Empire. According to Shannon (2016, p. 310) it raised several methodological issues arising from the correlations discussed. First, the paper by Pearson (1904) noted the significance of correlations. For this Pearson used the magnitude of the correlations in relation to their ‘probable errors.’ Second, he pointed out the ‘extreme irregularity’ of the correlation values—akin to heterogeneity—and sought to explain why they differed. Third, he commented on the ‘lowness’ of the values, arguing that they were too low to convince him that the inoculation had been proven worthwhile. There were also concerns about self-selection into the inoculated group by volunteers who were ‘more cautious and careful’, and thus, the studies could have produced spurious estimates of effectiveness. This led to a recommendation that a better vaccine was needed. This study by Pearson displayed essential issues for systematic reviews that are still used today.
4
1
Introduction
A further step in the development of methods for systematic reviews was the textbook by Fisher (1935). In it, there is an example of the appropriate analysis of multiple studies in agriculture, identifying the likely and real concern that fertiliser effects will vary by year and location. The textbook also calls for randomisation to produce more trustworthy results. According to O’Rourke (2006), the influence of Fisher on methods for systematic reviews has been profound. Among them is Yates and Cochran’s (1938) work describing approaches akin to fixed-effects and random-effects models that came in use later (Gurevitch et al. 2018, p. 176). Cochran (1954) advanced methodologies by further formalisation and generalisation. In this stage of development, the methodologies for research synthesis became more refined and broader applicable. A noteworthy development is the book by Pratt et al. (1940) about extrasensory perception. It contains a review of 145 reports on experiments published from 1882 to 1939. Furthermore, it includes an estimate of the influence of unpublished papers on the overall effect; see Section 5.7 for how to incorporate unpublished papers, which are part of what is called grey literature. By some, this is considered the first meta-analysis of conceptually identical experiments concerning a particular research issue and conducted by independent researchers. The term meta-analysis for quantitative systematic reviews was introduced by Glass in (1976) during a presidential address, when stressing the need for better synthesis of research results (O’Rourke 2006). This was also published as a paper in which Glass (1976) laid out the rationale for this type of analysis. Since then, the term meta-analysis has become a common denotation for mathematical approaches to research synthesis. The founding of the Cochrane Collaboration in 1993 for evidence-based choices about health interventions and the Campbell Collaboration in 2000 for evidence-based decisions and policy exemplify the methodologies anchored in science, but also the collaborative efforts needed to advance and undertake literature reviews in a systematic manner; for more information on Cochrane Reviews see Chapter 8. These initiatives, which also initiated improvement of the methodologies and issuing of guidelines for quantitative synthesis, led to the adoption of methods for research synthesis by a wider range of disciplines. The qualitative approach to systematic reviews was strengthened by two developments. One prominent development towards qualitative synthesis was Yin and Heald (1975). Their paper not only stimulated the development of the case study methodology, but the case survey method as they described is also a method for qualitative synthesis, though based on quantitative studies; see Section 10.3. Furthermore, in the domain of qualitative research synthesis, work was triggered by Noblit and Hare’s (1988) proposition for meta-ethnography, followed by Estabrooks et al.’s (1994) introduction of meta-aggregation. After this a plethora of methods followed, see Section 10.3, last but not least encouraged by works (e.g., Dixon-Woods et al. 2001, p. 128 ff.) advocating the adoption of methods from qualitative empirical studies. These writings have put systematic reviews based on qualitative synthesis firmly besides meta-analysis and quantitative approaches.
1.2 Brief History of Systematic Reviews
5
NOTE: POSITIONING ORIGINS OF SYSTEMATIC REVIEWS CORRECTLY Some works attribute the emergence of systematic reviews or systematic literature reviews to a surge in attention during the 1990s; however, to some extent this is arbitrary. An example is the statement by Boell and Cecez-Kecmanovic (2015, p. 162) that development of systematic literature reviews can be traced back to an evidence-based movement in medicine during the 1990s. As demonstrated by Chalmers et al. (2002, p. 14), Gurevitch et al. (2018, p. 176), O’Rourke (2006) and others, it was Karl Pearson who undertook a meta-analysis in medicine already in 1904, although not called so at that moment in time.
1.3
Addressing Variety in Literature Reviews
The aim of this book is not only providing guidance for systematic reviews, but also for other types of literature reviews, whether they are narrative or systematic, or whether they are stand-alone studies or part of empirical studies. Such covers how the content of a literature review can be compose from the consulted studies and sources, and which steps to undertake for a literature review.
1.3.1
Variety in Types of Literature Reviews
The variety in literature reviews has been highlighted, which implies that they may have been written with different purposes in mind. The writing by Cooper (1988) already draws attention how reviews differ with regard to purpose, perspective, comprehensiveness and presentation. Also, Grant and Booth (2009) point out differences in types of literature reviews as do Green et al. (2006). The latter’s writing has been point of departure for setting archetypes that are detailed in Section 2.5. Even though there is a large variety of literature reviews, they can be divided into narrative types of literature reviews and protocol-driven literature reviews; see Figure 1.1. Protocol-driven literature reviews take as starting point that a review is a research method for which the methodology should be described a priori, as would be the case for an empirical study. This type of literature review can be divided into systematic literature reviews and systematic reviews; the difference between the two is mostly that systematic reviews are typically about effects or outcomes of interventions, policies, practices and treatments, but can also address theoretical perspectives. They can be either qualitative synthesis or quantitative synthesis (such as meta-analysis, see Chapter 7) or mixed-methods synthesis. In this sense, methods for the systematic reviews are reflecting also methodologies used for empirical studies. Systematic literature reviews, the other type of protocol-driven literature reviews, are broadly divided into qualitative systematic
6
1
Introduction
Literature Reviews
Narrative Literature Reviews
Systematic Literature Reviews
Qualitative Systematic Literature Reviews
Protocol-driven Literature Reviews
Systematic Reviews
Quantified Systematic Literature Reviews Qualitative Synthesis
Quantitative Synthesis
Mixed-Methods Synthesis
Fig. 1.1 Typology of literature reviews. This classification displays the type of literature reviews that are covered by the contents of the book. The first type of literature reviews are narrative literature reviews; they are normally comprehensive and critical analysis of current scholarly knowledge on a topic. The second type is protocol-driven literature reviews; their protocols refer to search strategies, which studies to include, the method for analysis and approaches to synthesis. The group of systematic literature reviews are reviews in which an emergent analysis and synthesis of extant studies takes place; this can be supported by quantification of contents (i.e., quantified systematic literature reviews). The group of systematic reviews has methods for analysis and synthesis in place; they can take the form of qualitative, quantitative or mixed-methods synthesis.
literature reviews and quantified systematic literature reviews; the latter are literature reviews in which part of the content is quantified to support the analysis of studies discussed in the review. This broad diversity of literature reviews also shows that how literature studies have different purposes and that they can be conducted with differing approaches. Furthermore, literature reviews can be written as stand-alone studies or as part of empirical studies. Not all types of literature reviews are suitable for empirical studies, see Figure 1.1. Systematic reviews aiming at effectiveness of interventions, including meta-analysis, are almost exclusively conducted as stand-alone studies. Narrative literature reviews are hardly appropriate for this purpose. However, a narrative style for a literature review could be helpful to make the point for conceptual papers. As indicated by Callahan (2010, p. 302), in general, conceptual literature reviews, including those addressing theoretical frameworks, do not have a methodological section; this is unlike systematic reviews and systematic literature
1.3 Addressing Variety in Literature Reviews
7
reviews. This means that how the literature review will be used, and whether this is connected to an empirical study, also influences how it will be conducted.
1.3.2
Specific Approaches for Disciplines
In addition to the variety in types and purpose of literature reviews, there are specific approaches for disciplines; this applies to both writing of literature reviews for empirical studies and systematic approaches to literature reviews. For example, Stanley (2001) describes the application of meta-analysis to economics, but does so by discussing meta-regression; regression analysis is a common method for analysing numerical data in economics. Another case in point is the advice offered by da Silva (2011) and Kitchenham et al. (2009) for undertaking systematic literature reviews in software engineering. Also, Hallinger (2013) and Tranfield et al. (2003) provide guidance, albeit for literature reviews in education respectively business and management studies. And Berger-Tal et al. (2019) do so from the perspective of conducting systematic reviews in behavioural ecology. Such guidelines are often related to or derived from practices for systematic reviews in healthcare and medicine. This is reflected in the sources used throughout this book. Therefore, this book contains concepts and methods from not only healthcare and medicine, but also draws from other disciplines to inform conducting appropriate literature reviews. Because of the use of literature reviews of all kind across disciplines and domains, there are different viewpoints on what the range of methods and approaches constitute. For example, Paré et al. (2015) look at literature reviews that synthesise knowledge for the domain of information systems. In their overview of examples (ibid., p. 187), they blend processes for conducting literature reviews with the specific methods used for the analysis and synthesis. However, the methods and approaches in this overview differ from the one provided by Littell (2018, pp. 17–8) for social and healthcare. These two examples imply that views differ on which methods may be of use to specific domains; the stance of this book is that it aims at providing direction for all aspects of literature reviews, irrespective of domains, while acknowledging that different approaches might better with specific purpose of literature reviews and how scholarly knowledge is built in specific domains.
1.4
Scope and Outline of Book
Thus, the book aims at providing guidance for the broad variety of literature reviews, with a particular emphasis on systematic approaches. This is based on the premise that for a narrative style some aspects of protocol-driven literature reviews can be used; examples are search strategies in Chapter 5 and some methods for qualitative synthesis that are presented in Chapter 10. The broad orientation
8
1
Introduction
includes the points mentioned before, such as the purpose of literature review, the different types of literature review and the research paradigms that are relevant.
1.4.1
What Does the Book Cover
With this book covering a broad range of approaches and methods to literature reviews, students and researchers can develop their own process and method adequate for the type of review they want to undertake. This broad range aims at literature reviews as part of empirical studies, element of conceptual works and as stand-alone study. Also, it includes different literature reviews; see Section 2.5 for more detail on specific archetypes distinguished here building on Figure 1.1. And, it embraces the different philosophical stances that an adequate literature review constitutes; see Section 3.3 for an elaboration of this point. In addition, the text takes a multi-disciplinary perspective on how literature reviews should and could be organised. This means that students, scholars and researchers across disciplines can assess and apply approaches and methods presented in this book in the context of specific topics. Even though the book looks at a broad range of approaches to literature reviews across disciplines, most attention goes to systematic approaches and methods. This stems partly from the nature of the development of methods and approaches; this has been given considerable attention to improving the process of literature reviews in healthcare, particularly, nursing and medicine. Notwithstanding these contributions by these domains to the methods and processes for literature reviews, other disciplines have been active in developing approaches to literature reviews, too. A case in point is the domain of education. The book builds on the developments in all disciplines for methods and approaches, which is reflected in the list of references that can be found at the end of chapters and the examples provided throughout the book. Note that the book has a two-fold aim related to the broad range of approaches to literature reviews across disciplines and how they could be reported. First, it aims at providing insight in how to conduct a literature review and how to present it. This is covered by overviews of archetypes and specific methods. The second aim is to increase confidence of novices to research, doctoral students and early career researchers about conducting a literature review. Chen et al. (2016) distinguish four types of challenges: linguistic, methodological, conceptual, and ontological difficulties. The book covers directly three of these challenges, namely the methodological, conceptual, and ontological obstacles those less familiar with literature reviews may experience. The fourth one—linguistic challenges—is related to confidence across the other three, and not limited to proficiency in specific languages. With providing insight in which approaches and methods are suitable for specific purpose of literature reviews, how they can be enhanced and put into practice, this book covers the literature review from when initial thoughts start
1.4 Scope and Outline of Book
9
shaping how to undertake it to writing up the review for reports and publications in academic journals.
1.4.2
What Does the Book Not Cover
That said, while providing guidance about how to conduct and write an appropriate literature review and related methods, the book does not cover all methods in detail. A case in point are specific protocols, such as those for the international prospective register of systematic reviews (PROSPERO), even though acclaimed by some studies (e.g., Moher et al. 2014; Sideri et al. 2018). Where appropriate, reference has been made to such protocols; for instance, PROSPERO is mentioned in Section 14.3. This applies to other methods, too. An example is GRADE (Grading of Recommendations, Assessment, Development and Evaluations), which is mentioned in detail in Section 6.4; however, users might have to consult additional materials to fully implement it in their methodology for the literature review. Again, the references provided at the end of chapters may provide an entry to more detail on specific methods, if necessary. Similarly, not all detail for specific approaches in specific disciplines is provided, even though the description of methods and approaches are extensive. Where appropriate, more detail is provided. An example is Chapter 4 in which examples are found for a range of disciplines, including economics, engineering, medicine, nursing and psychology. While building on approaches from all disciplines, specific guidance as mentioned in Section 1.3 for few disciplines should be sought by the scholar writing literature reviews. Nevertheless, this book offers detailed starting points, insight and methods for all disciplines, so that any scholar can undertake and create a thorough literature review.
1.4.3
Part I: Basic Concepts for Effective Literature Reviews
In the spirit of the aims of this book, the first part contains information on generic processes and methods for all different types of literature reviews. This ranges from the purpose of literature reviews to determining whether studies should be included in the stage of detailed analysis. In this manner, it sets out guidance for literature reviews from narratives to systematic processes that can be applied. Chapter 2 provides a broad introduction to literature reviews, how to conduct them and what the four archetypes are. Furthermore, it discusses how literature reviews are positioned in empirical research and as independent publication. The chapter also contains a method for appraising individual sources as a foundation for literature reviews. Moreover, it makes the connection between the focus of a literature review and research paradigms. Finally, it pays attention how writing
10
1
Introduction
appropriate literature reviews avoids plagiarism and self-plagiarism. Thus, this chapter covers the basics for writing a literature review. Chapter 3 goes into detail about what the quality of a literature review is and how this can be achieved whether it supports an empirical study or is a stand-alone work. To this purpose, the processes for the archetypes in Chapter 2 are presented. Since conducing literature reviews is also a ‘research method’ in its own right, the connection between research paradigms and conducting reviews is made. This leads to different criteria to be considered for different purpose and archetypes. Furthermore, this chapter also discusses how to link literature reviews effectively to empirical studies. Finally, the crucial point of how to evidence engagement with literature is presented when writing reviews along with some commonly found caveats. Accordingly, this chapter highlights essential and practical points to consider for writing an effective literature review, including deliberations on research paradigms that influence the criteria for assessing a literature review. In Chapter 4 information can be found about how to develop review questions for a literature review. It pays attention to what an appropriate review question constitutes by setting out five criteria. Particularly, for systematic literature reviews and systematic reviews, a commonly used framework for formulating review questions called population-intervention-outcome and its variants are discussed. Also, the use of modelling, theories and laws of observed regularities for developing review questions is presented. Moreover, the position of scoping reviews and scoping studies as precursor to more extensive and focused literature reviews is discussed. Hence, this chapter provides essential information about the starting point of a literature review: its review question. Effective ways and methods for finding relevant sources and studies is the focus of Chapter 5. With the aim of a conducting a literature review being a discourse of relevant studies, two main search strategies are presented: the iterative search strategy, and the keywords, controlled vocabulary and databases search strategy; for both augmentations are also highlighted. Complementary search strategies to these two main search strategies are found in this chapter, too. There is attention for enhancing search strategies and assessing their effectiveness; various techniques are discussed to this purpose. The position of so-called grey literature and how this can be identified and found is presented. Also, the role of scoping reviews and scoping studies for setting out search strategies is elaborated. Consequently, this chapter contains an extensive overview of search strategies, their methods and ways to make them more effective. Following on the previous chapter, it is Chapter 6 that goes more in detail about which studies to include and exclude, which is also related to how the quality of evidence in studies can be assessed. It provides more information on how to evaluate studies from the title, keywords, abstract and content. Common criteria used for inclusion and exclusion after retrieval of studies are discussed. This is followed by how diverse studies can be included in a literature review based on the concept for the quality of evidence. Common methods for systematic reviews and systematic literature reviews are explained in detail, with particular attention to the methods of GRADE (Grading of Recommendations, Assessment, Development and
1.4 Scope and Outline of Book
11
Evaluations). Therefore, this chapter paves the way for the more detailed analysis that is found in Part II and III.
1.4.4
Part II: Quantitative Analysis and Synthesis
The second part of the book covers the quantitative analysis and synthesis of retrieved studies. It is about bringing together data from a set of included studies with the aim of drawing conclusions about a body of evidence, supported by numerical analysis, which is most commonly statistical synthesis of different types and methods. There are different approaches, ranging from the traditional meta-analysis to quantifying literature; for these approaches descriptions of the methods are provided together with applications and examples. Systematic reviews with meta-analysis are the topic of Chapter 7. This chapter covers how to use statistical models to determine the direction and size of an effect estimate. The different models are dependent on the nature of the data and the type of effect being studies in the review. These mathematical approaches also include determining the uncertainty of the effect estimate. Since designs of studies vary, meta-analysis includes examining this variety to which graphical representations and mathematical methods are presented. Hence, this chapter provides the principles and main methods of meta-analysis for systematic reviews. Building on the preceding chapters, Chapter 8 provides an example of meta-analysis drawn from the Cochrane Collaboration. It demonstrates how the principles of systematic reviews are applied in this iconic platform for collaboration. Particular attention goes to how meta-analysis is used to draw conclusions about the effect size estimate, confidence intervals and bias. The example also provides insight into how GRADE (Grading of Recommendations, Assessment, Development and Evaluations), presented in Section 6.4, can be applied to support the development of recommendations. In Chapter 9 five more methods are found for undertaking quantitative analysis and synthesis. The first one is network meta-analysis, which can be undertaken when specific comparisons between interventions and treatments are lacking or sparse. The second method is best-evidence synthesis, which aims at providing policymakers and practitioners insight into which interventions, policies are most effective and which ones less. The third method presented in this chapter is qualitative modelling, which precedes normally further analysis of quantified relationships between factors and variables. The fourth method is bibliometric analysis, which used data about studies to conduct a quantitative analysis. The fifth method is the systematic quantitative literature review, which analyses retrieved studies with regard to the production of scholarly knowledge and its applications. Thus, these five methods have different purposes and applications than meta-analysis.
12
1.4.5
1
Introduction
Part III: Qualitative Analysis and Synthesis
The third part of the book goes into more detail about approaches and methods for the qualitative analysis and synthesis of retrieved studies. This includes the combination of qualitative and quantitative synthesis, known as mixed-methods synthesis. Since there are a quite a number of approaches and methods, the selection of appropriate methods is also discussed. Thus, this part of the book provides guidance for a wide range of methods associated with qualitative analysis and synthesis. In Chapter 10 the basic principles of qualitative analysis and synthesis are presented. The chapter looks at two principle approaches to qualitative synthesis, one being aggregation of conjectures and findings from extant studies, and the other one re-interpretation of studies leading to new insights. The distinction forms also the basis for how to select an appropriate method. Additional criteria for selection are also brought to the fore. Characteristic for qualitative synthesis is a three-step approach to formulate findings of reviews, often in the form of recommendations; the use of recursive abstraction to this purpose is explained in detail. Also, how to assess the quality of retrieved studies and appraise the quality of the synthesis gets attention. Consequently, the contents of this chapter apply to any method for qualitative synthesis and direct the selection of the most appropriate method. Building on the previous chapter, five popular methods for qualitative analysis and synthesis are presented in Chapter 11. The first one, meta-summary, is an example of an aggregative method for qualitative synthesis; in its discussion, another, sixth method—meta-aggregation—is referred to. The second method found in this chapter is thematic synthesis, which is also an aggregative method. The subsequent three methods are interpretative approaches. The third method discussed is meta-ethnography, which relates retrieved studies to one another. Grounded theory is the fourth method described for the purpose of qualitative analysis and synthesis. The final method, discourse analysis, is set out how to apply to qualitative analysis and synthesis. Thus, this chapter presents guidance for how to use these five popular methods and makes also referral to meta-aggregation. Whereas the preceding chapters have addressed the quantitative and qualitative analysis separately, in Chapter 12 how to combine quantitative and qualitative synthesis is found. It explains the different approaches to this so-called mixed-methods synthesis. Also, there are different methods that can be followed for the actual analysis, which are described more detailed and when to apply them; this includes considering alternative approaches to the canonical divide between quantitative and qualitative analyses. So, this chapter provides guidance for reviews that combine studies with distinct methods and approaches to address complex review questions.
1.4 Scope and Outline of Book
1.4.6
13
Part IV: Reporting Literature Reviews
The fourth part of the book address aspects of how to report and publish literature reviews. Although throughout the chapters there are indications how to present, write and publish literature review, additional information on these processes is provided. Some of the processes and methods are identical to those for empirical studies, but there are also specific points for literature reviews with regard to methods and tools for reporting, data management and writing. Chapter 13 goes into more detail about the reporting of literature reviews and provides guidance on this point. In this respect, there is information about what should be reported about steps of the review, methods and tools used. This is of particular interest for reporting protocol-driven literature reviews. Also, styles of reporting, particularly how to make results and findings from literature reviews accessible, are found in this chapter. Furthermore, common reporting formats, related to different methods for synthesis, and domain-specific issues are highlighted. There is also attention to what should not be reported. Accordingly, this chapter lays the basis for writing and publishing literature reviews, which is of particular interest to protocol-driven literature reviews. In Chapter 14 information can be found about how to deposit datasets that are generated into repositories; this is specifically of interest for protocol-driven literature reviews. Research data management in general has attracted more attention, specifically because some funding agencies have made this a requirement; it extends to systematic reviews, too. However, making systematic reviews available to a broad range of users has a long-standing tradition, as also indicated in Section 1.2 about the history of systematic reviews. Although guidance for specific data repositories varies, in this chapter information is found on preparing datasets and making them accessible, which includes using options provided by some journals. Also, how access to data of literature reviews is related to the calls for open science is briefly discussed. Thus, the chapter provides those undertaking protocol-driven literature reviews guidelines for preparing datasets and how to make them available to others. How to actually write a literature review is explained in Chapter 15. It starts by looking into what makes writing literature reviews different. The chapter considers what is the best point for starting the writing a literature review. And it reflects on creating spatial and temporal dimension for the writing process. Schemes for the actual writing process are elaborated, while also providing advice on how to craft text. The importance of seeking feedback and revisions is addressed. By covering these topic, this chapter provides a comprehensive guide how to start writing a literature review and what to do for the actual writing. Building on how to writing a literature review in the previous chapter, guidance for the submission of literature reviews to journals is provided in Chapter 16. It includes the assessment whether it is worthwhile to consider a specific study for submission. This is also related to which journals fit with the topic of a literature review. Attention is also given on how to meet the requirements for manuscripts by
14
1
Introduction
journals. There is a brief description of review processes and outcomes, with some more detail on how to deal with a revision. This means that the onus of this chapter is on assessing whether it is worth the effort to submit a manuscript to a journal, and once doing so, how to be more successful with getting it published.
1.4.7
Epilogue
The final chapter of this book provides a note about dissenting voices with regard to the emphasis on process rather than content in literature reviews due to the systematic and protocol-driven approaches becoming more popular. This point of view reflects how conducting a literature review is a choice. This choice has also implications for the supervision of doctoral students and mentoring of early career researchers as advocated in Chapter 17.
1.5
How to Use This Book?
The content in the chapters of this book about literature reviews provides guidance for those that are conducting these types of studies and readers of reviews. The book is written to support undergraduate, postgraduate, doctoral students and early career researchers for undertaking literature reviews, while providing guidance for doctoral supervisors and practitioners at the same time. Also, citizens who are interested in the methods for systematic reviews into evidence-based interventions, practices, policies and treatments can benefit from the book. For each type of reader, a brief description follows how to make best use of the book for reading, undertaking, writing and assessing literature reviews.
1.5.1
Type of Study and Use
This book can be used by undergraduate students for coursework, essays and dissertations.1 Particularly, Chapter 2 provides foundations for how literature reviews are connected to the design of research methodologies and research paradigms. This chapter also contains a method for appraising sources and indicates how to use these individual appraisals to create a literature review. Sections 5.3 and 5.4 provide information on how to effectively conduct searches for relevant Note that the terminology follows British English conventions. The final reporting of a project for an undergraduate or postgraduate degree is called here a dissertation, whereas in American English this is commonly called thesis. It is the other way around for doctoral studies, where the final report is labelled a doctoral thesis; the latter convention is followed in this book.
1
1.5 How to Use This Book?
15
literature. Other chapters in this book are helpful to those that want to create a systematic literature review. Undergraduate students should be aware that such requires a review question for examining literature with a relatively narrow scope. Therefore, Chapter 2, Sections 5.3 and 5.4 describe the foundations and guidelines for narrative type of literature reviews, commonly found in undergraduate coursework, essays and dissertations, while also giving information about more systematic approaches to literature reviews. Also, for postgraduate taught students this book offers guidelines how to undertake a literature review in the context of coursework, essays and dissertations. To this purpose, Chapter 2 offers an overview of how research is conducted and which role literature reviews play in its processes. This chapter also contains a method for appraising sources and indicates how to use these individual appraisals to create a literature review. Sections 5.3 and 5.4 provide information on how to effectively conduct searches for relevant literature in the case of writing narrative type of literature reviews. For postgraduate students who opt for systematic approaches, either as part of empirical research or independent publication, further chapters contain information about how to retrieve sources and how to analyse these. This includes the information how to develop a protocol for systematic reviews or systematic literature reviews. Therefore, the book supports both systematic approaches to narrative type of literature reviews and protocol-driven literature reviews that are carried out by postgraduate taught students. For doctoral students and postgraduate research students this book informs about the different archetypes of literature reviews, writing and publishing these; this covers narrative overviews to systematic reviews in terms of the archetypes introduced in Section 2.5. In this respect, Holbrook et al. (2004, p. 103), and Golding et al. (2014, pp. 570) following, even denote the literature review as a ‘litmus test’ for the quality of a thesis as a whole; the work of Mullins and Kiley (2002, p. 377) also confirms this relevance of an appropriate literature review. This implies that doctoral students and postgraduate research students should critically engage with literature. Given the crucial role of a literature review in the evaluation of a doctoral thesis, Chapter 2 presents an overview of what literature reviews constitute and Chapter 3 a guide to quality criteria for literature reviews from different perspectives. Chapters 4, 5 and 6 address the basic steps of literature reviews and contain tips to be more effective and efficient in the search for relevant literature. The contents of the chapters cover the literature review as part of empirical study and stand-alone study. For systematic reviews Chapters 7, 8 and 9 are directed at the quantitative methods for synthesis, including meta-analysis, and Chapters 10 and 11 at the diversity of methods for qualitative synthesis; Chapter 12 covers methods for combining quantitative and qualitative synthesis, which is called mixed-methods synthesis here. When systematic literature reviews are conducted, Sections 9.3, 9.4, 9.5 and Chapters 10, 11 and 12 provide suitable methods for analysis and synthesis. In addition to the methods, Chapters 13, 14, 15 and 16 deal with reporting and publishing literature reviews. Hence, this book provides a comprehensive overview of how different archetypes of literature
16
1
Introduction
reviews can be conducted by doctoral students and postgraduate students, how to utilise more systematic approaches, and how to write and report them. Similarly, for early career researchers this book offers an expansion of approaches and methods covered during the doctoral study. To this purpose, Chapter 2 presents an overview of literature reviews, introduces the four archetypes, and sets out how literature reviews are related to propositional logic and research paradigms for empirical studies. Chapters 3, 4, 5 and 6 address the basic steps of literature reviews, contain tips to be more effective and efficient in the search for relevant literature, and introduce methodological issues related to conducting literature reviews. The contents of these chapters also cover the literature review as part of an empirical study and as stand-alone study. When choosing to carry out a systematic review, Chapters 7, 8 and 9 describe quantitative methods for synthesis, including meta-analysis, and Chapters 10, 11 methods for qualitative synthesis; Chapter 12 covers methods for combining quantitative and qualitative synthesis, which has been named mixed-methods synthesis in this book. When undertaking a systematic literature review, Sections 9.3, 9.4, 9.5 and Chapters 10, 11 and 12 provide the methods for analysis and synthesis. In the case of a narrative literature review, mostly Chapters 3, 4 and 5 cover methods to a more focused approach. In addition to the methods, Chapters 13, 14, 15 and 16 deal with reporting, research data management and publishing literature reviews. Consequently, for early career researchers this book offers support to further advance their knowledge about different archetypes of literature reviews, how such is related to research paradigms, what systematic approaches are available, and how to improve reporting, writing and publishing of literature reviews. For doctoral supervisors this book may provide support for doctoral students when writing their literature reviews, including developing protocols for systematic reviews and systematic literature reviews. This follows Bruce’s (1994, p. 225) stance that the supervisors need to explore with doctoral students how to undertake and write literature reviews. In this regard, guidance for developing review questions, a key step in a literature review, is found in Chapter 4. Furthermore, the steps for setting out a protocol for searching and selecting sources are found in Chapters 3, 4, 5 and 6. How to evaluate the quality of evidence is set out in Sections 6.4 and 6.5. The detailed methods for quantitative synthesis are presented in Chapters 7, 8 and 9 and for qualitative synthesis in Chapters 10 and 11; methods for combining quantitative and qualitative synthesis, called mixed-methods synthesis in this book, are shown in Chapter 12. Guidance for reporting literature reviews, specifically systematic reviews and systematic literature reviews, is provided in Chapter 13. If doctoral students want to publish Chapters 15 and 16 are helpful. Thus, doctoral supervisors can point doctoral students to starting points, processes and methods for a broad range of literature reviews, including how to write these. For practitioners this book provides details about how scholarly studies into evidence-based interventions, policies, practices and treatments are conducted. This makes systematic reviews into these more accessible and contributes to understanding how recommendations are underpinned by evidence drawn from a
1.5 How to Use This Book?
17
range of studies. In this respect, of particular interest are Sections 6.4 and 6.5 for the rating of the quality of evidence, Chapters 7, 8 and 9 for quantitative synthesis, Chapters 10, 11 for qualitative synthesis and Chapter 12 for combining both types of evidence. These chapters and Chapters 3, 4, 5 and 6 are also suitable for practitioners that want to undertake their own systematic review for evidence-based interventions, practices, policies and treatments. For citizens this book offers insight in the principles of literature reviews. Particularly, those that are interested in scholarly literature for evidence-based interventions, practices, policies and treatments can find guidance for reading and interpreting the findings of reviews. Especially, Chapters 7, 8, 9, 10, 11 and 12 provide insight into how systematic reviews, quantitative or qualitative, are conducted. Furthermore, Chapter 13 tells how these reviews should be reported. Thus, the information about methods and reporting of systematic reviews support citizens when reading studies and interpreting recommendations by publications for evidence-based interventions, practices, policies and treatments.
1.5.2
Structure of Chapters
To provide access to the guidance in the book for all these different readers, at the end of Chapters 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 and 16 key points can be found summarising the text of each chapter. These key points include references to sections, figures and tables in the respective chapter for further reading. However, as suggested by it being key points not all is covered; thus, reading the relevant sections in chapters in addition to the key points may be helpful to get a full grasp of topics relevant to undertaking, reporting and writing a literature review. The book also presents notes and tips for considering when undertaking a literature review. Notes draw attention to additional issues that should be considered for literature reviews and points for reporting them. The tips aim at improving the quality of reviews. Highlighting these notes and tips helps readers to consider specific points in addition to methods and suggestions in the main body of text. Furthermore, at the end of Chapters 2, 3, 4, 5, 6, 7, 10, 12 and 13 ‘how to’ guidance is provided. These sections aim at addressing questions that may raise from the text in a chapter. One of the recurring points is how to write a literature review. In addition to these points about writing across the chapters, Chapter 15 provides an approach to writing literature reviews in a practical manner and Chapter 16 provides information when a literature review is suitable for publishing and about the process of submitting manuscripts to academic journals. In addition to the key points, notes, tips and guidance, the chapters contain citations to relevant sources. These sources can be found at the end of each chapter. Some of these references can be used for consultation about how to undertake literature reviews in more detail and others are the examples provided for points
18
1
Introduction
made in the book. Particularly, examples are drawn from a broad variety of domains to show the application of approaches, methods and tools mentioned in this book.
1.5.3
Informed Choices for Undertaking and Assessing Literature Reviews
Summing it up, this books aims at providing a comprehensive overview of approaches, methods, formats and tools for literature reviews so that undergraduate, postgraduate, doctoral students and early career researchers can make informed choices about how to undertake literature reviews in a broad range of disciplines for different purposes. To this purpose, the book contains descriptions of how to set questions for a literature review to presenting and writing the literature review; the guidance for specific groups and for specific approaches can be found in the preceding subsections and the synopsis of the book. Also, the benefits and disadvantages of specific approaches and methods are placed in the context of the purpose of literature reviews. The contents of chapters, supported by figures, tables, key points, notes, tips and ‘how to’ sections, makes it possible to develop a thorough approach to a literature review. Also, instances of literature reviews and worked examples are provided in the book; these are drawn from different disciplines. All this demonstrates that each literature review serves a specific purpose and is different from others but can benefit from guidance and tips about how to conduct them effectively with different approaches for different purposes and at the same time keeping how a literature review is set within a specific discipline in mind.
1.6
Key Points for Book
• The book has a multi-disciplinary focus for conducting and writing literature reviews of any kind. To this purpose, it is informed by methods and practices found in multiple disciplines, while noting that some domains have been more active in publishing about methods than others, with education, healthcare, information systems, medicine and nursing being cases in point. • Furthermore, the emphasis of the book is on systematic approaches, but this is not limited to protocol-driven literature reviews, such as systematic reviews. Many approaches, methods, frameworks and formats that can be used for any type of literature review have found their way in the text and are illustrated how to conduct them. Also, this means that this book is a suitable guide for a broad range of those that have to undertake a literature review or are interested in how
1.6 Key Points for Book
19
they should be carried out: undergraduate students, postgraduate taught students, postgraduate research students, doctoral students, early career researchers, practitioners and citizens. • The synopsis provides a quick guide to specific chapters and sections related to the four archetypes that are introduced in Chapter 2 (based on Figure 1.1) and literature reviews for empirical studies. • In the chapters, information about criteria, guidelines, methods, practices, processes and tools is supported by: • Figures and tables for more detailed guidance and illustration. • Boxes that provide worked examples or more detail on specific methods and tools. • Notes that draw attention to additional issues that should be considered for literature reviews and points for reporting them. • Tips that aim at improving the quality of literature reviews. • Key points that summarise points addressed in a chapter. • ‘How to …’ sections that reflect questions often asked by those new to literature reviews or seeking to expand their knowledge.
References Airy GB (1861) On the algebraic and numerical theory of errors of observations and the combination of observations. MacMillan and Co, Cambridge Berger-Tal O, Greggor AL, Macura B, Adams CA, Blumenthal A, Bouskila A, Candolin U, Doran C, Fernández-Juricic E, Gotanda KM, Price C, Putman BJ, Segoli M, Snijders L, Wong BBM, Blumstein DT (2019) Systematic reviews and maps as tools for applying behavioral ecology to management and policy. Behav Ecol 30(1):1–8. https://doi.org/10.1093/ beheco/ary130 Boell SK, Cecez-Kecmanovic D (2015) On being ‘systematic’ in literature reviews in IS. J Inf Technol 30(2):161–173. https://doi.org/10.1057/jit.2014.26 Bolton JE (1971) Small firms—report of the committee of inquiry on small firms (4811). Her Majesty Stationary Office, London Boote DN, Beile P (2005) Scholars before researchers: on the centrality of the dissertation literature review in research preparation. Educ Res 34(6):3–15. https://doi.org/10.3102/ 0013189x034006003 Borgnakke K (2017) Meta-ethnography and systematic reviews—linked to the evidence movement and caught in a dilemma. Ethnogr Educ 12(2):194–210. https://doi.org/10.1080/ 17457823.2016.1253027 Bruce CS (1994) Research students’ early experiences of the dissertation literature review. Stud High Educ 19(2):217–229. https://doi.org/10.1080/03075079412331382057 Callahan JL (2010) Constructing a manuscript: distinguishing integrative literature reviews and conceptual and theory articles. Hum Resour Dev Rev 9(3):300–304. https://doi.org/10.1177/ 1534484310371492 Chalmers I, Hedges LV, Cooper H (2002) A brief history of research synthesis. Eval Health Prof 25(1):12–37. https://doi.org/10.1177/0163278702025001003 Chen D-TV, Wang Y-M, Lee WC (2016) Challenges confronting beginning researchers in conducting literature reviews. Stud Contin Educ 38(1):47–60. https://doi.org/10.1080/ 0158037X.2015.1030335
20
1
Introduction
Cochran WG (1954) The combination of estimates from different experiments. Biometrics 10 (1):101–129. https://doi.org/10.2307/3001666 Cooper HM (1988) Organizing knowledge syntheses: a taxonomy of literature reviews. Knowl Soc 1(1):104–126. https://doi.org/10.1007/BF03177550 da Silva FQB, Santos ALM, Soaris S, França ACC, Monteiro CVF, Maciel FF (2011) Six years of systematic literature reviews in software engineering: an updated tertiary study. Inf Softw Technol 53(9):899–913. https://doi.org/10.1016/j.infsof.2011.04.004 Darwin C (1859) On the origin of species by means of natural selection or, the preservation of favoured races in the struggle for life. John Murray, London Dixon-Woods M, Fitzpatrick R, Roberts K (2001) Including qualitative research in systematic reviews: opportunities and problems. J Eval Clin Pract 7(2):125–133. https://doi.org/10.1046/j. 1365-2753.2001.00257.x Estabrooks CA, Field PA, Morse JM (1994) Aggregating qualitative findings: an approach to theory development. Qual Health Res 4(4):503–511. https://doi.org/10.1177/ 104973239400400410 Fisher RA (1935) The design of experiments. Oliver and Boyd, Edinburgh Furunes T (2019) Reflections on systematic reviews: moving golden standards? Scand J Hospit Tour 19(3):227–231. https://doi.org/10.1080/15022250.2019.1584965 Glass GV (1976) Primary, secondary, and meta-analysis of research. Educ Res 5(10):3–8. https:// doi.org/10.3102/0013189X005010003 Golding C, Sharmini S, Lazarovitch A (2014) What examiners do: what thesis students should know. Assess Eval High Educ 39(5):563–576. https://doi.org/10.1080/02602938.2013.859230 Grant MJ, Booth A (2009) A typology of reviews: an analysis of 14 review types and associated methodologies. Health Info Libr J 26(2):91–108 Green BN, Johnson CD, Adams A (2006) Writing narrative literature reviews for peer-reviewed journals: secrets of the trade. J Chiropr Med 5(3):101–117. https://doi.org/10.1016/S0899-3467 (07)60142-6 Gurevitch J, Koricheva J, Nakagawa S, Stewart G (2018) Meta-analysis and the science of research synthesis. Nature 555(7695):175–182. https://doi.org/10.1038/nature25753 Hallinger P (2013) A conceptual framework for systematic reviews of research in educational leadership and management. J Educ Admin 51(2):126–149. https://doi.org/10.1108/ 09578231311304670 Holbrook A, Bourke S, Lovat T, Dally K (2004) PhD theses at the margin: examiner comment on re-examined theses. Melb Stud Educ 45(1):89–115. https://doi.org/10.1080/17508487.2004. 9558608 Kitchenham B, Brereton OP, Budgen D, Turner M, Bailey J, Linkman S (2009). Systematic literature reviews in software engineering—a systematic literature review. Inform Softw Technol 51(1):7–15. https://doi.org/10.1016/j.infsof.2008.09.009 Littell JH (2018) Conceptual and practical classification of research reviews and other evidence synthesis products. Campbell Syst Rev 14(1):1–21. https://doi.org/10.4073/cmdp.2018.1 Maxwell JA (2006) Literature reviews of, and for, educational research: a commentary on boote and beile’s “scholars before researchers.” Educ Res 35(9):28–31. https://doi.org/10.3102/ 0013189x035009028 Mayne R, Green D, Guijt I, Walsh M, English R, Cairney P (2018) Using evidence to influence policy: Oxfam’s experience. Palgrave Commun 4(1):122. https://doi.org/10.1057/s41599-0180176-7 Moher D, Booth A, Stewart L (2014). How to reduce unnecessary duplication: use PROSPERO. BJOG Int J Obstetr Gynaecol 121(7):784–786. https://doi.org/10.1111/1471-0528.12657 Mullins G, Kiley M (2002) ‘It’s a PhD, not a Nobel Prize’: how experienced examiners assess research theses. Stud High Educ 27(4):369–386. https://doi.org/10.1080/ 0307507022000011507 Noblit GW, Hare RD (1988) Meta-ethnography: synthesizing qualitative studies. Sage, Newbury Park, CA
References
21
O’Rourke K (2006) A historical perspective on meta-analysis: dealing quantitatively with varying study results. JLL Bull. http://www.jameslindlibrary.org/articles/a-historical-perspective-onmeta-analysis-dealing-quantitatively-with-varying-study-results/ Paré G, Trudel M-C, Jaana M, Kitsiou S (2015) Synthesizing information systems knowledge: a typology of literature reviews. Inform Manag 52(2):183–199. https://doi.org/10.1016/j.im. 2014.08.008 Pearson K (1904) Report on certain enteric fever inoculation statistics. BMJ 2(2268):1243–1246 Pratt JG, Rhine JB, Smith BM, Stuart CE, Greenwood JA (1940) Extra-sensory perception after sixty years: a critical appraisal of the research in extra-sensory perception. Henry Holt and Company, New York Shannon H (2016) A statistical note on Karl Pearson’s 1904 meta-analysis. J R Soc Med 109 (8):310–311. https://doi.org/10.1177/0141076816659003 Sideri S, Papageorgiou SN, Eliades T (2018) Registration in the international prospective register of systematic reviews (PROSPERO) of systematic review protocols was associated with increased review quality. J Clin Epidemiol 100:103–110. https://doi.org/10.1016/j.jclinepi. 2018.01.003 Stanley TD (2001) Wheat from Chaff: meta-analysis as quantitative literature review. J Econ Perspect 15(3):131–150. https://doi.org/10.1257/jep.15.3.131 Tranfield D, Denyer D, Smart P (2003) Towards a methodology for developing evidence-informed management knowledge by means of systematic review. Br J Manag 14(3):207–222. https:// doi.org/10.1111/1467-8551.00375 Yates F, Cochran WG (1938) The analysis of groups of experiments. J Agric Sci 28(4):556–580. https://doi.org/10.1017/S0021859600050978 Yin RK, Heald KA (1975) Using the case survey method to analyze policy studies. Adm Sci Q 20 (3):371–381. https://doi.org/10.2307/2391997
Part I
Basic Concepts for Effective Literature Reviews
Downloaded from: http://www.freepngclipart.com/download/temp_png/22107stack-of-books-png-image_600x600.png [accessed: 25th August, 2021]
Chapter 2
Objectives and Positioning of [Systematic] Literature Reviews
When undertaking literature reviews, it is important to know why you have to do these and how. In 2012 a doctoral student told that his supervisor had advised for the literature review: ‘Just do it!’ Whereas this advice holds true to a certain extent, it does not provide a true starting point for appropriate and thorough literature reviews. Better options for starting are to know how literature reviews are embedded in research processes and how they are related to different approaches for the design of research. To this purpose, the chapter provides starting points for literature reviews; more detail on how to undertake a literature review follows in the remaining chapters of this book. This chapter will begin with Section 2.1 that outlines how literature reviews are a necessity for both academics, scholars and practitioners. How literature is connected to the different steps of the research process is the topic of Section 2.2. It will also introduce a basic process for the design of research projects. This is followed by Section 2.3 that will look into what the difference is between a critical evaluation of literature and critiquing. The critical evaluation is the base for synthesising information held by sources in Section 2.4. Then, in Section 2.5 four archetypes of literature reviews will be presented. Literature reviews can also be related to propositional logic, which is addressed in Section 2.6. This will be followed in Section 2.7 by how research paradigms adopted by researchers for the empirical study also influence the nature of the preceding literature review. Finally, Section 2.8 informs how appropriate literature reviews avoid plagiarism. By looking at these topics in this chapter, the foundation is laid for literature reviews in whatever way so that they will be appropriate and thorough for their purpose.
© Springer Nature Switzerland AG 2022 R. Dekkers et al., Making Literature Reviews Work: A Multidisciplinary Guide to Systematic Approaches, https://doi.org/10.1007/978-3-030-90025-0_2
25
26
2.1
2 Objectives and Positioning of [Systematic] Literature Reviews
Literature Sensitivity and Professional Knowledge
For both academics, scholars, practitioners and policymakers knowing about state-of-the-art knowledge is informing activities, albeit in different ways. Such knowledge covers tools, methods, conceptualisations and theories, and what their applications and limitations are. For academics this knowledge ranges from the suitability of theory to effectiveness of tools and methods, and to viewpoints of actors and agents (subjects of study) in social interactions. This knowledge is often found in literature reviews that precede empirical studies or in overviews that collect, compare and synthesise evidence and concepts from other empirical studies, and sometimes propositional papers. In the case of practitioners, they may use state-of-the-art knowledge for evidence-based interventions, methods, policies, practices and processes (such as decisions on which treatment is most effective). Using literature improves the chances that these are more effective or that possible outcomes are known beforehand. Thus, literature plays an important role for developing tools, methods, conceptualisations and theories, and for putting these into practice. The extent to which somebody is familiar with existing knowledge in scholarly literature is called ‘literature sensitivity’ (Strauss and Corbin 1998, p. 49). This term expresses whether an academic, practitioner or policymaker has sufficient knowledge of key works and other studies that inform activities or empirical research. It should be noted that this also extends to depth of the knowledge and how it is being used. It may also include the origins and history of key concepts and theories. A case in point is the term set-based concurrent engineering, which refers to the stage-wise development of solutions in new product development; according to Salgado and Dekkers (2018, p. 913), this term was initially coined by Ward et al. (1995). But they also mention that it is preceded by the controlled convergence method, which dates back to Pugh (1981). Curiously enough, Ward et al. (1995, pp. 48–9) refer to this source, too, but ‘hidden’ in a footnote, commensurate with the referencing style for Sloan Management Review in which their article was published. However, many contemporary authors see Toyota’s set-based concurrent engineering described by Ward et al. as a new concept rather than one that builds on the controlled convergence method and other methods to this purpose. Similar to the historical context, the delineation between concepts is one of the subtleties that could be expected from an appropriate literature review. This indicates that appropriate literature reviews should also take into account these finesses and historical development of key concepts to demonstrate literature sensitivity. TIP: FINDING OUT ABOUT KEY CONCEPTS To develop this finesse and to have a better grip on the breadth and the historical development of key concepts, there are a number of ways, particularly for novices to a topic:
2.1 Literature Sensitivity and Professional Knowledge
27
• Reading sources carefully and finding references to key concepts or preceding works is the first approach. This could lead to further searches for these works or to requests to libraries for retrieving preceding works. • Consulting experts or more experienced academics is the second approach. Attending conferences and research seminars can also be helpful to this purpose. • Looking beyond a specific domain or topic. This is not only beneficial for multidisciplinary topics, and it can also develop appreciation of similar thoughts in other domains and disciplines. All these three resolutions to understanding the context of topics and key concepts better, take time and effort; however, they can be very rewarding, because it will reflect in the clarity and the added value of writing literature reviews.
2.2
Research Processes and Literature Reviews
Appraising and using literature is not only restricted to the literature review of a study; to this purpose, this section will look at a generic model for research and how literature is related to it.
2.2.1
Basic Research Processes
The basic process for conducting empirical research is depicted in Figure 2.1. This process applies to all kinds of studies, such as undergraduate and postgraduate dissertations, doctoral studies, research projects and practitioners’ interventions. A research process starts with setting research objectives and it ends by analysing results with the aim of drawing inferences, conclusions and recommendations related to the research questions and objectives. The steps in this basic research process can also be partly iterative, which is not discussed here. In the following paragraphs the steps of the basic research process will be elaborated. A first step in research is positioning the topic in its domain and its relationships to disciplines, which should result in setting research objectives. In this step a gap or the suitability of extant knowledge for a topic of investigation should be identified and how this is of interest to theory, applications and practice. It should lead to discerning the potential contribution to knowledge that may be made. Such reflects in the research objectives that are set; this can also be in the form of hypotheses or propositions. These research objectives are also sometimes called research aims, research questions, etc.; the exact nomenclature depends on the disciplines and sometimes where the work is published. In this book, the research aim is defined as the contribution to scholarly knowledge, whether theoretical or practical, a study is going to make. The research objectives are the initially specified questions that break down the research aim into smaller, logically connected parts
28
2 Objectives and Positioning of [Systematic] Literature Reviews
Topic Domain Research Objectives
Disciplines and domains
Literature Review (Detailed/ Refined) Research Questions
Research Method
Designed Data Collection
Data Analysis
Actual Data Collection
Fig. 2.1 Overview of basic steps for empirical research. The process for research starts by positioning the topic that will be investigated in its domain, or in case of multi-disciplinary studies, in relation to the constituent disciplines. This leads to formulating research objectives. A literature review adds insight from extant knowledge to guide the research method. Normally, the review ends up in the research objectives being redefined as detailed research questions (or hypotheses). This is followed by designing the research method. After collecting data it may emerge that the actual data collection diverts from initial intentions for it. Data analysis is the step before inferences, conclusions and recommendations related to the research question and objectives can be drawn.
that systematically address the various aspects of the problem. They also incorporate the theories, constructs, perspectives, methods, tools, etc. that are going to be used; note that all this refers to the topic of the study and not the methods and tools for the empirical study, which are defined later. The defining of research objectives guides the further research processes. Based on the research objectives a literature review is undertaken. It aims at finding out which published writings about theories, conceptualisations, methods, perspectives and tools pertinent to the research objectives are available. One purpose is confirming the research gap or the potential contribution to knowledge. The second purpose is identifying more specific issues that should be addressed during the empirical research. A third purpose is to find aspects, constructs and variables that should be investigated. This means that the outcomes of this step should lead to refining the research objectives into more refined research questions, hypotheses or propositions. Again, conventions across disciplines may vary to what this is called. No matter its exact wording, the literature review paves the way for the empirical data collection. Before the data collection can happen, the research method should be designed. This assures that not only the refined research questions will be addressed, but also
2.2 Research Processes and Literature Reviews
29
that quality criteria for the empirical research can be met. The design of the research method should also consider to what extent inferences will be supported by data collection. Such includes also possibilities for falsification and alternative explanations. Finally, the design of the research method takes into account specific limitations; if possible, countermeasures should be built into the design of the research method. Thus, the design of the research method results in a detailed plan outlining how a study will address its specific research questions, hypotheses or propositions. Based on the design of the research method data collection and analysis can take place. Whether this data collection is an experiment, secondary data collection, a survey, interviews, case study methodology or any other method, normally, it is difficult to keep to the initial design and often things change with regard to the pattern that was set out. An example is that the demographics of participants in a survey may differ from the actual demographics of the population, when relatively more younger people respond. Such differences between actual and intended data collection may influence the results. Therefore, the deviations should be recorded and assessed on how they influence the results and outcomes of a study. Also, which method was used for the analysis of data should be reported. The analysis of data leads to inferences and findings. After analysis of the collected data, drawing further inferences, conclusions and recommendations follows; this has been omitted from Figure 2.1. During this step researchers reflect on whether the research objectives have been met, whether it results in a contribution to knowledge and what the implications of the study are for further research, practice and society, if applicable.
2.2.2
Where Literature is Used and How in Empirical Studies
The first of the four steps where literature is used to inform empirical research is the positioning of the topic in its domain and determining its relationships to informing disciplines; see Figure 2.2. In this step of the research the relevance of the topic is highlighted by pointing to gaps or by calls for specific research to take place. This includes referring to domains and disciplines that will inform the research. The purpose is to establish research objectives, which are derived using a rationale rooted in literature. The second of the four steps is the literature review once research objectives have been set; see Figure 2.3. The purpose of this appraisal of literature is to find conceptualisation, constructs, methods, perspectives, theories, tools and variables that are pertinent to the research objectives. To this purpose, review questions are set within the context of the research objectives, the disciplines of the study and how outcomes of the study are expected to be used. This state-of-the-art (scholarly) knowledge informs the research method and data collection.
Disciplines and domains
Research Objectives
Topic Domain
(Detailed/ Refined) Research Questions
Literature Review
Fig. 2.2 Use of literature for empirical studies. Literature is used in various ways during an empirical study. In the first place literature informs the relevance of a topic both for the scholarly domain and practice. This should lead to identifying the significance of the potential contribution to knowledge. Its resulting research objectives are used for guiding the literature review. The appraisal of extant literature brings to the fore current insight and how this can be used for the empirical study. Also, for the design of a research method consultation of literature is necessary. Scholarly knowledge about research methods informs the choice of the most appropriate method and limitations of the study. Finally, results are compared with other studies, so are findings, conclusions and recommendations.
Actual Data Collection
Designed Data Collection
Data Analysis
Data Analysis • Comparison of Results with Other Studies • Comparison of Conjectures with Other Studies • Comparison of Conclusions with Other Studies
Research Method
Design of Research Method • Justification of Research Method • Detailing of Method • Path for Data Collection and Repository • Methods for Data Analysis • Assurance of Quality and Reliability • Countermeasures Limitations (if possible)
Literature Review • Confirmation of Gap (Contribution to Knowledge) • Theories, Conceptualisations, Methods, Perspectives, Tools • (Aspects, Constructs, Variables for Investigation) • Prevalence of Research Methods and Approaches
Positioning Topic in Domain • Relevance of Topic • Relationship to Domain • Contribution to Knowledge • Defining Research Objectives
30 2 Objectives and Positioning of [Systematic] Literature Reviews
2.2 Research Processes and Literature Reviews Fig. 2.3 Generic process for literature reviews. The starting point of literature review as part of an empirical study is defining the purpose of the literature review based on the overall research objectives. This informs specific review questions, which guide the selection and analysis of studies. After the actual analysis results and conjectures can be synthesised into findings. In the case of systematic approaches to literature reviews, the method for retrieving studies and the subsequent analysis is specified in a protocol.
31 Defining Research Objectives
Defining Purpose of Literature Review
Setting Review Questions
(Designing) Method for Literature Review
Retrieval of Sources
Analysis of Sources
Synthesis of Findings
Also, for designing the research method literature is needed. Such consultation and citing of sources lead to developing a justification why a particular research method is most appropriate. A justification may include comparisons to other specific research methods. In addition, the discussion of the research method could cover sample size and characteristics. Furthermore, it should lead to which constructs and variables will be used for data collection in a survey. By considering the quality of the design of the research method reliable outcomes should be assured. One of these considerations is the limitations and which impact they have on findings (in published academic papers these are normally found at the end of the writing); sometimes, it is possible to integrate countermeasures or complement the study with additional research methods to mitigate weaknesses. Thus, the use of literature for designing the research method covers quite a number of aspects that need to be considered in order to increase the rigour for data collection and analysis (sometimes called dependability, reliability or trustworthiness). Based on the design of the research method and after data collection, the data analysis can take place; literature has two roles here. The first role is the comparison of results with what is known in literature; this comparison is not always necessary, but can be helpful for explaining differences with preceding works later on in a report or publication. The second role is the comparison of conjectures and findings resulting from the analysis with existing studies; differently from the results, this
32
2 Objectives and Positioning of [Systematic] Literature Reviews
particular comparison could also include studies that have followed different research methods or had different research objectives. These two roles for using literature during the analysis help to establish the actual contribution to knowledge of a study, which is normally found at the end of a study. Sometimes literature is also found in the concluding section of a work. In such sections there is reflection on the contribution to knowledge by a study, implications and further research. Typically, the contribution to knowledge is compared with other studies that have made previous attempts or similar attempts in its entirety or focused on specific aspects of the study. The implications of a study can be directed at practitioners and society. A discussion about further research should indicate how other scholars can use the outcomes of the research for their studies, how possible imperfections can be addressed by further studies and what more needs to be done to increase reliability and generalisability. All these points are using preceding literature to point out complementarity and possible contrasts with the aim that the current study is anchored in both extant publications and directions for future research. NOTE: HOW CITING LITERATURE SPECIFICE FOR DISCIPLIINES How to cite literature in the steps of the research process, as depicted in Figure 2.2 depends on conventions for disciplines. For example, in some publications about education it is common to quote from articles in the literature review, whereas for other disciplines this may be considered not good practice. This means that the presentation of literature reviews is guided by what is considered normative in a specific discipline; consulting preceding publications may be informative to this matter.
2.2.3
Solving Practical Problems and Use of Literature
It also possible to solve problems in practice. This can be also evidence informed using models from literature, but this approach follows different steps and guidelines as described in Dekkers (2017, Chapters 3 and 4). A particular difference for such an approach is that modelling and elicited requirements for a practical situation guide the evaluation of tools, methods and conceptualisations found in literature. Some would call this evidence-based research or studies. However, key is finding the appropriate tools, methods and conceptualisations found so that information from practice can be used in a structured manner. The drawing of knowlege from literature applies to the phase of analysis, and the steps for generating, detailing and implementing solutions.
2.3
Evaluating Literature
Whereas a literature review for a study normally focuses on a specific empirical study, literature reviews of studies as stand-alone work also focus on evaluating scholarly knowledge, albeit in a broader context; in addition, literature reviews of
2.3 Evaluating Literature
33
studies as stand-alone work can focus on the effectivenss of interventions, policies, practices and treatments, and also, the generating and testing of theory. However, the ones concentrasting on the effectivenss of interventions, policies and practices also often include implications for further research. This means that foci of literature reviews may vary. However, they all share in common that literature is evaluated on its usefulness; this will be discussed in the next two subsections by looking at the difference between a critical evaluation and critiquing, and what appraisal of literature constitutes.
2.3.1
Difference Between Critical Evaluation and Critiquing
The purposeful evaluation of literature is called critical evaluation; it is also known by the terms: critical appraisal and critical review.1 The intent of the critical evaluation is to find out whether existing knowledge is appropriate for a specified research objective. If this is the case, then further empirical research does not add more than reliability, for example, by considering additional samples. However, if the existing knowledge is not appropriate, then futher empirical research may add to existing knowledge. In the same vein, it is possible to identify theory, constructs, variables, methods and tools, and to what extent they are relevant to the research objective in mind. This means that research objectives are leading the evaluation of literature on its usefulness for the further research that is planned. A critical appraisal of literature does not necessarily mean critiquing. When critiquing only the negative aspects of a specific topic are considered. This can be that the topic is not new, that it is incompatible with other lines of thoughts, that is not practical among other reasons. Such critiques do not necessarily point to how further research should be undertaken and how others can benefit from such research. In general, the intent of a literature review is either to collate conceptualisations or to evaluate appropriateness of preceding works or to amalgamate evidence; a critique alone does not necessarily sprovide such a perspective.
2.3.2
Elements of Appraisals
A first point in appraising a source, see Box 2.A, is evaluating where it can be used in the basic steps of research; see Figure 2.2 and Section 2.3. The first step is the justification of the research objectives, the determining of the gap in knowledge and
1 Some call it also ‘critiquing review’, which is slightly confusing. A case in point is Xiao and Watson (2017, p. 10), although the source they refer to, i.e. Paré et al. (2015), does not use the word ‘critique’ in any manner.
34
2 Objectives and Positioning of [Systematic] Literature Reviews
the setting of research processes. The second step is for the actual literature review. The third step how the source can inform the design of the research method. And a fourth use is to compare information from sources with data, results, conjectures, findings and conclusions. Thus, important it is to determine to what purpose the source can be used in the context of a literature review or empirical study. Box 2.A Method for Appraisal of Literature A method for appraising literature consists of four points to be evaluated for each source (note that the maximum number of words for each point is merely indicative): 1. Which purpose does the source serve for the current study (dissertation, doctoral thesis, research project)? (max. 50 words) • Justification of topic. • Information about conceptualisations, constructs, methods, perspectives, theories, tools and variables. • Design of research methodology. • Comparison of data, results, findings and conclusions. 2. What is the background of the author(s) and study? (max. 100 words) • Level of expertise and (research) interest. • Independent publication or part of larger project. • Potential bias (including funders, advocacy). 3. Is the design of its research process and are the results trustworthy for the objectives of the current study? (max. 100 words) • Note that this is not a summary of the source itself, but aims determining which parts of the source are useful • Points to evaluate the quality of a study (see also Section 3.4 for more detail). 4. How can this source be used for addressing the research objectives or review quesitons, and, in case of an empirical study, the design of the research methodology? (max. 250 words) • Arguments, perspectives and statements. • Conceptualisations, theories, constructs, variables, etc. • Specific aspects of research method. • Data and results. • Conjectures, findings and conclusions. • Topics for further research and gaps in knowledge. A similar but different approach to literature reviews is described by Wallace and Wray (2006, pp. 31–7).
After determining the use of a source, the second point concerns the evaluation of the background of the research and the author(s). The purpose is to find out whether arguments brought forward are related to the specific research interests;
2.3 Evaluating Literature
35
this could also cover whether bias may occur. For example, a consultant may want to advocate topics that are relevant to sustaining business rather than taking a more objective view on the usefulness of concepts. Also, funding sources may influence the design of research and determine to a certain extent the outcomes. These considerations should be evaluated whether they are relevant to the current study and may have had an impact on the outcomes. A third point in the evaluation is whether the results have validity for the current study for which the literature review is undertaken. Because the research objectives of a study are not exactly replicated in any extant study, most likely only parts of a source may be of interest. These parts of a source sometimes are related to the main arguments and core of the research, but at other times are peripheral. In the latter case, the design of the research process may not produce sufficiently reliable information for the current study. The appraisal of for the third point will result in knowing whether a writing is holistically or partially of relevance to the literature review, and whether the evidence is sufficient for the relevant claims made. The final point of the evaluation is in detail what of the source can be used for a current study. This can concern arguments, statements, theories, conceptualisations, constructs, variables, aspects of research methods, data, results, conjectures, finding and conclusions. When noting the further use in a literature review, it is helpful to make detailed notes, but not just copy and paste. Note that an appraisal is not the same as writing a summary. Such recapitulations only list the key points of a work and noteworthy considerations from the perspective of author(s). A summary may provide a first orientation what a published work is about. Nowadays, these points are often captured by abstracts that are provided when searching for sources, although not every abstract is informative, as hightlighted, for example, by Hartley (2003). What these summaries do not inform is how (selected) information from an article can be used for the study at hand; thus, summaries have a different purpose than critical reviews and are of limited use for analyses and synthesis in the context of a purposeful literature review.
2.4
Synthesising Literature
This also means that a literature review is not directed at summarising, but combining results, conjectures and findings of extant studies for a specific objective of a literature review. Based on appraisals, the commonalities and differences between retrieved works should be looked at. Commonalities indicate could indicate a stronger base of evidence, whereas variation may lead to further insight. This is done from the perspective of the current literature review’s objectives. Often this results in themes and aspects of research that are brought together. These themes for synthesis are linked to the research objectives, aspects of the research methods and outcomes of studies.
36
2 Objectives and Positioning of [Systematic] Literature Reviews
In general, there are no single, formal requirements for synthesising literature, but there are many guidelines, formats, methods and tools that will follow in the next section and other chapters in this book about systematic approaches to literature reviews; for an overview, see the Synopsis and Section 1.4. Results from the analysis, for example, by using the method for critical appraisal in the previous section, can inform themes for synthesis and presentation of the literature review. A method for synthesising is comparing studies one by one using their appraisals. When an appraisal of the individual study is undertaken from the perspective of research objectives, the comparison identifies commonalities and differences based on the responses and findings to the fourth question in Box 2.A. The commonalities and differences across retrieved studies usually shape the themes of the literature review. TIP: AVOID LITERATURE REVIEWS AKIN ANNOTED BIBLIOGRAPHIES In general, literature reviews should not be conducted as annotated bibliographies. An annotated bibliography is a list of resources gathered on a specific topic that include a commentary following the reference. Akin a list of references, annotated bibliographies gather all resources retrieved in the research process in one document. Each citation in the bibliography is followed by an annotation, which is usually a paragraph consisting of five to seven sentences serving as summary, evaluation and reflection on the source. This also means that each source is listed separately and no synthesis across these takes place. An annotated bibliography is different from a literature review because it serves a different purpose. Annotated bibliographies focus on sources gathered for a specific research project, whereas a literature review attempts to undertake a comprehensive analysis and evaluation of all studies and works available for a specific research objective. However, these annotated bibliographies are common in some disciplines. For example, the humanities, such as English, languages, film, and cultural studies, may use this form of evaluating literature. Therefore, it is of paramount importance to look at best practices for literature reviews in a specific domain, and consult supervisors and academics about specific perspectives on literature reviews.
2.5
Archetypes of Literature Reviews
How to evaluate and synthesise existing sources into a literature review can be done to a differing extent. Sometimes, it even seems that publications are written without considering a sufficient amount of extant knowledge. Examples are commentaries and critiques. Sometimes this can serve a specific purpose, for instance when expressing opinions related to research. The critical evaluation of meta-aggregation by Bergdahl (2019), also found in Sections 10.2 and 11.1, is a case in point. However, in most instances an appropriate literature review is needed to inform further research, to have a more founded overview of the effect of interventions or
2.5 Archetypes of Literature Reviews
37
to assess the suitability of theoretical foundations. From this perspective, five archetypes of literature reviews, can be distinguished: narrative overviews, narrative reviews, systematic literature reviews, systematic reviews and umbrella reviews; this is an extension of the classification by Green et al. (2006, pp. 103–5). These five types, from which four are found in Table 2.1, represent very different approaches on how to write a literature review.
2.5.1
Narrative Overviews
The first one of these four archetypes, the narrative overview, generally has a narrow scope of sources cited. Often its aim is to identify works that are pertinent to the objectives of empirical research that follows. The selection of information from articles, reports, etc. is usually subjective, and therefore, may lead to a biased review. It also means that the sources consulted for writing this type of literature review can be incomplete. Such can be balanced by the authors being acknowledged experts in their domain. Nevertheless, when well-written and based on a balanced view these narrative overviews can be thought provoking, introduce new lines of arguments and bring controversies to the fore. Also, commentaries, editorials, and research notes could be classified as part of this category. These commentaries, editorials and research notes may contain useful information for further research or demonstrate relevance of viewpoints. Commentaries can be either advancing the research agenda, critiquing existing approaches or providing contextual information. An example of advancing the research agenda are guest editorials of special issues in academic journals. Sometimes these editorials provide perspectives beyond introducing the topics in a specific issue. An example is the guest editorial by Kühnle and Dekkers (2017), which offers some thoughts on interdisciplinary aspects and paradigms for research into industrial networks that are not commonly found in regular papers about this topic. Research notes often address a specific aspect of a topic or research methods. A well-known paper about research methods is Eisenhardt’s (1989) description of the case study methodology; it is used by others for the research process, the number of cases and the principle of saturation. However, it does not contain any counterarguments for the methodology, even though known at the time of the writing; for example, Runyan (1982, pp. 441–3) raises some concerns for the case study methodology. This means that commentaries, editorials, and research notes written as narrative overviews aim at bringing up a particular perspective on topics or research methods, without necessarily being complete.
• Not usually specified • Potentially biased • Not usually specified • Potentially biased
Not always present
• Not specified • Potentially biased
• Not specified • Potentially biased
Review questions Retrieval
Selection
Broad or narrow
Depends on other works and knowledge assessors, readers, reviewers
Contribution Depends on how reviewers and readers are familiar with conceptualisation and key works
• Rationale for specific viewpoint or proposition • Research project focused on solution or method
Inclusion of key constructs, or alternatively, key works • Might be part of thesis or research project • Independent work for giving direction to research
• Arguments for purpose research • Context of research
• Contains arguments pertinent to purpose research or specific viewpoints • Context of research Justification and rationale
Outcomes
Guided by
Purpose
Narrative review
Narrative overview
• Comprehensive set of relevant sources • Explicit search strategy • Criterion-based selection • Uniformly applied
Inclusion of all relevant works • Independent work for giving direction to research • Secondary research: clarity of effects and contexts • Embedded in review itself • Method of review could be discussed but not how conducted Specific
• Decisive directions
Systematic literature review
• Criterion-based selection • Uniformly applied
• Comprehensive set of relevant sources • Explicit search strategy
Specific
• Embedded in review itself • Method of review could be discussed but not how conducted
• Independent work for giving direction to research, and informing practitioners and policymakers • Secondary research: clarity of effects and contexts
Inclusion of all relevant works
• Decisive directions • Effects and context of interventions
Systematic review
Table 2.1 Archetypes of literature reviews. The four archetypes in this table represent very different approaches and serve also different purposes. A narrative overview normally identifies relevant works for that justify a specific research project or point of view. A narrative review includes key works and more extensive arguments leading to a more detailed view on the topic at hand. A systematic literature review adds clarity about the retrieval of relevant works to the topic and how they were analysed using pre-defined methods, thus adding to rigour (sometimes called reliability or trustworthiness) of the review. A systematic review compares and combines all relevant works related to a specific intervention, method, practice or theory based on a specific question.
38 2 Objectives and Positioning of [Systematic] Literature Reviews
2.5 Archetypes of Literature Reviews
39
Box 2.B Example of Narrative Review A good example of a narrative review that is informative, presents an overview of a topic and traces developments for insight is the article by Newman (2003) on the structure and function of complex networks from a mathematical perspective. Without going into much detail here, it first discusses four types of networks (social networks, information networks, technological networks and biological networks). This is followed by exploring properties of these networks. Coming after these introductory sections is a relative exhaustive overview of mathematical conceptualisations at the time, with origins mentioned. The discourse includes also how studies into mathematical modelling of networks have overcome disadvantages and limitations. At the end of the narrative review an agenda for further research can be found.
It should be noted that the positioning of the topic in Figures 2.1 and 2.2 often takes the form of a narrative overview. Since the main purpose of the positioning of the topic is to provide a rationale and justification for the research objectives, there is less emphasis on being complete. In addition, an actual literature review based on the research objectives will follow after this step. This means that most likely the introduction of a research project will be written in the form of a narrative overview.
2.5.2
Narrative Reviews
The second category in the classification are narrative reviews. Although the selection of information from articles is subjective, they contain at least all key works for the topic and consider ideally all arguments that are supportive and conflicting. This can be achieved by critically appraising sources as described in Section 2.3 and Box 2.A. According to Green et al. (2006, p. 103) some suggest that this may take the form of critiquing preceding works, but in general this should not be considered an adequate form. If done properly, narrative reviews can be informative, present an overview of a topic and trace historical developments of a topic; an example of such a narrative review is given in Box 2.B. In addition, they can be thought-provoking and highlight controversies by offering a different perspective on a topic. This means that narrative reviews may serve different purposes, ranging from historical overviews to implications of studies on a specific topic. Notwithstanding that narrative reviews aim at dicussing key works for the topic and considering arguments from different perspectives, they may have a number of setbacks. The first one is that rarely the methods used in creating the paper are disclosed to the reader. This can be counteracted by clearly defining the topic, the research objectives and related key terms, and how these are found in papers used in
40
2 Objectives and Positioning of [Systematic] Literature Reviews
the narrative review. In addition, the sources employed to find the literature are incomplete, possibly creating an insignificant knowledge base from which to draw a conclusion. An example would be when supervisors of undergraduate and postgraduate dissertations advise to use specific papers, which are sometimes less relevant to the topic. Thus, the selection of information from articles is usually subjective, lacks explicit criteria for inclusion and could lead to a biased review. TIP: HOW TO IMPROVE THE QUALITY OF NARRATIVE REVIEWS There are a number of ways the quality of a narrative review can be improved in addition to points mentioned in Chapter 3: • Define precisely what the literature review needs to find out and search for sources accordingly. This will lead to a more complete picture of arguments, evidence, context of studies, etc. • Set the boundary of the topic. Setting this boundary about what to include and what not to include could provide clarity about arguments and evidence; this carries some similarities to the inclusion and exclusion criteria discussed in Sections 6.2 and 6.3. However, it should not lead to a too narrow range of studies for the literature review so that conveniently evidence, perspectives or thoughts are excluded. An example of the latter would be focusing for a topic on literature related to a very specific industrial sector, whereas for other sectors a theory has already been disproven. • Consult experts; see also the tip in Section 2.1. However, it should be noted that experts might hold specific views on topics that might hinder creating a more complete picture of the literature for a specific topic. • Look for counterarguments, discourse about points of contention and contradictory (or neutral) evidence in papers retrieved. This will help to create a more in-depth and balanced view of the topic. • Consult the list of references found in relevant papers that may help to advance arguments. This technique is called snowballing, see Section 5.5. • Try to find other related strands of research. Especially, this is help for creating insight crossing disciplinary boundaries. An example, would be using models from evolutionary biology to describe phenomena in economics (this is acually part of a strand of research called evolutionary economics). These six ways can also be used in conjunction to be more effective, because they have different effects on the content and quality of a narrative review (note that the guidance can also be used for narrative overviews).
2.5.3
Systematic Literature Reviews
Systematic literature reviews are the third archetype of literature reviews. These reviews follow a pre-specified method or protocol by which a body of literature is aggregated, reviewed and assessed. In order to reduce bias, the rationale, the
2.5 Archetypes of Literature Reviews
41
hypothesis or review questions, and the methods for retrieving sources and data collection are prepared before the review. Consequently, they guide the actual undertaking of a review. Just like any other literature review, the goal is to identify, critically appraise and synthesise the existing evidence concerning clearly defined review questions. This means that systematic literature reviews allow examining conflicting and coincident findings across studies as well as identifying themes that require further investigation. Furthermore, they include the possibility of evaluating consistency and generalisation of the evidence regarding specific questions, and therefore, are also of practical value for domains. The analysis and synthesis normally is qualitative, but could be supported by quantification or quantitative analysis, such as bibliometric analysis (Section 9.4) or systematic quantitative literature reviews (Section 9.5). In general, the methods for these reviews are particularly useful to integrate information of a group of diverse studies investigating the same phenomenon or similar phenomena and typically focus on a very specific empirical question. Systematic literature reviews are either conducted as stand-alone research project or part of research project; for the latter, see Figure 2.2. As stand alone, they seek as outcome how effective theories, conceptualisations, frameworks, methods and tools are. Such outcomes can be used by practitioners and by scholars for further research. As part of empirical research systematic literature reviews aim at creating more convincing evidence than possible with a narrative overview or narrative review. Thus, whether as stand-alone study or part of an empirical study, systematic literature reviews could provide rigourous insight into the current state-of-the art of literature.
2.5.4
Systematic Reviews
The fourth category are systematic reviews. They are similar to systematic literature reviews, but focus on comparing and aggregating studies for specific interventions, treatments, practices, phenomena and effects; they are also used for generating and testing theory. The main difference between a systematic literature review and systematic review is that the latter considers only primary research (i.e., empirical studies), whereas the first may consider a wider range of sources. Furthermore, systematic reviews have a clearly defined question that guides the analysis. In general, these reviews also contain relevant information for practitioners by aiming for evidence-based interventions, policies, practices and treatments. Systematic reviews are common in medicine and nursing, but can also be found in other disciplines, such as economics, education and psychology. Furthermore, the methods can be based on quantitative, qualitative or mixed-methods synthesis (see resp. Chapters 7, 8, 9, 10, 11 and 12). Thus, whether qualitative or quantitative, systematic reviews have a narrow focus, generically speaking, and are protocol-driven.
42
2 Objectives and Positioning of [Systematic] Literature Reviews
A specific type of systematic review is called the rapid review. A rapid review is essentially a fast-tracked version of the systematic review. These reviews are typically done when policymakers are working within a specific, tight timeframe and need a quick turnaround. As a result, some critical systematic review steps are either modified or skipped entirely in a rapid review. For example, there might be less comprehensive search strategies, reduced use of specific literature (which can be challenging to find), more basic data extraction, and only simple quality appraisal. Results of a rapid review are often presented in a narrative and tabular format.
2.5.5
Umbrella Reviews
The final category are reviews of (systematic) reviews, aka meta-reviews and umbrella reviews; essentially these are systematic reviews. The purpose of these reviews is putting together a broader evidence base building on existing reviews. According to Fusar-Poli and Radua (2018, p. 95), umbrella reviews are most suitable when previous studies are highly controversial or might have been subject to bias. In addition, they state that a strong rationale is needed before an umbrella review can be undertaken. Also, there should be sufficient systematic reviews available for such an umbrella review. Because the approach to an umbrella review is no different than that for a systematic review, in principle, Table 2.1 does not contain a separate column for these. NOTE: SIMILARITIES BETWEEN ARCHETYPES ALLOWS TAKING ADVANTAGE OF INSIGHT Whereas it may seem that the remainder of the book is written for conducting systematic literature reviews and systematic reviews, actually a considerable amount of practices, methods and tools can be shared across the four archetypes. Particularly, there is more guidance written for specific aspects of systematic literature reviews and systematic reviews. Nevertheless, the advice can be applied to the two other archetypes, narrative overviews and narrative reviews. An example is the effectiveness of the keywords, controlled vocabulary and databases search strategy presented in Section 5.4, which has validity of all archetypes; only for systematic literature reviews and systematic reviews this is explicitly part of the protocols. Thus, understanding and working knowledge of all archetypes will improve the undertaking of literature reviews to be more effective and suited to the purpose.
2.6
Propositional Logic and Literature Reviews
How to undertake literature reviews, particularly for empirical studies, also depends on propositional logic. Often, empirical research is classified as being deductive or inductive. This difference is based on concepts from propositional logic. This
2.6 Propositional Logic and Literature Reviews
43
section will not go into detail about propositional logic, but briefly introduce four relevant concepts for empirical studies, before linking them to with what purpose literature reviews can be conducted.
2.6.1
Brief Introduction to Propositional Logic
With propositional logic a branch of discrete mathematics and philosophy of resarch, for (empirical) research it plays an important role in establishing whether reasoning derived from analysis of data is valid or invalid; in the case of literature reviews of interest are inference rules. A well-known example is: • Premise A: If it is raining, then it is cloudy. • Premise B: It is raining. • Conclusion: It is cloudy. The use of inference rules is common in research, albeit sometimes implicitly. An inference rule for implication is written in the form: p!q
ð2:1Þ
In this equation as conditional statement p stands for the antecedent and q for its consequent. Note that in terms of propositional logic, this is a simplification for the purpose of how to conduct literature reviews. To this end, four types of empirical research can be distinguished: the hypothetico-deductive, inductive, reductive and design research to the design of research methodology; in the following paragraphs these typical approaches to research will be explained in more detail. The hypothetico-deductive method aims at forming scientific theory, laws of observed regularities and principles through direct observation and experimentation based on prediction of results; see Table 2.2. The prediction of the results is derived from an established theory or a tentative theory (sometimes in the form of laws of observed regularities or principles). This informs hypotheses or propositions that are subsequently verified through observation and experimentation. The hypothesis or propositions should be formulated in a form that can be falsifiable, using a test on observable data where the outcome is not yet known. It also means that a generic theory, law or principle is applied to a specific sample and circumstances. Deviations from the prediction then lead to either refuting the theory, law or principle for the sample and circumstances, or to revising the theory, law or principle. Although observations and experiments are at the core of the hypothetico-deductive method, the a priori formation of theories, laws of observed regularities and principles is a particular feature of this method. The second type method, the inductive method, takes observations as starting point, with the aim of finding statements about a phenomenon in terms of theories, laws of observed regularities, principles and perspectives. To a certain extent this is the opposite of the hypothetico-deductive research method. In the inductive method
44
2 Objectives and Positioning of [Systematic] Literature Reviews
Table 2.2 Propositional logic and literature reviews. The four basic forms that propositional logic can take for research lead to different foci for the literature review. In the case of the hypothetico-deductive method a literature review is directed at relevant theories for a context, for which mediating and convoluting factors may determine the applicability of these theories. In the inductive method the observation of facts and meanings leads to new theory; therefore, the literature review mostly aims at identifying gaps in scholarly knowledge. The literature review for a reductive method considers similarities and differences between observations and inferences. The design method, commonly found in engineering but applicable to other domains, is teleological and so is its literature review; of particular interest is how a state of a system can be achieved based on existing knowledge or its extensions. Hypothetico–Deductive
Inductive
Reductive (Presumption of fact)
Design
p q p!q
p!q q p
q
Conjecture
p!q p q
Reasoning
From generic to particular
From particular to particular
Principal domain
• Mathematics • Formal reasoning • Physics and life sciences
From particular to generic • Sciences • Social sciences
Focus of literature review
• Existing theories and studies, relevant to phenomenon and context • Mediating and convoluting factors
Formal logic Requisite
• Incomplete or inadequate theories • However, part may be deductive
• • • • • •
Law Historical sciences Medicine Geology Astronomy Similar cases, situations, particularly (dis) similarities
p p! q From generic to generic • Engineering • Arts and creative industries • Pedagogy • All relevant theories, propositions that make q possible (teleological) • Or synthesis
for research in first instance no hypotheses or propositions are formulated. Rather, research questions cover identifying aspects and features that are relevant and how these are related to each other. When using the inductive method, it is important to establish both internal and external validity, or in the case of qualitative research credibility and transferability; internal validity refers to the logic of factors and aspects and how there are related, whereas external validity points the extent to which the results of a study can be generalised to other samples and contexts. The reductive research methodology, the third type in this overview, aims at finding out in which cases or situations a similar consequent was linked to a conditional statement in the form of an antecedent. One possibility is that that the methodology aims at verification: in this case, the antecedent is taken as starting point and then evidence is sought to corroborate the consequent (which is given). Alternatively, the consequent is taken as given and the argument proceeds to the antecedent. Both resemble the hypothetico-deductive research methodology, but differ because the conditional statement (i.e., theories, laws of observed regularities and principles) are not investigated when following the reductive research methodology. The second possibility is the generalisation of the consequent. In this
2.6 Propositional Logic and Literature Reviews
45
case, a specific consequent is considered valid for a larger class of similar phenomena; this resembles the inductive method, but differs because the conditional statement and the related antecedent are not investigated. This means that the reductive research methodology does not investigate the forming of theories, but seeks how similarly appearing phenomena are related. The fourth and final type is the design research methodology. In this approach, the target state or requirements for a system are set and the research aims to find if a method, tool derived from theories, laws and principles are achieving this target state or meeting the requirements. For this reason, this research is also sometimes called practice-based or application-oriented research. Both the elicitation of requirements and the investigation of methods and tools are key topics of this research. However, to achieve this, prior research of the types hypothetico-deductive and inductive is necessary to establish theories, laws of observed regulariries and principles that can be used in the design research methodology; this also means that the methods for this research methodology are not limited to particular domains.
2.6.2
Overview of Literature Reviews Related to Propositional Logic
Because the hypothetico-deductive research methodology seeks the a priori formation of theories, laws of observed regularities and principles, the literature review mimics this search; see Table 2.2. Thus, the perusal of literature aims at finding out what has already been studied in terms of generic theories, laws of observed regularities and principles and whether this has accounted for the particular sample or circumstances. Furthermore, the literature review may include looking for mediating and convoluting factors related to the particular sample or circumstances. This means that the literature review covers the search and analysis of literature directed at a particular sample or specific circumstances to formulate testable and falsifiable hypotheses. The literature review for an inductive research methodology searches for a gap in knowledge; if there are existing theories, laws of observed regularities or principles the analysis aims at finding out why these are inadequate and to what extent. Also, in these type of literature reviews, the specifics of the sample or the circumstances may be looked at. In practice, because of the extent of knowledge, it means that part of such research may be deductive, i.e., there are existing theories, laws of observed regularities and principles, whereas other parts of the research will be inductive because of inadequate or incomplete theories, laws and principles. Therefore, the aim of this type of literature review is primarily aiming at delineating the applicability of existing knowledge from the knowledge that is required to understand a specific phenomenon.
46
2 Objectives and Positioning of [Systematic] Literature Reviews
When conducting a study with the reductive research methodology, the literature review aims at finding similar cases or circumstances. Particularly, with regard to the generalisation of the consequent, the search focuses on similarity. Also, dissimilarities could be looked at to find what the delineation of the generalisation is. Note that this research methodology does not consider directly the validity of the conditional statements. The reductive method merely seeks to establish whether there are more cases that are similar and which cases are too dissimilar for the conditional statement to be true. In the case of the design research methodology, the literature review focuses on how the consequent can be achieved. In general, there are several (principle) solutions, methods and tools derived from theories, laws of observed regularities and principles that can be used. The effectiveness of these artefacts forms the evaluation criteria. It can also be that there is a quest to improve the effectiveness of the (principle) solutions, methods and tools. This also implies that the literature review for the design method has a comparative character.
2.7
Research Paradigms and Literature Reviews
In addition to looking at literature reviews from the perspective of research methods, also the research paradigms adopted by researchers for empirical studies may influence how a literature review is undertaken. A research paradigm2 constitutes a belief about the way in which data about a phenomenon should be gathered, analysed and used. The term epistemology (what is known to be true) as opposed to doxology (what is believed to be true) encompasses the various philosophies of research paradigms. Writings on this topic take sometimes differing views on the research paradigm. Here a modification of Creswell’s (2014, p. 6 ff.) classification will be taken as starting point: positivism, interpretivism, pragmatism and advocacy; in the next paragraphs these will be briefly explained and linked to the nature of the literature review. As the first of this classification of research paradigms, positivism is a perspective that assumes knowledge can only be based on natural phenomena, their properties and relationships; see Table 2.3. The view that only phenomena only follow natural laws and patters extends to social sciences, too, according to this
The term ‘research paradigm’ instead of ‘research philosophy’ has been adopted in this book, following the thoughts of Kuhn (1962, p. 23), who describes a paradigm as ‘accepted model or pattern’, a reflection of a particular discourse and a philosophical position relating to the nature of social phenomena and social structures. In this sense, a paradigm directs research efforts, it serves to reassert itself to the exclusion of other paradigms and to articulate the theories it already established. Paradigms could be interpreted as prescriptive and as requiring particular research methods and excluding others. Feilzer (2010, p. 7) derives from Kuhn (1962, p. 24) and Mills (1959), a paradigm can constrain intellectual curiosity and creativity, blind researchers to aspects of social phenomena, or even new phenomena and theories, and limit sociological imagination. 2
2.7 Research Paradigms and Literature Reviews
47
Table 2.3 Research paradigms and literature reviews. The research paradigm also determines how a literature review for an empirical study will be conducted. Literature reviews for positivist studies aim at theory generation and verification. In interpretivist studies the attribution of meaning in works takes a central place in the literature review. For pragmatic studies the focus of the literature review is on how a problem can be resolved or how practitioners can use findings. When undertaking advocacy or participative studies the literature review centres on how change can be established. Positivism
Interpretivism
Pragmatism
Key aspects
• Determination (factors) • Reductionism • Empirical observation and measurement
• Consequences of actions • Problemcentered • Pluralistic
Focus of literature review
• Theory generation and verification
• Understanding • Multiple participant meanings • Social and historical construction • Theory generation for meaning
• Real-world and practiceoriented
Advocacy/ participation • Political • Empowerment issue-oriented • Collaborative
• Oriented at change
research paradigm; the extension to social sciences is sometimes called post-positivism. It means that in positivist studies the role of the researcher is seen as limited to data collection and interpretation in an objective way. In these types of studies research variables and constructs are usually observable and quantifiable, as are findings. Such implies that reality is simplified to factors and variables to comprehend it; this is sometimes called reductionism when the explanation search for the smallest unit or entity to explain the phenomenon. Thus, information derived from sensory experience, interpreted through reason and logic, forms the exclusive source of all certain knowledge in this perspective. The literature review for this type of studies is focused on finding theories, laws of observed regularities and principles (akin to the hypothetico-deductive research methodology) or identifying constructs and variables (similar to the inductive research methodology). Based on these foci the foundation will be laid for the empirical research. This means that positivism emphasises the determination and verification of observable aspects and variables to search for natural patterns and laws of observed regularities that to a certain extent are universally valid; the literature review reflects this approach. Interpretivism, closely related to the term constructivism,3 is the second research paradigm here and is based on the premise that people construct their own understanding and knowledge of the world, through experiences and reflection; see Table 2.3. Research based on this perspective assumes that access to reality (given or socially constructed) is only possible through social constructions, such as
3
The terms constructivism and interpretivism are often confused. In this writing constructivism is taken as gravitating towards an ontological perspective and interpretivism as an epistomological perspective for this research paradigm.
48
2 Objectives and Positioning of [Systematic] Literature Reviews
language, shared meanings, beliefs and related artefacts. Studies usually focus on meaning and may employ multiple research methods in order to reflect different aspects of the phenomenon that is investigated. In these studies, constructs and factors related to phenomena are subject to how actors view them, as opposed to positivism, in which these constructs and factors lead to universally valid laws and patterns. This means that the literature review for interpretivist studies follows this point and view and searches for how experiences have been reflected on by actors, which shared meanings have occurred, how this is expressed in artefacts and how specific actors influence these. Thus, in interpretivist studies and related literature reviews knowledge is transmitted to through ideas, discourses and experiences. Whereas positivism and interpretivism focus on generating knowledge, pragmatism as the third classification of research paradigms aims at action, intervention and constructive knowledge; see Table 2.3. This makes it appropriate as a basis for research approaches intervening into the real-world and not merely observing the world. For example, this would be the case if the intervention is organisational change (as in action research) or building of artefacts (as in design research). For the latter it is closely related to the design approach in the previous section. For these actions, interventions and knowledge that can be applied it can rely on the outcomes of both positivist and interpretivist studies. Thus, in this perspective an ideology or proposition is only true if it can be applied satisfactorily; the meaning of a proposition is to be found in the practical consequences of accepting it and unpractical ideas are to be rejected. Literature reviews for this type of studies concentrate on how specific actions, interventions and constructive knowledge are effective. Pragmatism and its related literature reviews focus on how knowledge can be applied rather than the generation of new knowledge; note that those adhering this view also extract new knowledge from the application of existing knowledge. Studies adopting an advocacy or participatory approach, the fourth type of research paradigm in the classification, aim to bring about positive change in the lives of the research subjects; see Table 2.3. The approach is sometimes described as emancipatory, and therefore, does not take a neutral stance. The researchers are likely to have an agenda and may give the groups they are studying a voice in a discourse. As they want their research to directly or indirectly result in some intervention, it is important that they involve the group of subjects being studied in the research, preferably at all stages, so as to avoid further marginalising them. This could involve interacting informally or even living amongst the research participants (who are sometimes referred to as co-researchers in recognition that the study is not simply about them but also by them). Whilst this type of research could by criticised for not being objective, it should be noted that for some groups of people or for certain situations, it is necessary as otherwise the thoughts, feelings or behaviour of the various members of the group could not be accessed or fully understood. Literature reviews for this type of research—advocacy—are often narrative overviews or narrative reviews; the reviews may be biased in terms of which sources are considered.
2.8 Avoiding Plagiarism
2.8
49
Avoiding Plagiarism
No matter the research paradigm or the approach to the research in terms of propositional logic, how sources are consulted is reflected in the actual writing of literature reviews and is key to any type of academic work; not citing sources that informed a work is referred to as plagiarism. The latter means that information from sources is used, but the sources are not or not appropriately acknowledged. An example is the work of the well-known economist Joseph Alois Schumpeter. Doubts have been raised about the originality of his work, particularly for concepts central to his stance about innovation. In this regard, core concepts are creative destruction that Reinert and Reinert (2006, p. 72) attribute to Werner Sombart, another German economist, and different types of innovation, which appears in the preceding work of Albert Schäffle, according to Balabkins (2010, pp. 120–1). According to Schneider (1970, cited in Balabkins 2010, p. 119), these inconsistencies should be seen as ‘infelicitious attributions’ rather than plagiarism. This indicates that how plagiarism occurs varies from simply not acknowledging any works relevant to a text or part of it to inaccuracies in citing. However, conducting an appropriate literature review and adequately acknowledging sources avoids plagiarism in the first place. By searching for all relevant sources related to a specific topic or phenomenon, no viewpoints are discarded. By keeping records of relevant sources consulted and using these for writing the actual review the chances for plagiarism will be decreased. Thus, plagiarism can be avoided by diligence of the researcher during searching and analysing relevant works. A particular phenomenon that has drawn attention more recently is self-plagiarism. It is the reuse of significant, identical, or nearly identical portions of one’s own work without acknowledging that one is doing so or citing the original work. A case in point of self-plagiarism are the publications Grant et al. (2013a, b), Mayo-Wilson et al. (2013a, b) and Montgomery et al. (2013). All five papers have the same title, address the same topic, have similar contents and use similar references. In current terms this would be seen as self-plagiarism and not an acceptable practice. This means that previously published work by authors should be cited in a new manuscript; however, to ensure a contribution to knowledge there should be sufficient differentiation between the new work and the cited sources. Another particular phenomenon associated with plagiarism is when a study or publication published in a particular language is translated into another language without accrediting the original work or appropriately acknowledging the author. This type of literature theft by translation often involves copying a major part of the work in another language. An example is the the publication by Boughzala (2020) about social innovation and entrepreneurship in Tunisia, written in French. An almost identical article using the same data was published by Fridhi (2021),4 but in 4
At the time of writing this book, the editors of the journal Innovations. Journal of Innovation Economics & Management have lodged a formal objection to the publishers of the second
50
2 Objectives and Positioning of [Systematic] Literature Reviews
English. Differences between the papers may occor because of review processes and revisions that are required before publications; see Sections 16.5 and 16.6. However, in such cases the factual content is the same or has a high degree of similarity. This example shows how important avoiding plagiarism as well as authors being aware that their work may be used by others without consultation or without acknowledgement. TIP: CHECKING FOR PLAGIARISM A good way to check for plagiarism is using plagiarism software once a literature review or a study is completed; this software is also used by publishers and academic institutions to detect any issues with regard to the originality of submitted work. It can avoid surprises and may lead to sources overlooked during the search. Particularly, it can be helpful to find sources that were outside the scope of a study, strand of research or a domain. Note that it is the expectation that researchers search for relevant sources anyhow, so that using plagiarism software is most helpful when a literature review or study is complete. However, a high plagiarism score as outcome of a check by plagiarism software does not necessarily indicate plagiarism. A high score can be caused by several reasons, with the following among them: • The literature review correctly cites studies, but each citation-in-text is detected as a separate incident by the software. Principally, taking fragments, sentences and statements from studies is not incorrect, but it could be worthwhile to consider appropriate paraphrasing; Section 3.5 contains guidance for paraphrasing in the context of close reading. • The literature review is a collation of statements by studies. Again, principally, this is not incorrect, but as stated in Section 2.4 the purpose of a literature review is to synthesise scholarly knowledge relevant to a topic rather than citing studies directly or summarising; the latter leads to a superficial treatment of literature, often called a topical survey, which should be avoided. Suggestions how to report literature are found in Chapter 13, particularly in Section 13.3. • The literature review contains second-hand citations (see Section 3.5). Depending on the algorithms of the software, each source may be seen as a case of plagiarism. However, second-hand citations should be treated cautiously because they could be inaccurate; Section 3.5 provides an example and guidelines. • The same phrasing is repeated throughout the literature review. Then each time the same phrasing is used is seen by the software as an incident. This is more difficult to change, because sometimes using the same phrasing, for example naming three theories in the same order, results in more accessible reading. This means that outcomes of checking literature reviews with plagiarism software should be assessed by authors, reviewers and assessors before judgement is
publication. This note is written with the sole purpose to indicate how different types of plagiarism may occur.
2.8 Avoiding Plagiarism
51
conferred; better is that those writing a literature review be mindful about the points raised so that plagiarism is avoided.
2.9
Key Points
• Literature reviews critically evaluate sources from a specified research objective. As they look at other sources for finding out what has been written about a specific topic, summaries or annotated bibliographies are in general insufficient to this purpose. What is expected is an in-depth appraisal of individual studies and how this can be synthesised into a coherent overview about the state-of-the art for a specific topic or research question; the latter can be in the form of a review question when a literature review is a stand-alone study (also called a review of studies). • The generic process for conducting literature reviews is found in Figure 2.4. Derived from the research objectives, specific review questions are set for the literature review. The purpose of the research in terms of propositional logic also determines what the literature reviews aims to find out. • A method for appraising literature consists of four points to be evaluated for each source (see Section 2.3 and Box 2.A for more detail): • Which purpose does the source serve for the current study (dissertation, doctoral thesis, research project)? • What is the background of the author(s) and study? • Is the design of its research process and are the results trustworthy for the objectives of the current study? • How can this source be used for the research objectives and the design of the research method? • There are four archetypes of literature reviews, which can be found in Section 2.5 and Table 2.1: • Narrative overviews. These overviews are limited in scope aiming primarily at supporting empirical studies, developing propositions or hypotheses, or presenting a specific point of view. • Narrative reviews. This archetype presents a detailed analysis of extant literature, discusses counterarguments and contradictions, and makes literature accessible to other readers. Such reviews may also address gaps in existing literature and advance arguments. • Systematic literature reviews. The aim of systematic literature reviews is considering all relevant literature for a specific topic. To this purpose, these reviews are protocol-based for the retrieval of sources and analysis. • Systematic reviews. Similar to the systematic literature reviews, this type of review looks at all relevant literature for achieving a specific performance,
52 Research paradigm • Positivism • Interpretivism • Pragmatism • Advocacy
2 Objectives and Positioning of [Systematic] Literature Reviews
Defining Research Objectives
Research methodology • Hypothetico-deductive • Inductive • Reductive • Design (applied)
Defining Purpose of Literature Review
Setting Review Questions
(Designing) Method for Literature Review
Retrieval of Sources
Analysis of Sources
Method for appraisal • Use of source in literature review • Background authors and study • Credibility design of research • What can be used
Synthesis of Findings
Fig. 2.4 Overview of research paradigms and appraisal for literature reviews. Building on Figure 2.3 it is shown that the epistemological stance and the purpose of the research are reflected in the research objectives; these objectvies set the stage for the lierature review. The presented method for the appraisal of literature is used in the stage of analysis.
effect or outcome of interventions. This includes also so-called umbrella reviews, which are essentially bringing systematic reviews on a specific topic together. • How literature reviews are undertaken can also be connected to research strategies and research paradigms. The propositional logic used in a study for the research strategy determines how to evaluate extant literature. For deductive studies it is about which theories apply to a specific phenomenon, whereas for example, the inductive approach for design considers conditional statements (of the form: if …, then …). Moreover, the epistemological and ontological perspectives of a study determine what is considered to be forming knowledge. Therefore, what is extracted from extant literature and how this is appraised is related to the research paradigm; see Table 2.3 for an overview. • Avoiding plagiarism is key to any scholarly work. Thus, it is expected that a study searches for and acknowledges all relevant works and sources that preceded it. A particular point of attention and contention is self-plagiarism.
2.10
2.10
How to …?
53
How to …?
2.10.1 … Select an Appropriate Approach to a Literature Review? The approach to the literature review is determined by the purpose of the study. Table 2.1 provides guidance to planning the literature review. A narrative overview will be appropriate (i) when readers of the study have insight in the current state-of-the-art with respect to scholarly knowledge, and (ii) when the context of the research allows a limited number of sources to be consulted. A narrative review aims at discussing all relevant insight pertaining to a topic and contains a discussion of contradicting arguments; other scholars will view such a review as being complete in its coverage and argumentation. A systematic literature review considers all relevant works to a topic, if possible, and is driven by a protocol for retrieval and detailed analysis. This also implies that review questions are set that are more narrowly defined than those used for narrative reviews. A systematic review considers also all relevant works, but its purpose is to collect and review all evidence with regard to a specified effect or intervention; the latter is the reason why these systematic reviews are consulted by practitioners in addition to scholars. Therefore, the four archetypes of literature reviews have differing purposes, methods for retrieving sources, approaches to evaluating arguments and findings found in sources and intended outcomes; such means that the selection of an appropriate method is determined by the context in which a literature review takes place, as part of empirical study or as independent work.
2.10.2 … Evaluate Which Studies Are of Interest for a Review The context in which a literature review takes place, as part of empirical study or as independent work, also determines which sources should be consulted as part of the literature review. Studies that will inform answering the research question or review question should be considered; how to formulate review questions will be detailed in Chapter 4. When searching for relevant sources a researcher may come along some that are less informative and others that contain more information with regard to the purpose of the study; this will be reflected in the way each relevant study is appraised. If particular relevant studies are discarded, then bias manifests itself. This is the reason that some fall back on systematic literature reviews and systematic reviews to decrease potential bias by researchers; however, this is only possible when the purpose of the literature review is relatively narrowly focused. Therefore, all relevant sources should be included in a literature review, determined by its purpose, and in such a manner that bias is reduced.
54
2 Objectives and Positioning of [Systematic] Literature Reviews
2.10.3 … Avoid Plagiarism Essentially, conducting an appropriate literature review warrants originality as long as the study is unique and aims at contributing to scholarly knowledge; furthermore, plagiarism can be avoided in four ways. First, each source used in a literature review should be appraised in-depth; Section 2.3 and Box 2.A present a method for appraisals. Second, from each source consulted, whether used later or not, a record along with notes should be kept. Third, making sure that publications produced during a study are cited and are different from the work being created; this avoids self-plagiarism. Fourth, plagiarism software can be used to track if there are any issues that are flagged; this can particularly be helpful to find sources that need acknowledgement beyond the scope of a study or domain. Assuring the originality of a study in these four ways avoids plagiarism.
2.10.4 … Write a Literature Review At the core of writing a literature review is the synthesis of existing knowledge to inform an empirical study (which is called a literature review for a study) or to advance scholarly knowledge and theoretical insight (which is called a literature review of studies) to provide evidence for the effectiveness of interventions, policies and practices (which is also called a literature review of studies, and sometimes, synthesis). This purpose should be kept in mind when appraising literature on a topic. To achieve this, all relevant works should be retrieved and analysed on how they contribute to scholarly knowledge, and how they contribute to further research to be undertaken. In the case of a literature review for a study this should lead to a justification of the empirical study, the identification of gaps in scholarly knowledge and the determination which extant literature can be built on; this could extend to literature pertaining to the research method. In the case of a literature review of studies the purpose is to evaluate existing evidence in studies and draw findings about effectiveness of interventions, policies and practices or about scholarly insight. Although a literature review for a study and a literature review of studies share some similarities, they differ with regard to their purpose and how they are used by others.
References Balabkins NW (2010) Joseph A. Schumpeter: not guilty of plagiarism but of “Infelicities of Attribution”. In: Aronson JR, Parmet HL, Thornton RJ (eds) Variations in economic analysis— essays in Honor of Eli Schwartz. Springer, New York Bergdahl E (2019) Is meta-synthesis turning rich descriptions into thin reductions? A criticism of meta-aggregation as a form of qualitative synthesis. Nurs Inq 26(1):e12273. https://doi.org/10. 1111/nin.12273
References
55
Boughzala Y (2020) Vers une approche collective de l’innovation sociale: le rôle joué par l’entrepreneuriat social en Tunisie. [Towards a collective approach of social innovation: the case of the social entrepreneurship in Tunisia]. Innovations 62(2):161–189. https://doi.org/10. 3917/inno.062.0161 Creswell JW (2014) Research design: qualitative, quantitative, and mixed methods approaches, 4th edn. Sage, Los Angeles Dekkers R (2017) Applied systems theory, 2nd edn. Springer, Cham Eisenhardt KM (1989) Building theories from case study research. Acad Manag Rev 14(4):532– 550. https://doi.org/10.5465/amr.1989.4308385 Feilzer MY (2010) Doing mixed methods research pragmatically: implications for the rediscovery of pragmatism as a research paradigm. J Mixed Methods Res 4(1):6–16. https://doi.org/10. 1177/1558689809349691 Fridhi B (2021) Social entrepreneurship and social enterprise phenomenon: toward a collective approach to social innovation in Tunisia. J Innov Entrepren 10(1):14. https://doi.org/10.1186/ s13731-021-00148-6 Fusar-Poli P, Radua J (2018) Ten simple rules for conducting umbrella reviews. Evid Based Mental Health 21(3):95–100. https://doi.org/10.1136/ebmental-2018-300014 Grant S, Mayo-Wilson E, Hopewell S, Macdonald G, Moher D, Montgomery P (2013a) Developing a reporting guideline for social and psychological intervention trials. J Exp Criminol 9(3):355–367. https://doi.org/10.1007/s11292-013-9180-5 Grant S, Montgomery P, Hopewell S, Macdonald G, Moher D, Mayo-Wilson E (2013b) Developing a reporting guideline for social and psychological intervention trials. Res Soc Work Pract 23(6):595–602. https://doi.org/10.1177/1049731513498118 Green BN, Johnson CD, Adams A (2006) Writing narrative literature reviews for peer-reviewed journals: secrets of the trade. J Chiropr Med 5(3):101–117. https://doi.org/10.1016/S0899-3467 (07)60142-6 Hartley J (2003) Improving the clarity of journal abstracts in psychology: the case for structure. Sci Commun 24(3):366–379. https://doi.org/10.1177/1075547002250301 Kuhn TS (1962) The structure of scientific revolutions. University of Chicago Press, Chicago Kühnle H, Dekkers R (2012) Some thoughts on interdisciplinarity in collaborative networks’ research and manufacturing sciences. J Manuf Technol Manag 23(8):961–975. https://doi.org/ 10.1108/17410381211276826 Mayo-Wilson E, Grant S, Hopewell S, Macdonald G, Moher D, Montgomery P (2013a) Developing a reporting guideline for social and psychological intervention trials. Trials 14(1):242. https://doi.org/10.1186/1745-6215-14-242 Mayo-Wilson E, Montgomery P, Hopewell S, Macdonald G, Moher D, Grant S (2013b) Developing a reporting guideline for social and psychological intervention trials. Br J Psychiatry 203(4):250–254. https://doi.org/10.1192/bjp.bp.112.123745 Mills CW (1959) The sociological imagination. Oxford University Press, New York Montgomery P, Mayo-Wilson E, Hopewell S, Macdonald G, Moher D, Grant S (2013) Developing a reporting guideline for social and psychological intervention trials. Am J Public Health 103(10):1741–1746. https://doi.org/10.2105/ajph.2013.301447 Newman MEJ (2003) The structure and function of complex networks. SIAM Rev 45(2):167–256 Paré G, Trudel M-C, Jaana M, Kitsiou S (2015) Synthesizing information systems knowledge: a typology of literature reviews. Inform Manag 52(2):183–199. https://doi.org/10.1016/j.im. 2014.08.008 Pugh S (1981) Concept selection: a method that works. Paper presented at the international conference on engineering design, Rome Reinert H, Reinert ES (2006) Creative destruction in economics: Nietzsche, Sombart, Schumpeter. In: Backhaus JG, Drechsler W (eds) Friedrich Nietzsche (1844–1900): economy and society. Springer, New York, NY, pp 55–85 Runyan WM (1982) In defense of the case study method. Am J Orthopsychiatry 52(3):440–446. https://doi.org/10.1111/j.1939-0025.1982.tb01430.x
56
2 Objectives and Positioning of [Systematic] Literature Reviews
Salgado EG, Dekkers R (2018) Lean product development: nothing new under the sun? Int J Manag Rev 20(4):903–933. https://doi.org/10.1111/ijmr.12169 Strauss AC, Corbin J (1998) Basics of qualitative research techniques and procedures for developing grounded theory. Sage, London Wallace M, Wray A (2006) Critical reading and writing for postgraduates. Sage Publications, London Ward A, Liker JK, Cristiano JJ, Sobek DK II (1995) The second Toyota paradox: how delaying decisions can make better cars faster. Sloan Manag Rev 36(3):43–61 Xiao Y, Watson M (2017) Guidance on conducting a systematic literature review. J Plan Educ Res 39(1):93–112. https://doi.org/10.1177/0739456X17723971
Chapter 3
Quality of Literature Reviews co-authored by Harm-Jan Steenhuis
Whereas the starting point of a literature review is presented in Chapter 2—finding out more about what is written about a specific topic by evaluating it from a critical objective—, it leaves open what constitutes a good quality literature review, whether as review of scholarly knowledge before an empirical study or as stand-alone study. Keeping in mind that there are different archetypes of literature reviews, see Section 2.5, also the way of looking at quality will vary across these types and with the objective of the literature review. Thus, it deserves a closer look at how quality of literature reviews can be assured. This chapter considers how to look at the quality of literature reviews from different perspectives. It starts by looking at what quality means related to different purposes of literature reviews in Section 3.1. It appears that there are differing viewpoints on what quality is, also connected to the skills of those reviewing literature. Section 3.2 goes into more detail for the processes of the four archetypes that were introduced in Section 2.5. At the end of this section, there is an overview of the archetypes with regard to their purpose and quality. Section 3.3 associates literature reviews with research paradigms. Looking at research paradigms brings a different perspective for how to assess the quality of a literature review. How these criteria can be used for literature reviews is also found in the section; this includes overviews of criteria in the form of tabulations. Section 3.4 sets out how literature reviews connect to empirical studies in multiple ways. And, Section 3.5 provides guidance to how close reading of studies should be evidenced in a literature review. It also mentions a few points that should be avoided in literature reviews. By considering all these aspects, this chapter leads to a deeper understanding of how to undertake reviews, what criteria are related to research paradigms and specific archetypes, and what needs to be done to ensure the quality of literature reviews.
© Springer Nature Switzerland AG 2022 R. Dekkers et al., Making Literature Reviews Work: A Multidisciplinary Guide to Systematic Approaches, https://doi.org/10.1007/978-3-030-90025-0_3
57
58
3.1
3 Quality of Literature Reviews
Quality Based on Fitness for Purpose as Frame of Reference for Literature Reviews
When considering what the quality of a literature review is, the first point that comes to the fore is that quality relates to the purpose of the literature review, i.e., is the intended purpose met? Thus, one criterion to evaluate the quality of a literature review is to determine its fitness for purpose. One such purpose of a literature review is to make a contribution to existing scholarly knowledge or to delineate the contribution to knowledge of the empirical study to follow. This can be insight for the development of theories, conceptualisations, methods and tools among others. A literature review with this purpose takes stock of extant studies, analyses these from the objective of the review and synthesises findings into validity of scholarly knowledge, gaps in this knowledge, and searches for generalisation and applicability. Another purpose of a literature review can be to make a contribution to existing practical knowledge. This concerns the extraction and synthesis of evidence from existing studies for determining the effectiveness of interventions, methods, policies, practices and tools. This type of contribution can extend to how to implement these. Even though the two types of contribution are distinct, they can also appear together in a literature review. Thus, the contributions to scholarly knowledge and practical knowledge serve different purposes and a literature review could advance insight for either or both. In addition to whether literature reviews making advances in scholarly and practical knowledge, another distinctive characteristic is how the purpose relates to further research, which also makes a difference with regard to the purpose of a literature review. In this respect, the purpose of a literature review can be in preparation for an empirical study or a literature review of studies can act as a stand-alone study (see, for example, Bolderston 2008, p. 87, for this distinction). When a literature review is connected to an empirical study, there can be two different functions. The first function is to justify the focus and method of an empirical study. Another function is to identify which scholarly knowledge can be used (or to what extent) for the empirical data collection and analysis. This function is advocated by some, such as Maier (2013). He refers to the ‘Swiss cheese’ model, because it identifies gaps throughout the stages of the empirical studies. However, a literature informs about scholarly knowledge throughout the entire research process by more than identifying gaps; see Figure 2.2 for an elaborate depiction of how appraisal of literature is used for empirical studies. The purpose of a literature review can also be to act as a stand-alone study. These stand-alone studies can be narrative in nature; for example, when writing about propositions (theories, models, etc.) or advocacy of specific perspectives. Normally, this type of study results in a so-called research agenda, meaning it can inform multiple research projects. In this regard, Webster and Watson (2002)1 suggest that a literature review is limited to the past, while offering an outlook for the future of practice and indicating advances to be 1
Interestingly, the publication by Webster and Watson (2002) does not dwell on the implication of the title, even though it captures the essence of a literature review.
3.1 Quality Based on Fitness for Purpose as Frame of Reference …
59
made in scholarly knowledge. If new scholarly knowledge or practice is the purpose of a study beyond existing literature, then it becomes propositional; the literature review in such studies then serves the purpose of demonstrating deficiencies and inconsistencies, gaps in knowledge and limitations in applications. This is also the case when aiming for determining the effectiveness and pathways for implementation of evidence-based interventions, methods, policies, practices and tools. The analysis and synthesis of studies leads to considering the quality of the evidence, see Sections 6.4 and 6.5; weaker underpinnings for recommendations can then trigger calls for strengthening evidence for particular interventions, methods, policies, practices and tools. Whereas literature reviews for empirical studies direct and support the design of the research methodology for a specific project, literature reviews of studies—as stand-alone study—typically set out an agenda from which multiple empirical studies can draw; this means that a distinction can be made between literature reviews for studies and literature reviews of studies in terms of the purpose they serve, and, thus, how they achieve this purpose. Beyond the overarching criterion ‘fitness for purpose’, detailed views on what constitutes a good quality literature review vary. A statement by Bearman (2016, p. 383) exemplifies the voice of the reviewer: ‘The most obvious element is excellent authorial judgements. At various points in the review process, authors must evaluate the literature for relevance, rigour and significance.’ This should not be equated with using the first person when writing a literature review, but indicates that appraising extant studies pertaining to a review question or research objective takes centre-stage. It means that the quality of the review is related to the skills of the author with regard to finding relevant literature, analysis of studies and synthesis of findings. A case in point about skill development is the work of by Granello (2001, p. 294 ff.), who views the learning of writing literature reviews from a taxonomy for cognitive complexity. In the taxonomy, synthesis and evaluation feature as highest level of complexity and Granello adds what is required in a literature review. For instance, it includes for the cognitive skill of evaluation—the highest level of cognitive complexity—that both sides of an argument are presented. This also highlights that in terms of this taxonomy synthesis and evaluation are expected in a literature review. Consequently, readers, scholars and users also have expectations about literature reviews. In this vein, Oxman and Guyatt (1988, p. 699) state that ‘the reader needs assurance all the pertinent and important literature has been included in the review.’ For meeting this expectation that pertinent literature and key works have been included, some have advocated protocol-driven approaches, declaring the literature review a research methodology; Bolderston (2008) and Snyder (2019) are among them. Furunes (2019, pp. 227–8) shows that protocol-driven literature reviews can take many forms. Another author skill is determining which studies to include in the literature review. Already Yin et al. (1976, p. 154)2 pointed to the need to include studies of varied quality, but in 2
Note that this proposition by Yin et al. (1976) is related to the formalisation of the case survey method in Yin and Heald (1975); the case survey method appears in this book as method associated with qualitative synthesis in Section 10.3. Also note that the latter publication is a precursor to what is known now as the case study methodology.
60
3 Quality of Literature Reviews
addition, to assess individual studies on their merits and their effects on the outcomes of the literature review. Therefore, among the core skills for writing a literature review are identifying relevant literature, placing the appraisals in the context of the review question or empirical study, and evaluation as considering different perspectives and arguments found in publications. Based on the above discussion, one view of quality of a literature review is that it is related to the fitness for purpose and connected to this the author’s skillset. In particular the demonstration of cognitive skills, particularly synthesis and evaluation of published literature. This can be enhanced by protocol-driven approaches, if appropriate, but a review does not directly exceed insight and evidence embedded in extant literature. TIP: WHEN STAND-ALONE LITERATURE REVIEWS ARE NOT PUBLISHABLE While advocating the literature review as research methodology and emphasising protocol-driven literature reviews, Snyder (2019, p. 338) remarks: ‘Too often, literature reviews are simply descriptive summaries of research conducted between certain years, describing such information as the number of articles published, topics covered, citations analysed, authors represented, and perhaps methods used, without conducting any deeper analysis’; this style of literature reviews is often called a topical survey. The remark by Snyder indicates strongly that even for protocol-driven literature reviews there is the expectation that a merely descriptive approach is insufficient. Rather, literature reviews are expected to be analytical, have a unique perspective and generate insight from extant studies that could not directly be found in these retrieved studies or by reading summaries of the retrieved studies.
3.2
Quality Based on Archetypes of Literature Reviews
To understand better how to achieve a good quality literature review, it is helpful to look at the specific processes of the archetypes of literature reviews introduced in Section 2.5; they can also be found in Fig. 3.1, which presents a further classification of the protocol-driven approaches to literature reviews. Even though different archetypes serve different purposes, there are similar steps in each archetype of literature review. The differences relate to the way they are conducted and how information is sought from extant literature. In order to link quality with different archetypes, first the archetypes have to be better understood. Therefore, a description of the processes for each of the four archetypes is presented in the next subsections. After this, a final subsection is presented that discusses quality related to the archetypes.
3.2.1
Narrative Overview
The purpose of narrative overviews is to provide justification and direction for an empirical study that follows it or to communicate a specific perspective or to
Narrative Overviews
Mixed-Methods Synthesis
Systematic Reviews
Protocol-driven Literature Reviews
Qualitative Synthesis
Quantified Systematic Literature Reviews
Systematic Literature Reviews
Qualitative Systematic Literature Reviews
Narrative Reviews
Narrative Literature Reviews
Quantitative Synthesis
Fig. 3.1 Overview of archetypes. This figure presents the classification into the four archetypes found in Section 2.5; the overview is an expansion of Figure 1.1. Literature reviews of the narrative type can be divided into narrative overviews and narrative reviews. The distinction between them lies in narrative overviews aiming at justification and narrative reviews at being comprenhesive (meaning covering all relevant conceptualisations, constructs and [counter] arguments). A narrative literature review should not be confused with a narrative synthesis, sometimes also called a narrative systematic review; see Section 10.3, for more information about narrative synthesis as method for qualitative synthesis. Literature reviews that are based on protocols are either systematic literature reviews or systematic reviews. The first can be either qualitative or quantitative in nature; for quantitative types of systematic literature reviews see Sections 9.3, 9.4 and 9.5. There are three types of systematic reviews: qualitative synthesis (see Chapters 10 and 11), quantitative synthesis (see Chapters 7, 8 and Sections 9.1, 9.2) and mixed-methods synthesis (see Chapter 12).
Archetypes
Literature Reviews
3.2 Quality Based on Archetypes of Literature Reviews 61
62
3 Quality of Literature Reviews
propose a new conceptualisation (the latter case is often called a propositional study or publication). In general, they do not need to contain an extensive, critical discussion of arguments, which could also cover counterarguments, and they tend to focus on a point being made. When it is part of an empirical study, then the narrative overview may look for theories, conceptualisations and constructs that will be used later. Consequently, topics are addressed in the narrative overview that contribute to the arguments being made, and if related to an empirical study, a justification and search for applicable conceptualisations. Related to how arguments could be constructed in this archetype, the narrative overview may turn into a topical survey3; such narrative overviews tend to enumerate topics and themes found in literature, but are hardly analytical and evaluative. They tend to take concepts and findings from literature as given and do not assess other studies at their contribution beyond what is provided by the original authors. In terms of Granello (2001, p. 294 ff.) relating the taxonomy for cognitive complexity to literature reviews, see previous section, the third level out of six applies at best, which is called application (of concepts to a specific topic). Typically, topical surveys lack in-depth analysis of extant literature, but can be sufficient for providing a viewpoint or justification. Due to gravitating towards justification for a viewpoint or rationale for an empirical study, the narrative overview is not necessarily led by an explicit review question. This is reflected in the typical processes for the narrative overview; see Figure 3.2. If the narrative overview is part of an empirical study then the first step of the process concerns the articulation of a research objective. Otherwise, the narrative overview as a stand-alone study has its own objectives. The justification of and rationale for these objectives determine topics that need to be addressed. Searching for and appraising of studies on these topics may lead to new insight. This either triggers a search for more studies in the context of the justification and the rationale, or redefines the topics that are covered in the narrative overview. After the iterative process of searching and appraising, the results and findings are synthesised. Consequently, the process for the narrative overview aims at balancing the further search for studies and sufficient support for the arguments made, whether as stand-alone study or as part of an empirical study. A few examples of narrative overviews follow now. An instance of a narrative overview as commentary that provides a different viewpoint is Bergdahl (2019), who writes about meta-aggregation. In this paper, she uses few references to demonstrate that meta-aggregation does not perform an adequate synthesis compared to other methods; for a description of meta-aggregation see Section 10.3 and for an elaboration on the point made by Bergdahl see Section 11.1. Another example of a narrative overview is the editorial by Borras et al. (2011). This is about land grabbing and makes a convincing case why further studies are needed into this topic. Also, it introduces the land deal politics initiative, a loose research
3
The topical survey is also addressed by Elisabeth Bergdahl in Section 11.1.
3.2 Quality Based on Archetypes of Literature Reviews
63
Defining Research Objectives
Purpose of Narrative Overview
Defining Topics Redefining topics Topic 1
Topic 2
Topic n
Searching for Studies
Searching for Studies
Searching for Studies
Appraisal of Studies
Appraisal of Studies
Appraisal of Studies
Additional Search • Clarification • Further Arguments • Constructs
Synthesis of Findings
Fig. 3.2 Indicative process for archetype narrative overviews. When conducting a narrative overview, the search is defined by the purpose of the literature review, either focusing on a specific viewpoint or supporting arguments and reasoning. To this purpose, typically, topics are identified for which then sources are sought. Each topic leads to appraisal of studies, and finally, synthesis takes place. In the case of a narrative overview preceding an empirical study, this could be in the form of hypotheses, conceptualisations and key constructs. Sometimes, the synthesis is also organised by topic rather than provided as integrative conclusion of the entire literature review.
and action network, as they describe it, and related it to forthcoming special issues. By the way, they (ibid., p. 211) also point out that meta-reviews are necessary to collect knowledge about the scope of land in use and land-property relations. Also, this example uses few references to support the points made in the editorial. A third example of a narrative overview is the paper by Hagedoorn and Duysters (2002). They looked at whether firms prefer strategic technological alliances or merger and acquisitions (or a combination) to improve their innovative performance. In this paper, there is a brief literature review outlining the focus of the study before their attention turns to developing hypotheses that are empirically tested by using secondary data. It follows Figure 3.2, in which each topic is followed by a hypothesis or set of hypotheses as appraisal and synthesis of each subset of preceding studies. The use of literature is only set towards the two forms, that is,
64
3 Quality of Literature Reviews
strategic technological alliances, and merger and acquisitions. It does not consider other forms of technological collaboration between firms. These three examples are typical representatives of the archetype narrative overview, which aims at supporting arguments that are advanced.
3.2.2
Narrative Review
The purpose of the archetype narrative review is different from the narrative overview; it aims at being comprehensive with regard to its topic or review question. In this sense, a narrative review differs from a narrative overview in the extent of literature that is consulted and appraised. Furthermore, in a narrative review all key sources, key theories, conceptualisations, etc. or key perspectives for a critical appraisal are included. This also means that counterarguments need to be discussed in this archetype of literature review. To this purpose, deviant studies and other points of view are also taken in and evaluated on their contribution to scholarly knowledge for the topic at hand. By being extensive in coverage—key works, key theories, conceptualisations, etc. or key perspectives—and embodying counterarguments and adverse perspectives, narrative reviews are less biased than narrative overviews. The approach to being more comprehensive is reflected in the typical process for the archetype narrative review; see Figure 3.3. The starting point for this type of literature review depends on whether the literature review is part of an empirical study or a stand-alone work. Both the purpose of the narrative for the review and setting more specific review questions could lead to the identification of themes (Section 4.4 discusses this). The themes also influence the retrieval of studies and the appraisal. During the narrative review there could be an iteration, but unlike the narrative overview, it is directed at additional searches for constructs, evidence, arguments, counterarguments and perspectives. The idea is to develop the analysis of studies into a comprehensive evaluation of the suitability and applicability of extant scholarly knowledge. Particularly, a narrative review includes all relevant works and evidence pertaining to the topic at hand, including counterarguments and contradictions contained in evidence, to determine suitability and applicability of extant scholarly knowledge, for which some degree of iterative searching could be necessary. Three illustrations of narrative reviews are discussed now. The first one is the literature review by Petty and Guthrie (2000) on intellectual capital. The study touches on a broad range of points, such as how intellectual capital is described, what role it plays in the economy, how it evolved and how it is measured. There is a discussion of classification schemes for intellectual capital. In the context of discussing research into practices for intellectual capital they (ibid., p. 164) refer to another extensive review. Their concluding section contains directions on how to conduct further research and lists a number of specific questions resulting from their
3.2 Quality Based on Archetypes of Literature Reviews
65
Defining Research Objectives
Purpose of Narrative Review
Setting Review Questions
Retrieval of Studies
Theme 1
Theme 2
Theme n
Appraisal of Studies
Appraisal of Studies
Appraisal of Studies
Additional Search • Constructs • Evidence • (Counter)arguments • Perspectives
Synthesis of Findings
Fig. 3.3 Indicative process for archetype narrative reviews. The process for the narrative review starts with defining its purpose and setting review questions; this could be informed by the research objective of a specific empirical study. This directs the search for studies, which are then subsequently grouped in themes. For each of the themes appraisal of studies takes place. Note that themes could also emerge during the analysis, but in most cases they implicity or explicitly are shaped in initial stages of the literature review. Some interation may occur, mostly triggered by the need for further information about conceptualisations, constructs, evidence, (counter) arguments and perspectives. After the analysis by themes the findings are synthesised into implications for practice, foundations for scholarly knowledge and topics that warrant further research.
narrative review. Another instance of a narrative review is the review by Galati and Moessner (2013) on regulatory practices, instigated by the global financial crisis in 2007–2008. It examines tools for macroprudential policy, their effectiveness, and relates them to monetary policy. The study provides policy makers with information so that more effective regulation can take place. In addition to this practical guidance, it sets out more detailed research questions, such as investigating the effectiveness of policy instruments in terms of specified measures and building theory. A final example of the narrative review is the paper by Karakas (2010), who looks at how spirituality improves employees’ performances and organisational effectiveness. After setting the scene, the prevalent literature is discussed using three themes: employee well-being, sense of meaning and purpose, and sense of community and interconnectedness. This narrative review is oriented at practice.
66
3 Quality of Literature Reviews
This is reflected in four cautions and four suggestions for dealing with the caveats. Although the presentation of their arguments is differently structured within the actual texts, the three examples reflect the thematic approach of the narrative review. Normally, the themes are set or emerge early on during the creation of a narrative review. Notwithstanding the differences in textual presentation, these examples of narrative review are extensive in the literature they cover ensuring that all arguments, and where appropriate counterarguments, are introduced.
3.2.3
Systematic Literature Review
The first archetype of a protocol-driven approach is the systematic literature review. Its purpose is to appraise literature in order to assess rigour and reliability across studies, find out gaps and deficiencies of existing studies, and may evaluate evidence with regard to practice. In this regard, a systematic literature review integrates information of a group of studies investigating the same phenomenon or similar phenomena and typically focuses on a well-defined review question. Alternatively, the approach could also be used for the literature review of an empirical study as a prelude to the research methodology. For both uses (stand-alone work or part of empirical study), the systematic literature review is suited for evaluating the validity and generalisation of theories, conceptualisations, perspectives, methods and tools. It can also be used to identify trends in the development of scholarly thought. If possible, then it achieves the purpose by considering all relevant studies and appraising the retrieved studies in the context of the purpose; otherwise, it uses a representative sample of studies. The need to retrieve all relevant studies or a representative sample is reflected in the protocol-driven approach of the systematic literature review; see Figure 3.4. After setting out the review questions based on the purpose of the review (see Chapter 4 for guidance), databases and search engines are selected in parallel to formulating inclusion and exclusion criteria. These are used to perform a search using keywords, aiming at finding all relevant studies or a representative sample. Retrieved studies are subjected to analysis, which can be quantitative and qualitative. In both cases—stand-alone or empirical study—the use of quantification could serve to support arguments or analyse further the studies. In principle, the resulting findings of a systematic literature review are valid for all studies relating to the topic, because of the exhaustive search and the comprehensive, detailed analysis. Three examples of systematic literature review are provided. They exemplify the protocol-driven approach of systematic literature reviews. The first example is the review by Jungherr (2016) on the use of Twitter in election campaigns. The rationale is that research into this topic is fragmented, lacks a common body of evidence and misses shared approaches to data collection and selection. In the section discussing findings (ibid., p. 83 ff.), he raises doubts that Twitter provides a level playing field—a popular view—, posits it as an increasingly integrated
3.2 Quality Based on Archetypes of Literature Reviews Fig. 3.4 Process for archetype systematic literature reviews. After setting the purpose of the systematic literature review and the review questions, a protocol defines the keywords and databases that will be searched for retrieving relevant studies; in addition, inclusion and exclusion criteria are specified for the studies found. The approach to the analysis can be qualitative or quantitative; in the latter case, this is done to support the analysis of papers. A final step is to synthesise the findings. A systematic literature review can be a stand-alone study, but also be part of an empirical study.
67
Defining Research Objectives
Purpose of Systematic Literature Review
Setting Review Questions
Keywords and Databases
Inclusion and Exclusion Criteria
Retrieval of Studies
Quantification of Retrieved Studies
Qualitative Analysis of Studies
Quantitative Analysis of Studies
Synthesis of Findings
element of political communication, presents findings divided into those for parties and their candidates, the public and mediated campaign events, and urges to move on from weakly connected case studies to more consolidation through alternative research designs. Another example of a systematic literature reviews is the work by van Laar et al. (2017) into the relation between 21st-century skills and 21st-century digital skills. They look at the differences between the two. Following the process depicted in Fig. 3.5, they performed content analysis on 75 studies that were retrieved using inclusion and exclusion criteria; see Sections 9.5 and Section 10.3 for content analysis. The skills found are displayed in tables. One of their conclusions (ibid., p. 582) is that the digital skills, though moving towards the knowledge-related skills, do not cover the broad spectrum of 21st-century skills.
68 Fig. 3.5 Process for archetype systematic reviews. For a systematic review, first review questions are developed, which also depends on the context for undertaking the review; the context depends on whether the effectiveness of an intervention, policy or practice is its aim or whether the emphasis is on capturing scientific knowledge. The next step is defining the search strategy, including which databases are searched and which types of studies are included. After retrieval of studies, data are extracted for analysis, followed by synthesis of findings. The systematic review can be a qualitative synthesis or quantitative synthesis or both; the latter is known as mixed-methods synthesis, which is symbollically indicated by the double arrow in the figure between the two types of syntheses.
3 Quality of Literature Reviews Context of Systematic Review
Developing Review Questions
Keywords and Databases
Inclusion and Exclusion Criteria
Retrieval of Studies
Extraction of Quantitative Data
Extraction of Qualitative Data
Quantitative Analysis of Studies
Qualitative Analysis of Studies
Synthesis of Findings
The final example of a systematic literature review is the work by Ribeiro et al. (2018) who look into the quality of life for students in healthcare studies. The analysis of thirteen studies provides an overview of all kind of factors that impede the quality of life, such as burnout and stress. The methodological quality of studies was assessed using a checklist found in the Health Evidence Bulletin of Wales (ibid., p. 72). Most interesting is that the number of students experiencing these detrimental effects are relatively quite high, also affecting learning and provision of healthcare (ibid., p. 75). These three exemplary studies share in common the protocol-driven approach of the archetype systematic literature reviews for searching and selecting studies; in some cases of the systematic literature review the analysis follows a specific method as illustrated by the second and third examples.
3.2 Quality Based on Archetypes of Literature Reviews
3.2.4
69
Systematic Review
The fourth archetype, systematic reviews, is also protocol-driven. It is also referred to as (systematic) meta-reviews, meta-synthesis and research synthesis. The purpose of most systematic reviews is to review all evidence and determine whether an intervention, policy, practice or treatment is effective towards defined outcomes. Systematic reviews can be used by practitioners and policymakers to define which intervention or practice should be adopted, and if so, under which conditions. It is also possible to test theories, and assess the effectiveness of methods and tools. There are three subcategories for systematic reviews (see Figure 3.1): quantitative synthesis (see Chapters 7, 8 and 9), qualitative synthesis (see Chapters 10 and 11) and mixed-methods synthesis (Chapter 12); the use of these subcategories depends on the nature of the review questions and the degree of variety in the studies that are analysed and synthesised. The typical process for a systematic review is found in Figure 3.5. The main difference with the systematic literature review is the explicit phase for extracting information from studies. The extraction could cover data, results, conjectures, findings and perspectives, depending on the specific method used in relation to the objectives of the systematic review. As stated by Gough and Elbourne (2002, p. 227), although there are many approaches to systematic reviews, they share the same basic principles of explicit methodology to allow accountability, replication, and updating of reviews. This means that the structure of the protocols for quantitative, qualitative and mixed-methods synthesis are similar, but that they differ in which type of data are extracted, and how the analysis and synthesis take place. The approach of the systematic review is demonstrated by three examples. The first example is the study by Bailey et al. (2012) into interventions for educational achievement of gifted and talented students, aged 5–16 years, representing a qualitative synthesis. The initial search found 101 papers related to their topic. The adopted search strategy was modified by adding six additional exclusion criteria based on the in-depth review of the initial set of retrieved studies. As part of the mapping of the studies, there is an evaluation of the weight of the evidence (ibid., p. 39). The fifteen remaining studies were subjected to narrative synthesis with the purpose to identify patterns or themes that became evident (ibid., p. 40). Three themes are discussed: (a) interventions based on school and class organisation, (b) interventions based on social interactions, and (c) interventions based on the development of new skills and strategies. The paper ends with implications for policy, practice and research. For example, they find support for the hypothesis that social interaction underlies effective strategies for these students (ibid., p. 43). The second example is the work by Lawrence et al. (2012) into secondary preventive lifestyle interventions following stroke. This is an instance of a quantitative synthesis. Although 56 papers were retrieved for detailed examination, the review considered only three eligible studies, due to the excluded studies not meeting criteria for inclusion and methodological quality. The pooled results are significantly favouring the interventions in terms of physiological outcomes, such
70
3 Quality of Literature Reviews
as reduced blood pressure, cholesterol levels, and body mass index (ibid., p. 245). The authors note that the evidence base is small, resulting in insufficient statistical power and the inability to split the pooled patient data in subgroups. The third example is a case in point for mixed-methods synthesis. This systematic review is the investigation of 38 articles by Babakus and Thompson (2012). They study the amount of physical activity in relation to sedentary time in the context of increased risk for chronic conditions, such as cardiovascular diseases, type 2 diabetes and obesity. Of the retrieved studies, 26 were quantitative in nature and 12 qualitative. They divided results in tabulation for each of these two categories. The analysis for both types was qualitative due to the heterogeneity across the retrieved studies as the authors note (ibid., p. 150/2). They (ibid., p. 150/14) concluded that indications found across the studies are supporting that levels of physical activity among women from South Asian origin are lower than the general or white population of their host countries. However, they also cautioned that heterogeneity of this group, such as diversity of socioeconomic status, religious beliefs and cultural practices, makes it difficult to generalise. The three examples representing qualitative, quantitative and mixed-methods syntheses show the typical processes of the archetype systematic reviews and also point to the need to apply exclusion criteria to find relevant evidence for the review question.
3.2.5
Relating the Archetypes to Quality of the Review
The four archetypes are summarised with respect to their features of quality in Table 3.1; this tabulation builds on Table 2.1. For the archetype narrative overview, the quality is determined by how effective the arguments are for highlighting a perspective, putting forward a proposition and laying the foundations for an empirical study that follows. One way to enhance this approach is explicitly mentioning the specific perspective of the literature review. The second way for improving quality is mentioning other perspectives; however, the more other perspectives are detailed and compared with the perspective of the paper, the higher the likelihood that it turns into a narrative review. For the narrative review, the quality is determined by to what extent it achieves to include all relevant key works (though this should not be equated with citation rates of studies), conceptualisations, arguments, counterarguments and perspectives. The quality of this archetype could be enhanced by how studies were found and by explicitly mentioning what was considered and what was not considered. An example would be mentioning that a study was not included because of its methodological quality. It could also be specific domains that were looked at and others that were excluded. Such mention does not have to take the form of a protocol, but it will give readers a better understanding of the boundaries of literature considered. For both systematic literature reviews and systematic reviews, the explication of the protocol for searching and analysis forms the key to their quality. However, this is merely a point of methodological quality. The archetype systematic literature review could benefit from quantification, thus turning into a systematic
• Not necessarily
• Explicit statement of perspective taken (with rationale) • Mention of more perspectives (but if so, gravitating towards archetype narrative review)
Enhancement of quality
• Rationale for specific viewpoint or proposition • Research project focused on solution or method No
• Explicit inclusion of key works, conceptualisations and constructs • Arguments and counterargument presented • Disclosure of search process (but not as protocol) • Explict mention of what was considered and what not
No
Systematic review
• Protocol specified • Methods detailed for qualitative and quantitative syntheses • Inherent to formulation of review question • Depends on the methods used in case of qualtiative synthesis
• Protocol specified • If applied, method for content analysis and quantification detailed
• Could be strengthened by using quantification (e.g., bibliometric analysis, content analysis or quantified systematic literature review)
• Merits of included studies
• More objective evaluation of current literature • Goal is to find gaps and deficiencies in existing studies
• Methods and tools for quality assessment
• Independent work for giving direction • Secondary research: clarity of effects and contexts
• Independent work for giving direction • Secondary research: clarity of effects and contexts
Inclusion of all relevant works
• Decisive directions • Effects and context of interventions
Inclusion of all relevant works
• Contains arguments pertinent to purpose research or specific viewpoints • Context of research Justification and rationale Inclusion of key constructs, or alternatively, key works • Might be part of prokect for degree or research project • Independent work for giving direction
Systematic literature review • Decisive directions
Narrative review • Arguments for purpose research • Context of research
Narrative overview
Cirtical analysis
Protocol-driven
Outcomes
Guided by
Purpose
Table 3.1 Archetypes of literature reviews and quality. This table, an extension of Table 2.1, relates the purpose and outcomes of the four archetypes to processes followed for selecting and including studies. Furthermore, it shows how the approach of the archetype supports critical analysis. Also, measures to enhance the outcome are shown. Note that the more extensive the literature considered for a narrative overview is, the more the chances it morphs into the approaches of the narrative review.
3.2 Quality Based on Archetypes of Literature Reviews 71
72
3 Quality of Literature Reviews
quantitative literature review; see Section 9.5. This only makes sense when the quantification makes the analysis more convincing or allows the analysis to be undertaken more in-depth. The quality can be enhanced by considering the merits of individual studies that are considered. This means that how will the inclusion of specific studies lead to strengthening the evidence for findings or lead to heterogeneity across studies (leading to broadening the scope of theories, conceptualisations, constructs and perspectives). The quality of the archetype systematic review can be improved by using methods and tools for quality assessment; specifically, these can be found in Sections 5.6, 6.4, 10.5 and 12.4. Thus, how to determine the quality of a literature review also depends on the archetype and the procedures it follows, and on how it may have been enhanced by specific methods. Furthermore, the archetypes can also be related to their potential subjectivity and coverage of literature, see Figure 3.6 for its symbolic depiction. The narrative overview is relatively selective and subjective in its choice of studies that are considered to bring relevant arguments to its purpose. Because of this selection, the number of relevant publications that are considered is limited. The narrative review fares better on this aspect, because it includes all relevant key works, conceptualisations, arguments, counterarguments and perspectives. However, the selection of the studies is still subjective, because it depends on the scholarly knowledge and the guidance sought by a researcher (or researchers). It never
Quantitative synthesis
Extensive
Mixed-methods synthesis Coverage of literature
Systematic literature reviews
Qualitative synthesis Systematic reviews
Narrative reviews
Narrative overviews Limited Subjective
Selection of literature
Objective
Fig. 3.6 Symbolic representation of archetypes for selection and coverage of literature. In practice, qualitative synthesis takes considerable efforts; hence, the number of studies tends to be limited in comparison to quantitative synthesis, which is reflected in the figure.
3.2 Quality Based on Archetypes of Literature Reviews
73
includes all studies due the lack of an explicit search strategy, as found in the archetypes of the systematic literature review and systematic review. Even then, there is a degree of subjectivity due to the criteria for inclusion in relation to the purpose of the study. Particularly, systematic literature reviews and qualitative-oriented systematic reviews are prone to this subjectivity due to more intense efforts that may come along with the qualitative analysis and synthesis of retrieved studies (and therefore, sometimes limiting the number of studies that can be considered). In addition, the methods for data extraction, analysis and synthesis for qualitative protocol-driven literature reviews inevitable lead to some degree of subjectivity. Thus, the determination of the quality of the four archetypes is different as they represent distinctive approaches in terms of how selection of studies takes place, and how the actual analysis and synthesis is undertaken.
3.2.6
Criteria for Systematic Literature Reviews and Systematic Reviews
In addition to criteria derived from research paradigms (see next section), there are also criteria specifically for systematic literature reviews and systematic reviews. These criteria mostly apply to specific types of literature reviews. For example, Walsh and Downe (2006) present eight checklists for qualitative synthesis and relate them to interpretivist and positivist approach; in Section 10.5 some specific methods they mention for qualitative synthesis can be found. For one of these methods, the critical appraisal skills programme (CASP), Long et al. (2020) provide detailed guidance on its use and also discuss extensions. Also, for mixed-methods synthesis an appraisal tool for quality has been put forward by Pluye et al. (2009); see Section 12.4 for this tool and other methods for assessing quality of these studies. Thus, for qualitative synthesis and mixed-methods synthesis, tools are available to assess their quality. There also methods and tools for assessing the quality of specific steps of literature reviews. A case in point is the presentation of the search strategy, which can be found in Section 13.4. In addition there are also the methods used for assessing the quality of recommendations presented in systematic reviews; Sections 6.4 and 6.5 detail these. This indicates that detailed methods and tools are available for specific steps, notably the search strategy and the assessment of strength of evidence for recomendations. NOTE: LESSONS LEARNT Some have written about achieving quality in literature review as lessons learnt. For instance, Brereton et al. (2007) write about their experiences conducting a systematic literature review. In their case, some of the insights gained relate to managing a research project, such as ‘team members must make sure they understand the protocol and the data extraction process’ (ibid., p. 580), whereas others are related to undertaking a literature review, for example, ‘… search many
74
3 Quality of Literature Reviews
different electronic sources; no single source finds all of the primary studies’ (ibid., p. 578). Generally, publications documenting which challenges others have encountered could lead to avoiding common pitfalls for literature reviews, specifically protocol-driven ones where multiple reviewers collaborate.
3.3
Associating Research Paradigms with Literature Reviews
Another perspective than processes for archetypes, is viewing literature reviews through the lens of research paradigms, aka research philosophies. In Section 2.7, such was already considered for how literature reviews are connected to the nature of an empirical study. This section will go into more detail about research paradigms for literature reviews. It will start by looking at what constitutes a research paradigm, followed by four sections on positivist, interpretivist, constructivist and advocacy literature reviews.
3.3.1
Distinguishing Between Idiographic and Nomothetic Research
The scientific acceptability of research results (particularly, in the form of articles) is highly influenced by the background of the researcher and the reviewer; this applies to literature reviews, too. With regard to acceptability of research results, there is often a distinction made between knowledge construction that emphasises the general (nomothetic research) and that which focuses on the particular (idiographic research). Originally introduced by Wilhelm Windelband during an address in 1894 (Oakes 1980, p. 165),4 Münsterberg (1899) already warned to view the distinction as a dichotomy rather than complementary views on how knowledge is formed. However, there are others that have viewed the two conceptualisations as a dichotomy, to which Robinson (2011) refers, although he attempts to reconcile, too. One particular concern of the nomothetic stance is that aggregation does not necessarily lead to generalisation (Salvatore and Valsiner 2010, pp. 821–2), but rather that abductive reasoning is necessary (ibid., p. 10 ff.), in which alternative conceptualisations, explanations and perspectives are reviewed in terms of evidence provided; see Dekkers (2017, pp. 61–3) for abductive reasoning. Such applies to literature reviews, too; they can be viewed as nomothetic or idiographic. Rather
4
There is some discussion about whom introduced or modified the concepts of nomothetic and idiographic forms of generating knowledge in its early stages. On this matter, Hurlburt and Knapp (2006, pp. 287–9), and Salvatore and Valsiner (2010, pp. 818–20) produce slightly different accounts.
3.3 Associating Research Paradigms with Literature Reviews
75
than declaring universal applicability of findings, an idiographic perspective to the literature review will result in which context interventions, methods, practices, policies and theories will work or are valid. Healthcare interventions in the form of holistic treatments are an example of these. Furthermore, the adherence to particular forms of scientific acceptability is also influenced by the academic settings in which research takes place. To illustrate this point, Bengtsson et al. (1997) point out that for business studies there is a difference between Europe and North America. North-American researchers are mostly nomothetic oriented, i.e., towards general laws and procedures of exact science. European researchers are mostly ideographic oriented, i.e. understanding of particular cases.5 These stances towards what is acceptable as research—whether research should aim for being nomothetic or idiographic—have to do with beliefs about what constitutes research, which is reflected in literature reviews. Three instances here provide insight how this distinction between idiographic and nomothetic research is used for literature reviews. The first literature review, by Brantstätter et al. (2012), uses the distinction between idiographic and nomothetic research whether assessment for psycho-oncology and end-of-life care include cultural differences towards ‘the meaning of life.’ They (ibid., p. 1048) state that most instruments were developed in North America and that only one of the 59 explicitly paid attention to cultural differences; through the list of references it can be found out that the authors of this instrument are working for an Israeli university. This seems to support the notion in the previous paragraph about differences between Europe and North America. The second example is a systematic review by Ellaway et al. (2016) on the relationships between medical education programmes and communities. The distinction is used to explain the variance across the 790 reviews they conducted in addition to noting that there were different discourses, studies used limitedly theory and writing were gravitating towards ideology rather than being critical and reflexive. The final representative case of a literature review is the narrative review by Tilden (2020) on user involvement as a means to improving practices for evidence-based therapy. There is a discussion about considering the level of evidence in the context of the ‘idiographic level of knowledge’ as opposed to the ‘nomothetic level’ (ibid., pp. 389–90), while noting they are complementary. These three examples indicate that some studies refer explicitly to the distinction between idiographic and nomothetic research, to build on what the beliefs what constitutes research are.
3.3.2
Background to Research Paradigms
Such beliefs are also captured by research ideologies or paradigms, sometimes also called research philosophy. Strang (2015, pp. 18–9) gives the following definition for
5
Such is found by Steenhuis & de Bruijn (2006), too, in the case which journals gravitate toward nomothetic or ideographic research.
76
3 Quality of Literature Reviews
ideology: ‘Research ideology refers to how the researcher thinks about knowledge claims, as being on a continuum based on explicit evidence structured from theories to the other extreme of authentic qualitative tacit meanings expressed by participants … it forces researchers to articulate their philosophical view or belief system that in turn strongly influences how they design, execute, and report a scholarly study.’ In the same vein, Guba and Lincoln (1994, p. 107) earlier on provided the following definition of paradigm: ‘A paradigm may be viewed as a set of basic beliefs (or metaphysics) that deals with ultimates or first principles. It represents a world view that defines, for its holder, the nature of the “world”, the individual’s place in it, and the range of possible relationships to that world and its parts, as, for example, cosmologies and theologies do.’ Furthermore, they (ibid., p. 105) state that a paradigm ‘guides the investigator, not only in choices of method but in ontologically and epistemologically different ways.’ Literature reviews, particularly when informed by systematic approaches, are also a research method, and therefore, may also be subject to the world views and paradigms of the reviewer(s). Thus, a research paradigm is the conceptual lens through which the researcher examines the methodological aspects of their research project to determine the research methods that will be used and how the data will be analysed, also in the case of literature reviews. Research paradigms as conceptual lens for literature reviews can be divided in three aspects: • Ontology, which raises the basic questions about the nature of reality. • Epistemology, which asks how do we know the world (what is the relationship between the inquirer and the known?). • Methodology, which focuses on how we gain knowledge about the world. Typically, ontology, epistemology and methodology are interrelated. Take for example, a literature review on how patients view interventions to prevent stroke. Such nature of questions requires searching for interpretations and re-interpreting these for the particular question at hand, which means that what is observed, how it is interpreted and synthesised are interrelated. A key consideration for a research paradigm is the relationship between the observer and what is observed. The positivist and post-positivist paradigms have their roots in the natural sciences such as physics, chemistry and biology. In these paradigms the researcher views the reality as objective. The belief is that there is a reality and that the researcher can describe this reality in an objective way. This reality is independent from the way it is observed. Thus, similar to the natural sciences, the researcher is viewed as independent from the object that is observed. Other paradigms have their roots in the social sciences. The main difference between the approach in the social sciences and the natural sciences concerns epistemology. Initially, interpretivism was conceived as a reaction to the effort to develop a natural science of the social sciences (Schwandt 1994, p. 125). The main issue interpretivists had with the positivist stance was the separation of researcher and research object (or subjects). Interpretivists find that researcher and research object cannot be separated, because in social contexts there is interaction with humans. Furthermore, in order to understand the world of meaning, one has to
3.3 Associating Research Paradigms with Literature Reviews
77
interpret it. Subsequently other paradigms arose that differed from the interpretivist stance. For example, where interpretivists still believe in a set of natural laws, although these have to be derived at through the interpretation of data, constructivists do not believe in one reality but rather that reality is simply a construct (Guba and Lincoln 1994, p. 116). This indicates that in the social sciences there are differing views on how to conduct appropriate research. The four principal paradigms that have been mentioned in the preceding paragraphs are found in Table 3.2; the tabulation also includes notes on ontology, epistemology and methodology. In North America, research general gravitates more towards the (post)positivist paradigm, whereas in Europe research follows a mix of paradigms. This explains the observation by Bengtsson et al. (1997, p. 473) that: ‘While European studies run the risk of being regarded as weird and “unscientific” by North Americans, many Europeans may feel that North American research leans too much towards rigorous but rather uninteresting statistical exercises.’ It is important to be aware of the paradigm, because different paradigms lead to different consequences for carrying out research, the conclusions that can be reached by following related pathways for research and the criteria for acceptable research.
Table 3.2 Overview of research paradigms for literature reviews. The four research paradigms, also found in Table 2.3 where they are related to empirical studies, stand for different ways of engaging with scholarly knowledge. With regard to ontology, positivism assumes a singular reality that can be captured by studies, whereas post-positivism recognises that through interpretations this reality can be observed. Interpretivism takes as point of departure that reality only exists through interpretations and constructivism that reality is only constructed through interactions and dialogue. Also for epistemology, differences exist between the research paradigms. Positivism takes an objective view on reality in the search for truth, while considering that objects and interaction between human beings requires different approaches (dualism), whereas constructivism only sees findings are a results of subjective views. Post-positivism builds on dualism, but with a critical view. And interpretivism recognises that views are subjective. The approach to methodology ranges from experimentation for positivism to hermeneutics for constructivism. Positivism
Post-positivism
Interpretivism
Constructivism
Ontology
Naïve realism —‘real’ reality but apprehensible
Critical realism—‘real’ reality but only imperfectly and probabilistically apprehensible
Critical realism—‘real’ reality but only apprehendable through interpretation
Relativism— local and specific constructed realities
Epistemology
• Dualist/ objectivist; • Findings true
• Modified dualist/objectivist • Critical tradition/community • Findings probably true
Methodology
• Experimental/ manipulative • Verification of hypotheses • Chiefly quantitative methods
• Modified experimental/ manipulative • Critical multiplism • Falsification of hypotheses • May include qualitative methods
• Transactional/ subjectivist • Value-mediated findings • Dialogic/dialectical
• Transactional/ subjectivist • Created findings • Hermeneutical/ dialectical
78
3 Quality of Literature Reviews
NOTE: INCREASED PLURALITY OF METHODS FOR LITERATURE REVIEWS Some of these remarks about differing perspectives should also be placed in the context of time. Particularly, for literature reviews based on qualitative synthesis, a multitude of methods has become available. Most methods to this specific purpose have been described in the period 2000–2010; see Section 10.3. In general, these methods are seen as acceptable ways for undertaking synthesis, also because they are often linked to protocol-driven approaches. This may also mean that some of the remarks made here about research paradigms could apply to specific methods for qualitative synthesis but to a lesser extent to others.
3.3.3
(Post)Positivist Perspectives and Archetypes of Literature Reviews
What matters for the (post)positivist paradigm is that the researcher is independent and does not influence the observations, and thus, this should lead to results being objective. This also implies that the research follows specific methods that can be repeated by others and that should then lead to the same results. In general, this means that research that falls inside this paradigm meets scientific criteria, notably validity and reliability. Validity deals with issues such as that the measurement reflects observed reality, and it can be separated into construct validity, internal validity and external validity. Reliability means that the study can be repeated with similar results. Particularly, the criterion of reliability implies that the (post)positivist approach seeks to find patterns within a reality perceived as objective and observable. A systematic review using quantitative synthesis represents a literature review that falls inside the positivist paradigm. In such a systematic review, there are detailed descriptions of the methods followed, such as what kind of search criteria were used, how publications were selected for the review, which years were included, which data and information were extracted from the retrieved studies, what results were found from the analysis, etc. This detailed information provides insight into what was exactly measured and with it an insight into the validity of the study. The description also allows others to repeat the study, and thus, contributes to reliability, a main criterion for research that falls within the positivist paradigm. Typically, the (post)positivist paradigm is reflected in systematic reviews, whether qualitative or qualitative, and some systematic literature reviews, and uses five criteria drawn from objectivity, validity and reliability; see Table 3.3. Although often associated with quantitative approaches, also some qualitative studies can take a positivist approach. This could be the case when qualitative data are used for generating of testing propositions, or any work directly related to causal relationships, as put forward by Berkovich (2018) and Lin (1998) among others. The first criterion in the (post)positivist paradigm is construct validity. For
3.3 Associating Research Paradigms with Literature Reviews
79
quantitative synthesis, this is reflected in the number of studies in relation to their homogeneity. In the case of qualitative synthesis, where there is more heterogeneity in purpose of studies and methods used, see Chapter 10, the emphasis is on triangulation and chains of evidence. Another criterion for validity is internal validity. Typically, this refers to the effectiveness or impact on outcomes of interventions, practices, policies and treatments in the case of systematic reviews. It also includes a focus on the degree of certainty within a study. External validity, a third criterion, refers to whether interventions, practices and policies can be applied to other populations than studied or other contexts. This type of validity can be divided into generalisability, which means application to another setting, applicability, as the appropriateness or relevance to a specific context, and feasibility, whether it is possible to implement the intervention in a setting given resources and institutional constraints. The criteria can be interpreted and considered in the context of replication of studies. Avellar et al. (2017) point out that external validity is often poorly addressed in the systematic reviews they looked at. The reliability of a literature review in the (post) positivist paradigm is expressed in the protocol used for the search strategy—how the search was conducted and which studies were included—and the explicit use of methods for the analysis and synthesis. Also, the quality of evidence is an important feature of reliability; see Sections 6.4 and 6.5 for guidance. A final criterion is how objectivity is achieved. In the case of quantitative synthesis, this is done by looking at sources of bias, such as allocation bias, confounding, attrition bias and publication bias; see Sections 7.5, 7.6 and 8.2 for more detail. In the case of qualitative synthesis, this can be achieved by having in place an explicit process for a literature review, such as keeping a journal to log results, findings and choices, and explicit stating of values and beliefs when reporting. Although different at points for qualitative synthesis, systematic literature reviews and quantitative synthesis, the criteria for a literature review in the (post)positivist paradigm—construct validity, internal validity, external validity, reliability and objectivity—are similar, as shown in Table 3.3. Examples of positivist literature reviews are the systematic reviews by Dwan et al. (2008) and Lawrence et al. (2012). Both reviews have a separate methods section that provides detailed information on the exact procedures followed, such as: what databases were included in the study, what keywords were used, and which inclusion and exclusion criteria were applied. They also contain flow charts of the study, which cover the major steps in the study and how many papers were part of each step. For instance, (1) the initial eligible articles and what may have been excluded at this early stage and why, (2) the number of abstracts that were retrieved and how many were subsequently excluded and for what reason, (3) the number of full articles that were subsequently reviewed and then excluded and for what reason, leading to (4) the final set of articles that were part of the systematic review. This disclosure of the methods and steps followed is typical for the positivist paradigm.
80
3 Quality of Literature Reviews
Table 3.3 Criteria for literature reviews following the (post)positivist paradigm. Protocol-driven literature reviews can be associated with criteria for the (post)positivist paradigm; for qualitative synthesis such depends on the purpose of the literature review and the method followed (see Section 10.2 for aggregative synthesis). The criteria are comprised of construct validity, internal validity, external validity, reliability and objectivity. Note that the interpretation of the criteria differs for the archetypes systematic literature reviews and systematic reviews. Criteria
Qualitative syntheses (systematic reviews) and systematic literature reviews
Quantitative synthesis (systematic reviews)
Construct validity
• Triangulation by different research methods • Chain of evidence • Review of appraisal individual studies (at least two reviewers) • Favourability, direction and significance of effects (effectiveness of intervention) • How expressed • Variety across studies • [Model or theory-based]
• Number of studies in relation to certainty of outcomes • Pooling of data, results • Review of aggregated data (at least two reviewers) • Favourability, size and significance of effects (effectivensess of intervention) • How measured • Outcome means and variations • Specified methods for effect sizes • [Model or theory-based] • Generalisibility (beyond population of interest) • Applicability • Feasibility • Protocol for search strategy • Methods for quantitative synthesis • Quality of evidence • More than one reviewer (except in some cases for degrees) • Accounting for bias (allocation bias, publication bias, attrition bias, confounding)
Internal validity
External validity
• Generalisibility (different contexts) • Applicability • Feasibility
Reliability
• Protocol for search strategy • Methods for qualitative or mixed-methods synthesis • Quality of evidence • More than one reviewer (except in some cases for studies in the context of postgraduate and doctoral degrees) • Consensus and reflexivity
Objectivity
NOTE: RELIABILITY OF SYSTEMATIC REVIEWS EVIDENCED BY REPORTING Even for systematic reviews, particularly the quantitative ones that should follow the positivist paradigm closely, reporting does not always meet standards. For instance, Moja et al. (2005, p. 1053/2) found that nearly 50% of published systematic reviews did not specify how the methodological quality of primary studies was evaluated or how it was considered in the interpretation of results, with the quantitative Cochrane Reviews faring better in this respect. Following this publication, there has been more attention for reporting standards; for example, see Sections 6.4 and 13.4.
3.3 Associating Research Paradigms with Literature Reviews
3.3.4
81
Interpretivist Perspectives and Archetypes of Literature Reviews
For studies positioned in the interpretivist paradigm, it is essential that the ‘story’ is being told so that the correct interpretations can be made, which also applies to literature reviews. This leads to idiographic research, which concerns the understanding of particular cases. The aim of idiographic researchers is to provide rich descriptions, and if possible, to make theoretical generalisations. This type of research can also be considered in the context of discovery rather than the context of justification. Then, it is positioned in the inductive phases of the empirical cycle; see Figure 3.7, based on de Groot (1969, pp. 7, 29). Note that this cycle carries some resemblance to Popper’s (1999, p. 14) inductive logic. During the inductive phase, observations lead to shaping empirical laws, which could result in tentative theories. Another way of viewing interpretivist studies is that this type of research ‘builds’ theories, whereas the (post)positivist paradigm aims at testing theories. Interpretivist studies can therefore be expected to end with hypotheses rather than to test these hypotheses. This position appears in the revised empirical cycle of de Groot by Wagenmakers et al. (2018, p. 423), while it should be noted that testing of theory is not only limited to statistical evaluations. Figure 3.8 reflects this and connects the empirical cycle to opportunities for literature reviews. The interpretivist literature reviews are found on the right-hand side of the figure, serving the purpose of generating new conceptualisations, laws of observed
Observations
Deduction
Shaping of tentative theories
Induction
Shaping of empirical laws
Fig. 3.7 Empirical cycle for research. The empirical cycle (derived from de Groot, 1969, pp. 7, 29) provides a systematic overview for the development of scientific knowledge. It starts with observations, from which empirical patterns and laws are discovered through induction. These lead to tentative theories that can be tested using deduction and hypothesis for observations. The inductive evaluation of the outcome of testing tentative theories results in revised scholarly knowledge, after which the empirical cycle starts anew.
82
3 Quality of Literature Reviews
Existing observations and data
Evaluation
Testing on new data Context for justification
Extensions and propositions
Literature reviews aiming at generation of tentative theory
Literature reviews aiming at testing tentative theory
Literature reviews aiming at aggregation
Tentative behaviour Discovery
Fig. 3.8 Revised empirical cycle for research with position of literature reviews. The revised empirical cycle by Wagenmakers et al. (2018, p. 423) shows two distinct phases for discovery and testing theory. Three types of literature reviews are related to the empirical cycle. On the left of the figure, the literature review aims at providing a justification and rationale for testing tentative theories or testing theories in new contexts. Using existing observations and data leads to aggregation (synthesis) to study theories and aberrations. On the right-hand side of the figure are found literature reviews that generate tentative theories (extensions, propositions, postulations).
regularities and theories; Section 10.2 provides more detail on qualitative synthesis to this purpose. This leads to two types of interpretive literature reviews, those aiming at understanding particular cases and those aiming at the generation or discovery of conceptualisations and theory. The criteria for interpretivist literature reviews are not well established. It is fairly obvious that the criteria for (post)positivist literature reviews, such as validity and reliability, cannot be fully applied. For example, external validity is inappropriate since interpretivist literature reviews typically involve less studies than their (post)positivist counterparts. Reliability is also inappropriate since it is difficult if not impossible for another person to conduct for example the same case study again. Janesick (1994, p. 217) points out this out for the case study methodology: ‘the value of the case study is its uniqueness; consequently, reliability in the traditional sense of replicability is pointless here.’ Criteria in the paradigm associated with social sciences include trustworthiness and authenticity (Denzin and Lincoln 1994, p. 100), but is perhaps best captured in the term credibility (Janesick 1994, p. 216). Janesick (1994, p. 214) also mentions ‘the researcher should describe his or her role thoroughly, so that the reader understands
3.3 Associating Research Paradigms with Literature Reviews
83
the relationship between the researcher and participants.’ This goes back to the issue of interpreting the data and, due to the interaction of researcher and participants, subjectivity. An additional technique to help in this regard is the use of triangulation, i.e., use of different methods, researchers, data sources, etc. Thus, a key criterion for interpretivist literature reviews is the credibility of the interpretation which comes from presenting multiple views and angles as a result of triangulation. The archetype narrative review is an example of a literature review that falls inside the interpretivist paradigm. In a narrative review, there is the notion of a reality and the intention to gain insight into this reality through interpretation. The insights are gained by comparing existing studies from several perspectives and interpreting them. Narrative reviews do not provide detailed descriptions of the methods followed but instead rely on the authors’ knowledge of the field that provides the direction in the writing. The key to evaluate this type of literature review is therefore the credibility of the interpretation through the comprehensive coverage of existing literature. Due to its nature, the interpretivist paradigm uses different criteria for the quality of a literature review, which centre on establishing trustworthiness. The criteria credibility, transferability, dependability and confirmability are typical for qualitative studies (Spencer et al. 2003, p. 40), and often related to the work of Guba and Lincoln (e.g., 1994). How they can be used for literature reviews is found in Table 3.4. The first criterion is credibility, which reflects the confidence that can be placed in the truth of findings across studies. Credibility ascertains whether the findings of studies represent plausible information drawn from original data in retrieved studies and whether the review is a correct interpretation of the original studies. Since multiple studies are considered, this focus on consistency of findings related to context, but also variance that could be found. Of particular interest is the search for triangulation by the use of methods and sampling found across studies; this could also be related to theoretical perspectives, dependent on the topic of a study. Particularly for systematic reviews, the involvement of other experts in the same domain and actors directly related to scope of the topic could enhance the credibility. Transferability, as the second criterion for achieving trustworthiness as quality, is the extent to which results and findings can be applied in other settings than the ones considered in the literature review. This is normally achieved through using abstraction mechanisms and comparison of contexts. For the case of qualitative systematic reviews, Munthe-Kaas et al. (2019) note that checklists for transferability do not exist, although they (ibid., pp. 22/9–10) provide an overview of items to this purpose. To achieve transferability in systematic reviews, Munthe-Kaas et al. (2020) propose the method TRANSFER, which aims at involving stakeholders in the early stages and final stages of a review so that the implications for a wider range of contexts can be considered. A third criterion is dependability, which is for literature reviews determined by the chain of evidence across studies, see also Section 10.4 for qualitative systematic reviews, and how invariance is linked to methods and evidence used in retrieved studies. If the same
84
3 Quality of Literature Reviews
Table 3.4 Criteria for literature reviews following the interpretivist paradigm. The interpretivist paradigm leads to criteria for literature reviews that are mostly associated with the archetype narrative reviews and qualitative protocol-driven literature reviews. The main criteria are credibility, transferability, dependability and confirmability. Note that for the protocol-driven literature reviews, triangulation, and engagement with experts or stakeholders, are additional points of consideration. Also, bias can be reduced by having at least two reviewers independently assessing retrieved studies. Criteria
Narrative reviews
Systematic literature reviews and qualitative syntheses (systematic reviews)
Credibility
• Depth and relevance of selected studies • Similarities and differences among selected studies • Comparison of contexts found in selected studies • In-depth descriptions for relevant themes
Transferability
• Generalisation or instantiation through isomorphism, homomorphism and abstraction • Extended description of different contexts
Dependability
• Chain of evidence across studies • How invariance is related to evidence and methods used
Confirmability
• Relatively delimited question(s) for review • Reflexivity
• Depth and relevance of retrieved studies • Similarities and differences among retrieved studies • Comparison of contexts found in retrieved studies • In-depth descriptions for relevant themes • Triangulation (methods, sampling) • Engaging with experts and relevant actors, focusing on results • Generalisation or instantiation through isomorphism, homomorphism and abstraction • Extended description of different contexts • Engaging with experts and relevant actors (outside domain of review) • Chain of evidence across studies • How invariance is related to evidence and methods used • [Verification by authors of studies] • Relatively delimited question(s) for review • Protocols for review • Coding (if appropriate) • Two or more researchers conducting evaluation of publications independently • Consensus and reflexivity
line of thinking was applied as in qualitative research, it could also involve engaging with original authors of studies to check whether interpretations and reformulations of studies are deviating from their intentions. Confirmability, the fourth criterion, is the degree to which the findings of the literature review could be confirmed by other researchers; this will inform to which degree the results and findings of a literature review are subjective. It is concerned with determining that data, results and interpretations of the findings found in retrieved studies are clearly derived from the data, within studies and across studies. In addition to the chain of
3.3 Associating Research Paradigms with Literature Reviews
85
evidence, this is found in protocols followed (for the case of qualitative protocol-driven literature reviews that follow an interpretivist approach, see Section 10.2), explicit statements about coding, two or more reviewers appraising each individual study, seeking consensus across reviewers and reflexivity. The latter often is a forgotten aspect in literature reviews; it concerns reflecting on how outcomes and findings of literature reviews are influenced by the personal viewpoints of researchers. Thus, the trustworthiness of systematic literature reviews and systematic reviews based on qualitative synthesis is determined by addressing a broad variety of points related to credibility, transferability, dependability and confirmability. An example of a narrative review based on an interpretivist paradigm is the study by Zhou et al. (2018). They start with a main concept (rockbursts in mining) followed by a narrative description of the existing literature that deals with this main concept from four different angles. The authors provide a narrative based on the following four different perspectives: (a) empirical criteria, (b) simulation techniques, (c) mathematical algorithms and (d) rockburst charts. Each of these perspectives looks at rockbursts from a different viewpoint and together they can be considered a comprehensive review and interpretation of the current state-of-the art. Thus, this example shows the interpretivist studies can take into account multiple perspectives as interpretation of a phenomenon.
3.3.5
Extending the Interpretivist Approach to Academic Mastery
A very different take on the interpretivist view for literature reviews is taking the perspective of academic mastery,6 bridging the gap between the interpretivist paradigm and the hermeneutic approach that follows in the next subsection. Following the thoughts set out by Kvale (1995, p. 30 ff.), academic mastery consists of checking, questioning and theorising. Although he focused on validity in the context of qualitative research using interviews, his approach has been transferred to the settings of literature reviews in Table 3.5. In this regard, checking is considering evidence found in retrieved studies and searching for patterns of convergence or divergence across studies. Convergence should be evidenced by triangulation, i.e., multiple studies, methods and perspectives indicating the same results and findings. It also involves looking for deviant studies, extreme cases, contradicting evidence and rival explanations. The latter may include looking for critical reviews of theories, conceptualisations and methods. According to Kvale (ibid., p. 28), questioning is that the what and why need to be answered before the how. This may require looking at different perspectives in studies, searching for
The common term ‘craftsmanship’ has been replaced with ‘academic mastery’ to avoid any unintended connotations. 6
86
3 Quality of Literature Reviews
Table 3.5 Criteria for literature reviews as academic mastery. Typically, for the interpretivist and constructivist paradigm as academic mastery, the criteria focus on the processes of checking, questioning and theorising. In this sense, the literature review considers deviant cases, searches for differing viewpoints, assumptions, limitations, etc. in extant scholarly knowledge. Thus, it searches for convergence and divergence by looking at patterns across studies, but also how key arguments have been constructed in literature. Criteria
Aspect
Checking
Triangulation
Questioning
Theorising
Narrative reviews and qualitative systematic literature reviews • Comparing findings of studies with others: methods, sampling and analysis • Patterns of convergence or divergence
Interpretation
• Validity of review questions • Logic of interpretations for individual studies and across studies
Considering deviant studies and searching for extreme cases
• Explicitly searching for studies that do not fit with existing conceptualisations, theories, methods and data
Searching for contradicting evidence
• Explicitly searching for falsification in literature • Considering critical reviews of theories, conceptualisations and methods • Different viewpoints to be considered • Scrutinising assumptions in studies • Different values and priorities
Searching for different interpretations Using different questions for evaluating data in studies Evaluating theoretical views
• Generating alternative questions for appraising studies • Comparing theories, conceptualisations and methods • Searching for limitations
Falsification of knowledge claims
• Searching for tautologies • Detecting underlying assumptions • Deductive. reductive and abductive reasoning
assumptions, making explicit values and clarifying priorities embedded in works. Also, generating alternative questions for appraising individual studies contributes to this point. Theorising, as the final criterion here for academic mastery, looks at methodological issues. These include comparing theoretical considerations in studies, evaluating the impact of limitations, looking for tautologies, searching for assumptions and evaluating reasoning (including considering self-evidence within studies). In addition to checking, questioning and theorising, some, for example, Montuori (2005), see academic mastery as a creative process. Interpreting texts by checking, questioning and theorising will lead to emergent issues that have not emerged before, and such is a creative process.
3.3 Associating Research Paradigms with Literature Reviews
3.3.6
87
Hermeneutic Perspectives on Literature Reviews
Another approach relevant to literature reviews arrives from hermeneutics; this particular approach is positioned within the constructivist paradigm (see Table 3.2). Hermeneutics refers to the theory and practice of interpretation, where interpretation involves an understanding that can be justified. It describes both a body of diverse methodologies for interpreting texts, objects and concepts, and a theory of understanding. Ultimately, hermeneutics is conceived as a theory of communication of information exchange developed from theories of truth to twentieth-century more contemporary theories of ontology and understanding. Hermeneutics harbours a wide range of criteria, which makes it difficult to discern how quality is judged. Most characteristic is that it is oriented towards finding and understanding that is justified. This leads to considering the holistic meaning of text in addition to how specific elements of the text were written in terms of creating meaning. It is also associated with looking at historical explanations and viewpoints. And often, ambiguities, paradoxes and contradictions are seen as intrinsic to extracting meaning from texts. Characteristic is the ‘hermeneutic circle’ (Gadamer 1975, pp. 40–2), in which texts are revisited to appraise prior meaning (akin to the hypothetico-deductive research methodology). Thus, ultimately the outcome of a hermeneutic literature review is the re-interpretation of studies with the purpose of finding new meaning and insight not yet discovered. The hermeneutic approach is suitable for undertaking literature reviews in a wide range of contexts. However, it does not seek providing an overview of pre-articulated knowledge, merely to show a gap in the literature, but to provoke thinking and reflection (Smythe and Spence 2012, p. 14). Protocol-driven literature reviews have a poor place in the hermeneutic stance towards reviews. In this regard, Boell and Cecez-Kecmanovic (2010, p. 132 ff.7; 2015, p. 163 p. ff.) demonstrate the limitations of systematic approaches to literature reviews that have become more popular, particularly in the domain of information technology for which they write. Among these limitations are a review question that is answerable, protocols for searching replacing discovery of studies, assumed relatively homogeneity in the sense that studies address the same topic in a similar way and limited views on how to interpret retrieved studies. They see literature reviews instead as a creative process where the state-of-the-art knowledge captured by literature is constantly assessed on its value. Such is better captured by hermeneutic approaches, according to them. Different from Boell and Cecez-Kecmanovic’s writing, Smythe and Spence (2012) see the hermeneutic approach as a process of questioning. The latter
7
Note that Boell and Cecez-Kecmanovic (2010, p. 134 ff.) introduce a search strategy that is reminiscent of the iterative search strategy, presented in Section 5.3, rather than representative of the hermeneutic approach as detailed in the current section. In their next writing (Boell and Cecez-Kecmanovic 2014, p. 264), the search strategy is expanded with a cycle of analysis and interpretation, closer to the analysis stages in the systematic quantitative literature review (Section 9.5) and content analysis (see Section 10.3) than to hermeneutics.
88
3 Quality of Literature Reviews
• Point of view • Review question
Initial review • Particular study • Set of studies
Search terms
Locating and identifying studies
Evaluate point of view
Generic
Across studies Studies
Specific
Argumentation
Interpreting or revisiting key sources
Coherence in emerging insight
Expanding evidence base
Fig. 3.9 Symbolic representation of hermeneutic cycle for literature reviews. A hermeneutic approach to literature reviews start often based on a specific point of view, sometimes related to a particular study (or set of studies). Alternatively, an initial search within the hermeneutic cycle leads to initial studies that contain information related to the point of view that is being examined. Questioning leads to examining texts, moving from the specific to the generic and vice versa, which is the hermeneutic cycle. Every time a study is added or examined, this process takes place also for studies that were already looked at. This cycle could include looking across studies in the case of literature reviews. Depending on the coherence of insight obtained through the continuous examination of studies, other publications may be looked for based on search terms derived from the insight obtained. The comprehension of studies may also result in a new point of view, which might trigger searching again and a renewed perspective for the hermeneutic cycle.
is directed at capturing the influence of contexts, the positing and use of conceptualisations of any kind, the building on postulations and assumptions, the forming of perspectives, the role of values and beliefs, irrespectively whether these are mentioned explicitly or implicitly in the studied works. This process is captured in Figure 3.9. In this regard, particularly, the archetypes narrative overview and narrative review fit well with the hermeneutics take on literature reviews. The approach for a literature review based on hermeneutics also depends on the explication of the reviewer’s perspective on the topic and how the review is seen as part of a dialogue, within the review, but also across reviews. The hermeneutic approach to literature reviews, in some way complementary to academic mastery, adds a more reflective approach. However, it should be noted that hermeneutics can result in methodological problems because there are lacking descriptions of how it should be conducted in detail; particularly, the hermeneutic cycle in modern-day thought is seen as problematic because of different presuppositions that may have to be dealt with in different ways and how cycling between the whole and its parts influences interpretations (Grondin 2015, p. 299).
3.3 Associating Research Paradigms with Literature Reviews
89
Table 3.6 Criteria for literature reviews following the hermeneutic approach. In literature reviews based on the hermeneutic paradigm, key criteria are evidence of the hermeneutic cycle and reflexivity. The first criterion—hermeneutic cycle—is evidenced by the provision of critical evaluations of how individual studies have been interpreted, how such influenced outcomes of evaluation of earlier studies and the lens (point of view) through which all studies are appraised. Also, it calls for documenting how the dialogue between reviewers in the team happened, authors of studies and other relevant scholars. The second criterion addresses reflexivity, expressed by personal interests, values and beliefs, and how changes in viewpoint were triggered. Criteria Hermeneutic cycle
Aspect Reinterpretation
Narrative reviews and qualitative literature reviews • ‘Thick’ descriptions of how individual studies are interpreted • Revisiting earlier review studies based on gained insight from other studies • Reformulation of lens for revisiting studies
Dialogue
• Process for dialogue between reviewers • Engagement with other, relevant scholars (or experts) in domain of review
Reflexivity
• Personal interests in topic • Articulation of assumptions, values and beliefs • Explicitation of insight gained • Changes in viewpoint clarified
Notwithstanding such methodological controversy, the approach embodied by this circle can be used to appraise literature for each study that is retrieved and for the body of knowledge that is collected; see Table 3.6. The approach requires identifying a pre-understanding or articulated relevance of a study before the appraisal. The evaluation of the text leads to a ‘thick’ description of its content in relation to the interpretation sought. The renewed insight can be used to review the study again or used for interpretations of other studies. Renewed insight leads to revisiting studies that were already appraised. This can be punctuated by dialogue among the reviewers and seeking insight from scholars or experts in the domain. Related to the hermeneutic cycle, there is reflexivity. It involves stating personal interests and viewpoints on the topic in advance, and also, how insight gained changed these perspectives, beliefs and values. Even though the methods and approach could lead to methodological problems, the hermeneutic approach to literature results in considering studies in-depth and development of knowledge based on the individual studies and the holistic body of knowledge for the retrieved studies. An instance of a hermeneutic literature review is the review by Shephard et al. (2019) into education for sustainable development. After the rationale, the authors (ibid., p. 534) convey their own stance towards the topic and their point of departure. They provide a narrative of how they looked at three papers (ibid., pp. 535–6), and describe the process of engaging with authors of another work to reshape their view on the previous studies considered (ibid., pp. 537–8); in this manner, they evidence
90
3 Quality of Literature Reviews
the hermeneutic cycle. In the discussion section, they (ibid., pp. 542–3) refer to how their own thoughts evolved as a consequence of engaging with literature in-depth. In this sense, this specific literature review is built on interpreting text and revisiting text to form thoughts and on being specific to insight gained. NOTE: HERMENEUTIC APPROACHES DO NOT EXCLUDE PROTOCOL-DRIVEN REVIEWS Although Boell and Cecez-Kecmanovic’s (2010, 2015) stance sheds a negative light on systematic approaches to literature reviews, the systematic and hermeneutic approaches are actually not mutually exclusive. NOTE: CRITERIA APPROACHES CAN BE COMPLEMENTARY Particularly for qualitative-oriented literature reviews, the criteria for the interpretivist paradigm, academic mastery and hermeneutic approach can be used complementary to each other.
3.4
Quality by Effectively Linking Literature Reviews to Empirical Studies
When literature reviews are used for justifying and informing empirical studies, the question arises how they can be effectively linked to the design of the research methodology. Such goes beyond the representation of the literature review as a ‘Swiss cheese’ model (Maier 2013), mentioned in Section 3.1. This model views the purpose of a literature review as identifying gaps that need to be filled. Then, an inadequate literature review will insufficiently support the empirical study by not properly identifying gaps, or overlooking literature that either pinpoints gaps or filled gaps already. As Dellinger (2005, p. 51) writes: ‘even controlled experimental studies may produce invalid inferences if selected literature is inadequately evaluated resulting in poor problem/theory development, theoretical structure, measurement procedures or methodological characteristics.’ Therefore, adequately relating a literature review to an empirical study is of paramount importance and for this reason this section looks at the quality of the connection between the literature review and an empirical study. There are two different starting points for connecting the literature review to an empirical study. One point of departure is that the research objectives, see Figure 2.1, are relatively well-specified. Then the literature review will aim at finding relevant literature to justify the research and inform the design of the research methodology. Typically, these literature reviews take the form of narrative overviews, and sometimes, narrative reviews. The second starting point is the research objective which is relatively ill-defined. In this case, the literature review is used to gain insight what needs to be investigated from which perspective. Normally, such literature reviews are of the archetype narrative reviews, but also systematic literature review are used to this purpose. However, both cases of
3.4 Quality by Effectively Linking Literature Reviews …
91
starting points for the literature review will lead to more detail what needs to be investigated during an empirical study. The first connection between a literature review and a related empirical study is found in ontological considerations; see Figure 3.10. Normally, the research objectives state what will be investigated, whether those are objects, entities or subjects. The undertaking of a literature review leads to more detail on what needs to be considered during the empirical study, such as attitudes, concepts, perspectives or variables. Some (e.g., Rowley and Slack 2004, p. 36) call this exploring of literature concept mapping. It covers more than creating an overview of concepts and constructs, as mentioned by Nakano and Muniz Jr. (2018, p. e20170086/4); in addition to relationships between constructs, they stress that it should support arguments that are made. These ontological considerations are used for the design of the research methodology, i.e., the rationale for which data are collected, which method is used for obtaining data and how analysis will take place; the research methodology covers the detailed research questions, the design of the research method and the design of the data collection in Figure 3.10. Thus, concepts, their relationships, which could be presented in a concept map, and related arguments are the first way how a literature review connects the research objectives of an empirical study with its design of the research methodology; constructs can take various forms from variables to perspectives to attitudes. A second connection between a literature review and a related empirical study is prior knowledge on how constructs relate and in which contexts. The existing scholarly knowledge could be related to specific contexts, and therefore, could or could not be applicable to the specific context of the empirical study. Furthermore, the extent of the knowledge in literature will also indicate whether a hypothetico-deductive or inductive research methodology is more appropriate for the empirical study; see also Section 2.6 for the implications of the research strategy on the purpose of a literature review. In practice, it is more likely that some, but not all, knowledge is known; such leads to balancing both approaches in a research strategy rather than choosing only one. Also, it could be that alternative explanations exist in literature, and then, an empirical study could aim at providing evidence to reduce the likelihood of weaker explanations and supporting more likely ones. Therefore, the sufficiency of knowledge determines which approach the design of the research methodology should take, a second way how a literature review connects to an empirical study. The third connection between a literature review and a related empirical study are considerations about methodology. It is highly likely that specific findings are also related to the research methods found in retrieved studies. For example, through surveys statistical relationships may have been affirmed in a specific strand or topic of research. However, it could that these have either little meaning for practice or are specific to research methods and selection of constructs. In such cases, the outcome of a literature review is also which methods are necessary to complement extant findings. Thus, the literature review informs which methods are related to which findings, which gaps in methods used in extant studies exist and which range of methods could be more suitable for addressing specific research questions.
(Detailed/ Refined) Research Questions Rationale • Hypothetico-deductive • Inductive • Abductive
iii. Relationship between findings, methods used and paradigms in literature
ii. Sufficiency of extant knowledge • Descriptive, explanatory, predictive • Alternate or competing theoretical foundations • Gaps in scholarly knowledge
i. Ontology from literature • What to study in more detail • Adjusted ontological considerations
Literature Review
Designed Data Collection
Design of Research Method
Fig. 3.10 Connecting literature reviews to design of research method based on research paradigms. This figure complements how literature is used during the research process in Figure 2.2. The starting point for an empirical study are the research objectives; these contain what is going to be studies in which context, thus providing clarification on ontological concepts being used for the study. A literature review provides more detail on the ontology; particularly, more details about the objects, subject and phenomena of study, and also in which contexts they have been studied. The account of these in the literature review leads to more detail about the ontology embedded in the research questions, hypotheses and propositions; for example, in the case of hypotheses the constructs to be used. A second aspect of the literature review that it should reveal to what extent scholarly knowledge is sufficient, i.e., will the study be more oriented towards exploratory research, explanatory research or predictive research. In practice, this will lead to balancing the hypothetico-deductive and inductive research methodologies for the empirical study. Finally, the examination of scholarly knowledge should also lead to insight how specific findings in literature are related to methods used and research paradigms.
Ontology embedded in research objectives • What to study • (In which context)
Research Objectives
92 3 Quality of Literature Reviews
3.4 Quality by Effectively Linking Literature Reviews …
93
This means that a literature review provides information and insight on three aspects that influence the design of the research methodology. First, related to the research objective(s) for the overall empirical study, the literature review will denote ontological considerations of more detail (denoted by point ‘i’ in Figure 3.10). More particulars on ontology for a specific study result in more defined research questions for the actual empirical study (the dashed arrow in the figure) and informs the design of the research methodology. Second, the appraisal of relevant literature will also inform to what extent scholarly knowledge is sufficient to explain or predict the phenomena (point ‘ii’ in the figure); this includes an appraisal whether alternate or competing theoretical foundations exist in literature and what gaps are identified in scholarly knowledge from the viewpoint of the review question. If there is sufficient knowledge, then a hypothetico-deductive research methodology can be followed. However, if there is insufficient knowledge, then it is more likely that an inductive research methodology will be fruitful. In practice, as pointed out before in this section, knowledge is available, whether in the form of studies into the topic or analogies drawn from related topics and disciplines. This implies that studies typically can draw on both research methodologies—hypothetico-deductive and inductive—and that the design of the research methodology may have to accommodate both (the dashed arrow in the figure). The third aspect that guides the design of the research methodology is which methods in extant studies can be related to specific outcomes. An example is the relationship between strategies and functional strategies (departmental or divisional strategies) that can be proven by statistical means, but the implementation of a functional strategy may depend more on how those involved view the formation of the strategy and actual decision making, which is normally covered by qualitative research. This example show that specific outcomes can be related to the use of specific research methods (point ‘iii’ in the figure). It then depends on the research objective whether specific outcomes are pursued and the study can build on research methods in use for the topic or whether a gap in knowledge needs to addressed in which case a lesser or not used method could be more appropriate (the dashed arrow in the figure). Thus, the design of the research methodology is influenced by the ontological considerations, to what extent scholarly knowledge is adequate to the purpose of the empirical study and which methods are more suitable for the contribution to knowledge the empirical study aims to make; all three aspects are normally addressed by the literature review.
3.5
Quality by Evidencing Engagement with Consulted Studies
Writing literature reviews also means evidencing that other studies have been consulted, read properly and related to the topic at hand. As can be seen from the writing so far in this book, just summarising and capturing contributions by others is insufficient. Rather, the main purpose of literature reviews is analysing and synthesising literature from review questions, that may be related to empirical
94
3 Quality of Literature Reviews
research following directly the literature review. In any case, this requires weaving together what studies have said or done that does justice to what the original writings intended to tell; however, such does not mean that a literature review cannot be critical about earlier writings. To this purpose, this section pay attention to close reading and achieving rigour when citing studies in-text; it also includes pointers to what should not be done.
3.5.1
Close Reading
Close reading, an essential skill for conducting literature reviews, requires evaluating the content of a study on its consistency and appropriateness for the review. This goes further than critical reading, which is captured by the method for appraisal presented in Section 2.3. Critical reading is directed at evaluating a publication or study from its relevance to a review question (or research objective for an empirical study). Close reading is the interpretation of the publication of study, and ensuring all details are fully understood and properly represented in a literature review. Thus, close reading can be seen as a condition for critical reading, but goes further in terms of paying attention to how relevant details are explicitly interpreted and represented. One aspect of close reading is considering whether a retrieved study achieved the objectives of that study, and did so in a convincing and credible manner. Another aspect of close reading is relating the contents of the study to the questions posed in the literature review. This means that the writing about a source should reflect paying attention to differing research objectives, the context of a study, how it was conducted or any other aspect that is relevant to understand parts of the study or the study in its entirety. Such is not just a literary ornament, but an expression that a retrieved study was understood in its meaning and relevance. Third, close reading also covers understanding the meaning of reasoning, arguments and statements, even sometimes beyond what authors have written themselves. Sometimes, studies have inaccurate expressions or use terminology that is not common in a domain; an instance is the use of ‘in-house outsourcing’, which is commonly and more appropriately called ‘captive offshoring’ in operations and supply chain management.8 Therefore, close reading of selected works as extension of critical reading is necessary to ensure exactness with regard to the interpretation of publications and studies, and the representation in a literature review. If there needs to be a claim made beyond what is found in the original study, then reasoning should be provided to ensure transparency. This reasoning can presented in two ways. In the first instance, the review may provide additional contextual information or explicitly-stated reasons why the fragment or a claim is connected to
A work in point noting this confusion about the use of the term ‘offshoring’ and related wording is Jahns et al. (2006, pp. 222–3); to support their interpretation, they introduce a matrix to delineate the concepts for offshoring and outsourcing.
8
3.5 Quality by Evidencing Engagement with Consulted Studies
95
the topic at hand. Alternatively, a more extended reasoning or comparing of contexts could be presented. This could be valid for a larger number of studies. A case in point of the latter would be comparing characteristics of organisms and organisations for the validity of evolutionary mechanisms derived from evolutionary biology before discussing literature; Dekkers (2005, pp. 67–75) does so in the context of searching for evolutionary models to describe the interaction between organisations and the environment. Such implies that somehow the context of a study being investigated should be related to the objectives of a literature review before inferences are drawn about its usefulness or statements are drawn from a source.
3.5.2
Achieving Rigour for Citations-In-Text
Literature reviews consider evidence or reasoning presented in other papers rather than just lifting statements from publications. This should be reflected in adequate paraphrasing, as one form of citations-in-text. Paraphrasing is reformulating a statement of a text, a passage of text or a fragment of a text in other words; text that is paraphrased may cover more than one page. It is closely related to quoting, which is using text directly. In the case of citing, it is common to use quotation marks or any other typographical marking to indicate that the original text is used. When paraphrasing no quotations marks or other typographical marking is used, which implied that paraphrasing must meet a few criteria to be correct and meaningful: • The actual meaning of the fragment in the cited source should not be altered. This means that the context from which the fragment is paraphrased should be taken into consideration in order to have a correct representation of the original wording. However, it is not necessary to include all details from the original passage. • In addition, one point made in the cited fragment should not be stressed more than others. If one point needs to be stressed more, because of the direction of the literature review, then such should be mentioned. • The text of the paraphrasing should not be longer than the original writing. An exception could be when terms are substituted or need to be explained. Another acceptable reason is when the context needs to be added to paraphrasing. Inadequate paraphrasing happens frequently; two examples are shown in Box 3.A. This means that paraphrasing requires understanding of the original text for it to be correct and complete. TIP: STATE PAGES WHEN QUOTING AND PARAPHRASING For both quoting and paraphrasing, stating the relevant pages supports readers, and if applicable, assessors, to find more directly how the statements were derived from a text; even though poorly practised, both quoting and paraphrasing refer to a specific fragment of a publication. For example, in the case of Mohammed et al. (2016, p. 700) citing Campbell et al. (2003), see Box 3.B, it is not directly obvious from which fragment of the text they have quoted or paraphrased; thus, the inclusion of page numbers for the citations-in-text would have made it possibly easier on which fragments in other studies their citing was based.
96
3 Quality of Literature Reviews
Box 3.A Two Examples for Incorrect Paraphrasing Example 1 In the text of Bodolica and Spraggon (2018, p. 2474) this statement appears: • ‘Contrary to common beliefs, influential review articles are relatively more difficult to produce and publish than original research papers because they require a certain level of maturity, longstanding expertise and an in-depth understanding of the field of inquiry (Gomez-Mejia et al. 2011).’ This citation-in-text is attributed to Gomez-Mejia et al. (2011), in which this statement or a similar phrasing cannot be found; in no terms they discuss the production of review articles. It could be that Bodolica and Spraggon (2018, p. 2474) meant to say that the review by Gomez-Mejia et al. (2011) was an influential article; however, this would have required to write an entirely different sentence, for example: • ‘Contrary to common beliefs, influential review articles such as Gomez-Mejia et al. (2011), are relatively more difficult to produce and publish than original research papers …’ Example 2 Dekkers and Kühnle (2012) are paraphrased by de Moura and Botter (2017, p. 883) for the following three statements: • ‘Factors such as financial sustainability, ways of relating to their supply chain and customers, reliability and recognised quality of products and service are key points that shall be taken into consideration when making strategic decision for a company to become globally competitive.’ • ‘Thus, it is essential that companies make investment as a way tostand out from competitors and gain recognition.’ • ‘It is also essential to integrate innovative business strategy of a company and its partners.’ However, none of these statements can be traced back to the original text nor can they be considered interpretations of it.
TIP: AVOID SUMMARY CITATIONS (AKA REFERENCE STACKING) Another problem that sometimes surfaces is the practice of summary citations, when referring to other works; this is sometimes called reference stacking (e.g., McKercher et al., 2007, p. 465). For instance, the text of de Moura and Botter (2017, p. 883), also mentioned in Box 3.A, shows that for the same statements a considerable number of other publications are cited, some repeatedly for the three statements. It is unlikely that many studies have similar findings or reasoning, and thus, such indicates that literature has not been well consulted or is not placed in context. Moreover, this practice of summary citations does little justice to the authors of the works cited. This means that close reading and understanding, sometimes subtle differences, of existing publications is a prerequisite for producing high-quality literature reviews.
3.5 Quality by Evidencing Engagement with Consulted Studies
97
Box 3.B Example of Citation-In-Text Being Questionable In the text of Mohammed et al. (2016, p. 700) the following statement appears: ‘… whether and how to appraise the quality of included studies, what are the preferred criteria to use, the value of the quality assessment, and whether or not to exclude studies based on their quality assessment [8, 10, 57].’ As can be seen this is based on three references; the citation path, as shown in the figure, is as follows: • Mohammed et al.’s statement is based on three sources, from which one is directly quoted. This means that the other two are paraphrased, which is not clear from Mohammed et al.’s sentence. • For the quoted statement, Atkins et al. (2008, p. 5) refer to two studies, which discuss the quality of individual qualitative studies, but not in the context of qualitative synthesis. • With regard to paraphrasing Campbell et al. (2003, pp. 672–3), the statement cannot directly be understood from this text. • Dixon-Woods (2005, p. 52) cite one source for the debate, i.e. Yin and Heald (1975), who present the case survey method in the context of the positivist research paradigm instead of qualitative synthesis.
Mohammed et al. (2016, p. 700): ‘... whether and how ... quality assessment ...’
directly cited from
paraphrased from
Dixon-Woods et al. (2005, p. 52)
Atkins et al. (2008, p. 5)
Campbell et al. (2003, pp. 672–3) • Referring to source Murphy et al. (1998) • Range of checklists (1996–2000) for individual studies Spencer et al. (2003) • Report about quality individual qualitative studies Mays and Pope (2000) • About quality criteria for qualititative research compared with quantitative studies • Setting out quality criteria for qualitative research
Individual qualitative studies
Estabrooks et al. (1994, p. 508) • Exclusion of weak studies, similar to excluding ‘suspect’ interview, with reference to Morse (1989)
Qualitative synthesis
Yin and Heald (1975) • Case survey method
Positivist case study method
98
3 Quality of Literature Reviews
TIP: CAUTION WHEN CITING CITATIONS-IN-TEXT (AKA SECOND-HAND CITATIONS) Particularly, using citations-in-text from other studies (aka second-hand citations) should be avoided. The example of a statement by Mohammed et al. (2016, p. 700) in Box 3.B shows that the underpinning of the statement can be questioned. None of the sources cited provides a detailed argument about the quality of retrieved studies. Thus, the studies cited in the chain miss out on the point that aggregation could result in more convincing evidence even when including weaker studies. If the chain of citations-in-text would have been investigated by Mohammed et al., they would have paraphrased the studies differently; in addition, if page numbers would have been added, then the tracing of the chain of statements and citations would have been easier, too. Furthermore, Kennedy (2007, p. 141) points to evidence about mistakes in second-hand citations. A particular study may be cited by one author and then re-cited by other authors who never read the original piece but instead are basing their citations on the first citations’s description of the study. Evidence was found by her in which particular articles were incorrectly cited in the same way by multiple authors, a pattern suggesting that one author’s mistake was copied by several others; this is called third-hand citations. This means that caution should be exerted when citing citations-in-text and it is often better to check whether these are correct, most definitely when such second-hand citations pertain to the key arguments found in a literature review.
3.6
Key Points
• The quality of a literature review is determined by its fitness for purpose, which is determined by the purpose of a review. Also, how a literature review effectively informs later empirical studies plays a role in determining its quality. • For a literature review preceding an empirical study, the archetypes narrative overview, narrative review and systematic literature review are most suitable. The choice for which archetype fits best depends on how narrowly research objectives for the empirical study are defined, the breadth and depth of scholarly knowledge related to the study and which information is necessary to inform the design of the research methodology. See Figures 2.2 and 3.10 for more information about how literature reviews are connected to empirical studies and how they inform stages of the research process. • One way to enhance the quality of a literature review is by being aware how the different processes for the archetypes are conducted. These processes for the specific archetypes have been displayed in Figures 3.2, 3.3, 3.4 and 3.5. Although the processes have similar steps in common, there are differences in how they cover relevant literature is, how retrieved studies are analysed and how synthesis across studies is achieved.
3.6 Key Points
99
• Another way of appraising the quality of literature reviews is to be aware of the research paradigm. A distinction has been made between the (post)positivist paradigm, interpretivist paradigm and constructivist paradigm in Section 3.3. Related to these paradigms four sets of criteria have been presented (Tables 3.2, 3.3, 3.4 and 3.5); the criteria for academic mastery find themselves in-between the interpretivist and constructivist paradigm. • In case a literature review is connected to an empirical study, three considerations (ontology, epistemology and methodology) play a role in assessing its quality. These are presented in Section 3.4 and depicted in Figure 3.10. • With regard to practical advice related to evidencing close reading and engagement with scholarly knowledge in literature, the following matters should be taken into account: • Consider the context of a study when citing, particularly when paraphrasing. • Paraphrase without changing the meaning of the original fragment. • Avoid summary citations. • Be careful with citing citations-in-text (in other studies), aka second-hand citations and third-hand citations.
3.7 3.7.1
How to … … Evaluate the Quality of a Literature Review
For evaluating the quality of literature reviews, there are four points to consider. First, the purpose of the literature review—advances in scholarly knowledge or guidance for practice or both—should be clarified and this purpose should guide the process of the literature review and its outcomes. Second, the archetype is another facet that determines the quality of a literature review. Information about the archetypes and their processes can be found in Section 3.2. Third, quality criteria for specific type of literature reviews and specific research paradigms should be accounted for. Section 3.3 presents three research paradigms that imply also different ways of assessing the quality of a literature review. Fourth, the accuracy of referring to, citing and paraphrasing other works should be high in order to avoid incorrect representations of the works by others. Some practical advice is provided in Section 3.5. Thus, evaluating the quality of a literature review requires paying attention to whether it meets its purpose, which processes have been followed, whether it follows criteria for research paradigms and whether there is sufficient accuracy when referring to other sources. The processes for the archetypes differ, and consequently, the criteria for assessing the quality of a literature review. For the archetypes narrative overview and narrative review the relevance of the arguments related to the purpose of the literature review brought forward is the main criterion. In the case of the
100
3 Quality of Literature Reviews
protocol-driven literature reviews—systematic literature review and systematic reviews—the protocol for conducting the processes also plays an important role for determining the quality. With regard to content, topical surveys should be avoided, because they hardly exceed the results and findings embedded in retrieved studies. This means that there are different pathways for achieving the purpose of a literature review and each should be assessed on its merits for the topic to ensure the best fit.
3.7.2
… Achieve a Higher Degree of Accuracy in Literature Reviews
The first point for achieving more exactness in literature reviews is ensuring that all relevant studies are included. Relevant studies contain theories, conceptualisations, laws of observed regularities, frameworks, methods, tools, perspectives, etc. In the case of more defined questions for a review or research objectives for an empirical study, constructs, variables, etc. are what is of interest. Search strategies, presented in Chapter 5, should identify the relevant studies, based on the purpose of a literature review, see Section 3.1, and the defining of review questions, see Chapter 4. A second point for achieving more exactness in literature reviews is assuring that relevant studies are properly cited and paraphrased. This means paying attention to purpose of a publication, the context in which it is was written and the argumentation provided by authors. Particularly, when paraphrasing a study, the meaning of the fragment should not change. Similarly, caution should be exerted when citing citations-in-text; the evidence chain should be investigated to avoid replicating incorrect citations. Furthermore, summary citations should be avoided. These four concerns have been addressed in Section 3.5.
3.7.3
… Write Literature Reviews
The foundation for writing literature reviews of good quality starts with close reading and appraising individual studies that are considered before aggregating and synthesising appraisals into insight that transcends summaries of the individual studies. If performed adequately, this process avoids common flaws, such as summary citations and second-hand citations that are incorrect; the latter requires also reading cited studies, certainly when pertaining to key arguments made. See Section 3.5 for guidance on this matter. The citing of relevant studies should be placed in the context of the purpose for the literature review; this could be either advances in scholarly knowledge or guidance for practice or both; see Section 3.1. Such is also reflected in the archetypes of literature reviews, introduced in Section 2.5, with their typical processes expanded in Section 3.2. Therefore, the
3.7 How to …
101
quality of a literature review is determined in the first place by its fitness for purpose and by evidence close reading of studies that have been consulted. Depending on the purpose of the study, the quality can be enhanced by paying attention to the processes of the archetypes, the prevalent research paradigm and related criteria. Even though the basic steps for a literature review for all four archetypes are similar, see Section 3.2, they differ in for what reason studies are searched for, how the coverage and selection of relevant studies takes place and the methodological approaches to analysis and synthesis of retrieved studies. Furthermore, the way a literature review is conducted can be related to research paradigms: positivism, post-positivism, interpretivism and constructivism; see Section 3.3. Ontological, epistemological and methodological considerations also underpin criteria for literature reviews related to empirical studies that can be related to these research paradigms; see Section 3.4. Implicit or explicit use of criteria related to research paradigms and processes for the archetypes of literature reviews will result in more robust and credible outcomes, and in this manner contribute to better articulation of advances in scholarly knowledge and guidance for practice. When conducting literature reviews for empirical studies, they serve three explicit outcomes. The first one is the suitability of scholarly knowledge and possible gaps. Second, a review should lead to transforming research objectives for an empirical study into more refined research questions, hypotheses or propositions; the suitability of research methodologies and methods then depends on to what extent they allow answering the more refined research questions, hypotheses or propositions. Finally, the literature review should identify which theories, conceptualisations, constructs, frameworks, methods, tools and perspectives are taken forward in the empirical study. Paying attention to three outcomes, also ensures that literature reviews effectively connect to an empirical study.
References Atkins S, Lewin S, Smith H, Engel M, Fretheim A, Volmink J (2008) Conducting a meta-ethnography of qualitative literature: lessons learnt. BMC Med Res Methodol 8(1):21. https://doi.org/10.1186/1471-2288-8-21 Avellar SA, Thomas J, Kleinman R, Sama-Miller E, Woodruff SE, Coughlin R, Westbrook TPR (2017) External validity: the next step for systematic reviews? Eval Rev 41(4):283–325. https:// doi.org/10.1177/0193841x16665199 Babakus WS, Thompson JL (2012) Physical activity among South Asian women: a systematic, mixed-methods review. Int J Behav Nutr Phys Act 9(1):150. https://doi.org/10.1186/14795868-9-150 Bailey R, Pearce G, Smith C, Sutherland M, Stack N, Winstanley C, Dickenson M (2012) Improving the educational achievement of gifted and talented students: a systematic review. Talent Dev Excel 4(1):33–48 Bearman M (2016) Quality and literature reviews: beyond reporting standards. Med Educ 50 (4):382–384. https://doi.org/10.1111/medu.12984
102
3 Quality of Literature Reviews
Bengtsson L, Elg U, Lind J-I (1997) Bridging the transatlantic publishing gap: how North American reviewers evaluate European idiographic research. Scand J Manag 13(4):473–492. https://doi.org/10.1016/S0956-5221(97)00022-5 Bergdahl E (2019) Is meta-synthesis turning rich descriptions into thin reductions? A criticism of meta-aggregation as a form of qualitative synthesis. Nurs Inq 26(1):e12273. https://doi.org/10. 1111/nin.12273 Berkovich I (2018) Beyond qualitative/quantitative structuralism: the positivist qualitative research and the paradigmatic disclaimer. Qual Quant 52(5):2063–2077. https://doi.org/10.1007/ s11135-017-0607-3 Bodolica V, Spraggon M (2018) An end-to-end process of writing and publishing influential literature review articles: do’s and don’ts. Manag Decis 56(11):2472–2486. https://doi.org/10. 1108/MD-03-2018-0253 Boell SK, Cecez-Kecmanovic D (2010) Literature reviews and the hermeneutic circle. Aust Acad Res Libr 41(2):129–144. https://doi.org/10.1080/00048623.2010.10721450 Boell SK, Cecez-Kecmanovic D (2014) A hermeneutic approach for conducting literature reviews and literature searches. Commun Assoc Inf Syst 34:257–286. https://doi.org/10.17705/1CAIS. 03412 Boell SK, Cecez-Kecmanovic D (2015) On being ‘systematic’ in literature reviews in IS. J Inf Technol 30(2):161–173. https://doi.org/10.1057/jit.2014.26 Bolderston A (2008) Writing an effective literature review. J Med Imaging Radiat Sci 39(2):86–92. https://doi.org/10.1016/j.jmir.2008.04.009 Borras SM, Hall R, Scoones I, White B, Wolford W (2011) Towards a better understanding of global land grabbing: an editorial introduction. J Peasant Stud 38(2):209–216. https://doi.org/ 10.1080/03066150.2011.559005 Brandstätter M, Baumann U, Borasio GD, Fegg MJ (2012) Systematic review of meaning in life assessment instruments. Psychooncology 21(10):1034–1052. https://doi.org/10.1002/pon.2113 Brereton P, Kitchenham BA, Budgen D, Turner M, Khalil M (2007) Lessons from applying the systematic literature review process within the software engineering domain. J Syst Softw 80 (4):571–583. https://doi.org/10.1016/j.jss.2006.07.009 Campbell R, Pound P, Pope C, Britten N, Pill R, Morgan M, Donovan J (2003) Evaluating meta-ethnography: a synthesis of qualitative research on lay experiences of diabetes and diabetes care. Soc Sci Med 56(4):671–684. https://doi.org/10.1016/S0277-9536(02)00064-3 de Groot AD (1969) Methodology: foundations of inference and research in the behavioral sciences. The Hague, Mouton de Moura DA, Botter RC (2017) Toyota production system—one example to shipbuilding industry. Indep J Manag Prod 8(3):874–897. https://doi.org/10.14807/ijmp.v8i3.626 Dekkers R (2005) (R)Evolution, organizations and the dynamics of the environment. Springer, New York Dekkers R (2017) Applied systems theory, 2nd edn. Springer, Cham Dekkers R, Kühnle H (2012) Appraising interdisciplinary contributions to theory for collaborative (manufacturing) networks: still a long way to go? J Manuf Technol Manag 23(8):1090–1128. https://doi.org/10.1108/17410381211276899 Delllinger AB (2005) Validity and the review of literature. Res Sch 12(2):41–54 Denzin NK, Lincoln YS (1994) Handbook of qualitative research. Sage, Thousands Oaks, CA Dixon-Woods M, Agarwal S, Jones D, Young B, Sutton A (2005) Synthesising qualitative and quantitative evidence: a review of possible methods. J Health Serv Res Policy 10(1):45–53b. https://doi.org/10.1258/1355819052801804 Dwan K, Altman DG, Arnaiz JA, Bloom J, Chan A-W, Cronin E, Williamson PR (2008) Systematic review of the empirical evidence of study publication bias and outcome reporting bias. PLoS One 3(8):e3081. https://doi.org/10.1371/journal.pone.0003081 Ellaway RH, O’Gorman L, Strasser R, Marsh DC, Graves L, Fink P, Cervin C (2016) A critical hybrid realist-outcomes systematic review of relationships between medical education programmes and communities: BEME Guide No. 35. Med Teach 38(3):229–245. https://doi. org/10.3109/0142159X.2015.1112894
References
103
Estabrooks CA, Field PA, Morse JM (1994) Aggregating qualitative findings: an approach to theory development. Qual Health Res 4(4):503–511. https://doi.org/10.1177/ 104973239400400410 Furunes T (2019) Reflections on systematic reviews: moving golden standards? Scand J Hosp Tour 19(3):227–231. https://doi.org/10.1080/15022250.2019.1584965 Gadamer H-G (1975) The problem of historical consciousness. Grad Fac Philos J 5(1):8–52. https://doi.org/10.5840/gfpj1975512 Galati G, Moessner R (2013) Macroprudential policy—a literature review. J Econ Surv 27(5):846– 878. https://doi.org/10.1111/j.1467-6419.2012.00729.x Gomez-Mejia LR, Cruz C, Berrone P, De Castro J (2011) The bind that ties: socioemotional wealth preservation in family firms. Acad Manag Ann 5(1):653–707. https://doi.org/10.5465/ 19416520.2011.593320 Gough D, Elbourne D (2002) Systematic research synthesis to inform policy, practice and democratic debate. Soc Policy Soc 1(3):225–236. https://doi.org/10.1017/ S147474640200307X Granello DH (2001) Promoting cognitive complexity in graduate written work: using Bloom’s taxonomy as a pedagogical tool to improve literature reviews. Couns Educ Superv 40(4):292– 307. https://doi.org/10.1002/j.1556-6978.2001.tb01261.x Grondin J (2015) The hermeneutical circle. In: Keane N, Lawn C (eds) The Blackwell companion to hermeneutics. Wiley, Chichester, pp 299–305 Guba EG, Lincoln YS (1994) Competing paradigms in qualitative research. In: Denzin NK, Lincoln YS (eds) Handbook of qualitative research. Sage, Thousand Oaks, CA, pp 105–117 Hagedoorn J, Duysters G (2002) External sources of innovative capabilities: the preferences for strategic alliances or mergers and acquisitions. J Manag Stud 39(2):167–188 Hurlburt RT, Knapp TJ (2006) Münsterberg in 1898, not Allport in 1937, introduced the terms ‘idiographic’ and ‘nomothetic’ to American psychology. Theory Psychol 16(2):287–293. https://doi.org/10.1177/0959354306062541 Jahns C, Hartmann E, Bals L (2006) Offshoring: dimensions and diffusion of a new business concept. J Purch Supply Manag 12(4):218–231. https://doi.org/10.1016/j.pursup.2006.10.001 Janesick VJ (1994) The dance of qualitative research design: metaphor, methodolatry, and meaning. In: Denzin NK, Lincoln YS (eds) Handbook of qualitative research. Sage, Thousand Oaks, CA, pp 209–219 Jungherr A (2016) Twitter use in election campaigns: a systematic literature review. J Inform Tech Polit 13(1):72–91. https://doi.org/10.1080/19331681.2015.1132401 Karakas F (2010) Spirituality and performance in organizations: a literature review. J Bus Ethics 94(1):89–106. https://doi.org/10.1007/s10551-009-0251-5 Kennedy MM (2007) Defining a literature. Educ Res 36(3):139–147. https://doi.org/10.3102/ 0013189x07299197 Kvale S (1995) The social construction of validity. Qual Inq 1(1):19–40. https://doi.org/10.1177/ 107780049500100103 Lawrence M, Kerr S, McVey C, Godwin J (2012) The effectiveness of secondary prevention lifestyle interventions designed to change lifestyle behavior following stroke: summary of a systematic review. Int J Stroke 7(3):243–247. https://doi.org/10.1111/j.1747-4949.2012.00771.x Lin AC (1998) Bridging positivist and interpretivist approaches to qualitative methods. Policy Stud J 26(1):162–180. https://doi.org/10.1111/j.1541-0072.1998.tb01931.x Long HA, French DP, Brooks JM (2020) Optimising the value of the critical appraisal skills programme (CASP) tool for quality appraisal in qualitative evidence synthesis. Res Methods Med Health Sci 1(1):31–42. https://doi.org/10.1177/2632084320947559 Maier HR (2013) What constitutes a good literature review and why does its quality matter? Environ Model Softw 43:3–4. https://doi.org/10.1016/j.envsoft.2013.02.004 Mays N, Pope C (2000) Assessing quality in qualitative research. BMJ 320(7226):50–52. https:// doi.org/10.1136/bmj.320.7226.50 McKercher B, Law R, Weber K, Song H, Hsu C (2007) Why referees reject manuscripts. J Hosp Tour Res 31(4):455–470. https://doi.org/10.1177/1096348007302355
104
3 Quality of Literature Reviews
Mohammed MA, Moles RJ, Chen TF (2016) Meta-synthesis of qualitative research: the challenges and opportunities. Int J Clin Pharm 38(3):695–704. https://doi.org/10.1007/s11096-016-0289-2 Montuori A (2005) Literature review as creative inquiry: reframing scholarship as a creative process. J Transform Educ 3(4):374–393. https://doi.org/10.1177/1541344605279381 Moja LP, Telaro E, D’Amico R, Moschetti I, Coe L, Liberati A (2005) Assessment of methodological quality of primary studies by systematic reviews: results of the metaquality cross sectional study. BMJ 330(7499):1053. https://doi.org/10.1136/bmj.38414.515938.8F Münsterberg H (1899) Psychology and history. Psychol Rev VI(I):1–31. https://doi.org/10.1037/ h0071306 Munthe-Kaas H, Nøkleby H, Lewin S, Glenton C (2020) The TRANSFER approach for assessing the transferability of systematic review findings. BMC Med Res Methodol 20(1):11. https:// doi.org/10.1186/s12874-019-0834-5 Munthe-Kaas H, Nøkleby H, Nguyen L (2019) Systematic mapping of checklists for assessing transferability. Syst Rev 8(1):22. https://doi.org/10.1186/s13643-018-0893-4 Nakano D, Muniz Jr, J (2018) Writing the literature review for empirical papers. Production 28: e20170086. https://doi.org/10.1590/0103-6513.20170086 Oakes G (1980) History and Natural Science. Hist Theory 19(2):165–168. https://doi.org/10.2307/ 2504797 Oxman AD, Guyatt GH (1988) Guidelines for reading literature reviews. Can Med Assoc J 138 (8):697–703 Petty R, Guthrie J (2000) Intellectual capital literature review: measurement, reporting and management. J Intellect Cap 1(2):155–176. https://doi.org/10.1108/14691930010348731 Pluye P, Gagnon M-P, Griffiths F, Johnson-Lafleur J (2009) A scoring system for appraising mixed methods research, and concomitantly appraising qualitative, quantitative and mixed methods primary studies in mixed studies reviews. Int J Nurs Stud 46(4):529–546. https://doi. org/10.1016/j.ijnurstu.2009.01.009 Popper K (1999) All life is problem solving. Routledge, London Ribeiro ÍJS, Pereira R, Freire IV, de Oliveira BG, Casotti CA, Boery EN (2018) Stress and quality of life among university students: a systematic literature review. Health Prof Educ 4(2):70–77. https://doi.org/10.1016/j.hpe.2017.03.002 Robinson OC (2011) The idiographic/nomothetic dichotomy: tracing historical origins of contemporary confusions. Hist Philos Psychol 13(2):32–39 Rowley J, Slack F (2004) Conducting a literature review. Manag Res News 27(6):31–39. https:// doi.org/10.1108/01409170410784185 Salvatore S, Valsiner J (2010) Between the general and the unique: overcoming the nomothetic versus idiographic opposition. Theory Psychol 20(6):817–833. https://doi.org/10.1177/ 0959354310381156 Schwandt TA (1994) Constructivist, interpretivist approaches to human inquiry. In: Denzin NK, Lincoln YS (eds) Handbook of qualitative research. Sage, Thousand Oaks, CA, pp 118–137 Shephard K, Rieckmann M, Barth M (2019) Seeking sustainability competence and capability in the ESD and HESD literature: an international philosophical hermeneutic analysis. Environ Educ Res 25(4):532–547. https://doi.org/10.1080/13504622.2018.1490947 Smythe E, Spence D (2012) Re-viewing literature in hermeneutic research. Int J Qual Methods 11 (1):12–25. https://doi.org/10.1177/160940691201100102 Snyder H (2019) Literature review as a research methodology: an overview and guidelines. J Bus Res 104:333–339. https://doi.org/10.1016/j.jbusres.2019.07.039 Spencer L, Ritchie J, Lewis J, Dillon L (2003) Quality in qualitative evaluation: a framework for assessing research evidence. Cabinet Office, London Steenhuis HJ, de Bruijn EJ (2006) Publishing in OM: does scientific paradigm matter? In Annual Meeting of the Academy of Management, Atlanta, CA, 11–16 August 2016. Strang KD (2015) Articulating a research design ideology. In: Strang KD (ed) The Palgrave handbook of research design in business and management. Palgrave Macmillan, New York, NY, pp 17–30
References
105
Tilden T (2020) The idiographic voice in a nomothetic world: why client feedback is essential in our professional knowledge. In: Ochs M, Borcsa M, Schweitzer J (eds) Systemic research in individual, couple, and family therapy and counseling. Springer International Publishing, Cham, pp 385–399 van Laar E, van Deursen AJAM, van Dijk JAGM, de Haan J (2017) The relation between 21st-century skills and digital skills: a systematic literature review. Comput Hum Behav 72:577–588. https://doi.org/10.1016/j.chb.2017.03.010 Wagenmakers E-J, Dutilh G, Sarafoglou A (2018) The creativity-verification cycle in psychological science: new methods to combat old idols. Perspect Psychol Sci 13(4):418– 427. https://doi.org/10.1177/1745691618771357 Walsh D, Downe S (2006) Appraising the quality of qualitative research. Midwifery 22(2):108– 119. https://doi.org/10.1016/j.midw.2005.05.004 Webster J, Watson RT (2002) Analyzing the past to prepare for the future: writing a literature review. MIS Q 26(2):xiii–xxiii Yin RK, Bingham E, Heald KA (1976) The difference that quality makes: the case of literature reviews. Sociol Methods Res 5(2):139–156. https://doi.org/10.1177/004912417600500201 Yin RK, Heald KA (1975) Using the case survey method to analyze policy studies. Adm Sci Q 20 (3):371–381. https://doi.org/10.2307/2391997 Zhou J, Li X, Mitri HS (2018) Evaluation method of rockburst: state-of-the-art literature review. Tunn Undergr Space Technol 81:632–659. https://doi.org/10.1016/j.tust.2018.08.029
Harm-Jan Steenhuis is Associate Dean and Professor of Management, International Business at the College of Business, Hawaii Pacific University. He has published three books and over 150 refereed articles, book chapters and conference proceedings on international operations, (international) technology transfer and related topics; strategic operations and global supply chains; methodology; and the interface of instructor and student learning. He has a special interest in the aviation industry and additive manufacturing. He is Editor-in-Chief of the Journal of Manufacturing Technology Management and the International Journal of Information and Operations Management Education, and reviews for several other journals. He has served on more than 25 conference scientific committees. He is board member of academic organisations, such as IAMOT and PICMET, and previously was on the Board of Directors of the Spokane Intercollegiate Research and Technology Institute. He participates in the Micro-economics of Competitiveness Network, run by the Harvard Business School’s Institute for Strategy and Competitiveness.
Chapter 4
Developing Review Questions
The point of a literature review as presented in Chapter 2 is to find out more about what is written about a specific topic, particularly to inform further study and to synthesise evidence across studies, but this generic aim needs more refinement. To reflect on what needs to be searched for will support the retrieval of sources and lead to more direction in the stage of analysis and synthesis. Therefore, an essential step for a critical analysis of sources is the development of appropriate questions for a literature review, which is the topic of this chapter. For this essential step towards critical analysis of studies, the chapter presents starting points and describes guidelines for developing review questions. It starts by looking at how research objectives and review questions differ in Section 4.1; also, the difference between research objectives and research questions is highlighted in this section. This is followed by Section 4.2, in which the formulation of good questions for a review is paid attention to; it mentions five recommendations for formulating and evaluating review questions. Section 4.3 provides generic starting points for review questions and also indicates examples of studies using these. And Section 4.4 goes into more detail about a common format for review questions in medicine and nursing, which can also be used in other disciplines; in addition, it discusses the use of models, theories, laws of observed regularities, etc. that support the development of review questions. This is followed by Section 4.5 in which the purpose of and the process for scoping studies and scoping reviews are presented; one of the aims of such studies is developing more focused questions for reviews. By giving thought what needs to be considered, resulting in review questions, literature reviews will become more directional and yield more relevant insight to their topics.
The original version of this chapter was revised: Co-author names have been included in the Springer Link for Chapter 4. The correction to this chapter is available at https://doi.org/10. 1007/978-3-030-90025-0_18 © Springer Nature Switzerland AG 2022, corrected publication 2022 R. Dekkers et al., Making Literature Reviews Work: A Multidisciplinary Guide to Systematic Approaches, https://doi.org/10.1007/978-3-030-90025-0_4
107
108
4.1
4
Developing Review Questions
Differentiating Research Objectives and Review Questions
Looking back at Figures 2.1 and 2.2 the purpose of a literature review is to find out what has been written about a specific topic with two intentions. The first intention is to find all relevant sources to identify and to ensure a contribution to knowledge. This is the case for all archetypes of literature reviews mentioned in Section 2.5: narrative overviews, narrative reviews, systematic literature reviews and systematic reviews. In the two latter cases the contribution to knowledge is found in the literature review itself. The second intention is to find knowledge that will inform empirical research. Even the two archetypes systematic literature reviews and systematic reviews aim to achieve this. Both intentions for the literature review will lead to the reformulation of research objectives into refined research questions, hypotheses or propositions, either as part of a study or informing a research agenda. This also means that literature reviews cannot be seen disconnected from empirical studies. However, the objective of a literature review is to critically evaluate literature so that future research can be directed to either addressing gaps in scholarly knowledge or increasing rigour or enhancing validity; see also Sections 2.3 and 3.4. This future research can be the empirical study of a project or other studies, particularly in the case of systematic literature reviews and systematic reviews. Robinson et al. (2011, pp. 1327–8) put forward four reasons for a research gap: • Information in extant literature is insufficient or imprecise, which is a first reason for further research. If no relevant studies are identified, or if only few studies are identified, or if sample sizes in available studies are too small to allow conclusions about the question of interest, then there is insufficient information. Subsequent empirical studies will then contribute to expanding scholarly knowledge. Imprecision occurs when theoretical constructs or variables are ill-defined, leaving the reliability and trustworthiness below acceptable levels. Further research using better defined constructs and variables will aim at recovering from this deficiency in scholarly knowledge. In the case that existing information derived from literature is insufficient or imprecise for a specific research objective, further research to fill this gap will be warranted. • The second reason for further research based on a literature review can be biased information. This can be the case when sampling of objects or subjects is not sufficiently representing the population for which conclusions are drawn. Or, it can also occur when not all effects and outcomes are measured. Or, when the possibility of falsification is missing in the design of the research methodology. It can also come to the fore that existing studies have not considered alternative theories or explanations. And, it could be that the dominance of specific research paradigm may have led research. Such variety of causes may contribute to literature gravitating towards specific conclusions that are not fully warranted; further research to counter this inclination contributes then to rigour, reliability and dependability.
4.1 Differentiating Research Objectives and Review Questions
109
• The third reason for further research is inconsistency or unknown consistency. Inconsistency of results can be a result of bias, but also of samples sizes, definition of variables and methods for analysis. Unknown inconsistency occurs when differences across relevant studies are noted but could not yet be explained or attributed to specific variables. In the first case, further research will look at how the impact of these variances can be better explained, whereas in the second case further studies will aim at problem-solving to find underlying explanations and causes or look at alternative explanations. • The fourth reason for further research is that the appropriate information is not available in extant literature from the perspective of a specific research objective. This could be that the available information is not clear whether findings are applicable to other contexts (generalisation of outcomes), the outcomes of other studies are not complete or the temporal dimension is limited (for example, not studying long-term effects). In this case, further investigations will direct at expanding scholarly knowledge to other contexts, complementing existing studies or inclusion of longer timescales. These four reasons for identifying research gaps show how literature reviews of any type inform empirical studies, whether part of the same project, other projects or agendas for further research. However, questions for a review aim at excavating extant scholarly knowledge to identify any of the four reasons for further research, whereas empirical studies work towards a contribution to knowledge likely building on one of the four reasons. This subtle difference has implications for formulating research objectives and review questions. Research objectives are more generic than review questions for the literature review as part of an empirical study; see Section 2.2, and Figures 2.1 and 2.2. In addition, review questions are part of a study determined by research objectives, unless the literature review is a stand-alone study, particularly a systematic literature review or systematic review; even in the latter case of a systematic review, an additional aim is often identifying research gaps. This means that research objectives are aimed at filling a research gap, whereas the primary aim of review questions is identifying appropriate knowledge to the topic at hand and finding such research gaps. A word of caution is in order for narrative overviews (see Section 2.5). In general, writings using this archetype of literature reviews tend to hardly change the research objectives. Even though a literature review of this type may result in more precise formulations of hypotheses or propositions for the empirical study. Also, this kind of literature review is subject to selection of studies by the author(s), which leads to publication bias; thus, the review considers only those studies that support the point of view. This implies that narrative overviews as part of empirical studies possibly contribute less to scholarly knowledge than other archetypes of literature reviews, or at least differently.
110
4.2
4
Developing Review Questions
What Are Good Questions for a Literature Review?
Developing good questions for a literature review carries some parallels to establishing research objectives; see Figures 2.1 and 2.2 for the latter. Based on similarities the next subsections provide more detail on what to consider when formulating questions that provide direction to a literature review.
4.2.1
Guiding Collection and Analysis of Literature
A first point to look at when developing questions for a literature review is that such questions should direct the collection of literature. Thus, the nature of the question should clarify what types of literature should be consulted. This could be formulated in terms of the empirical evidence or perspectives by relevant actors for a specific research objective. When the review question does not clarify this point, then more literature needs to be considered than strictly necessary; this could lead to drifting away from the research objective or to more convoluting conjectures, because a too broad base of diverse information needs to be considered. A well-formulated review question averts digressing from literature that needs to be reviewed and implicitly introducing a different scope. Furthermore, a review question should indicate what needs to be found out from the sources that were retrieved. It means that the review questions should be stated in such specific terms that it makes it possible identifying relevant information in existing sources. Because most research investigates causal relationships a change of state in an object (objects) or subject (subjects) should be part of the review question or a performance criterion should be present. Both changes of state and (performance) criteria facilitate a critical evaluation of literature found.
4.2.2
Single Guiding Question as Point of Reference
To support the collection of relevant sources and to direct the analysis, it is advised to have a single question guiding the literature review, as a first recommendation for review questions. The focus on a single guiding question avoids drift, changes in the scope and publication bias when searching for answers in existing literature. Also, it will unlikely lead to a fragmented discourse in the writing, because multiple review questions may make it harder to define what the actual focus of the literature review is. Sometimes a single guiding question may have corollaries (closely related sub-questions); however, care should be taken that these sub-questions are closely related and do not digress a discussion of retrieved literature. In general, such a single guiding question should also not be a multi-part question, as they may result in relatively short answers.
4.2 What Are Good Questions for a Literature Review?
4.2.3
111
Narrowly Focused
Even with a single guiding question, attention should be paid to it being narrowly focused, a second recommendation to be considered for developing review questions. In general, broader set questions will lead to lesser depth of the analysis. This means that a trade-off between breadth and depth needs to be made in a literature review. However, particularly when considering a systematic literature review or systematic review a review question needs to be narrowly focused. Such focus can be achieved by defining the phenomenon that is going to be investigated in the literature review, by specifying the set of objects or subjects (population) and by describing outcomes. Also, it could be helpful to consider factors that are influencing the phenomenon. These considerations for the focus of review questions mean most likely that polishing and reformulation is needed before the actual literature review can start. In this sense it is also helpful setting boundaries for the literature review. These boundaries define what to take and what not to take into account when looking at literature; these boundaries appear in Chapter 6 as inclusion and exclusion criteria. By explicitly establishing the boundaries for a specific literature review, it becomes clear what is relevant to consider. A case in point is that a specific domain for the application of a theory is considered; although the research could be inspired by the broader application of such a theory, the introduction of a specific domain will lead to a review being more in-depth by accounting for specific characteristics of application to the domain. At the same time, setting such boundaries also leads to limitations of a literature review, because perhaps not all relevant conjectures and findings from other studies are examined. NOTE: BROADER FORMULATED REVIEW QUESTIONS In case of broader formulated review questions the corollaries (closely related sub-questions) should be narrowly focused. The seeking for answers in the review for these corollaries will then lead to the depth of the analysis and contribute to a more coherent synthesis; that is if the corollaries are sufficiently related.
4.2.4
Clarity of Good Review Questions
To achieve clarity of a guiding review question and its corollaries, another recommendation to consider is that well-defined terms should appear in its formulation. If terms are indistinct, then this could lead to finding sources for a literature review that are not or hardly related to a chosen topic. Consequently, this leads to poorly grounded inferences and findings in addition to questions that may be raised about rigour. Also, indistinct review questions will lead to inefficient search strategies, intensifying the efforts to find relevant publications and increasing the chances that relevant studies and sources are overlooked. Thus, defining, or
112
4
Developing Review Questions
alternatively describing, terms used in a literature review is a necessary step for well-formulated review questions. One aspect is that terms should normally be derived from extant literature. An example of undesirable use of terms is a submission to a journal that contained the term ‘in-house outsourcing’ (note the contradiction in terms), submitted almost three decades after research into outsourcing work to other companies and moving facilities for companies to other countries had an upsurge. In this long-standing literature about what is called offshoring and outsourcing, the specific term used is ‘captive offshoring’. Such use of incorrect terms has two effects. First, it shows that the authors most likely have either not read literature related to the topic or not understood the writings by others (see also Section 2.1 about literature sensitivity). Second, it puts into doubt the actual contribution the authors are making, because they have insufficiently engaged with extant literature. Only when terms in literature are not covering sufficiently a phenomenon, then it may be helpful to introduce new terms. However, these should be defined and explained. Sometimes, it helps to look into other disciplines that may investigate the same topic and lend terms from those. Therefore, literature sensitivity will avoid use of non-existing terms and align a review better with extant literature.
4.2.5
Assuming Possibility of Different Outcomes or Opinions
The formulation of a review question should also make it possible to find different than expected outcomes and opinions, which is the fourth recommendation for review questions. If not, the outcomes and opinions are known on beforehand, and such implies that the literature review does not add any contribution to scholarly knowledge. Such can appear in the form of self-evident statements or tautologies in terms of logic. An example of a self-evident statement would be that ‘smaller firms have less resources than larger firms.’ An instance of a tautology is that ‘a ball is green or a ball is not green.’ When developing review questions, the evaluation whether these are self-evident or tautological may be more intricate than the examples provided here. This implies that how terms used in review questions and how they are related should be scrutinised to avoid any degree of self-evidence or tautology. This argument for appropriate review questions in terms of assuming the possibility of different outcomes or opinions is closely linked to the principle of falsification. Applied to literature reviews this principle means that a criterion of demarcation is needed to distinguish those statements that can come into conflict with observations and those that cannot, but the criterion itself concerns only the logical form of the theory. A case in point is the use of theories for finance in supply chains by Dekkers et al. (2020b); all five theoretical frameworks used in the study incorporate exchange relationships, and therefore, demarcation between the theories
4.2 What Are Good Questions for a Literature Review?
113
is not possible on this point, both in the literature review and the empirical study in the work. This means that review questions should incorporate that extraction of results, statements and findings from studies contains classes a priori that can refute theoretical foundations or discover inconsistencies. Therefore, examining whether the possibility for falsification, i.e., different outcomes and opinions, exists for the review questions will actually lead to more direction when appraising literature. Box 4.A Example of Review Question Using the Five Recommendations An example of a review question capturing the five recommendations mentioned in Section 4.2 is found in the work of Leary and Kowalski (1990, p. 35) about impression management; it refers to the process by which people control the impressions others form of them, which play an important role in interpersonal behaviour. Single Guiding Question as Point of Reference The review question was to reduce the multitude of variables that affect impression management to the smallest possible set of theoretically meaningful factors. This single guiding question had two corollaries, because they split impression management into two distinct processes: impression motivation and impression construction. Narrowly Focused Particularly, the focus is well-defined, because the authors focus on reducing the factors for impression management, from a theoretical point of view. Clarity of Good Review Questions Although impression management was a settled strand of research, the review question is followed by an explanation of the two processes for impression management. Also, they (ibid., p. 36) indicate that this distinction was not made in conceptual analysis before the review. Assuming Possibility of Different Outcomes or Opinions Preceding the review question, there is a discussion (ibid., p. 35) about the scope of the review and that the review is limited to self-presentation (to others, as the review notes). Building on Sound Assumptions When discussing the scope of the review, the authors also mention that private self-motives may play a role in impression management; they make it clear that they view self-presentation to others (i.e., impression management) and private self-motives as distinct concepts. At the end of the article, when they present their model with three central factors that determine impression motivation and five central factors that determine the mode of impression construction, they (ibid,. pp. 43–4) also connect their model to private self-motives.
114
4.2.6
4
Developing Review Questions
Building on Sound Assumptions
Related to the possibilities of different outcomes and opinions are the considerations of assumptions used for theories, conceptualisations, methods and artefacts. Such assumptions may occur within specific schools of thought (Alvesson and Sandberg 2011, p. 254); it is likely that these schools of thought within a domain poorly refer to each other. Another possibility is that assumptions are shared across schools of thought and deemed necessary for advancing research (ibid., p. 255). Thus, reflecting on which assumption review questions are built is the fifth recommendation for formulating review questions. NOTE: USING FIVE RECOMMENDATIONS FOR DEVELOPING AND EVALUATING REVIEW QUESTIONS The five recommendations mentioned in this section can be used for both developing and evaluating review questions. Box 4.A is an example of how a literature review followed these criteria, albeit in the writing in an implicit manner. TIP: (TOO) BROAD REVIEW QUESTIONS When review questions are broad, it is recommended to generate more specific questions. Starting point for more specific questions are subsets of objects or subjects, more specific relationships between objects or subjects, more specific variables, criteria, etc. Sometimes it can be helpful to ask peers to generate these more specific questions in a group setting. When during the search for literature, it appears that the review question is too broad, the enhancement ‘successive fractions’ for the iterative search strategy (see Section 5.3) could be helpful.
4.3
Starting Points for Review Questions
As a starting point for finding and developing review questions, Davis (1971) has written an interesting paper on how to find topics for research, on which some of the additional guidance is based. Following his thoughts, in this overview a starting point can also be defined by its opposite; in terms of Davis (ibid., p. 313), an interesting research question is one in which an accepted phenomological presumption is negated. This requires ontological considerations as well. However, not all points of Davis’ overview and his opposite propositions are captured in the subsections that follow now; examples of review questions based on the starting points discussed now are found in Table 4.1.
4.3.1
From Generic to Specific and Vice Versa
One possibility to investigate literature (or undertake an empirical study) is to look at whether conceptualisations or scholarly knowledge about a generic phenomenon
Which interventions bring about specified effects? Which assumptions have underpinned research and to what extent are they valid?
Across research methods for phenomenon Z is there consistency of results?
Rigour and reliability
Are settings in spaces E and F for phenomenon Y similar? Has phenomenon G changed over time? Which artefacts or methods or tools are achieving specified effect?
Setting and influencing guidelines and policy Assumptions
Temporal Artefacts, methods and tools
Theory testing and falsification Spatial
Do findings from case C apply to case D?
From one case to another Establishing causation
• Estimation of reuse of syinges and needles in the absence of sterilisation across 10 regions using literature (Hutin et al. 2003) • Impact of daylight saving time on energy consumption (Aries and Newsham 2008) • Evaluation of models for rockburst in tunnels, shafts, caverns and mines (Zhou et al. 2018) • Appraisal of tools for diagnosing depression can be used for diagnosis in primary care (Nabbe et al. 2017) • What theoretical concepts of the policy studies’ discipline can contribute to multidisciplinary energy research (Hoppe et al. 2016) • Reactions by employees to the practice of talent management (De Boeck et al. 2018) • Investigation into unverified assumptions, failing to notice local needs and pro-solution bias for the adoption of solar cookers in Sub-Sahara Africa (Iessa et al. 2017) • Rigour of grounded theory as research method for supply chain management (Denk et al. 2012) • Reliability of models for prediction of faults in code of softare (Hall et al. 2012)
• Identifying research issues for the application of the methodologies for systematic reviews to software engineering (Brereton et al. 2007) • Validity of the dichotomy exploitation versus exploration for the domain of innovation management (Zhou 2020) • Application of transformational leadership theory from domains such as organisational psychology, health care and promotion, and education to coaching in youth sport (Turnnidge and Côté 2018) • Actual effect of interventions based on lean thinking, derived from manufacturing, on outcomes in healthcare (Moraros et al. 2016) • Overview of independent and dependent variables for decisions on and outcomes of business process outsourcing (Lacity et al.. 2014) • Application of agency theory to supply chain management (Fayezi et al. 2012) • Validity of technology acceptance model, measuring behavioural intention, for actual usage (Turner et al. 2010) • Impact of tests of the theory transaction cost economics (Carter and Hodgson 2007)
Does A apply to case (or context) X?
Do findings from case (or context) B apply to a wider range of cases (or contexts)?
Examples
Typical formulation
From specific to generic
Type of review question From generic to specific
Table 4.1 Examples of questions guiding literature reviews.
4.3 Starting Points for Review Questions 115
116
4
Developing Review Questions
applies to specific conditions, contexts or situations. The specificities of a condition or situation may lead to insight whether a more generic conceptualisation or scholarly knowledge holds. In the context of literature reviews, Brereton et al. (2007) is a case in point, when they investigate how the generic principles for systematic reviews, though mostly based on those in the domain of medicine, are applied in the domain of software engineering; one of their conclusions (ibid., p. 575–6) is that specifying appropriate review questions is of utmost importance to successfully conducting a literature review. Another is the systematic literature review by Zhou (2020, pp. 29–71), in which the validity of an accepted, well-known dichotomy—exploitation versus exploration—by March (1991) is investigated for the domain of innovation management. One conclusion is that this dichotomy has been widely accepted in academic works but not challenged on its premises for this domain. These two examples show literature reviews about applying conceptualisations or scholarly knowledge to a specific domain will yield new insight that leads to further research. Also, the reverse is possible when looking at whether a phenomenon observed in a specific situation can be generalised to a wider range of conditions, contexts or situations. This requires comparing characteristics of these conditions, contexts or situations with the original specific situation. An example is the systematic literature review by Turnnidge and Côté (2018) in which they look at the application of the theory of transformational leadership to sport coaching of youth; they draw on literature from the domains organisational psychology, health care and promotion, and education (ibid., p. 330) to establish this generalisation. A specific instance of this generalisation is when the application of a concept in one domain is investigated for another domain. This is the case for the literature review by Moraros et al. (2016) when they look at lean thinking in healthcare; lean thinking is a concept originating in manufacturing industries with a particular focus on quality improvement and performance. Thus, Moraros et al. investigate whether the application to healthcare has affected patient satisfaction, health outcomes, financial performance, worker satisfaction, patient flow and safety. After looking at 22 studies, they (ibid., p. 163) state that, even though some may believe otherwise, evidence does not support improvements often claimed for the application of lean thinking to healthcare settings. These two examples show how generalisation could be examined in literature reviews.
4.3.2
Establishing Causation
Another starting point is whether and how variables, factors or aspects are related to a phenomenon. An example is the systematic literature review by Lacity et al. (2015) about decisions on and outcomes of business process outsourcing. The analysis of retrieved studies results in an overview of independent and dependent variables (ibid., p. 186), which can be used by further studies investigating the
4.3 Starting Points for Review Questions
117
effectiveness of outsourcing decisions. This example shows how causation between independent and dependent variables or factors could be the focus of a literature review. This type of review could be also based on prior knowledge. A case in point is the work by Fayezi et al. (2012) into the application of agency theory (aka principal-agency theory and theory of agency) to supply chain management. After examining 19 publications, one of their outcomes (ibid., p. 566) is that agency theory can be used to inform contractual responses to outcome and behaviour uncertainty of agents (or principals) within supply chain relationships. Thus, the evaluation of theories, conceptualisations, methods, tools and frameworks can be also a starting point for establishing causation through literature reviews. Also, it is possible to look at variables that are thought to be related, but are in fact not. This could be caused by mediating variables or other variables that have a simultaneous causal relationship to variables and contingencies. An example is this type of systematic literature review is the work by Turner et al. (2010) looking into the well-known technology acceptance model, which measures behavioural intention to adopt new technology by users, but not its actual usage. After analysing 79 relevant empirical studies, they (ibid., p. 471) find that perceived usefulness, and particularly, perceived ease of use, are predicting actual technology use very well; both variables are part of the four variables identified in the original conceptualisation of the model. Thus, synthesising findings across studies on a specific topic could also demonstrate that variables in an original theory or conceptualisation are not related.
4.3.3
Testing and Falsification of Theories
Another possibility is for review questions is testing the validity of theory. An instance is the narrative review by Carter and Hodgson (2006) on the validity of the well-established theory of transaction cost economics, which is also used in business and management studies. Looking at influential tests of this theory, 27 studies in total, they (ibid., pp. 473–4) conclude that only a limited number of studies are congruent with the original theoretical framework of transaction cost economics and that it has not been tested against rival theories. Such reviews looking into specific theories not only lead to whether a theory is valid, but also indicate further research to should be undertaken in order to find out whether other theories provide also valid explanations. A remark should be made about inductive logic and falsification in the context of testing theories through literature reviews; this applies to conceptualisations, methods, etc., too. Inductive logic refers the hypothetico-deductive research cycle (Popper 1966, pp. 52–55; his thoughts are rooted in Selz’s (1913, p. 97) writing.) and literature reviews can contribtute to forming and testing theory. When
118
4
Developing Review Questions
abberations of theories are found during literature reviews, it could lead to forming tentative theory. An example is the mention of the technology acceptance model in the subsection 4.3.2; the finding that two core variables are not related to actual usage of technology implies falsification. As such, this finding should lead to a modification of the technology acceptance model, forming a new model or investigations whether alternative models are more appropriate. Similar to Carter and Hodgson (2006, pp. 473–4), also Dekkers et al. (2020a, p. 25) find that competing theories that inform decision making on outsourcing have been insufficiently explored. Therefore, literature reviews that evaluate the applicability of theories and conceptualisation can lead to their modification, new theories and conceptualisations, or conclusions with regard to competing theories and conceptualisations.
4.3.4
Considering the Spatial Dimension
A further starting point for a literature review can be whether a phenomenon in a particular space can also be observed in another space; or alternatively, whether the phenomenon manifest itself in a similar manner in another space. For instance, Hutin et al.’s (2003) review considers estimations derived from studies about reuse of syringes and needles in the absence of sterilisation across ten clusters of countries. In the context that unsafe healthcare injections can transmit bloodborne pathogens, they (ibid., p. 1075/5) find that overuse of injections is common in developing and transitional countries, and reuse of injection equipment in the absence of sterilisation occurs in almost one in three injections in these countries. This instance show how phenomena can be studied in literature reviews by using the spatial dimension.
4.3.5
Considering the Temporal Dimension
An additional starting point for literature reviews is found in considering the temporal dimension. A phenomenon can manifest itself differently across a period, or the consideration of time may lead to changes in behaviour, patterns, etc. A case in point of such a literature review is the study by Aries and Hewsham (2008), who look into studies that have investigated the impact of daylight saving time on energy consumption. They (ibid., p. 1864) find that limitedly is known, and what has been studied is outdated and based on assumptions; furthermore, they note that behavioural patterns have changed across four decades of daylight saving time without being accounted for in research. This work indicates how research could look into the temporal dimension to work out consistency of findings, changes in patterns and further research.
4.3 Starting Points for Review Questions
4.3.6
119
Artefacts, Methods and Tools
The evaluation of performance of artefacts, methods and tools is also a worthy topic for undertaking a literature review. In this regard, an interesting example is the comprehensive literature review by Zhou et al. (2018) into models for predicting the phenomenon of rockburst occurring in underground tunnels, shafts, caverns and mines. In its concluding section (ibid., pp. 654–5) the study differentiates itself from earlier literature reviews, indicates recent modelling techniques (such as machine-based learning), presents an overview based on the timeline of development and indicates the trade-off between accuracy, speed and complexity for the most appropriate models. Such reviews lead to more insight when particular artefacts, methods and tools can be used for a specific phenomenon in terms of investigations and application to processes of new product and service development, design and engineering of various objects. In establishing how and when artefacts, methods and tools can be used most productively, comparing plays a key role. By way of illustration the study by Nabbe et al. (2017) looks into which tools for diagnosing depression can be used for diagnosis in primary care. One of their conclusions (ibid., p. 104) is that there is need for further research on reliability and ergonomic data for these tools in order to define the best tools in terms of efficiency, reproducibility, reliability and ergonomics (easy-to-use) for collaborative research in primary care and psychiatry. Comparing artefacts, methods and tools by means of literature reviews plays a key role in establishing effectiveness and how they can be used. Also, shifting boundaries for performance or applicability of artefacts, methods and tools is part of this starting point for literature reviews. A case in point is the work by Chen (2012) on the application of agent-based modelling to architectural design and urban studies; this type of modelling of complex systems aims at detecting emerging patterns. The review includes an overview of programming platforms (ibid., p. 173), and applications to urban studies (ibid., pp. 169–70) and architectural design (ibid., pp. 170–1) are discussed. Although the article asserts in the introduction a long history of the application of computer modelling, it also leads to placing agent-based modelling in its modern-day context and evaluating its performance. In this respect, new domains for the application of artefacts, methods and tools, and evaluations of their performance could be the topic of literature reviews.
4.3.7
Setting and Evaluating Policy
Determining and evaluating (public) policies is also a suitable topic for literature reviews. This could lead to evidence-based formulation of policies. A well-known example in economic policy is the so-called Bolton Enquiry (Bolton 1971). In addition to data about the relevance of small firms to the economy of the United
120
4
Developing Review Questions
Kingdom, it also provides insight into existing works. A case in point for these other works is the referral to a study by Christopher Freeman at the University of Sussex for measuring innovation in firms related to size (ibid., p. 52).1 Another example of this topic for a review is the work by Hoppe et al. (2016). They seek how policy studies can contribute to multidisciplinary energy studies’ research, and in how far research on energy policy actually uses the concepts of policy studies. At the end of the article (ibid., p. 22), they conclude that the discipline of policy studies offers a wide array of concepts, heuristics and methods that can be of help to assist energy researchers and energy policy makers in the development of models for the interrelation between stakeholders’ interests and public interest; this extends to knowledge (and research agendas) on policy diffusion, evidence-based policy and responsible innovation policy. These two works exemplify how literature reviews can evaluate and inform (public) policies.
4.3.8
Investigating Assumptions
Another starting point for looking at literature is the assumptions that are embedded in theories, laws of observed regularities and conceptualisations. It involves testing ideas that are assumed to be true by scholars for theories, conceptualisations and empirical studies. A case in point is the study by De Boeck et al. (2018) into the reactions by employees to the practice of talent management. This systematic review finds that the basic assumption that those selected for programmes for talent management will always react positively to them, whereas those not selected will react negatively appears, at best, to lack nuance and, at worst, to be simply incorrect (ibid., p. 211). They also intimate that although social exchange theory provides an appropriate framework, scholars in talent management ignore the role of uncertainty in this theory. Another work by Iessa et al. (2017) looks at why the actual benefits of solar cookers, advocated as beneficial for developing countries in Sub-Sahara Africa, are reported as being only modest. They find pro-innovation bias, which is only occasionally contested by other scholars, a solution bias and ignorance of local needs in studies (ibid, p. 103). These selected works into challenging assumptions found in empirical research show how this type of research can be conducted. For making assumptions more explicit, Alvesson and Sandberg (2011, pp. 255– 60) introduce a typification: • In-house assumptions. In-house assumptions exist within a particular school of thought in the sense that they are shared and accepted as unproblematic by its advocates.
1
Note that the report by Freeman, referred to by Bolton (1971, p. 52), could not be found and is not dated either in the inquiry. Professor Christopher Freeman (1921–2010) was a well-known scientist on innovation policy at Science Policy Research Unit, located at the University of Sussex.
4.3 Starting Points for Review Questions
121
• Root metaphors. These assumptions are associated with broader images of a particular subject matter. • Paradigmatic assumptions. Ontological, epistemological, and methodological assumptions that underlie a specific literature can be characterised as paradigmatic assumptions. • Ideological assumptions. Such includes various political-, moral-, and gender-related assumptions held about the subject matter. • Field assumptions. These assumptions, sometimes also called domain assumptions, are a broader set of assumptions about a specific subject matter that are shared by several different schools of thought within a paradigm, and sometimes even across paradigms and disciplines. Although their article is written for business and management studies, it applies to wider domain of disciplines, and thus, could provide inspiration for literature reviews.
4.3.9
Rigour and Reliability
Also, rigour and reliability can be studied in literature reviews. An instance is the study by Denk et al. (2012) into the rigour of studies on supply chain management using grounded theory as research method. They (ibid., pp. 756–7) conclude that researchers should have knowledge about the methodological roots of grounded theory with the aim to increase trustworthiness. They find researchers often violate one or more of the analytic tenets of the methodology. In addition, they (ibid., p. 757) call for increasing transparency when researchers describe their research; particularly, this can be done by addressing six dimensions (emergence and researcher distance; theory development; specific, non-optional procedures; core category; coding procedures; evaluation criteria) to reflect a deep knowledge about the applied methodology. A second example is the systematic literature review by Hall et al. (2012) into the reliability of models for predicting faults in code found in software. This leads them to the conclusion that most studies report insufficient contextual and methodological information to enable full understanding of a model (ibid., p. 1292); this makes it difficult for potential model users to select a model to match their context. And in addition, they present their own criteria to make these models more useful based on prediction, context, model building and data (ibid., pp. 1278–81, 1292–3). These two cases show how literature reviews on rigour and reliability can contribute to scholarly knowledge, and in some studies, can have practical value, too. NOTE Although many examples provided in this section take the form of systematic literature reviews and systematic reviews, insight can also be provided by narrative reviews. A literature review in point is Liao et al. (2017) when they look at consumer preferences for the purchase of electric vehicles. They use a conceptual
122
4
Developing Review Questions
model (ibid., p. 254) for reviewing literature on financial attributes, technical attributes, infrastructure attributes, policy attributes, dynamic preferences and factors for heterogeneous preferences; for conceptual models and their relation to literature reviews see Section 4.4. This informs the conclusion that for some attributes the retrieved studies reach the same conclusion, but for others not (ibid., p. 268). Also, a research agenda is presented. Another instance is the literature review by Jacobi (1991) on mentoring of undergraduate students and attainment. It notes a lack of coherent definitions across studies (ibid., pp. 506–8), which hinders coherent research findings, and notes that neither empirical nor theoretical research has kept pace with programme development at educational institutes (ibid., p. 526). These two extensive literature reviews show that making advances for both scholarly knowledge and practical value is not limited to systematic literature reviews and systematic reviews. TIP: SOURCES FOR REVIEW QUESTIONS • How to think about research objectives, and thus, also about review questions, can be found in few publications as source of inspiration. In addition to Davis (1971), these publications, albeit written with specific domains in mind, are valid for a wide range of domains: • Alvesson and Sandberg (2011). • Chow and Harrison (2002). • Lewis and Grimes (1999). • Another source for review questions are narrative reviews, systematic literature reviews, systematic reviews and propositional papers. • Also in concluding sections of academic publications often suggestions for further research can be found.
4.4
Population-Intervention-[Comparison]-Outcome
A format that is commonly used in systematic reviews for evidence–based practice and furthering research, such as in medicine and nursing, is population-intervention-outcome; it is applicable to other disciplines, too. The next subsections describe this format and how to use it.
4.4.1
Root Format Population-Intervention-Outcome
The format population-intervention-outcome (aka by its abbreviation PIO) uses three terms to describe a review question. The term ‘population’ stands for the objects or subjects of a study; sometimes, ‘participant’, ‘patient’ or ‘problem’ is
4.4 Population-Intervention-[Comparison]-Outcome
123
Population
Intervention
Outcome
Childbirth
Doulas
Attitudes
Fig. 4.1 Example of population-intervention-outcome. In this hypothetical case for a systematic literature review, the population is women giving childbirth, the intervention is the support by a doulas and the outcome is the attitude of women towards this type of support.
used instead of population. It should include relevant characteristics and information that may influence the outcome. The ‘intervention’ is an induced change of state of a subject or object. The ‘outcome’ represents the effect of the intervention in the specified population. An example of population-intervention-outcome could the attitude towards lay support during labour; see Figure 4.1. In this example doula stands for a trained companion, typically without formal obstetric training, who supports another individual through a significant health-related experience, such as childbirth. Note that the description of the population and intervention should be specific. However, bearing in mind that if any or both are described too narrowly, it may be difficult to find relevant studies or sufficient data to reach reliable conjectures and conclusions. NOTE The impact of the intervention, being a change of state in the objects or subjects of study, can be related to generic concepts of systems theories. Anderson et al. (2011, p. 34) explicitly make references to systems thinking by mentioning elements and relationships. In Dekkers (2017, pp. 32–4, 117–9) the state of a systems and its elements is described as a change in properties or a change in the relationships between elements and the environment of a system.
4.4.2
Format Population-Intervention-ComparisonOutcome and Other Variants
The format population-intervention-outcome has been extended in four ways, see Table 4.2, with the first addition being comparison; sometimes it is called comparator or control. Known by the acronym PICO, its extension refers to a pre-intervention, post-intervention or control group used for developing the review question; control groups can also be intragroup, and between groups. This means that the main alternative or alternatives are considered for the intervention. In the case of the example in Figure 4.1 it would be obstetric care. The adding of a comparison allows to determine how effective the intervention is. A second extension to population-intervention-outcome is time (or timing), known by its acronym PICOT. It reflects the period over which the outcomes are assessed. Using the same example in Figure 4.1, this could include the attitude
124
4
Developing Review Questions
Table 4.2 Common extensions to population-intervention-outcome. In this table the basic format population-intervention-outcome for review questions is compared with some of its variants: population-intervention-comparison-outcome (PICO), population-intervention-comparisonoutcome-time (PICOT), population-intervention-comparison-outcome-study (PICOS) and population-intervention-context (PICo); note that the latter two are also known by other acronyms. Acronym PIO
Extension
Description
Examples of studies Risk associated with children entering public care (Simkiss et al. 2013, p. 629): • Population: infants, children, adolescents, young person • Intervention: Association, determinant, epidemiology • Outcome: public care, ‘looked after’, foster care, out-of-home care, substitute care
PICO
Comparison (comparator, control)
Addition of alternative interventions or control groups
Review into cost estimation studies for software engineering (Kitchenham et al. 2007, p. 318): • Population: software, Web, project • Intervention: cross-company, project, effort, estimation, model • Comparison: single-company, project, effort, estimation, model • Outcomes: prediction, estimate, accuracy
PICOT
Time
Addition of change
Adding aprepitant to antiemetic therapy for ostoperative nausea and vomiting prophylaxis (Milnes et al. 2015, p. 407): • Population: adult patients undergoing general anesthesia • Intervention: aprepitant • Comparison: other antiemetic therapy or a placebo • Outcomes: ostoperative nausea and vomiting • Time: post-operatively
PICOS*
(Type of) study
Addition of specific types of research methods and designs of research methodology
Action plans for asthma provided by emergence departments (Villa-Roel et al. 2018, p. 189): • Population: adults presenting to emergency departments with asthma exacerbations • Intervention: emergency deparment directed educational interventions involving provision of individualised asthma actions plans • Comparison: usual care • Outcomes: reduction of the proportion of asthma relapses after an asthma exacerbation • Study: randomised controlled trials
(continued)
4.4 Population-Intervention-[Comparison]-Outcome
125
Table 4.2 (continued) Acronym PICo or PICOC**
Extension Context
Description Addition of specific circumstances or settings
Examples of studies Interventions supporting self-management of life tasks for youth with high functioning autism spectrum disorder (Munsell and Coster 2018, p. 3): • Population: middle school or high school aged youth (aged 11–18 years) who have a diagnosis of autism spectrum disorder • Intervention: aiming at developing the ability to self-manage daily life tasks • Cotext: interventions selected for review either took place in specified school settings * The addition of type of study also appears as PICOTS (combined with time) ** Also the acronym PICO (population-intervention-context-outcome) is used
before and childbirth or how the attitude has changed for the past decades. The addition of the temporal dimension allows studies to include how developments have influenced outcomes. In a generic manner, this could be (scholarly) knowledge about phenomena, changes in culture, legislation, new methods, discoveries, etc. Therefore, the inclusion of time in the format allows to create a more dynamic view on a topic. A third extension of is the type of study that is considered in the review, commonly known by its acronym PICOS. This refers to specific methods used in the study or the design of the research methodology. An example of the latter is the randomised controlled trial, common in medicine. For the example in Figure 4.1 about childbirth it could mean the inclusion of only qualitative studies. Note that sometimes the acronym PICOT is used, where the ‘T’ stands for ‘type of study’; for clarity, this has not been followed here to avoid confusion where the acronym stands for the extension with time. The inclusion of the type of studies in the format could be related to the nature of the review question, the type of evidence sought or the quality of the evidence. Sometimes context is added as a fourth extension; this is described by Squires et al. 2013 (p. 1216) as a ‘fifth consideration to the framework’. Examples are a particular region, country, group of countries, a particular setting (public versus private sector employment) or type of environment (hospitals). Stern et al. (2014, p. 54) state that it is perhaps better to use PICo (population-intervention-context) for qualitative reviews, because these focus on the engagement between a participant in a study with the intervention; they argue that quantitative reviews isolate the intervention from the happenings and influences of the participants. In the case of the example in Figure 4.1, the context could be national cultures that may influence the attitudes towards lay support for childbirth. Thus, the context provides an
126
4
Developing Review Questions
additional dimension to the format population-intervention-outcome for forming review questions. NOTE: LIMITATION AND MORE VARIANTS OF POPULATIONINTERVENTION-OUTCOME The use of the format population-intervention-outcome for literature reviews assumes that all reporting in existing studies is sufficient coherent and comparable; this is not always the case. For example, Vereenooghe et al. (2018) find that for their study about interventions for mental health problems in children and adults with severe intellectual disabilities a quantitative systematic review could not be conducted. According to them (ibid., p. e02911/4), this is caused by the incomparability of studies in terms of study design, interventions and outcomes. More specifically they state that no study addressed the same mental health problem using a similar intervention. This example indicates that using too specific terms in format for review questions may lead to limiting the scope of studies identified during a literature review and that working with a format does not directly lead to coherence and comparability across the available evidence for a given topic. In addition to the four extensions to the format population-intervention-outcome presented here, more variants exist. For example, Davies (2011) discusses additional variants of population-intervention-comparison-outcome and some other formats, such as ECLIPSE (expectation-client group-location-impact-professionals-service) and SPICE (setting-perspective-intervention-comparison-evaluation). And, Cooke et al. (2012) put forward the format SPIDER (sample-phenomenon of interest-design-evaluation-research type) targeting reviews consulting qualitative and mixed methods research. Furthermore, Booth et al. (2019, pp. e001107/4–5) propose for reviews into complex interventions with qualitative evidence the format PerSPEcTiF (perspective-setting-phenomenon-environment-comparison-time-findings). Therefore, researchers should consider what type of studies will be appraised in order to use an appropriate format for the review question. TIP: USE FORMAT OF PIO TO CREATE TITLE OF STUDY Eldawatly et al. (2018) indicate that it is beneficial for citations by other works to include the elements of the format population-intervention-comparison-outcome in the title of published studies. This highlights the relevance of appropriate review questions and clarity about what studies are about. The advice applies to titles of studies in all cases, including empirical studies.
4.4.3
Enhancing Population-Intervention-Outcome by Using Models
How to arrive at an appropriately descriptive format, whether population-inter vention-outcome (PIO), population-intervention-comparison-outcome (PICO) or any of its variants, can be derived from a logic model. Anderson et al. (2011) make
4.4 Population-Intervention-[Comparison]-Outcome
127
this case; they see logic models as a graphical representation of elements and relationships in a system (ibid., p. 34). Particularly, their attention goes to conceptual models and causal diagrams. An example they (ibid., p. 37) used is found in Figure 4.2; as can be seen the model informs in more detail the format population-intervention-outcome underpinning the review question. The detailed ‘thinking through’, description of processes and identification of contexts it involves are valuable in understanding and describing an event, intervention, phenomenon, policy, practice or treatment, eventually moving beyond statements of whether something works towards a richer explanation of why and when it does or does not work. The richer description facilitated by the inclusion of logic models leads to more detailed statements than simple statements about effectiveness or ineffectiveness, or about the need for more evidence. Thus, logic models could be used to inform the conceptualisation and development of systematic reviews, particularly for setting out review questions and the analysis of studies. An additional way of looking at and creating these models is found in Dekkers (2017, p. 72), based on the previously mentioned relationship to systems theories by Anderson et al. (2011, p. 34). In this overview, models are used for problems related to specified (sub)systems and aspects; this division into subsystems and aspectsystems is shown in Figure 4.3. Aspects describe specific type of
Intervention • Programme goals and activities • Programme implementation • Programme structure and resources • Programme staff • Collaborations between school and community • Student attendance
Immediate changes • Student supervision • Student academic support • Participation in enhancing activities
Intermediate outcomes Behavioural • Homework completion • Television viewing • School attendance • Effort • Discipline • Risk-taking behaviour Social/emotional • Safety • Self-esteem • Attachment to school • Improved peer and adult interactions • Future aspirations
Longer-term academic outcomes • Grades • Test scores • Educational attainment
Parental outcomes • Work hours • Concern over child
Context • Student characteristics • Student prior academic achievement • Family background • School and community characteristics
Fig. 4.2 Example of logic model for population-intervention-comparison-outcome. The model (Goerlich Zief et al. 2006, p. 35)* is to support a literature review into after-school programming on youth context (i.e., student location, supervision, and safety), participation in activities, and behavioural, social and emotional and academic outcomes. The model identifies the outcomes, their changes and outcomes. Also, it takes into account the context. In this sense, the format of the related review question is population-intervention-outcome-context. *Reproduced under the Creative Commons Attribution License.
128
4
Developing Review Questions
System F
A K J
E
Focus on specific elements
C D
Focus on specific relationships
I
B H
Subsystem F
A K J
E C D
Aspectsystem F
A J
E I
B
I
D B
H
H
Fig. 4.3 Distinction between subsystems and aspects. Two principles for investigating a system (Dekkers 2017, p. 38)*. When focusing on specific elements a subsystem may be distinguished; in this case, this subsystem is consisting of elements A, C and J. When focusing on a specific relationship only, the elements that have interrelationships of this type are looked at; in the figure these elements of the aspectsystem are A, B, I and J. Note that elements in the system that do not have this particular type of relationship with other elements are omitted. This also means that element D is no part of the environment of the aspectsystem. *Reproduced with permission from the author.
relationships, for example, financial-economic and social relationships. Events are external changes (environment) in relationships that cause changes in internal relationships or attributes of the systems or its elements; examples are interventions, policies and uncontrolled events (such as disasters). From the perspective of systems theories, the first step in the selection or creation of a model is identifying the (sub)systems to be investigated—objects or subjects of study—, which type of relationships to consider (aspectsystem) and what events are that cause changes in the attributes of the system and its elements. Furthermore, the models are divided along two dimensions in Figure 4.4. The first dimension (depicted horizontally) sets the extent to which models predict; are they just descriptive, explanatory or could they predict outcomes? The second dimension (shown vertically in the figure) shows whether they are qualitative or quantitative; it also suggests that qualitative models precede quantitative models. Both type of models can be derived from theories. Where a model fits in this overview depends on the objectives of the study and relevant scholarly knowledge; the latter points to the necessity to examine studies and other reporting about to figure out whether scholarly knowledge is sufficient for the objectives of a study.
4.4 Population-Intervention-[Comparison]-Outcome
129
Objectives review
Predictive
Explanatory
Descriptive
Selection of aspect Subsystem for examination
Purpose model
Qualitative • Classification • Conceptual Models Quantitative • Sampling Models • Iconic Models • Analogue Models • Symbolic Models
Fig. 4.4 Overview of models based on systems theories. The problem definition points to the aspects and subsystems to be considered (Dekkers 2017, p. 72)*. Models have two dimensions: (1) being descriptive, explanatory or prescriptive and (2) being qualitative or quantitative. Qualitative models are further divided into classifications and conceptual models. The four basic quantitative models are: sampling models, iconic models, analogue models and symbolic models. *Reproduced with permission from the author.
Classification, as one of the two types of qualitative models in the overview, is the act of placing an object, system, element or concept into a set or sets of categories (such as a subject index), based on its properties. It assumes that all objects, systems, elements or concepts in a specific set (or class) have similar properties from a certain perspective, at the aggregated level of the objects, systems, elements or concepts. This means that a person may classify the object or concept according to an ontology, which the description or categorisation of entities or positing basic categories within an overarching framework. Developing conceptual models, as the second type of qualitative models, means specifying: • Essential elements of the (sub) systems being studied. • Relationships of the elements that pertain to the objectives of a study (i..e., aspects).
130
4
Developing Review Questions
• Changes in the elements (i.e. content of system) or their relationships (i.e., structure) that affect the functioning of the system – and in what ways. These could be triggered by events (or factors) in the environment of a system. • Objectives and methods of research (or investigation). The example in Figure 4.2 is a conceptual model. Also, the format population-intervention-outcome and its variants are conceptual models, albeit for review questions. Some would see the development of taxonomies of events, interventions, phenomena, policies, practices, treatements, etc. as conceptual model; in the scheme presented here this is seen as classification, where it will support extraction from information from studies in a review. In this sense, conceptual models are often representations of causal relationships between constructs, objects or subjects. Therefore, conceptual models in addition to classification are supporting the development of review questions. In the realm of quantitative models four kinds are distinguished (three of these are derived from Ackoff (1962, p. 109)), see Figure 4.3: • Sampling models consist of a mere subset of mutually exclusive systems taken from a larger set of systems. A sampling model resembles classification but differs in the sense that not all relevant properties have been identified, yet. In the case of a literature review, this could be taking one particular study or one particular object as point of reference for other studies on the same topic, object, context or type of study. However, sampling for reviews could benefit from other criteria for selection based on the case study methodology. In this perspective, Flyvjberg (2006, p. 230) sets out that these cases may include extreme or deviant instances, cases with maximum variation or critical instances; this could be applied to literature reviews, too. • Iconic models look like the real system but sometimes employ a change of scale or materials. They are used principally to communicate (design) ideas—for example, to the designer, to a customer (e.g. sketches, 3D prototypes) or to users. Iconic models represent real systems by, for example, scale models, photographs and graphical representations of networks. Iconic models can be used for reviews, depending on the topic of the review and are directed more at representations of the object of study; an example would be a literature review into design of hospitals. • Analogue models explore particular features of an idea by stripping away detail and focusing, via a suitable analogous representation, on just a few key elements (e.g., flow diagrams and circuit diagrams). The representations in analogue models do not aim at looking like the real systems and are intended primarily to examine functions and behaviour (of one aspect) rather than communicate appearances. For example, using the analogy between the evolution of organisms and organisations, as found in Dekkers (2005, pp. 67–75), as foundation for understanding the interaction between the dynamics of the environment and organisations better. • Symbolic models represent ideas by means of a code (for instance, numbers, mathematical formulae, words and musical notation). These models are very
4.4 Population-Intervention-[Comparison]-Outcome
131
useful at analysing performance and predicting events. Symbolic models are an abstraction of reality. In symbolic models the set of objects are represented by symbols and the relations are expressed in the form of algebraic, computational or algorithmic statements exhibiting no behaviour of their own. For literature reviews, normally, these are mathematical expressions of relationships between variables, but also other types of symbolic models could be considered. Even though these four types of quantitative representation differ quite substantially, all build on conceptual qualitative models (whether implicitly and explicitly) and have greater detail that may inevitably lead to loss of accuracy compared to the original qualitative model. In the thinking presented in Figure 4.4 qualitative models precede quantitative models. The rationale is that first classifications and conceptual models need to be developed before quantitative modelling can take place. This allows to explore and define how relationships between elements and aspects should be accounted for. Causal relationships and events that trigger changes in attributes of elements and modification of relationships are among them. In most cases, quantitative models are more detailed than qualitative models, because they require specifying attributes of elements (and systems) and relationships into variables that can be observed or measured. Also, multiple variables could be used to describe the specific systems, elements, aspects, relationships and events. Studies, including reviews, may limit the number of variables to be considered, thus leading to a less fine-grained analysis, or focusing on essential factors, such as the review by Leary and Kowalski (1990) on impression management mentioned in Box 4.A. For these reasons, the forming of qualitative models normally comes before the development of quantitative models.
4.4.4
Using Theories and Laws of Observed Regularities
Another possibility for modelling and capturing a review question is using theories. Taking Wacker’s (1998, p. 362) perspective, theories provide frameworks for analysis, contribute to developing fields of interest for research and present clear explanations for practitioners. This means that theories can be used to develop models, propositions and frameworks for literature reviews. An example is the systematic literature review by Riebl et al. (2015) in which they investigate the theory of planned behaviour as an effective framework to identify and understand child and adolescent nutrition-related behaviours. The review (ibid., p. 176) concludes that the theory may be useful in determining youth’s underlying attitudes, norms and perceptions of control that can be employed in customised interventions and programmes to address dietary behaviours. The example shows how literature on specific theories can be investigated and also result in findings relevant to practitioners. Furthermore, laws of observed regularities can be used to the same purpose. In the context of literature review, laws of observed regularities should be understood
132
4
Developing Review Questions
as precise descriptions of observed patterns how constructs relate to each other. An instance of such a law is the so-called experience or learning curve, which stipulates that an increase in productivity is related to the cumulative volume of a production system; this is often described by logarithmic equations. Anzanello and Fogliatto’s (2011) narrative review provides an overview of equations and mathematical approaches, and Glock et al.’s (2019) systematic literature review combines a similar outline with an investigation into trends found in the publications on the learning curve; both reviews set out a research agenda and also deliberate on applications. Thus, laws of observed regularities as described here can be investigated similar to theories and inform the formulation of review questions.
4.5
Scoping Study for Review Questions
In addition to modelling for underpinning the development of appropriate review questions, literature reviews for scoping may be necessary; these are also called scoping reviews and scoping studies, with differences between them explained in the later subsections. According to Grant and Booth (2009, p. 101), scoping reviews are a preliminary assessment of the potential volume and scope of available scholarly literature; the aim is identifying the nature and extent of evidence (usually including ongoing research). More specifically, Munn et al. (2018, p. 143/2) mention the following purposes for conducting a scoping review: • To identify the types of available evidence in a given field. • To clarify key concepts and definitions in literature. • To examine how research is conducted on a certain topic or field. • To identify key characteristics or factors related to a concept. • To identify and analyse gaps in scholarly knowledge. These broad-ranging objectives help to identify how further, more detailed studies could take place. Typically, such a scoping review precedes a full systematic literature review or systematic review. Literature reviews for scoping as formal approach are mainly found in domains, such as healthcare, education, psychology, but are hardly used in other domains, with business and management studies being a case in point. The key feature of such studies is getting an overview of research in the domain so that next steps, including systematic literature reviews and systematic reviews, can have a more specific focus.
4.5.1
Scoping Review as Protocol-Driven Literature Review
Scoping reviews tend to follow defined protocols and checklists. For example, Arksey and O’Malley (2005), Levac et al. (2010) and Tricco et al. (2018) provide detailed guidelines on how to conduct these types of literature reviews. Box 4.B informs about the steps of a scoping review. Since the guidelines have developed
4.5 Scoping Study for Review Questions
133
over the course of time, it seems that scoping reviews in some aspect carry similarities to systematic reviews. Box 4.B Guidelines for Scoping Reviews According to Arksey and O’Malley (2005, pp. 22–9) and Levac et al. (2010, pp. 3–7), scoping reviews consist out of five steps: • Identifying the review question for the scoping review. The questions are relatively broad as a study seeks breadth of coverage on a specific topic. • Identifying relevant studies. This stage aims at finding relevant studies by using a variety of search strategies: databases, search engines, list of references, hand searching of journals, etc. Chapter 5 provides more detail on search strategies. • Selection of studies. When reading retrieved studies and sources inclusion and exclusion criteria are developed. Chapter 6 addresses inclusion and exclusion criteria in more detail. • Charting and tabulating data. A ‘narrative review’ or ‘descriptive analytical’ method is used to extract contextual or process–oriented information from each study, resulting in tabulated and charted data. • Collating, summarising and reporting results. An analytic framework or thematic construction is used to provide an overview of the breadth of the literature, but not necessarily a synthesis of results and findings. Sometimes, a sixth step, though optional, is added: consultation. Engagement with stakeholders offers additional sources of information, perspectives, meaning and applicability of the results and findings of the scoping review. Table 4.3 contains more information about scoping reviews and how they differ from systematic reviews, although similar in some respect.
The predefined methods also raise the question how these scoping reviews differ from systematic reviews. Table 4.3 compares the scoping review with narrative reviews, systematic literature reviews and systematic reviews. This comparison indicates that in the approaches to scoping reviews, the search strategy for publications and their initial analysis are similar to systematic literature reviews and systematic reviews, whereas the analysis and synthesis of findings follows is less structured. Therefore, scoping reviews can lead to exploring an extensive body of scholarly knowledge, but are less effective towards findings and recommendations for practice. With regard to developing review questions for the archetype systematic review, undertaking a scoping review could be helpful; see Figure 4.5. First, one of its objectives is to identify gaps in scholarly knowledge, i.e., the existing literature. From these gaps, directions for both literature reviews and empirical research can be derived. Second, during a scoping study key concepts in literature can be found. Possibly, this can lead to models, see Section 4.4, and characteristics or factors relevant to what is studied. Third, the available evidence is considered; this can
134
4
Developing Review Questions
Table 4.3 Comparison of scoping review with three archetypes. Scoping reviews are positioned between narrative reviews and systematic (literature) reviews. They have a protocol—covering retrieval and data extraction—in common with systematic (literature) reviews, but do not aim at an unbiased appraisal and synthesised findings. Narrative reviews typically lack any of the characteristics. The table is an expanded version of Munn et al.’s (2018, p. 143/6) overview and refers to archetypes for literature reviews found in Section 2.5. Narrative reviews*
Scoping reviews
A priory review protocol
No
Yes
Systematic literature reviews Yes
Systematic reviews Yes
Registration of review protocol
No
No
No
Yes
Explicit, transparent, peer-reviewed search strategy
No
Yes
Yes
Yes
Standardised data extraction (forms)
No
Yes
Yes
Yes
Mandatory critical appraisal (risk of biased assessment)
No
No
No
Yes
Synthesising findings from individual studies
No
No
Yes
Yes
Generation of ‘summary’ No No Yes Yes findings *The category has been changed from ‘traditional literature reviews’ to ‘narrative reviews’
include the type of studies that are available for reviews, the context in which they took place and how they did build on extant scholarly knowledge. This can lead a particular focus on studies or contexts. However, because of its nature a scoping review may also introduce undue bias when compared with systematic reviews. Therefore, in three ways scoping reviews can help to develop review questions for more specific, detailed reviews. NOTE: POOR REASONS TO UNDERTAKE A SCOPING REVIEW • To assert that there is no literature on a specific topic is a poor reason to undertake a scoping review. On most topics there is a wealth of literature; some may be of relevance in some way, at least. Or, at worst, there is a lot of related literature that can be connected to the topic at hand. Therefore, a scoping review will always identify some kind of relevant literature being available. • Insufficient time available, and therefore, substituting a systematic (literature) review with a scoping review is another poor reason. Scoping reviews are protocol-driven and require substantial efforts to identify, retrieve and analyse studies. Also, scoping reviews have a relatively broad focus, which may lead to more superficial findings rather than conjectures and recommendations based on detailed analysis; therefore, considerable attention needs to be paid to conducting a scoping review so that meaningful conclusions can be drawn on the suitability of scholarly knowledge and its practical relevance.
Derived from Scoping Review (Protocol-driven)
4.5 Scoping Study for Review Questions
135 Context of Systematic Review
• Review questions • [Format, e.g. PIO]
Appropriateness • Causal relationships • Laws observed regularities • Models • Theories
Developing Review Questions
Identifying or Developing Models
Detailed Questions for Analysis Studies
Keywords and Databases
Inclusion and Exclusion Criteria
Retrieval of Studies
Extraction of Quantitative Data
Extraction of Qualitative Data
Quantitative Analysis of Studies
Qualitative Analysis of Studies
Synthesis of Findings
Fig. 4.5 Positioning scoping reviews for archetype systematic review. The position of a scoping review has been added to the process for the archetype systematic review in Figure 3.5. A first purpose of the scoping review could be to develop specific review questions based on a topic of interest and a broad range of publications and studies being available; this include the considerations of different contexts and different types of studies written on a topic. The second purpose is to identify or develop models so that the analysis and synthesis of studies are more directional towards findings. A third purpose is to find out which keywords and subject terms are used best to identify publications, and which databases are best used; see Section 5.4. A final purpose is to set guidelines for inclusion and exclusion criteria; see Chapter 6.
TIP: AVOID UNDULY BIAS BY A SCOPING REVIEW When undertaking a systematic review derived from a scoping review, unduly bias might be introduced. Because of its broader approach and its orientation towards a topical survey, the quality of evidence (see Section 6.4) may not be appropriately assessed during a scoping review. This could lead to findings and recommendations by a scoping review that are based on studies that were weakly conducted, did not disclose full information or were biased. Therefore, when conducting a systematic review originating from a scoping review, during the development of the protocol either re-assessment of the specific findings and recommendatons by the scoping
136
4
Developing Review Questions
review is necessary or sampling of studies within the scope may be necessary to gain additional insight with regard to the quality of evidence.
4.5.2
Scoping Study for Topical Mapping
Although not following a protocol, a scoping study will help to set out how to search for literature in the case of the archetypes systematic literature review and systematic review. Whereas a scoping review assesses the potential volume and scope of available scholarly literature, and thus, may result in the guidance to undertake a number of systematic reviews, a scoping study just aims at finding out what publications are available on a specific topic and for a literature review to happen; see Figure 4.6 for how a scoping study connects to the archetype systematic literature review. In this spirit, Brereton et al. (2007, p. 576) state that a scoping study, which they call a pre-review mapping study, may help in defining research questions. Such scoping studies typically provide an overview of topics, which topics are well-covered, including the discovery of already existing literature reviews, which topics need more attention and the type of studies that have been conducted. The recap of literature may also include how findings of studies are related. Because they do not necessarily provide an in-depth analysis of retrieved studies, sometimes, they can also be seen as topical surveys.2 The discovery of relevant literature and setting review questions may also benefit from the enhancement ‘successive fractions’ for the iterative search strategy in Section 5.3. Typically, these are of the archetypes narrative overviews and narrative reviews, because their purpose is not to be exhaustive, but provide direction for a protocol-driven literature review. Thus, the purpose of a scoping study is guiding a specific literature review that follows so that review questions can be set out, a search strategy defined, and inclusion and exclusion criteria determined.
4.6
Key Points
• Review questions for evaluating literature should be derived from research objectives in the case of empirical research and can be developed on their own in the case of stand-alone literature reviews. They can be informed by scoping reviews in the case of the archetype systematic review and scoping studies in the case of the archetypes systematic literature review and systematic review; scoping studies are topical surveys to inform a specific literature review, whereas scoping reviews have a broader remit as they seek to explore the extent of scholarly knowledge and are protocol-driven.
2
See Section 11.1 for further remarks on topical surveys in the context of qualitative synthesis.
4.6 Key Points
137 Defining Research Objectives
Scoping Study Informing
Purpose of Systematic Literature Review
• Review questions • [Five recommendations]
Appropriateness • Causal relationships • Laws observed regularities • Models • Perspectives • Theories
Setting Review Questions
Identifying or Developing Models
Questions or Themes for Literature Review
Keywords and Databases
Inclusion and Exclusion Criteria
Retrieval of Studies
Quantification of Retrieved Studies
Quantitative Analysis of Studies
Qualitative Analysis of Studies
Synthesis of Findings
Fig. 4.6 Positioning scoping studies for archetype systematic literature review. The position of a scoping study has been added to the process for the archetype systematic literature review in Figure 3.4. Similarly, but more narrowly focused than a scoping review, the scoping study supports the development of specific review questions, the range of publications and studies being available, including their contexts and types of studies, and if applicable, the identification or development of models, the keywords and subject terms to find publications, and which databases are best used; for the latter, see Section 5.4. A final purpose is to set guidelines for inclusion and exclusion criteria; see Chapter 6.
• Key to literature review is developing appropriate questions for reviews; the five recommendations for forming them include: • Guided by a single question, which may have corollaries, so that it provides direction for the retrieval of relevant sources and their appraisal. • Narrowly focused to achieve in-depth analysis of the relevant sources, while paying attention to the breadth of the literature. In some case such as reviews in healthcare, the guiding question could be broader when corollaries are narrowly focused.
138
4
Developing Review Questions
• Clearly formulated to avoid finding sources for a literature review that are not or hardly related to its topic. • Phrased in such a manner that different outcomes or opinions are possible. • Built on sound assumptions. • Scoping studies and scoping reviews have as purpose providing an overview or map of the evidence of a broader topic, before a more detailed systematic literature review or systematic review is conducted. Its outcomes also include the identification of key concepts, definitions and gaps in scholarly knowledge. In the case of a scoping review, it can result in setting an agenda for multiple, more detailed reviews, each with an appropriate review question. Particularly, in healthcare, medicine and nursing scoping reviews follow a protocol that for the retrieval of sources and extraction of data is similar to those for systematic reviews. Caution should be exerted when basing further literature reviews on scoping reviews due to potential bias and lack of assessment for the quality of evidence. • For formulating review questions, the format population-intervention-outcome can be used; particularly, this is of interest for evidence-based interventions and practice. Extensions of this format, found in Section 4.4 and Table 4.2, include: • Comparison, which adds alternative interventions or control groups, so that the effectiveness of the intervention can be evaluated. • Time, which adds changes that may have occurred in the aspects that are considered for the review. • (Type of) Study, which refers to particular types of research methods and designs of research methodology to be considered for analysis. • Context, which are the circumstances that form the setting for an event, statement or concept; this may be of particular interest to qualitative literature reviews. • The identification of models, theories and laws of observed regularities can support the development of appropriate review questions. Not only support they a more detailed analysis of retrieved studies, but they also enable developing a more profound theoretical base for reviews; this could include comparing models, theories and laws of observed regularities in a review. More recently, it has been advised to do this when a review is using the format population-intervention-outcome and its variants.
4.7 4.7.1
How to …? … Develop Review Questions That Are Worthwhile
To determine whether review questions are worthwhile, they need to be considered on both the contribution to scholarly knowledge and practical value; it should be noted review questions also can be either of practical utility or a contribution to scholarly knowledge (or both). The review questions should be guided by a single
4.7 How to …?
139
question, with closely related subquestions if necessary. Section 4.2 presents five recommendations to formulate review questions. And, Section 4.3 indicates a number of starting points to find these type of guiding questions. The recommendations and the starting points can be used for all four archetypes of literature reviews. Furthermore, such a guiding question should be expressed with clarity and narrowly focused; this will allow review questions with scholarly contributions and practical merit to guide the evaluation of literature. For systematic reviews into evidence-based interventions, practices and treatments, it is helpful to use the format population-intervention-outcome and its extensions (comparison, time, type of study and context) for setting review questions; see Section 4.3 for more detail. These can be developed further using modelling, theories and laws of observed regularities to inform the key concepts and the logic between these; see Section 4.4.
4.7.2
… Conduct and Time a Scoping Review or Scoping Study
A scoping review involves setting out a review question, which can be broad, retrieving studies and sources in a systematic way, and capturing data in a structured manner; such a review supports the development of review questions for more narrowly focused systematic reviews. In addition to finding out the extent of literature available on a specific topic, it also leads to identification of key concepts, models, theories and laws of observed regularities that could inform the review study. To achieve these purposes, a scoping review is a protocol-driven literature review and follows the steps of a systematic review; see Table 4.3 for similarities and differences between scoping reviews and systematic reviews. A scoping review is particularly helpful when the literature on a topic is broad and its nature needs to be investigated before a protocol-driven systematic review, or more likely a number of reviews, with a more narrowly focused review question can be effectively undertaken. Different from a scoping review, normally, a scoping study precedes a focused literature review, particularly the archetypes systematic literature review and systematic review. First, a scoping study reveals the extent of the literature available on the specific topic and identifies gaps in scholarly knowledge, whether theoretical or as guidance for practitioners. Second, a scoping study leads to the identification of key concepts, relevant objects or subjects of study and aspects to be considered for an effective literature review. Third, the scoping study may lead to determining inclusion and exclusion criteria. A scoping study for protocol-driven literature reviews is helpful when the extent of the scholarly knowledge is not yet known, and an effective protocol for a literature review needs to be set out.
140
4.7.3
4
Developing Review Questions
… Write a Literature Review
There are two ways of embedding the review questions in the writings of a literature review, depending on the archetype of the literature review. The first one is clearly referring to why the literature is investigated and what needs to be gotten out of it. Typically, this is suitable for the archetypes narrative overview, narrative review, and sometimes for the archetype systematic literature review, when part of an empirical study. The second one is explicitly formulating review questions using guidance provided in this chapter. This is appropriate for narrative reviews as independent publication, systematic literature reviews and systematic reviews. Both require a clear articulation of the purpose of investigating literature, preferably as a review question or review questions; Section 4.2 provides five generic recommendations for formulating review questions, and Section 4.3 guidance using the format population-intervention-outcome and its variations. For both literature reviews as part of an empirical study and literature reviews as independent study the formulation of review questions avoids diversion and supports the later analysis of retrieved studies. Formulating review questions can be further supported by modelling (see Figure 4.4 for an overview), laws of observed regularities and theories. This leads to a more coherent view on key concepts, aspects to be considered and their interrelationships. Consequently, such underpinning leads to more clarity about the review questions and concepts related to it.
References Ackoff RL (1962) Scientific method: optimizing applied research decisions. Wiley, New York Alvesson M, Sandberg J (2011) Generating research questions through problematization. Acad Manag Rev 36(2):241–271. https://doi.org/10.5465/amr.2009.0188 Anderson LM, Petticrew M, Rehfuess E, Armstrong R, Ueffing E, Baker P, Tugwell P (2011) Using logic models to capture complexity in systematic reviews. Res Syn Methods 2(1):33–42. https://doi.org/10.1002/jrsm.32 Anzanello MJ, Fogliatto FS (2011) Learning curve models and applications: literature review and research directions. Int J Ind Ergon 41(5):573–583. https://doi.org/10.1016/j.ergon.2011.05. 001 Aries MBC, Newsham GR (2008) Effect of daylight saving time on lighting energy use: a literature review. Energy Policy 36(6):1858–1866. https://doi.org/10.1016/j.enpol.2007.05.021 Arksey H, O’Malley L (2005) Scoping studies: towards a methodological framework. Int J Soc Res Methodol 8(1):19–32 Bolton JE (1971) Small firms: report of the committee of inquiry on small firms. Chairman JE Bolton, Vol Cmnd. Her Majesty’s Stationary Office, London Booth A, Noyes J, Flemming K, Moore G, Tunçalp Ö, Shakibazadeh E (2019) Formulating questions to explore complex interventions within qualitative evidence synthesis. BMJ Glob Health 4(Suppl 1):e001107. https://doi.org/10.1136/bmjgh-2018-001107 Brereton P, Kitchenham BA, Budgen D, Turner M, Khalil M (2007) Lessons from applying the systematic literature review process within the software engineering domain. J Syst Softw 80 (4):571–583. https://doi.org/10.1016/j.jss.2006.07.009
References
141
Carter R, Hodgson GM (2006) The impact of empirical tests of transaction cost economics on the debate on the nature of the firm. Strateg Manag J 27(5):461–476. https://doi.org/10.1002/smj. 531 Chen L (2012) Agent-based modeling in urban and architectural research: a brief literature review. Front Architect Res 1(2):166–177. https://doi.org/10.1016/j.foar.2012.03.003 Chow CW, Harrison PD (2002) Identifying meaningful and significant topics for research and publication: a sharing of experiences and insights by ‘influential’ accounting authors. J Account Educ 20(3):183–203. https://doi.org/10.1016/S0748-5751(02)00008-8 Cooke A, Smith D, Booth A (2012) Beyond PICO: the SPIDER tool for qualitative evidence synthesis. Qual Health Res 22(10):1435–1443. https://doi.org/10.1177/1049732312452938 Davies KS (2011) Formulating the evidence based practice question: a review of the frameworks. Evid Based Libr Inf Pract 6(2):75–80. https://doi.org/10.18438/B8WS5N Davis MS (1971) That’s interesting! – towards a phenomenology of sociology and a sociology of phenomenology. Philos Soc Sci 1(2):309–344. https://doi.org/10.1177/004839317100100211 De Boeck G, Meyers MC, Dries N (2018) Employee reactions to talent management: assumptions versus evidence. J Organ Behav 39(2):199–213. https://doi.org/10.1002/job.2254 Dekkers R (2005) (R)Evolution, organizations and the dynamics of the environment. Springer, New York Dekkers R, Barlow A, Chaudhuri A, Saranga H (2020a) Theory informing decision-making on outsourcing: a review of four ‘five-year’ snapshots spanning 47 years. Glasgow: Adam Smith Business School. Accessed from https://ssrn.com/abstract=3691983 Dekkers R, de Boer R, Gelsomino LM, de Goeij C, Steeman M, Zhou Q, Souter V (2020b) Evaluating theoretical conceptualisations for supply chain and finance integration: a Scottish focus group. Int J Product Econ 220:107451. https://doi.org/10.1016/j.ijpe.2019.07.024 Dekkers R (2017) Applied systems theory, 2nd edn. Springer, Cham Denk N, Kaufmann L, Carter Craig R (2012) Increasing the rigor of grounded theory research – a review of the SCM literature. Int J Phys Distrib Logist Manag 42(8/9):742–763. https://doi.org/ 10.1108/09600031211269730 Eldawlatly A, Alshehri H, Alqahtani A, Ahmad A, Al-Dammas F, Marzouk A (2018) Appearance of population, intervention, comparison, and outcome as research question in the title of articles of three different anesthesia journals: a pilot study. Saudi J Anaesth 12(2):283–286. https://doi. org/10.4103/sja.SJA_767_17 Fayezi S, O’Loughlin A, Zutshi A (2012) Agency theory and supply chain management: a structured literature review. Supply Chain Manag Int J 17(5):556–570. https://doi.org/10.1108/ 13598541211258618 Flyvbjerg B (2006) Five misunderstandings about case-study research. Qual Inq 12(2):219–245. https://doi.org/10.1177/1077800405284363 Glock CH, Grosse EH, Jaber MY, Smunt TL (2019) Applications of learning curves in production and operations management: a systematic literature review. Comput Ind Eng 131:422–441. https://doi.org/10.1016/j.cie.2018.10.030 Goerlich Zief S, Lauver S, Maynard RA (2006) Impacts of after-school programs on student outcomes. Campbell Syst Rev 2(1):1–51. https://doi.org/10.4073/csr.2006.3 Grant MJ, Booth A (2009) A typology of reviews: an analysis of 14 review types and associated methodologies. Health Info Libr J 26(2):91–108. https://doi.org/10.1111/j.1471-1842.2009. 00848.x Hall T, Beecham S, Bowes D, Gray D, Counsell S (2012) A systematic literature review on fault prediction performance in software engineering. IEEE Trans Softw Eng 38(6):1276–1304. https://doi.org/10.1109/TSE.2011.103 Hoppe T, Coenen F, van den Berg M (2016) Illustrating the use of concepts from the discipline of policy studies in energy research: an explorative literature review. Energy Res Soc Sci 21:12– 32. https://doi.org/10.1016/j.erss.2016.06.006 Hutin YJF, Hauri AM, Armstrong GL (2003) Use of injections in healthcare settings worldwide, 2000: literature review and regional estimates. BMJ 327(7423):1075. https://doi.org/10.1136/ bmj.327.7423.1075
142
4
Developing Review Questions
Iessa L, De Vries YA, Swinkels CE, Smits M, Butijn CAA (2017) What’s cooking? Unverified assumptions, overlooking of local needs and pro-solution biases in the solar cooking literature. Energy Res Soc Sci 28:98–108. https://doi.org/10.1016/j.erss.2017.04.007 Jacobi M (1991) Mentoring and undergraduate academic success: a literature review. Rev Educ Res 61(4):505–532. https://doi.org/10.3102/00346543061004505 Kitchenham BA, Mendes E, Travassos GH (2007) Cross versus within-company cost estimation studies: a systematic review. IEEE Trans Softw Eng 33(5):316–329. https://doi.org/10.1109/ TSE.2007.1001 Lacity MC, Solomon S, Yan A, Willcocks LP (2015) Business process outsourcing studies: a critical review and research directions. In: Willcocks LP, Sauer C, Lacity MC (eds) Formulating research methods for information systems, vol 2. Palgrave Macmillan, London, pp 169–251 Leary MR, Kowalski RM (1990) Impression management: a literature review and two-component model. Psychol Bull 107(1):34–47. https://doi.org/10.1037/0033-2909.107.1 Levac D, Colquhoun H, O’Brien KK (2010) Scoping studies: advancing the methodology. Implement Sci 5(69) Lewis MW, Grimes AJ (1999) Metatriangulation: building theory from multiple paradigms. Acad Manag Rev 24(4):672–690 Liao F, Molin E, van Wee B (2017) Consumer preferences for electric vehicles: a literature review. Transp Rev 37(3):252–275. https://doi.org/10.1080/01441647.2016.1230794 March JG (1991) Exploration and exploitation in organizational learning. Organ Sci 2(1):71–87. https://doi.org/10.1287/orsc.2.1.71 Milnes V, Gonzalez A, Amos V (2015) Aprepitant: a new modality for the prevention of postoperative nausea and vomiting: an evidence-based review. J Perianesth Nurs 30(5):406– 417. https://doi.org/10.1016/j.jopan.2014.11.013 Moraros J, Lemstra M, Nwankwo C (2016) Lean interventions in healthcare: do they actually work? A systematic literature review. Int J Qual Health Care 28(2):150–165. https://doi.org/10. 1093/intqhc/mzv123 Munn Z, Peters MDJ, Stern C, Tufanaru C, McArthur A, Aromataris E (2018) Systematic review or scoping review? Guidance for authors when choosing between a systematic or scoping review approach. BMC Med Res Methodol 18(1):143. https://doi.org/10.1186/s12874-0180611-x Munsell EGS, Coster WJ (2018) Scoping review of interventions supporting self-management of life tasks for youth with high functioning ASD. Exceptionality 1–14. https://doi.org/10.1080/ 09362835.2018.1480949 Nabbe P, Le Reste JY, Guillou-Landreat M, Munoz Perez MA, Argyriadou S, Claveria A, Van Royen P (2017) Which DSM validated tools for diagnosing depression are usable in primary care research? A systematic literature review. Eur Psych 39:99–105. https://doi.org/10.1016/j. eurpsy.2016.08.004 Popper KR (1966) Logik der Forschung. J.C.B. Mohr, Tübingen Riebl SK, Estabrooks PA, Dunsmore JC, Savla J, Frisard MI, Dietrich AM, Davy BM (2015) A systematic literature review and meta-analysis: the theory of planned behavior’s application to understand and predict nutrition—related behaviors in youth. Eat Behav 18:160–178. https:// doi.org/10.1016/j.eatbeh.2015.05.016 Robinson KA, Saldanha IJ, McKoy NA (2011) Development of a framework to identify research gaps from systematic reviews. J Clin Epidemiol 64(12):1325–1330. https://doi.org/10.1016/j. jclinepi.2011.06.009 Selz O (1913) Über die Gesetze des geordneten Denkverlaufs, erster Teil. Spemann, Stuttgart Simkiss DE, Stallard N, Thorogood M (2013) A systematic literature review of the risk factors associated with children entering public care. Child Care Health Develop 39(5):628–642. https://doi.org/10.1111/cch.12010 Squires JE, Valentine JC, Grimshaw JM (2013) Systematic reviews of complex interventions: framing the review question. J Clin Epidemiol 66(11):1215–1222. https://doi.org/10.1016/j. jclinepi.2013.05.013
References
143
Stern C, Jordan Z, McArthur A (2014) Developing the review question and inclusion criteria. Am J Nurs 114(4):53–56. https://doi.org/10.1097/01.Naj.0000445689.67800.86 Tricco AC, Lillie E, Zarin W, O’Brien KK, Colquhoun H, Levac D, Straus SE (2018) PRISMA extension for scoping reviews (PRISMA-ScR): checklist and explanation. Ann Int Med 169 (7):467–473. https://doi.org/10.7326/m18-0850 Turner M, Kitchenham B, Brereton P, Charters S, Budgen D (2010) Does the technology acceptance model predict actual use? A systematic literature review. Inf Softw Technol 52 (5):463–479. https://doi.org/10.1016/j.infsof.2009.11.005 Turnnidge J, Côté J (2018) Applying transformational leadership theory to coaching research in youth sport: a systematic literature review. Int J Sport Exercise Psychol 16(3):327–342. https:// doi.org/10.1080/1612197X.2016.1189948 Vereenooghe L, Flynn S, Hastings RP, Adams D, Chauhan U, Cooper S-A, Waite J (2018) Interventions for mental health problems in children and adults with severe intellectual disabilities: a systematic review. BMJ Open 8(6):e021911. https://doi.org/10.1136/bmjopen2018-021911https://doi.org/10.1136/bmjopen-2018-021911 Villa-Roel C, Voaklander B, Ospina MB, Nikel T, Campbell S, Rowe BH (2018) Effectiveness of written action plans for acute asthma: a systematic review. J Asthma 55(2):188–195. https:// doi.org/10.1080/02770903.2017.1318142 Wacker JG (1998) A definition of theory: research guidelines for different theory—building research methods in operations management. J Oper Manag 16(4):361–385. https://doi.org/10. 1016/S0272-6963(98)00019-9 Zhou J, Li X, Mitri HS (2018) Evaluation method of rockburst: state-of-the-art literature review. Tunn Undergr Space Technol 81:632–659. https://doi.org/10.1016/j.tust.2018.08.029 Zhou Q (2020) Challenging the exploration and exploitation dichotomy: towards theory building in innovation management. Doctoral thesis, University of Glasgow, Glasgow
Chapter 5
Search Strategies for [Systematic] Literature Reviews co-authored by Lynn Irvine
After setting review questions as discussed in the previous chapter, the search for relevant publications is the next step of a literature review. No matter the archetype of literature review, see Section 2.5, these searches aim at finding all relevant publications. In the case of systematic approaches, the objective is to find all or almost all relevant publications; if this cannot be achieved, then the retrieved publications should present a representative sample of literature on the subject matter. To the purpose of finding relevant publications or a representative sample, this chapter goes into more detail about how to find these writings. It starts with what criteria for search strategies are in Section 5.1; this includes a discussion about the breadth and depth of retrieving studies and sources. What type of publications and sources to consider appears in Section 5.2; the generic guidance about the inclusion of specific studies paves the way for how a search can be conducted. The classic search strategy, with its advantages and setbacks, is found in Section 5.3. This section also describes four ways to enhance this search strategy. Section 5.4 presents a search strategy based on the use of keywords, controlled vocabulary and databases. This strategy is often thought of in connection with protocol-driven literature reviews, but some of its features can also be used in conjunction with the iterative search strategy. Six other search strategies are introduced in Section 5.5. The enhancement of search strategies is the topic of Section 5.6. It covers how to determine the effectiveness of search strategies and trade-offs that may emerge when conducting a search. Also, how search strategies can be used in conjunction to increase effectiveness of the search is brought up. Section 5.7 looks at so-called grey literature and approaches to retrieving this type of publications. Finally, Section 5.8 pays attention to keeping records of searches. Furthermore, it provides suggestions for how to deal with a multiple of sources reporting the same study and how to manage large volume searches. Thus, the chapter provides a comprehensive overview of different ways to search for relevant publications, gives guidance about how to make a search effective and how to consider different sources.
© Springer Nature Switzerland AG 2022 R. Dekkers et al., Making Literature Reviews Work: A Multidisciplinary Guide to Systematic Approaches, https://doi.org/10.1007/978-3-030-90025-0_5
145
146
5.1
5 Search Strategies for [Systematic] Literature Reviews
Criteria for Retrieving Publications
Finding all relevant publications or a representative sample related to review questions, or alternatively research objectives when a literature review is part of an empirical study, is the purpose of a search strategy. A search strategy is the method by which relevant sources are found, usually by searching selected databases and search engines using a mix of keywords, controlled vocabulary and search operators. Relevance is determined by a review question for which guidelines can be found in Sections 4.2 and 4.3. When conducting a literature review in the context of an empirical study, the relevance is determined by to what extent studies inform the research design and data collection; see Section 3.5 for more detail. Thus, the relevance of a retrieved publication is determined by how it relates to a review question or research objective. In principle, such requires assuring that the search identifies studies and works with conceptualisations, laws of observed regularities, methods, perspectives, theories, tools, etc. that fall within the depth and breadth of the objectives of the review or empirical study. To search for and locate all relevant works and studies is sometimes not possible due to practical constraints. When it is not possible to retrieve all relevant studies and works, then ideally the search strategy should aim to find almost all relevant studies. If this is not possible, then sampling could suffice, but evidence needs to be provided how the sampling represents the larger body of relevant literature. However, practical constraints should not be confused with lack of diligence. For instance, university libraries often offer services for researchers to request research monographs, articles and chapters in edited books that are not immediately available to them. Jennex (2015, p. 142) notes that in some cases researchers take a convenience stance; this should be avoided (see guidance at the end of this section). Practical reasons are often that the publication is no longer available or bibliographical details are insufficient to locate the source. A case in point is the study into innovation by Freeman, mentioned in the so-called Bolton Enquiry (1971, p. 52); this report was not accessible even after many paths were pursued to get hold of it. Thus, practical difficulties can be a factor in not locating relevant publications, but are only acceptable reasons for omission when researchers have tried all avenues at their disposal to locate them. In addition, in some cases the review question may not warrant investigating all sources that contain information about a specific topic. For example, when the nature of the literature review does not have to consider all evidence. In particular this may be the case for works on conceptualisations, methods, theories, tools, etc. A case in point is the renewed interpretation of the resource-based view by Priem and Butler (2001). In the publication the authors put forward limitations of this popular theory in business and management studies, often used for topics related to strategic management. Based on the limitations, its conceptualisation is renewed and the theory extended. To do this, it was not necessary to include all studies until the date of writing, because the literature review only needed to demonstrate how the resource-based view was used by preceding studies. In this respect,
5.1 Criteria for Retrieving Publications
147
Finfgeld-Connett and Johnson (2013, p. 199), referring to Bates (1989, 2007), advocate a ‘berry-picking’ search strategy for exploring which sources and studies are relevant to a topic. This is an evolving search strategy suitable for knowledge building and theory generation. It may also be associated with the exploratory stage of conducting a literature review (Bates 1989), akin to scoping studies (see Section 4.5). Thus, the nature of a literature review, particularly the generation of conceptualisations, methods, patterns, theories, tools, etc. may warrant that not all but a more limited set of studies and sources are considered; however, such literature reviews need to demonstrate that they are sufficiently comprehensive and no studies have been omitted for reasons of convenience. In contrast to using limited studies and sources, finding an abundance of relevant material is also problematic; retrieving too broad a range of literature related to a specific topic is mostly caused by two reasons. The first one is that the topic has been well-studied. If this is the case, there is less likelihood that a literature review will identify gaps in scholarly knowledge; one way to examine this body of literature is whether assumptions or foundations have been investigated, which is raised in Section 4.3 as a possible route to worthwhile review questions. The second reason is that the review question is too broadly formulated. This will result in many different strands of research coming together during the search for relevant literature. In this scenario the reading of retrieved studies and sources may lead to new searches, increasing the scope of literature and number of studies to be considered during analysis and synthesis. Even if a topic is defined well with set boundaries to consider, the retrieval of many sources and studies may hinder finding the relevant ones amongst them. This means that a search strategy should be carefully planned to find relevant publications as effectively as possible, which implies that the review question should be narrowly focused (see Section 4.2); the use of scoping studies (see Section 4.5) and initial searches will help to avoid this scenario. The approach to retrieving relevant studies and sources can be expressed by two concepts: sensitivity and specificity (the latter is sometimes called precision). Sensitivity is defined as how the search strategy will identify the number of relevant studies; it is commonly measured as the number of relevant studies and sources identified divided by the total number of relevant studies in existence. Sensitivity expresses how many relevant studies were found and how effective a search strategy is. Specificity (or precision) indicates how the search strategy leads to identification of relevant studies and sources, relative to the total number of these existing; it is measured as the number of relevant studies and sources identified divided by the total number of studies and sources available. A proper search strategy will strive to balance sensitivity (comprehensiveness) with specificity (precision). When searching, there can be diminishing returns for efforts; after a certain stage, each additional search may return fewer additional references that are relevant to the review. Consequently, there comes a point where the rewards of further searching may not be worth the effort required to identify the additional references. This means that for evolving or stepwise search strategies an increase in sensitivity may come at the expense of specificity.
148
5 Search Strategies for [Systematic] Literature Reviews
TIPS: EFFECTIVENESS OF SEARCH STRATEGIES • The effectiveness of a search can be judged on whether key references that should appear in any search are found; for example, vom Brocke et al. (2015, pp. 215–6) highlight the importance of identifying seminal works, as they call them. When a search strategy does not retrieve these key references for the topic, then it must be evaluated and changed in order to be more effective. The key references or seminal works for a specific review question can be identified in a number of ways: • Consultation of colleagues, peers and supervisors. By engaging with any of these scholars, studies and sources pertinent to the topic may be identified. After retrieval, these key sources should still be assessed on their merits, for example, by using the method in Box 2.A. • Extraction from publications. By reading through studies and sources, references related to the core arguments and discourse can be found. However, references that are used regularly do not guarantee that those individual studies have fully accounted for the content of seminal works. • Using search engines and databases. There is likelihood that papers that are cited more frequently are considered seminal works in a domain or for a specific topic. However, the same warning applies as made previously; these works should be scrutinised to evaluate their appropriateness, scholarly credibility and assumptions they may be based on. A case in point for examining a conceptualisation in-depth is the popular notion of open innovation in business and management studies; see Box 5.A. Despite its popularity only few have challenged the conceptualisation of open innovation. Consequently, this means that referring to a specific publication or source that is well-cited does not warrant that the actual contribution to knowledge of a review is assured. • When focusing on specific applications or domains, it could be that pivotal papers for other applications or domains could contribute insight. This approach can be useful when specific assumptions or starting points are questioned less, or when theories, conceptualisations, etc. have been scantly introduced for a specific application or domain. Care should be taken that sufficient reasoning is provided. An example is the use of techniques of operations management for processes of product design and engineering. Shishank and Dekkers (2013, p. 319) provide a brief overview of characteristics of the processes design and engineering before looking into how literature has dealt with decisions on outsourcing of manufacturing activities in this process. This example shows the relevance of adequate reasoning before borrowing theories and conceptualisations from another domain or application. • If access is limited to relevant studies and sources, for example when a university library does not provide access to specific publications, alternative routes for obtaining these should be considered. Routes for getting access to relevant publications in addition to access provided by libraries include:
5.1 Criteria for Retrieving Publications
149
• Payment for sources (most publishers offer the possibility for the purchase of individual articles). • Consultation of peers (without infringing on intellectual property rights). • Inclusion of authors in study that have access to a wider range of sources (of course, they are also expected to contribute to the content of a study, too).
Box 5.A Seminal or Not? The Case of ‘Open Innovation’ Attributed to Chesbrough (2003), open innovation has become a popular conceptualisation for studies into innovation management. Often referred to as paradigm, it is contrasted with closed innovation. Others have noted its thinking was already existing far before. For example, Trott and Hartmann (2009) called it an old paradigm and question its credibility, Dekkers et al. (2019, p. 215) prove that other factors dominate collaboration between companies for the Scottish scene, and even, Chesbrough (2012, p. 20) downplays the relevance of the notion. Furthermore, its foundations have been contested, for instance in the writing by Isckia and Lescop (2011). Notwithstanding this criticism, others have called Henry Chesbrough the ‘father of open innovation’, with Onetti (2019, p. 51), and Piller and West (2014, p. 29) being among them. It is up to the reader to decide what to make of these points with regard to the contribution to knowledge by this specific conceptualisation. It shows that searching for information about a theory, construct or conceptualisation also requires looking into its origins and what it brings to the table.
NOTE: RELEVANCE SHOULD NOT BE CONFUSED WITH CITATION RATES A common mistake is to misinterpret the number of citations of an article, even when excluding self-citation, with its relevance. For example, Dieste et al. (2009, p. 537) state that focusing on citation rates causes a considerable number of relevant studies to slip through the net. In this respect, it should be noted that citations of studies and sources happen for a variety of reasons. They can be mentioned because they are used to introduce and justify a topic of study or they can be cited because reviewers suggested them or because authors want to add credibility to their study, in addition to the fact that they may contain relevant theories, conceptualisations, constructs, etc. Thus, assuring the relevance of highly cited studies would require removing all citations that do not actually use the content of a study. This would be impractical, particularly when the citations rates are high. Citation rates are only a coarse indication of how others have built on an article. Therefore, unless the topic of study entails investigating how thought has developed, the multitude of reasons for citing works besides its relevant content make the number of citations a very poor indicator of relevance.
150
5.2
5 Search Strategies for [Systematic] Literature Reviews
Types of Sources
When looking for relevant publications it is important to consider what types of sources are suitable for the information needed in relation to the review question. For example, when searching for origins of concepts, these could have been initially communicated through presentations rather than formal academic publications. When seeking evidence about mitigation of effects from pre-mature births, scientific publications may only provide the reliable and rigorous evidence necessary for investigating interventions. Therefore, the purpose of a literature review determines what needs to be included as sources in a study. When looking at studies and sources, these may be considered in terms of generic criteria for evaluation, i.e., content, structure and credibility. The suitability of the content is determined by how much a source relates to the topic of the literature review, but also whether it reflects state-of-the-art knowledge at the time of its publication and whether there is acknowledgement of other works that informed the writing. The structure of a source informs how the arguments are organised and presented. Some journals, such as Nature, have findings early in the text, whereas others expect these to be positioned nearer to the end of a writing. Credibility refers to the confidence in the findings, particularly if there is explicit reference to how data was collected and how it was analysed. The following subsections will provide a brief overview of primary sources, secondary sources, propositional writings, professional publications and other relevant material that can be used, discussed in relation to the three criteria for evaluation of suitability: content, structure and credibility; see also Table 5.1.
5.2.1
Primary Sources
As the first generic type of source in the context of literature reviews and scholarly works, primary sources present results, findings and conclusions of empirical studies. They can be published as articles in journals, chapters in edited books, monographs, proceedings of conferences, working papers and reports by research institutes. Reports and studies by the Organisation for Economic Co-operation and Development (OECD) are cases in point for the latter. Often, this type of source follows canonical formats for presenting the findings and conclusions based on generic processes for research; see Section 2.2 and Figure 2.2 for the generic processes of research. Normally, it includes a literature review that precedes the empirical study; see Section 3.5 and Figure 3.10 for more detail how a literature review informs an empirical study. Further credence can follow from editorial policies for acceptance of manuscripts for publication, peer-review and authors’ profiles. Because of the explicit reasoning throughout such writings, it should also be possible to distil key thoughts and comments. Thus, primary sources can offer more insight than just findings derived from analysis of collected data. However, even when there is a relatively strong focus for the literature review, there could be a broad diversity in research objectives, research methods, samples
Notes
Extraction for literature review
Credibility
Structure
Content
Typical resource
• Principally all relevant studies should be included, no matter rankings • Clusters of studies might represemt strands of thought
• Aiming at providing arguments
• • • • • • • • •
• Extends to umbrella reviews • Attention needs to be paid to how authors paid to how authors reframed evidence from other sources
Canonical formats Rigour Peer review (journals) Editorial policy journal Authors’ profile Extracted data (if presented) (Major) findings Recommendations Key thoughts
• Most instances: selective use of sources and studies • New theories, conceptualisations, methods, tools, etc
• Stand-alone literature reviews of studies • Methodological details for review method • Explicit reasoning
• Key thoughts, assumptions, beliefs, etc • Evidence of implementation and practices • May be poorly referenced • May biased by personal interests or commercial benefits
• Key thoughts, assumptions, beliefs, etc
• Selective use of sources for justification may lead to incomplete reasoning and consideration of evidence
• Authors’ profile
• Varied content about practices, methods and tools
• Professional magazines • Business reports • Manuals, handbooks
Professional publications
• Peer review (journals) • Authors’ profile
Academic journals Reports by research insititutes Chapters edited books Professional publications
• • • •
• Academic journals • Chapters edited books • Evidence-based reports
• Academic journals • Reports by research insititutes • Chapters edited books • Monographs • Doctoral theses • Empirical data, results, conjectures, findings • Preceding literature • Methodological details in conjunction with explicit reasoning • Canonical formats • Rigour • Peer review (journals) • Editorial policy journal • Authors’ profile • Data (if presented) • (Major) findings • Recommendations • Key thoughts
Propositional writings
Secondary sources
Primary sources
Table 5.1 Overview of type of publications and their suitability for literature reviews.
• May biased by personal interests or commercial benefits
• Key thoughts, assumptions, beliefs, etc • Implementation and practices
• Authors’ profile
• Varied content about practices, methods and tools • Points of view (personal or organisational)
• Presentations • Articles in news outlets
Other types of study
5.2 Types of Sources 151
152
5 Search Strategies for [Systematic] Literature Reviews
and analysis across retrieved studies. This is often related to the search for contributions to knowledge, where papers and studies may differ substantially. The variety may pose additional challenges for conducting literature reviews; Section 6.5, and Chapters 7, 9, 10, 11 and 12 provide guidance how to incorporate differences across studies. Whereas the variety in retrieved studies may impede drawing more definite conclusions towards coherence in scholarly knowledge, it also provides opportunities to search for additional explanations.
5.2.2
Secondary Sources
The purpose of secondary sources is mainly capturing knowledge present in primary sources; as such they do not offer new evidence but may reframe existing evidence. Stand-alone literature reviews are the dominant form in this category, albeit they could have differing purposes. These range from identifying gaps in scholarly knowledge to synthesising literature to study the effects of interventions, policies, practices and treatments aiming at specified outcomes. The diversity in approaches within secondary sources may hinder interpretation. This book offers insight into how literature reviews could be conducted, and also provides information on how to evaluate these. The inclusion of secondary sources can be evaluated better when the extraction of data, related to its analysis and synthesis, is adequately reported; for more information on this, see Chapters 13 and 14. Even though secondary sources will reframe existing evidence and thus cannot reach beyond publications they consider, they may still contribute to generating new insight, depending on the perspective taken. Particularly, for evidence-based interventions, policies, practices and treatments in some cases, and for advances in theoretical insight, the inclusion of secondary sources could be helpful.
5.2.3
Tertiary Sources
A further category, though not classified in Table 5.1, are umbrella reviews (see also Section 2.5), aka tertiary sources; these are best described as ‘reviews of literature reviews.’ They are principally reviews of systematic reviews, but sometimes appear also as part of systematic literature reviews. Examples of umbrella reviews are the studies by Grosso et al. (2017) and Poole et al. (2017) on the consumption of coffee and its effect on health outcomes. Even though these two publications have similar objectives for the study, they consider a different set of papers; both consider 141 studies reporting meta-analysis, though the distribution across observational studies and randomised controlled trials is different. Whereas the factors considered are also similar, the studies differ in the way they detail physiological mechanisms, and which findings they highlight. This means that even tertiary sources, though similar, may express nuanced differences in outcomes and findings, which could be useful to establish which further research is necessary.
5.2 Types of Sources
5.2.4
153
Propositional Writings
Different from primary and secondary sources, propositional writings introduce revised, extended or new theories, conceptualisations, methods and tools. These can be found in more diverse types of publications. In addition to being found as articles in journals, chapters in edited books, monographs, proceedings of conferences, working papers and reports by research institutes, they also appear as presentations, commercial reports, etc. They may use selective studies and sources to present arguments, and sometimes, these are incomplete. If the questions for the review are related to revised, extended or new theories, conceptualisations, methods and tools, then these sources and studies can be used, though in general cautiously. Typically, they do not analyse newly generated data, results and findings, but build on the works done by others. This means that when a literature review seeks to investigate origins of conceptualisations and scholarly knowledge, to look for advances in theoretical insight and to find arguments related to assumptions, hypotheses, postulations and propositions, propositional writings can be considered.
5.2.5
Professional Publications
A further potential source of evidence and conceptualisations are professional publications, also called trade publications and professional literature. These can be in the form of reports, articles in professional magazines, articles in trade magazines, manuals, handbooks, etc. An example of these are professional publications by Koninklijke Philips (aka Royal Philips), a multinational conglomerate, in the 1980s. These publications focused on specific topics, for example the implementation of the Japanese approach to just-in-time production. The manual contained descriptions of concepts, case studies and also other publications from a wide variety of sources. It was very helpful for those that were responsible for improving manufacturing, but at the same time, it provided an overview of state-of-the-art practical knowledge and some relevant scholarly knowledge. However, such publications must also be considered on their merits; sometimes, there is bias towards specific perspectives, instigated by either personal interests or commercial benefits. Thus, depending on the review question and its topic, professional publications could contain evidence and points of view that could be of interest, as long as appropriate caution is exerted.
5.2.6
Other Types of Publications and Sources
There are other sources, such as presentations and publications in news sources. Their quality may vary, depending on the perspective taken. Akin to professional publications, there could be bias towards specific perspectives, either from personal interests or for commercial benefits. It depends on the review question for a
154
5 Search Strategies for [Systematic] Literature Reviews
literature review whether these presentations and publications (magazines, newspapers, etc.) are of interest for generating scholarly or practical knowledge. NOTE: TYPES OF SOURCES DEPENDENT ON REVIEW QUESTIONS Which sources to consider is strongly dependent on the nature of the review question. For evidence-based interventions, policies, practices and treatments, primary sources of any kind could be of interest. However, if the aim of a literature review includes advances in theoretical developments, then secondary sources could also be worthwhile. When the objective of a review is to look at the origins and development of a conceptualisation, sources such as presentations and articles in professional magazines could be of interest. An example of the latter is a literature review into the origins and conceptualisation of lean product development by Salgado and Dekkers (2018). This review considered monographs, chapters in edited books, doctoral theses, presentations, professional publications, reports and working papers. Especially in the early stages of the development of conceptualisations, methods and tools, such sources could provide additional insight not fully captured in scholarly studies. NOTE: TO USE OR NOT TO USE WIKIPEDIA Particularly, undergraduate and postgraduate student might use Wikipedia as a first call. This can be helpful for finding information about a concept for orientation, akin to consulting an encyclopaedia. However, the nature of this online resource does not warrant that explanations are adequate, sometimes they are strongly influenced by views of those that write the entry in Wikipedia. And similar to an encyclopaedia they provide basic information about a concept, theory, etc., but do not necessarily discuss underpinnings, relations to other theoretical conceptualisations and critical reviews. Therefore, students should consult academic textbooks, academic journals and edited books to find a more fine-grained description and analysis of concepts, theories, etc.; an additional advantage is that consulting scholarly publications also demonstrates a more genuine effort to learning than relying on Wikipedia, other online resources and textbooks.
5.3
Iterative Search Strategy
The classic search strategy for a literature review is iterative in nature, because after looking at initial relevant papers the search is revised depending on information found in studies; see Figure 5.1. This search strategy often starts with a review question or research objective in mind, which is not necessarily explicitly formulated, as opposed to the guidance in Sections 4.2, 4.3 and 4.4. For the literature review there is an initial conception about which papers and writings are relevant to the topic of the study. Reading this initial set of papers, often found using keywords, will lead to identifying other works, ideas and additional keywords. This induces searches for other sets of papers, or to searching based on the amended set of
5.3 Iterative Search Strategy
155
Conception about state-of-the-art in literature
Search terms Search for studies
Key publications
Obtain key concepts, ideas and perspectives
Drafting literature review
Search for additional works
Finalising literature review
Fig. 5.1 Iterative search strategy for literature reviews. In a search strategy that is iterative, a starting point is defined by what the state-of-the-art for a specific topic is. This determines which key publications are of interest. Based on the key publications and the search for further literature most likely new concepts, ideas and perspectives appear. These lead to a revised view on what key publications are and an additional search for studies supporting the revised view and providing relevant evidence. Also, the drafting of the literature review will trigger further searches for key publications and additional works. These iterations continue until the literature review is finalised, caused either by review (or assessment) or by not finding additional arguments, concepts, ideas and perspectives anymore.
concepts and keywords. Sometimes, this is supported by the drafting of a literature review to make search terms for the search strategy more explicit. However, during the drafting a renewed look at the papers evaluated previously may also trigger further searching for additional works. Consequently, an iterative search strategy is flexible, because the approach to literature is adapted to the scholarly knowledge obtained during the search; it results in progressive insight and may also uncover arguments that were not known or considered beforehand. An initial search of literature for a proposal for a doctoral study or postgraduate dissertation may look like this. Also, when preparing an empirical study this search strategy could be very helpful. Thus, the iterative search strategy, particularly helpful for initial stages of empirical studies and cases where scholarly knowledge for a domain is less extensive, provides a flexible approach to finding, analysing and synthesising literature. The challenge for a more structured iterative search strategy is to keep the focus of the review in mind and avoid drifting away. As Rudestam and Newton (1992, p. 49) express: the aim is to ‘build an argument, not a library.’ The likelihood that it leads to an ever-expanding set of publications with increasingly less relation to the
156
5 Search Strategies for [Systematic] Literature Reviews
original review question or research objective is one of the challenges for the iterative search strategy. It may encourage searching almost endlessly for additional works and new avenues for exploring the topic. Consequently, the topic of the study may change due to this progressive insight. Alternatively, the scope of the topic could become extensive, which also results than in more superficial treatments of concepts and themes because of the extent of literature found. As already noted, this could have the positive result of finding novel ways for addressing the topic. The experience of most novice researchers, particularly students, is that they tend to drift away from the topic or find too many interesting papers and concepts to engage with. To avoid this resulting in additional complexity and extensive literature to be looked at, the review question should always be kept in mind. Section 4.2 provides consideration for questions that guide the literature review to avoid drifting away and ending up with an overly extensive body of results. Notwithstanding this advice about focusing on a specific review question, the search should also be open to the wider context of a topic, which contributes to literature sensitivity, as it is called in Section 2.1. To conclude, drifting away and an ambitious scope during the iterative search strategy can only be avoided by sticking to the topic of a review, whilst keeping an open mind to new and emergent perspectives for a study. The iterative search strategy can be enhanced in four ways, particularly for the archetypes narrative overviews and narrative reviews. These enhancement of the iterative search strategy have been adopted from Rowley and Slack (2004, pp. 35–6): • Citation pearl growing. This modification to the iterative search strategy uses relevant sources, or citations, to find more relevant sources on a topic. Usually, it begins with an initial document on a topic that holds all information needed or matches with all the keywords for a search. From this document, the reviewer is able to identify other keywords, descriptors and themes for further searches. This is a relatively easy approach to use. • Briefsearch. During a briefsearch a few documents are retrieved crudely and quickly, without necessarily considering if they present all aspects and perspectives on a topic. It is often a good starting point, for further work using the iterative search strategy, to have a look what is available relating to a topic. • Building blocks. This form of the iterative search strategy takes the concepts found during an initial search, and extends them by using synonyms and related terms. A more thorough, but possibly lengthy, search is then conducted to retrieve a comprehensive set of documents. • Successive fractions. This approach to the iterative search strategy can be used to reduce a large set of studies and works that were retrieved in earlier stages of the literature review. The set of documents can then be divided into applications, aspects, concepts, themes, topics or similar; note that some studies and writings may fall into more than one category. The next step consists of either discarding specific topics to narrow down the focus, or to eliminate less relevant or useful documents. These variations on the iterative search strategy aim to aid more effectively finding what is needed, but again, they could also result in the researcher drifting away
5.3 Iterative Search Strategy
157
from the original topic and retrieving a broad range of studies when the focus during the search is not kept on the leading question of the study. This may particularly the case when they are used in combination. NOTE: ON WRITING WHILE READING It is also important to see the writing stage as part of the research process, not as something that happens after finishing the reading of the retrieved literature. This is a common practice and perception of those with less experience in research or writing. Wellington et al. (2005, p. 80) suggest ‘writing while you collect and collecting while you write.’ This advice is often given by supervisors to undergraduate, postgraduate and doctoral students. The thinking behind the advice is that writing also contributes to knowledge and skills by making deliberations on a topic more explicit. This increases knowledge and helps identify additional keywords for searching for arguments, conceptualisations, methods, theories, etc. However, at the same time, this can be a double-edged sword. The advice assumes that relevant studies and works are at hand so that the writing process is directional. In many instances, the topic may be either not fully defined or ill-defined, or relevant literature appears only during later stage of the search process. An example is looking into the origins for the theory of agency, a well-known theory for business and management studies, and economics. Nowadays, many take Eisenhardt’s (1989) extensive description as starting point. Retrieving older studies on which she relies (e.g., Jensen and Meckling 1976; Ross 1973), but interprets in a specific manner, may paint a picture that differs from the underpinnings of the theory and how it should be conceptualised. But these studies and others are harder to obtain, sometimes hidden away in archives, in libraries, and may not necessarily be available electronically. An example is the work by Mitnick (1973), introducing what is now known at the principal-agent problem. Thus, taking Eisenhardt (1989) as starting point for initial writing may result in a biased understanding of the theory of agency. The example shows that writing supports the literature review only when core concepts have been sufficiently explored and analysed, and key works assessed on their merits (Figure 5.2). Further advice is to determine in advance the length of a literature review in terms of numbers of words, often given to undergraduate and postgraduate students. For example, if the maximum number of words for a master’s dissertation is 10,000, then the literature review should be around 2,500 words. Starting writing too early in the literature review will result in topics being covered that turn out to be less relevant later; this makes the early writing efforts a relatively meaningless exercise, except for practicing writing. Additionally, more relevant literature, emerging later in the process, may be treated in a more superficial way than it deserves. Occasionally, those that have reached the required number of words for the literature review may halt their review, assuming ‘mission accomplished’. Moreover, sometimes, it may turn out the literature needs more attention than the empirical study, so that the number of words for the literature review could be necessarily more than planned. Therefore, it is better to first understand the structure of the relevant literature and its position within an empirical study before starting the writing of the literature review; this allows formative feedback to be a more effective support.
158
5 Search Strategies for [Systematic] Literature Reviews
If ... then ...
Proposition
Context Levels of analysis
Theories Laws Methodologies Tools Cases Themes
Assessment of saturation Robustness of findings
Cases
Embedded units of anal.
Subsyst./ aspects
Abstraction mechanisms • Classification • ...
Could be model-driven
Fig. 5.2 Stylised concept map for writing about case study methodology. The figure depicts the concepts and how they are related for the writing of a propositional paper by Dekkers and Hicks (2019). This map facilitated the search for studies to provide arguments to the points being made. For example, for the assessmet of saturarion different methods were found in the literature. Also, it show logical connections between key concepts of the case study methodology as research method. This facilitated making a coherent argument in the paper.
TIP: AVOIDING DIVERGENCE OF RESEARCH TOPICS Two additional approaches to the iterative search strategy that help avoid divergence of the research topic, once a topic for a literature review is relatively well-defined (see Section 4.2), are described here: • The first approach to complement the iterative search strategy is mapping (sometimes, also called mind mapping) and modelling. Mapping concepts help understanding of how concepts, perspectives, theories, etc. found in literature are related to each other. An example is the concept map for a paper about saturation and the case study methodology (Dekkers and Hicks 2019). The initial map used for the writing is displayed in Figure 4.2. Mind mapping though different in its application is a similar technique, but caution should be applied; originally, this technique applied to reviews only aims at obtaining an overview of what is found in the literature. Mapping concepts and mind mapping do not equate to the formal modelling that has been presented in Section 4.4. • The second approach is discussion with peers and experts. Receiving formative feedback and engaging with other perspectives helps to advance insight that can used for further searching. Note that it may result in divergence, because peers and experts may have their own interpretations of scholarly knowledge and give feedback that can lead to digression as well. All feedback and shared perspectives should be considered on their merits for the topic of a literature review.
5.4 Keywords, Controlled Vocabulary and Database Search Strategies
5.4
159
Keywords, Controlled Vocabulary and Database Search Strategies
Different from the iterative search strategy, the structured approach of the keywords, controlled vocabulary and databases search strategy aims at finding all relevant sources in one go; in addition to or instead of search engines can be used. There could be iteration, which can be caused by the discovery of additional terms when reading and analysing retrieved studies. The process for this search strategy is found in Figure 5.3. It is often associated with protocol-driven literature reviews, such as the archetypes systematic literature review and systematic review, but not necessarily limited to these; this search strategy can also be used for narrative literature reviews albeit in a more iterative manner. The next subsections will describe how to define keywords for the search, how to use controlled vocabulary, and how to select appropriate databases and search engines.
5.4.1
Defining Keywords as Search Terms
The key elements of a search strategy are relevant search terms and appropriate discovery tools. This is essential whether conducting a narrative literature review or a protocol-driven review. Search terms should reflect the review question, including the scope of the review, types of information sought (study characteristics) and any date, language, publication status or geographical limits. Search terms will be a mixture of keywords and subject terms (the latter are also called controlled vocabulary when integrated in a search engine or database). The keywords and subject terms used to search, along with the database or discovery tool, determine which studies will be identified as relevant to the review question or research objective. In searching for (as opposed to finding) relevant studies, there is a direct correlation between the search terms and the outcomes of the search. In the case of a protocol-driven literature review, the search strategy strives to achieve maximum sensitivity limiting bias in the search process, to identify as many studies that meet the eligibility criteria as possible, whilst recognising possible time constraints and the need for specificity. A well-designed search strategy will help achieve this aim. It will combine the inclusion of all appropriate, related search terms reflecting the main concepts being examined in the review with search operators to ensure that no relevant studies are missed, using selected databases and structured searches to enable precision in retrieval. In terms of defining keywords, the starting point should be the concepts being examined in the review question. These may have been clarified using the format population-intervention-outcome, its variants or another format, derived from a scoping review, scoping study or initial searches; see Section 4.4 for more detail on the format population-intervention-outcome and its variants for review questions. An interesting work on this matter is Methley et al. (2014), who demonstrate that
160
5 Search Strategies for [Systematic] Literature Reviews
Setting Review Questions
Deriving Initial Keywords
Complementing Keywords
Selection of Databases
Retrieval of Studies Additional Keywords Analysis of Studies
Synthesis of Findings Fig. 5.3 Search strategy based on keywords and databases. The figures indicates initial keywords are derived from the review questions. These keywords are complemented to consider synonyms or other related subject terms for identifying similar publications. During the retrieval of studies and early stages of analysis it may become apparent that other keywords and search terms complement the earlier set of keywords and search terms. This will result in additional searches to identify and retrieve relevant studies.
when using either PICO (population–intervention-comparison-outcome), PICOS (population-intervention-comparison-outcome-study design) or SPIDER (sample, phenomenon of interest, design, evaluation, research type) as formats for the keyword search different studies are found; the formats also performed differently in terms of specificity and sensitivity. Therefore, caution should be exerted when developing review questions and the related concepts, because this may influence which studies are found, and consequently, influence the outcomes of a literature review.
5.4 Keywords, Controlled Vocabulary and Database Search Strategies
161
Keywords as search terms should represent the appropriate main concepts or elements of the review. Examples are concepts for the population (study participants, group in society, health condition, etc.), intervention (policy, treatment, etc.), study design (randomised controlled trials, anti-bullying programmes, etc.) and outcome measures. For generating an initial set of keywords, publications retrieved in initial searches are also useful for finding an initial set of keywords. Keywords can be identified from various parts of a document (title, abstract, methodology, discussion and conclusions) and at the point of searching, key search terms can be matched against these parts of a document or record. Doing so provides more specificity to a search; in a search which values sensitivity over specificity, key search terms might be matched against a whole document. The decision whether to aim for more specificity or sensitivity will depend on the topic of the review, the extent of relevant keywords and any time or resource constraints. In Figure 5.4 an example is provided for a hypothetical review; it shows that for the constructs in its review question multiple keywords are common in publications. To ensure that no relevant material is missed, keywords should be comprehensive including all possible related terms, e.g., schoolchildren, pupils, adolescents, ‘school-aged children.’ Note that different sources and disciplines may use different terminology. As depicted in Figure 5.4, for the constructs there are alternative terms that need to be used to find relevant studies. Thus, it is necessary to complement the constructs in a review question with keywords obtained from reading documents, consulting classification systems and searching for related terms. To make the search more effective, it is common to use keywords and terms in combination with Boolean operators and to reduce the number of search terms with other operators. For alternative terms and keywords the Boolean operator OR is used in a search to capture all relevant works; these Boolean operators are normally capitalised. Adding alternative related terms is not necessarily adding new concepts or ideas; it is merely aiming to capture all relevant information and not miss key works. This is an important distinction. The Boolean operator AND is used to find articles that mention all of the searched topics. Often, they must be used in conjunction as Figure 5.4 demonstrates. The Boolean operator NOT excludes a term Population
Intervention
Outcome
Childbirth
Doulas
Attitudes
· Childbirth OR · Labour
· Doulas OR · Lay support
· Attitudes OR · Views OR · Opinions
AND
AND
Fig. 5.4 Example of keywords based on population-intervention-outcome. This is an extension of the example in Figure 4.1. This hypothetical case for a systematic review shows that each concept in the format population—intervention-outcome has alternative keywords found in relevant studies. The use of Boolean operators aids in searching studies based on different combinations of these keywords.
162
5 Search Strategies for [Systematic] Literature Reviews
or concept from a search. This operator could be helpful when a search term or keyword has a different meaning in different domains. A case in point is the word ‘moderation’, which could mean less excessive or review of graded exams and results for teaching. Nevertheless, this operator should be used with caution as it may inadvertently exclude relevant references. The use of Boolean operators in a search for literature, particularly OR, AND and NOT, increases specificity. There could be a degree of iteration with regard to the keywords identified for the search; see Figure 5.3. This is mostly caused by the discovery of new keywords and terms after an initial search. Box 5.B gives a worked example of such an iteration. After the initial search it appeared that other strands of research also relevant to the topic were available. However, they were not fully identified during the first search. A modified search led to finding another set of relevant papers; the modification was the reduction of keywords and changes for the Boolean operators. The total of 71 papers were classified according to their content, with categories A and F relevant to the doctoral study. The analysis of these fifteen papers resulted in finding five relevant methods for measuring productivity, and also to the conclusion that none of them was underpinned by empirical evidence. However, when there is cause for another iteration of the search instigated by inadequacy of search terms, then the setting of the topic for the review and its review questions need to be evaluated. Well formulated scoping studies and search protocols, see Sections 4.5 and 5.9, should help avoid such a scenario. Thus, the need for an iteration is normally finding additional keywords and terms that were initially overlooked and could lead to more studies to be retrieved for analysis and synthesis; however, too many additional keywords and terms could indicate a poorly phrased topic and review question, or in the worst case, a poor understanding of scholarly knowledge related to the topic. TIP: SPELLING VARIATIONS A common consideration for keywords is spelling variation between US and UK English. Simply use both spellings with the OR operator or use a wildcard operator (see subsection ‘Search Operators’) to search both terms in a single search. TIP: TOO MANY KEYWORDS FOR A CONSTRUCT OR CONCEPT When there are many keywords for a specific construct in a search of databases, this might indicate that the topic is too broad. Such can be mitigated by: • Investigating why different keywords for the same concept or construct are used. Sometimes this is caused by different strands of research or by lower proficiencies of authors with regard to keywords in a domain. • Focusing on a subset of the specific keywords after investigating the delineation of keywords. This can be supported by ranking of returns for specific keywords; however, it is necessary to check whether a subset is sufficiently representative for the topic or a narrowing down of the topic is to be considered. • Writing a narrative review instead of a protocol-driven literature review. Such a narrative review could pave the way for both empirical studies and more specific systematic literature reviews and systematic reviews. The narrative review in this instance has similarities to a scoping study; see Sections 4.5 and 5.9.
5.4 Keywords, Controlled Vocabulary and Database Search Strategies
163
Box 5.B Worked Example for Search Strategy of Protocol-Driven Literature Review A doctoral student, following the combined degree programme of the Hochschule der Medien and the University of the West of Scotland in the period 2011–2014, was interested in the measurement of productivity when product mixes are dynamic across time intervals. Such could be caused by a company manufacturing different types of goods in one year compared to another year, which is called a changing output mix. Initial search • The review question was formulated as: ‘Is there any empirical evidence showing effectiveness of available measurement methods with regard to a changing output mix?’ • Example for keyword search used in Google Scholar: [“productivity measurement” OR “productivity measures” OR “productivity measure” OR “measuring productivity”] AND [“product-mix” OR “product-structure” OR “output-mix”] AND [“manufacturing” OR “product-level” OR “make-to-order”]. • Search period from 1961 to 2011: 1.180 results. • First 205 inspected on title and abstract: 57 articles remained. Modified search • Modified search to identify articles from other strands of research: [“productivity measurement” OR “productivity measures” OR “productivity measure” OR “measuring productivity”] AND [“product-mix” OR “product-structure” OR “output-mix” OR “make-to-order”]. • 1.800 search results: first 150 inspected (title and abstract). • 14 articles remained after filter. • Total: 71 relevant articles after two-stage selection process. Outcomes Cluster A. Productivity measurement approach considering product mix change presented
N 12
B. Impact of product mix change on productivity measurement described
22
C. Productivity measurement approach does not consider product mix change
21
D. Article does not contain any relevant content but contains other related content
5
E. No relevant content productivity/productivity measurement and product mix
9
F. Relevant measurement approach from another discipline than manufacturing
3
G. Information about impact product mix change on productivity/profitability/ performance
10
Total
71
164
5.4.2
5 Search Strategies for [Systematic] Literature Reviews
Search Operators
In addition to Boolean operators, there are several search operators used to give searches more specificity and sensitivity: • Phrase searching. Enclosing words in quotation marks instructs a database or search engine to retrieve records with words in a phrase. It is a common operator used in almost all discovery tools and usually represents an exact phrase search, in which records will only be retrieved that have the exact words in a specified order. There are a couple of notable exceptions. For example, in the Scopus database an exact phrase search uses braces (curly brackets); quotation marks are used for ‘loose’ phrases. Loose phrases are those that allow for spelling variations and plurals. Google Scholar requires quotation marks for exact word searches as well as phrase searching. See Table 5.2 for variations in search operators used in databases and platforms. When searching for a phrase as a list of words without using a phrase operator, most search engines will assume the Boolean operator AND between the words and retrieve all records with these words whether they are in a phrase or not. It can make a vast difference to the specificity of search results. As with all other search operators, a phrase search should only be used as appropriate as it will limit a search if used incorrectly. • Proximity searching. Adjacency or proximity operators are ways to retrieve records where terms appear near to one another. They are useful when searching across the full text of a document and for narrowing down search results. Proximity operators focus the search and work well when a phrase search would be too restrictive or is not appropriate. The most common way to express a proximity operator is with ADJ/n or NEAR/n where n is replaced by a number. The number represents the number or words between the terms. It is very effective when using large indexes as it provides specificity without the tight focus of a phrase or a field search. As a general rule of thumb, a value of 15 for n finds terms in the same sentence and a value of 50 finds terms in the same paragraph. The search rules for each database to understand how the search operators are used as each has specific rules for retrieval which will affect search results. Table 5.2 summarises the common search operators used in some key databases and search platforms. TIP: ADVICE FOR SEARCHING • Further advice is to check the database help for more complex operators used in advanced searching. • Consulting a librarian or information specialist will be helpful as they will have experience in complex searching using operators and filters.
Boolean operators
and, or, not
AND, OR, NOT
AND, OR, NOT
AND, OR, AND NOT
AND, OR, NOT (limited to 49 operators per search query)
Database or platform
Databases on EBSCOhost
Databases on Ovid®
Databases on ProQuest
Scopus
Web of Science
* Represents any number of characters)
“” Use quotation marks for phrae
* Represents any number of characters $n Represents a specified number n of characters “” Use quotation marks for an * Standard truncation where exact phrase asterisk replaces up to 5 characters after stem of word *n Defined truncation where n represents up to number of characters after stem (maximum:. 20) “” Use quotation marks for loose * Represents zero or more phrases allowing for plurals characters and spelling variants. Wildcards can be used {} Use braces for exact phrases. Wildcards cannot be used “” Wildcards, including * Represents any number of truncation, can be used within characters phrase search
“” Use quotation marks for phrase
Truncation
Phrase searching
?
?
??? (use multiple wildcards to represent multiple characters)
#
#
Wildcard—One character
$
?
?
?
Wildcard —Zero or one character
• NEAR/x (x represents number of words between each term)
• W/n: For terms within n words in any order • Pre/n: For terms within n words in a specific order
• Nx: For terms within x words in any order • Wx: For terms within x words in the order specified • ADJn: For terms within n words in any order. N can represent any number from 1–99 • NEAR/n or N/n: For terms within n words in any order • PRE/n or P/n: For terms within n words in a specific order
Proximity
Table 5.2 Search operators for five generic databases and platforms. This selection of generic databases shows which search operators are used by them. In addition, the table mentions some limiations with regard to their use.
5.4 Keywords, Controlled Vocabulary and Database Search Strategies 165
166
5.4.3
5 Search Strategies for [Systematic] Literature Reviews
Field Searching
‘Field searching’ provides specificity by matching combinations of search terms and search operators against fields in an indexed record. Records in a database for book chapters, journal articles, conference proceedings, etc. are indexed using fields and are given a field code. Fields are sections of a record—most commonly: • Author. • Title. • Abstract. • Subject heading (including MeSH [Medical Subject Headings]) • Source or publication title. • Author affiliation. • Full text. Using the ‘Advanced Search’ screen in databases will match search terms against specific fields or combinations of fields providing more precision and relevance to results. It is possible to perform multiple field searches in a single search using combined fields in a guided search (see Figure 5.5) or by using combined field codes in a search string. The latter is more efficient, especially if you are doing a lengthy complex search. For example, in EBSCOhost, using the default search use the search string TI “systematic review” OR AB “systematic review” OR KW “systematic review” to match search terms against the article title, abstract and author-supplied keywords. In Scopus TITLE-ABS-KEY (“systematic review”) will match search terms against the article title, abstract and keyword fields. In Scopus, the keyword field includes author supplied keywords, controlled index terms, chemical and trade names. This means that settings and outcomes of searches across different databases may differ in detail.
Fig. 5.5 Example of multiple field (title, abstract, keyword) phrase search on EBSCOhost using the guided style. Use the + to the right of the search box to add more rows. Usually, multiple rows can be added to build a search as an alternative to search strings or command searches. A note of caution; not all records or documents in a database use every field so searching in specific fields could mean you fail to retrieve relevant records. Using an OR operator and searching multiple fields can mitigate against this. Note: Reproduced with the permission of EBSCOhost.
5.4 Keywords, Controlled Vocabulary and Database Search Strategies
5.4.4
167
Controlled Vocabulary
Another approach to searching is to use controlled vocabulary; sometimes, this is also called thesaurus searching. These are standardised subject headings that are assigned to publication records by indexers. Controlled terms connect related literature and provide consistency and control in a search. The searcher can have confidence that retrieved records are relevant when using this approach though it is important for the researcher to look at the thesaurus terms and understand how they are being applied in specific databases and to understand how narrower and related terms are handled. There is still some level of interpretation in applying index terms to a record so thorough searches will combine both thesaurus and keyword approaches. In some subject areas, controlled vocabulary is essential. For example, in trying to retrieve non-textual sources such as images or music. Research (e.g., Gross et al. 2015) has shown that there are a number of subject disciplines where controlled vocabulary should be used in searching, including branches of engineering, healthcare, medicine and nursing. A large number of journals assign pre-defined keywords to papers to facilitate discovery by readers; a case in point is the JEL classification system1 for economics, which contains standardised codes that are used as substitute for keywords. Also, there are usually author-assigned keywords in the document, which may be useful if topics are not falling into more widely-used keywords. Therefore, controlled vocabulary, using standardised subject headings, plays an essential role for the identification of relevant studies when conducting a search for a literature review, particularly for specific disciplines with economics, engineering, healthcare, medicine and nursing among them. These controlled vocabularies are often used in databases and search engines relevant to specific disciplines. For example, MeSH (Medical Subject Headings) are used in medical and healthcare databases, and are required in Cochrane Reviews and Campbell Systematic Reviews. MeSH headings are produced by the National Library of Medicine annually. They are arranged in a hierarchical structure. MeSH was conceived to enable the effective retrieval of medical literature, in recognition that the language in the field is complex. Since MeSH headings in some databases differ, e.g., in MEDLINE and in CINAHL, as stated elsewhere in this chapter, search strategies must be adapted to each database. Many databases have the function to ‘explode’ MeSH and other subject headings, which means that narrower or more specific headings are also included in the search. Within databases, subject terms are listed in a thesaurus or in a subject tree. This can be useful to ensure that all potentially relevant studies are found, but it may also result in high volumes of less relevant studies being retrieved. A look at the hierarchical index in the database will aid the decision on whether it is appropriate to explode a subject term in a
JEL is the abbreviation of the ‘Journal of Economics Literature’, published by the American Economic Association, which launched this coding system.
1
168
5 Search Strategies for [Systematic] Literature Reviews
search. A related feature is the ‘Major Topic’, ‘Focus’ or ‘Major Concept’ option; this feature matches subject headings against articles in which a specific heading is defined as a major focus of the article. This will prioritise precision (specificity) over comprehensiveness. It is therefore not recommended for Cochrane Reviews, but it may be helpful in other reviews. With controlled vocabulary being a feature of specific databases and search engines, and regularly used in specific disciplines, care must be taken that this feature does not result in restricting the search. However, not all subject databases have controlled vocabulary but for those that do, it is strongly recommended to use these when conducting thorough reviews. Systematic reviews should employ both keyword searching and controlled vocabulary searching as required. Some other examples of controlled vocabularies are the APA Thesaurus of Psychological Index Terms, CINAHL Subject Headings (including some MeSH), Business Thesaurus (subset of EBSCO Comprehensive Subject Index) and INSPEC Controlled Index. Therefore, the selection of keywords and subject terms for a search as part of a literature review could be related to which databases and search engines are consulted, because controlled vocabulary may result in a different set of retrieved studies than keyword searching; hence, the recommendation to use both keyword and controlled vocabulary searching. A worked example in Figure 5.6 for the EBSCO Business Thesaurus demonstrates a search exploring organisational culture, absenteeism and employee morale; the search uses a combination of controlled vocabulary, proximity, default keyword and selected field searches. Separate searches are conducted with keywords, controlled vocabulary and operators to facilitate the search and keep an overview of results. Whereas individual keyword or subject term searches yield high numbers of potentially relevant publications, the combinations of the individual searches have a lower but more specific output. Though not shown here, the results can be limited to specific source types (academic journal articles, trade publications, etc.), peer-reviewed material or by date as required. The overview supports also reviewing the degree of saturation for the search strategy, discussed in Section 5.6. This example shows how the different approaches to searching in databases with controlled vocabulary can be used to set out an effective search strategy.
5.4.5
Selecting Databases
When selecting databases, the aim is to find all relevant publications for the topic and related review question, including its corollaries. This means that two main aspects should be taken into consideration when selecting a database: its coverage and the accuracy of its data. Coverage relates to the extent to which the sources indexed by a database cover the written scholarly literature in a field. As such, coverage must not be biased towards particular countries, languages or publishers
5.4 Keywords, Controlled Vocabulary and Database Search Strategies
169
Fig. 5.6 Example of search with controlled vocabulary using EBSCO Business Thesaurus. The search uses default (S2, S9); thesaurus (S1, S4, S7), proximity (S5) and all-text (S8) searches. Thesaurus terms are shown with the prefix DE, the proximity operator is indicated by N and the operator for an all-text search is TX. Add controlled terms using the Thesaurus option on the advanced search screen, choosing subject terms, and selected narrower, broader or related terms as required. Combine searches using Boolean operators for sensitivity and specificity (S3, S6, S10, S11). Search results are shown in square brackets after each description. Search 1 (S1) is a thesaurus search using the main term corporate culture and related terms selected from the hierarchical subject index [111, 879]. S2 is a phrase search using the default field search [65, 196]. The default search in EBSCO databases, matches search terms against the author, title, abstract, keyword and subject fields. S3 combines these using the Boolean OR option [121, 315]. S5 is a proximity search, retrieving records where the terms absence and work appear within 15 words of each other. S6 combines the proximity search and thesaurus search S4 [5, 554]. S7 is a thesaurus search on job satisfaction [43, 350]. S8 is an All-Text search for terms related to job satisfaction and employee morale [70, 424]. An all-text search matches terms to all fields, including bibliographic information. S9 is a default search using the same search terms, included for comparison with the all-text search in S8 [28, 893]. S10 combines the thesaurus (S7) and all-text (S8) search [88, 083]. S11 is a combined results set [141].
(Neuhaus and Daniel 2008, p. 204). It could also include which type of sources are available through search mechanisms and records of publications that are held; see Section 5.2 for the types of sources that may be considered for a literature review. As regards the accuracy of data, this refers to the absence of inconsistencies and
170
5 Search Strategies for [Systematic] Literature Reviews
erroneous spellings of author names, or a lack of standardisation with respect to journal titles and affiliations (ibid., 2008, p. 205). Furthermore, Gusenbauer and Haddaway (2020, p. 212) distinguish between crawler-based search engines, i.e., platforms that search the Internet for sources, such as Google Scholar and Microsoft Academic Search, database providers, for example, ProQuest and EBSCOhost, publisher platforms, and major abstract and indexing databases, for instance Scopus; the category of major abstract and indexing databases was not mentioned in Gusenbauer and Haddaway (ibid.). All of these have different characteristics in terms of coverage and accuracy of the data they use. Thus, the selection of an appropriate database (or databases) depends on the coverage in terms of sources, their relevance and their accuracy of bibliographical details in addition to supporting the search strategy with keywords and subject terms.
5.4.6
Using Databases and Search Engines
With regard to searching for publications and studies for a literature review, there are generic databases and search engines; these have been enumerated in Appendix A. Among them are Scopus (Elsevier), Web of Science (Clarivate Analytics) and major platforms such as EBSCOhost and Proquest. These large multidisciplinary databases and platforms provide access to publications and studies for a broad range of disciplines. However, using these may result in the search initially finding many publications, from which few are relevant to the topic. Thus, the specificity of using these generic platforms and large multidisciplinary databases could be low. This can be mitigated by having a defined search strategy and a clear understanding of what cross-searching databases in platforms such as ProQuest or EBSCOhost and using large indexes will achieve. These can be useful in initial scoping searches to help identify what specific databases may be of most use. For example, both Proquest and EBSCOhost have a database filter which shows the number of retrieved documents in each database matching the search terms. Web of Science and Scopus also have features which analyse search results by author, source, publication year, document type and subject area. This can be useful in initial searches to help identify useful sources, subject areas (and therefore subject-specific databases) and potentially key authors and works. There are also specific databases for disciplines; these have been listed in Appendix A, too. Examples are MEDLINE for medicine and PsycINFO for psychology. Also, for other disciplines, there are recommendations which databases to use; a case in point is Schryen (2015, p. 299), who recommends AIS Electronic Library and the digital library of IFIP (International Federation for Information Processing), but also the Web of Science and other search engines for reviews into information systems. The advantage of discipline-specific or specialist databases is that they give direct access to relevant journals and other sources, thus increasing the specificity of a search. However, some studies may have appeared in other
5.4 Keywords, Controlled Vocabulary and Database Search Strategies
171
sources, which are not directly discovered through discipline-specific and specialist databases and search engines. Furthermore, some of these databases, such as MEDLINE, use controlled vocabulary; see Subsection 5.4.4. This may ease the search for relevant studies and other sources, since they are already classified. It also depends on the newness of a topic; the newer it is, the less likely it is captured by keywords in a controlled vocabulary. Thus, discipline-specific databases and search engines may have a higher specificity than generic databases and search engines and the search could be supported by some of these by controlled vocabulary or other thesauri. NOTE: USING GOOGLE SCHOLAR Particularly, Google Scholar has drawn attention as being a potentially suitable search engine for literature reviews. Some are of the opinion, it is. Aguillo (2012, p. 348) draws attention to Google Scholar containing links to informal material, draft papers, unpublished reports or academic handouts. This makes it more difficult a source for reviews that depend on data they import from a search engine, as such will be the case for bibliometric analysis (see Section 9.4 for this method). Contrastingly, Harzing and van der Wal (2008, p. 72) find that Google Scholar is suitable for the same purpose in business and management studies, because it provides better coverage. Gusenbauer and Haddaway (2020, p. 208) find that Google Scholar does not allow queries to be reproducible. The Cochrane Collaboration also caution researchers on the disadvantages of Google Scholar as a discovery tool, recognising issues of duplicate citations, lack of transparency and difficulties of reproducing searches. More positively, they cite evidence by Levay et al. (2016) that where researchers do not have access to either Scopus or Web of Science, Google Scholar has sufficient citations to make it a viable alternative. Therefore, there are contrasting views on the suitability of Google Scholar as a search engine; its usefulness may therefore depend on the perspectives of specific disciplines and topics. TIP: AT LEAST TWO DATABASES It is advised that at least two databases are used in the search strategy, according to Green et al. (2006, p. 107) and recently reiterated by Harari et al. (2020, pp. 103,377/7–8). There is no single database likely to index all journals and source types relevant to a review question so it will be necessary to search across at least two databases. It should be noted that different databases need different operationalisation of the search strategy due to slight differences in functionality and in thesauri if using subject headings. For example, Wilczynski and Haynes (2007) show how to optimise the search strategy for EMBASE but this will not be directly transferable to another database. The principles of searching however are similar so a good understanding of the functions of key search operators (truncation, wildcard and Boolean) will help. This means that using two databases comes also along with minor differences in searching, which normally should be reported in a literature review.
172
5.4.7
5 Search Strategies for [Systematic] Literature Reviews
On Using Publishers’ Databases
It has become also custom to use publishers’ databases. In order to retrieve almost all or as many as possible relevant studies, it is necessary to have access to all search engines used by publishers. This is almost impossible because all relevant publishers should be known to the researchers conducting such a search strategy. Furthermore, Neuhaus and Daniel (2008, p. 204) point out that the search strategy should not be biased towards particular publishers, among other sources of bias they mention. Finally, Harari et al. (2020, p. 103,377/8) find that publisher’s databases perform poorly, and thus, should be avoided. Therefore, it is strongly advised not to consult publishers’ databases and search engines, but instead to use generic databases and search engines. Moreover, some undertaking literature reviews confuse generic versions of specific databases, particularly those of publishers. An example is ScienceDirect versus Scopus. For example, Marangunić and Granić (2015) use Sciencedirect and six other generic databases and search engines. However, ScienceDirect returns only publication in journals published by Elsevier, whereas Scopus is the large multidisciplinary abstracting and indexing tool produced by Elsevier containing not only Elsevier’s journals but those of other academic publishers. There are many others seemingly unaware of this, with Bernardo et al. (2015) and de la Torre Díez et al. (2016) being cases in point. Therefore, authors should check thoroughly the suitability and coverage of databases and search engines before using them.
5.5
Other Search Strategies
In addition to the iterative search strategy and a search strategy using keywords, controlled vocabulary and databases, there are also other search strategies; among them are: hand searching, snowballing, backward and forward searching, and citation pearl growing; in addition the use of expert panels will be briefly addressed. Note that some of the methods have overlaps and also go by different labels. The next subsections will detail those along with some practical advice.
5.5.1
Hand Searching
The first additional search strategy is called hand searching. It is a process involving the manual checking of complete journal issues or conference proceedings, inspecting individial entries in specific databases to identify studies and reports relevant to a review. It is akin to manually scanning selected journals from cover to cover, page-for-page for relevant articles and other writing, whether done literally
5.5 Other Search Strategies
173
manually or electronically. Hand searching is a methodical process of searching journal contents page by page (and, by hand) including articles, editorials, letters from readers, etc., to identify relevant studies. Since not all reports or studies will be included in database indexes, particularly those that are very recent, handsearching provides a method of retrieving such content. Published reports or studies may not be easily retrieved in database searches if key terms in abstracts or titles do not easily identify such reports or studies. It may also be useful for researchers whose institutions have access to subscribed journals but who do not have access to databases indexing those journals. It is also appropriate when needing to find material that may pre-date the indexed content in databases. It may be necessary to consider hand searching when relevant studies and reports are harder to find through keyword and database searching. This could be caused by the diversity of subject terms, indexing and methods used for studies of any kind. Therefore, hand searching is an alternative method, but could intense in terms of effort, when keywords, controlled vocabulary and databases search strategies are more difficult. Some have looked at the effectiveness of handsearching. For example, a Cochrane Methodology Review (Hopewell et al. 2007) compared handsearching with database searches to establish the effectiveness of these methods in retrieving randomised controlled trials. Handsearching retrieved 92–100% of relevant reports. Results for database searches varied depending on the databases used and the sensitivity of the search strategy. Well-constructed complex search strategies retrieved 82% of relevant reports. Thus, handsearching is an effective search strategy with a high or relatively high yield. Thorough handsearching will involve looking in articles, abstracts, news columns, editorials or even letters. It needs to be carried out by an experienced researcher and necessitates an initial identification of the relevant conference proceedings or journals relevant to the review. Cochrane Reviews do not mandate handsearching to supplement sensitive and well-designed database searches, but it will be valuable for studies of health interventions published in languages other than English and those published before 1991, as Hopewell et al.’s (2007) study showed. The Campbell Collaboration suggest using handsearching in a more targeted way by handsearching only the most recent issues of journals in which a large number of eligible studies has been found, which will not yet be indexed in any databases. The decision on whether to include handsearching in a search strategy will depend on: • The scope of the review. • The date range of the published content that is relevant to the review question. • When the review is including material in languages other than English. The first step will be to identify key journals and conference proceedings that are most likely to publish material relevant to a review. If these are not indexed in any databases or in databases that can be accessed, then handsearching may be required.
174
5 Search Strategies for [Systematic] Literature Reviews
TIP: USE STUDIES INDICATING RELEVANT JOURNALS The effectiveness of hand searching depends on whether journals that contain relevant studies are used; sometimes, the scope and relevance have been object of study. A case in point is Linton and Thongpapanl (2004), who look at the ranking of journals in the domain of technology and innovation management. Their analysis differs at points from other rankings that are in use, but provides guidance for hand searching in this domain.
5.5.2
Snowballing
A second alternative search strategy is called snowballing; it also known as citation chaining and reference chaining. This method is a way of finding literature by using a key document on the subject of the reviews as starting point, or a set of key works. Nowadays, more often than not, it refers to looking at all documents that were discovered when searching. By both reading the text and consulting the list of references of this document, other potentially relevant articles can be identified. The bibliographies of these new publications are used to find yet more relevant titles. The advantage of snowballing is that it can find a lot of literature about a subject quickly and relatively easily. Particularly, it can help to spot sources that were used early on to describe a phenomenon. However, it can also lead to finding incongruences in citing of literature as demonstrated in the example in Box 3.B. The disadvantage of this method is that searching happens retrospectively, so each source will be older than the previous one. Therefore, snowballing is an effective search strategy for finding related articles but could end up looking at older publications. For some disciplines the technique of snowballing is recommended. For example, Webster and Watson (2002, p. xvi), though they call it ‘going backward’ and Wohlin and Prikladnicki (2013) recommend this search strategy. Note that both also differentiate between backward and forward snowballing, in which the latter is identifying relevant articles that have cited a particular study; the latter are by definition more recent studies than the ones. Forward snowballing requires the use of databases and search engines (Rewhorn 2018, p. 144). Snowballing in both directions also goes by other names, such as backward and forward searching, but in this book is seen as distinct from the latter search strategy.
5.5.3
Backward and Forward Searching
Another strategy for searching is called backward and forward searching, akin to backward and forward snowballing but more extensive. The process of backward searching in literature can be divided into three specific steps: searching backward for references, backward searching for authors’ bibliographies and searching for previously used keywords. Backward searching for references, the first step,
5.5 Other Search Strategies
175
reviews the references of the articles retrieved from a keyword search. This is complemented by backward searching for authors’ bibliographies to identify what relevant materials authors have published prior to a specific study. The third step of backwards searching is looking for keywords that were used previously; this could initiate further searching using these associated keywords or subject terms. Forward searching in literature can be divided into two specific steps: searching forward for references and forward searching for authors’ bibliographies. The first step of forward searching is looking for studies and works that have cited a specific article. The second step, forward searching for authors’ bibliographies, refers to reviewing what the authors have published following the specific article. Scopus and Cited Reference Search (within the Web of Science database) are suitable databases for finding citations of specific studies. If repeated, this process may start looking like the iterative search strategy described in Section 5.3. Thus, the aim of backward and forward searching is to find all relevant studies related to a specific publication or a set of publications by looking at sources that have been consulted and studies that have used these specific publications; in addition, during searching backward for references also additional keywords and subject terms could be identified that trigger further searches for other publications not yet identified. An example of using backward and forward searching is found in MacSuga-Gage and Simonsen (2015). Their systematic literature review examines studies into the instructional practice for classrooms to provide all students with frequent and varied opportunities to respond, on which feedback is given; the commentaries can take the form of being provided during direct instruction by a teacher or by peers, or facilitated by mediated interfaces such as computer games (ibid., p. 212). In their method for the retrieval of studies, backward and forward searching complemented a database search which had yielded seven eligible studies for the analysis. Backward and forward searching in 100 abstracts (compared to 427 in the database search) and after selection eight more articles for analysis; their description indicates that actual method used is closer to backward and forward snowballing. This example shows that this method may have a high specificity and sensitivity, though first relevant studies need to be identified.
5.5.4
Root and Branch Searches
A root and branch search, the fifth alternative strategy, typically occurs after a large pool of potential publications obtained from a keyword and database search has received an initial review. As described by Swift and Wampold (2018, p. 359), through the initial review a smaller subset of likely relevant articles can be identified. These studies represent the ‘trunk,’ or the core set of studies that inform the topic of a literature review. Each article included in a reference list of any of the articles that belong to the ‘trunk’ stands for the ‘roots’; the roots are seen as the
176
5 Search Strategies for [Systematic] Literature Reviews
foundation on which the body of knowledge is built. Each paper that cites any of the ‘trunk’ articles constitutes the ‘branches’; these branches provide the additional evidence how the core studies from the trunk were used. Publications found in the root and branch search should be reviewed to identify additional studies that were missed through the keyword and database search. However, this strategy may be more difficult to execute when the body of knowledge has accumulated over time. Principally, the method for the root and branch search does not fundamentally differ from snowballing.
5.5.5
Citation Pearl Growing
Citation pearl growing, the fourth alternative search strategy, is a technique used to ensure all relevant articles are included. Pearl growing involves identifying a primary article that meets the inclusion criteria for the review; see also Section 5.3. From this primary article the researcher works backwards to find all the articles cited in the bibliography and checks them for eligibility for inclusion in the review. The researcher then works forwards to search for any articles that have cited the primary article. This procedure of searching is akin to snowballing, albeit that in this case only one document serves as starting point. The original document can also be used to find other keywords, subject terms and themes for subsequent searches. With regard to its effectiveness there are conflicting accounts. For example, Papaioannou et al. (2010, p. 118) suspended pearl growing, because identified pearls were dispersed across numerous databases, with databases indexing only few pearls. Contrastingly, Schlosser et al. (2006) provide a detailed description of this method for searching. They distinguish between classical2 pearl growing, briefly described in Section 5.3, and comprehensive pearl growing; the latter refers to the searcher beginning with a compilation of studies from a relevant narrative review or a topical survey.3 Thus, citation pearl growing can be seen as similar to snowballing, whether it is classical pearl growing starting with one study or comprehensive pearl growing with multiple studies as point of departure, and also has some resemblance to backward and forward searching.
2 Actually, Schlosser et al. (2006, p. 571 ff.) call it ‘traditional pearl growing.’ The term ‘classical’ pearl growing has been adopted to ensure consistency throughout the book. 3 The wording ‘topical bibliography’ by Schlosser et al. (2006, p. 574) has been replaced with ‘topical survey’ in order to connect better to the terminology in this book.
5.5 Other Search Strategies
5.5.6
177
Expert Panels
Finally, experts can be consulted to provide additional studies. This can take the form of contacting experts. In the study by Swift et al. (2018, p. 1930) on client preference in psychotherapy the individual consultations of experts contributed two out of 53 studies, although details are not provided how many and how experts were approached. Also, Ogilvie (2007, pp. 1204/1–2) invited experts to put forward studies of interest for their review into interventions to promote walking in individuals and populations. Thus, the consultation of experts, individually or as expert panel, will yield additional publications not discovered during the keyword and database search strategy. NOTE: ALTERNATIVE SEARCH STRATEGIES OFTEN MIXED UP In addition to variety in the nomenclature for the second to fifth alternative search strategies, authors often mistakenly label one for the other. In this book, the descriptions of these labels have been applied consistently; Table 5.3 captures the approaches and differences between the methods.
Table 5.3 Overview of methods for four closely related alternative search strategies. The often subtle differences between four of the alternative search strategies have been displayed in this table. They have in common looking at references cited in texts and bibliographies of studies and at studies cited initially found works. In this sense, snowballing is the basic approach with the other three search strategies being variations. Search strategy Snowballing
Backward and forward searching
Root and branch searches Citation pearl growing
Brief description of method • Backward snowballing: searching for relevant studies by reading text and examining list of references based on retrieved studies • Foward snowballing: searching for relevant studies by looking at studies that cite retrieved works based on retrieved studies • Sometimes when studies mention snowballing, it may only refer to backward snowballing • Distinction between backward and forward searching identical to snowballing • Examining bibliographies of authors • Searching for previously used keywords • After keyword and database search strategy, identifying set of key articles (called ‘trunk’) • Search for ‘roots’ and ‘branches identical to snowballing • Classical pearl growing: only one document identified in search leads to snowballing • Comprehensive pearl growing: set of relevant studies found in initial search leads to snowballing (akin to root and branch searches; in contrast to root and branch searches, initial search strategy not set)
178
5.6
5 Search Strategies for [Systematic] Literature Reviews
Enhancing Effectiveness of Search Strategies
Although the keyword and databases search strategy has a structured approach, the difficulty is whether all relevant studies are retrieved. Therefore, the specificity and sensitivity of a search strategy need to be determined, and if necessary, the search needs to be enhanced. This leads to considering saturation, trading off specificity and sensitivity, complementary search strategies and the role of expert panels; these topics are found in the next subsections.
5.6.1
Determining Saturation When Searching
To determine whether all or nearly all studies have been retrieved the principle of saturation can be used for the search strategy. Saturation applied to searching for studies indicates that further iterations of a search will yield no more or very few studies and works of relevance. Particularly this applies to keywords, controlled vocabulary and databases search strategies, because the iterative search strategy leads naturally to discovery during each cycle, even though the benefits of such decrease over time. For example, saturation will be proven when adding an appropriate search engine or database leads to finding only few studies in addition to studies already retrieved. In the same vein, adding new relevant subject terms or keywords to the search strategy should indicate where these could result in additional studies and if these are relatively low in number, then saturation is achieved. To find the degree of saturation it is best to undertake the actual search in stages and for each of these stages to separately record the yield. The lesser the number of studies found in an iteration, the higher the degree of saturation. When the degree of saturation for studies and works becomes higher, then the search can be stopped. In the example of Box 5.B this is the case; the second search led to only 14 articles compared to the 57 in the first search. Therefore, the likelihood that a third search will discover more relevant articles is relatively low, while the efforts through further relaxing the subject terms and keywords would increase. In the case of Box 5.B the second search did not identify any new keywords or subject terms. A further instance is presented in Box 5.C; this example shows that through additional searches in databases new relevant publications were discovered. However, in this instance the third database used (EBSCOhost) only yielded one more publication to be included in the set, so adding a fourth database would have unlikely resulted in identifying additional studies of relevance. Thus, saturation for the retrieval of studies can be proven by recording how many new relevant studies are added in each iteration, and when this number is relatively low the actual search can be stopped. Though often referred to in publications, the degree of duplication is an indicator, but not an absolute measure for saturation; duplication refers to finding the same study or studies in different databases, searches with different or modified
5.6 Enhancing Effectiveness of Search Strategies
179
combinations of keywords, or through alternative search strategies. Regularly, the reporting of protocol-driven literature reviews by using the PRISMA (Preferred Reporting Items for Systematic reviews and Meta-Analyses) format refers to it; see Section 13.4. In this format, it is disclosed how many retrieved studies were removed due to duplication. This indicates the specificity of the search to some degree, but it does not indicate whether broadening subject terms and keywords would have resulted in discovering more relevant studies. Thus, at best the degree of duplication indicates in some way whether all relevant studies are found but is logically not a direct measure of saturation for the retrieval of studies. Related to the saturation for the retrieval of studies is the saturation for synthesis. Finding additional studies does not necessarily mean that new findings will emerge. However, findings are the outcomes of the phases of analysis and synthesis, which are positioned after the actual search and selection of studies. This means that only after analysis and synthesis of relevant studies can one assess whether saturation for synthesis has been achieved. Also, it should be noted that adding relevant studies following tests for saturation for the retrieval of studies will lead at least to broadening the evidence base; for example, if more studies confirm the same finding, then the quality of the evidence improves. Again, for these reasons, saturation for synthesis can only be determined in hindsight and is more difficult to integrate into the search strategy. Box 5.C Example of Saturation for Retrieval of Studies An example of mentioning saturation for retrieval of studies is found in Salgado and Dekkers (2018, pp. 905–6). After explaining their search strategy using three databases and a set of keywords, they present a table with how publications were found; see below. They (ibid., p. 906) also remark that only one additional study was found by using the third database: EBSCOhost. The table also shows how many relevant studies were found through snowballing (14) and serendipitous searching (1). Finally, there is an indication of how additional subject terms yielded how many publications, albeit not explicitly discussed in the review. Keyword Lean product development
EBSCOhost 14
Google Scholar 104
Scopus 93
Total 141
Lean engineering
10
75
43
83
Lean product and process dev
7
38
25
39
Subtotal protocol-driven
16
145
111
189
Snowballing (backwards) Serendipitous searching
17 1
Total
207
180
5.6.2
5 Search Strategies for [Systematic] Literature Reviews
Trading Off Specificity and Sensitivity
Another way of enhancing search strategies is by explicitly trading off specificity and sensitivity. This has specifically been looked at for databases. Table 5.4 provides an overview of studies for this type of optimisation. An example is the study by Wilczynski and Haynes (2007), looking into optimal search strategies for Embase. These studies show how keywords can be used in specific databases to find relevant studies; these sets of keywords together with Boolean operators are also called search filters. Furthermore, comparisons of databases have been made with regard to specificity and sensitivity. A case in point is Rogers et al. (2018), who compare search strategies in CINAHL, Embase, MEDLINE, and PsycINFO. For instance, they (ibid., p. 585) find that Embase and MEDLINE offer little benefit for locating qualitative dementia studies if CINAHL and PsycINFO are also searched. Another example is the work by Tanon et al. (2010) into developing optimal search strategies for CINAHL, Embase and MEDLINE. They test strategies using different keywords to see how effective a search could be. In this respect, Bardia et al. (2006, p. 204) note that their search was more cumbersome than necessary because the indexing of relevant papers was not consistently using the term ‘complementary medicine’. Moreover, Beynon et al. (2013, p. 13) note that none of the search filters they looked into sufficiently combined high sensitivity, required for systematic reviews, with a reasonable degree of specificity (precision). Again, this indicates that searches are also strongly dependent on the use of keywords, for which approaches have been outlined in Section 5.4, and for specific search filters defined in a protocol. Thus, these searches balancing specificity and sensitivity found in studies into specific databases measure the yield of search strategies with notional efforts needed for searching databases. NOTE: ‘OBJECTIVE APPROACH’ TO KEYWORDS RESULTS IN HIGHER YIELD WHEN NO SEARCH FILTERS AVAILABLE There are two approaches to finding keywords, which are called the conceptual and the objective approach, when the topic of a review or part of it have no available search filters in literature. The first one is based on using different sources to identify key terms and their synonyms related to the concepts of the research question; this is similar to how the identification of keywords and subject terms is described in Section 5.4. The objective approach, as described by Hausner et al. (2012, pp. 19/4–5), consists of the following steps: • Generation of a test set of studies that represent the topic to be investigated (for example, relevant references from systematic reviews). • Division of the test set into development and a set for the later validation of the search strategy (and search filters). • Development of the search strategy with references from the development set (analysing information derived from the titles, abstracts, and subject headings of relevant references). • Validation of the search strategy (checking whether references from the set for validations can be identified with the search strategy developed beforehand).
5.6 Enhancing Effectiveness of Search Strategies
181
Table 5.4 Examples of studies on search filters related to databases. Each of the studies looks into how to meet specificity (precision) and sensitivity criteria when conducting a search for relevant works. They are often related to specific domains or specific type of studies (such as observational studies). Some of the studies in the table have developed search filters for a range of databases. Database
Studies
Domain
CINAHL
Lokker et al. (2010) Mc Elhinney et al. (2016)
• Knowledge translation (nursing) • Child protection issues related to pregnant women • Palliative care • Healthcare • Healthcare • Therapy studies • Economic evaluations (healthcare) • Acute kidney injury • Observational studies • Palliative care • General
Embase
MEDLINE
PsycINFO
Rietjens et al. (2019) Rosumeck et al. (2020) Wilczynski et al. (2007) Wong et al. (2006) Glanville et al. (2009) Hildebrand et al. (2014) Li et al. (2019) Rietjens et al. (2019) Wilczynski and Haynes (2007) Boluyt et al. (2008) Glanville et al. (2009) Hildebrand et al. (2014) Li et al. (2019) Mc Elhinney et al. (2016) Rietjens et al. (2019) Zhang et al. (2006) Eady et al. (2008) Rietjens et al. (2019) Mc Elhinney et al. (2016) Rosumeck et al. (2020)
• Randomised controlled trials • Palliative care • Child protection issues related to pregnant women • Observational studies • Acute kidney injury • Economic evaluations (healthcare) • Pediatrics • Therapy studies • Palliative care • Child protection issues related to pregnant women • Healthcare
In a later study, they (Hausner et al. 2016, p. 122) compare the conceptual approach with the objective approach to find that the latter has a higher sensitivity and both have a similar specificity. The objective approach has similarities to scoping reviews and scoping studies for setting out search strategies, see Section 5.8, and are congruent with the iterations for the keywords, controlled vocabulary and databases search strategy, mentioned in Section 5.4. NOTE: ONLY OPTIMISING SEARCH STRATEGIES IN DATABASES INSUFFICIENT Relevo (2012, p. S29) notes that some search filters may be effective but not necessarily all do warrant finding all relevant studies. She advises to complement the filters based on controlled vocabulary with additional keywords used in texts.
182
5.6.3
5 Search Strategies for [Systematic] Literature Reviews
Complementary Search Strategies
The final note in the previous subsection about the precision of search filters brings in a third approach to enhance the effectiveness of search strategies, which is the use of complementary search strategies. In this respect Greenhalgh and Peacock (2005) demonstrate that only about 30% of relevant sources were found through searching databases and hand searching. For the other 70%, snowballing complimented with reference tracking accounted for ca. 51% of the sources, with the remainder found through personal knowledge and serendipitous searching. Their findings indicate that relying on one search strategy may yield limited results, particularly with regard to searching databases. Others have found similar results when investigating the effectiveness of search strategies, with Harari et al. (2020, pp. 103,377/8–9) for applied psychology and Savoie et al. (2003, p. 176) for randomised clinical trials being cases in point. Moreover, Papaioannou et al. (2010, p. 121) and Pearson et al. (2011, p. 305) suggest that the use of complimentary search strategies is a more effective use of resources for a review than increasing the sensitivity of a search; the latter option may result in more studies and sources to be evaluated on their relevance compared to another search strategy. For some disciplines this is even explicitly advised. For example, Webster and Watson (2002, p. xvi) recommend snowballing4 after concluding a database search when conducting a review in the domain of information systems. Particularly, their advice for snowballing has been picked up by others, such as Wohlin and Prikladnicki (2013), for guidelines regarding systematic literature reviews in this domain. Furthermore, others, for instance Stevinson and Lawlor (2004, pp. 230–1), find that searching in generic databases is best complemented with searching specialist databases, snowballing and contacting experts. The points made for the saturation for the retrieval of studies also apply to the search strategies; thus, the effectiveness of the complete search strategy can be proven by how many relevant studies and sources each additional search strategy yields. These considerations indicate that a single search strategy may be ineffective and inadequate, implying that those undertaking a review should use complementary search strategies to ensure finding as many relevant publications as possible and report the degree of saturation for the retrieval of studies.
4
Webster and Watson (2002, p. xvi) call it forward searching and backward searching rather than snowballing. See Table 5.3 for the nomenclature used in the book for search strategies.
5.6 Enhancing Effectiveness of Search Strategies
183
Box 5.D Example of Staged Systematic Literature Review Combined with Delphi Study The systematic literature review (Hinckeldeyn et al. 2015) preceded an empirical study into productivity of product design and engineering processes (this can also be called ‘new product development’). It covered a three-staged literature review, for which the third stage was complemented with a Delphi study. I. Umbrella review First, an umbrella review was carried out, called the first step in this publication. This yielded five reviews into productivity for product design and engineering processes. It turned out that these reviews did not address productivity thoroughly in addition to being outdated. II. Narrative overview A further search for a narrative overview using an iterative search strategy yielded 23 publications. An overview of methods to improve productivity of product design and engineering processes could also not be extracted from this search, even though often reference is made to concepts directly related to productivity. IIIA. Systematic literature review The systematic literature review in the third stage was conducted by performing a keyword search in eight selected journals and in Google Scholar. In total this yielded 31 publications with 20 methods for improving productivity of product design and engineering processes found in them. IIIB. Complementing with Delphi study The systematic literature review was complement by a Delphi study. Initially 39 experts were approached based on their profile with regard to both practical and theoretical knowledge. In the first round only 13 questionnaires for the Delphi study were returned. From these 11 participated in the second and third round; in the final round agreement was reached. Outcomes The combination of the systematic literature review and the Delphi study results in 27 methods found for improving the productivity of product design and engineering processes: • 20 methods identified through the systematic literature review. • 12 methods put forward by the experts, from which 5 were found in the systematic literature review.
184
5.6.4
5 Search Strategies for [Systematic] Literature Reviews
Expert Panels
A final method to enhance the effectiveness of search strategies is consulting experts, for which there are three approaches. The first one is the use of PRESS (Peer Review of Electronic Search Strategies), a set of guidelines evaluating search strategies by experts written by McGowan et al. (2016). In this case, the experts contributed by evaluating the recommendations of a review. Second, the use of experts is also suggested by others for specific methods, particularly those for qualitative synthesis. An instance is the description of developing research protocols by Booth et al. (2018, p. 47), in which one of points to consider is the need for expert input throughout the conduct of a review. Third, the engagement with experts could also complement the outcomes of a literature review. This can take the form of consultations, focus groups or Delphi studies. McManus et al. (1998, p. 1562) show that the consultation of experts accounted for 23 out of the 75 unique articles identified, thus complementing searching in electronic databases and hand searching; this is similar to the mention of expert panels in Section 5.5. Differently, in Frederick et al. (2007), an expert panel was used for oversight of the systematic review. For the use of a Delphi study a worked example is provided in Box 5.D. Thus, expert panels can be used for suggesting additional studies to be included, development of protocols, supplementing findings from a review, particularly using a Delphi study, evaluating recommendations and oversight of a review; in some cases, the use of expert panels is advocated, in other cases it complements the search strategy. TIP: INVOLVE LIBRARIANS FOR MORE EFFECTIVE SEARCH STRATEGIES Even though few take advantage of the expertise held by librarians, as demonstrated by Harari et al. (2020, p. 103,377/3), and Koffel and Rethlefsen (2016, p. e0163309/12), with the latter reporting a higher percentage their involvement could enhance search strategies. Koffel (2015, p. e0125931/9) and Rethlefsen et al. (2015, p. 624) reported higher quality search strategies when there is the involvement of a librarian, whereas Koffel and Rethlefsen (2016, p. e0163309/12) find that the involvement leads to more reproducible search strategies. This advice holds to those that are new to setting search strategies as well as those with more experience.
5.7
Grey Literature
So far, the focus has been on studies and works that are found in typical academic outlets, such as journals and indexed conference proceedings, but there is also a large body of knowledge that is called ‘grey literature.’ This can cover conference proceedings (those not directly publicly available), lecture notes, presentations, reports, unpublished studies, white papers and working papers that contain both
5.7 Grey Literature
185
conceptual information and evidence on topics relevant to a literature review. Sometimes, grey literature precedes scholarly publications, such as conference proceedings and working papers. In general, it may lack a systematic means of distribution and collection in contrast to journals and indexed conference proceedings. Depending on the domain and the objective of the review other sources, such as professional publications may fall in this category; see also Section 5.2 for the different types of sources and their suitability for literature reviews. The relative importance of grey literature to literature reviews is largely dependent on research disciplines and subjects, on methodological approaches and on the sources they use. In some disciplines, especially in the life sciences and medicine, there has been a traditional preference for only using peer-reviewed academic journals, whereas in others, such as agriculture, aeronautics and the engineering sciences in general, grey literature is also seen as credible sources for literature reviews. Although this type of literature could be relevant to a particular literature review, its quality may vary. For instance, Benzies et al. (2006, p. 58) find that about one third of the grey literature did not hold sufficient information to determine the quality of the evidence. This is something to be accounted for during a review; see Sections 6.4 and 6.5 for methods to assess the quality of evidence in studies of any type. In this respect, Haddaway and Bayliss (2015, pp. 829) propose to make a distinction between ‘file drawer’ studies and ‘practitioner-generated’ literature; Table 5.5 shows reasons why these could be included in a literature review. Their stance is that these two distinct types of grey literature need to be assessed separately and also require different search strategies. Notwithstanding that grey literature is more diverse, more cumbersome to find and more varied in its reporting, it is seen as a valuable source for some literature reviews. The reasons to include grey literature in a review have been highlighted in several studies. For example, Conn et al. (2003, p. 258) state that leaving out grey literature for meta-analysis will lead to sampling of studies with statistically significant effects, resulting in the overestimation of the effects and considering less variety in how outcomes are achieved. Also, the consideration of contextual aspects for included studies is enhanced by the inclusion of grey literature (Benzies et al. 2006, p. 58). Though generally not found in generic databases for scholarly work, the inclusion of grey literature will lead to a more solid base of conceptualisations and evidence, particularly for estimating effect sizes, and a broader variety in methods and contextual aspects. For finding grey literature different search strategies need to be deployed than for finding scholarly publications and similar works. One source of grey literature can be the list of references in studies. While typically, scholarly articles are found in conventionally published studies in scholarly outlets, such as academic journals and indexed conference proceedings, also other sources may find their way into list of references. Among other search strategies for discovering grey literature is communication with experts and researchers. Benzies et al. (2006, p. 57) mention that they approached national and international experts identified from a workshop, Internet searches, conference presentations and reference lists by sending a ‘Letter of Request for Information’. Another form of interacting with experts is the use of
186
5 Search Strategies for [Systematic] Literature Reviews
Table 5.5 Rationales for inclusion of ‘file drawer’ and ‘practitioner-generated’ literature. Adapted from Haddaway and Bayliss (2015, pp. 829)*, this table shows two types of grey literature. So-called file drawer studies are works that are not or not yet available as publications; this includes master’s dissertations, doctoral studies, conference proceedings, reports about funded research, submission of evidence to relevant scientific communities, etc. The category practitioner-generated reports covers policy-related documents, reports by organisations, professional publications, etc. * Reproduced with permission from Elsevier. File drawer studies • When subject area is relatively novel and may not be widely used in practice
Practitioner-generated reports • When data collection by practitioners or policy-makers has low resource requirements
• When little evidence exists but theory is still widely supported
• When access to sampling systems is not challenge
• When challenges to accepted dogma are unlikely to be readily accepted
• When data is readily recorded for other purposes or as routine measurements from permanent sampling regimes
• When competition within academic discipline is such that risks of questionable research practices are high
• When practitioners are unlikely to restrict access to research (i.e., no on-going commercial interests)
• When research is likely to be undertaken by students and unlikely to be written up
workshops, focus groups, or perhaps even Delphi studies; see also Section 5.5 for this specific point. Also, searching through conference programmes can be a method for finding grey literature. Harari et al. (2020, p. 103, 377/5) report that searching through conference programmes was the most frequently used method in their sample of reviews. A fourth strategy for searching is using announcements (e.g., Listserv postings). This strategy is used by Swift and Wampold (2018, p. 360), though the yield was low in their case. A fifth strategy for locating grey literature is searching websites of various kinds. This is one of the most common strategies, where it should be noted that the specific websites may vary per topic as demonstrated in the two examples in the next paragraph. A sixth strategy is using Google Scholar for finding grey literature. Although it contains a fair amount of grey literature, Haddaway et al. (2015, p. e0138237/14) caution that it does not find all grey literature, and therefore, cannot be used as a stand-alone tool. A final search strategy for identifying grey literature is using specialised databases to this purpose. For example, Godin et al. (2015, pp. 138/3–4) use this as one of their methods; in their case for school-based breakfast programmes in Canada, they used the Canadian Research Index, Canadian Public Policy Collection and Canadian Health Research Collection, the latter two both hosted by the Canadian Electronic Library. Therefore, the seven search strategies, though not necessarily all, should be used in conjunction to find relevant grey literature. Moreover, sometimes literature reviews solely focus on grey literature. A case in point is the review by Soldani et al. (2018) into grey literature on so-called microservices as part of software architecture. Their rationale (ibid., pp. 215–6) is
5.7 Grey Literature
187
that this relatively new approach, positioned in 2014 by them, shows a gap between academic research and industry practices, and they intend to extract insight into the benefits and challenges this approach brings for firms. To this purpose, they (ibid, pp. 216–8) have used general web search engines as search strategy and retrieved 51 blog posts, videos and white papers. One of the challenges for industry they (ibid., p. 227) note is access control, which is hindered by the intensely distributed nature of microservice-based applications. One of the benefits they report is the degree of freedom when developing a microservice compared to more centralised software architecture. Another instance is the study by Piggott-McKellar et al. (2019) into community-based adaptation projects in an effort to respond effectively and sustainably to the impacts of climate change, with a particular focus on people’s livelihoods. Their search strategy (ibid., p. 378), which yielded 25 reports (containing 69 projects), covered searching websites of international, multilateral and bilateral development organisations and agencies, searching in Google and following up by contacting members of organisations to provide reports, where necessary. One of their findings (ibid., pp. 386–7) is the concern that top-down approaches continue to be used under the guise of being ‘community-based’, something already noted in literature. The two examples demonstrate the more intense effort needed to gain access to grey literature in comparison to the more traditional scholarly writings, but that insight can be gained that otherwise would be more difficult to obtain. NOTE: TOWER OF BABEL—GREY LITERATURE EXTENDS TO DIVERSITY IN LANGUAGES Not only is grey literature seen as more difficult to access studies and reports than publications in scholarly journals, but also studies published in other languages can be cumbersome to identify and retrieve. With English being the current dominant language of scholarship, authors tend to ignore publications in other languages; this evidenced by Jackson and Kuriyama (2019) who find that only 22% of the studies in their review included non-English trials, but only representing 2% of the total studies used in these reviews. Some studies, such as Egger et al. (1997) and Grégoire et al. (1995), report bias for systematic reviews related to language restrictions and unpublished works. Differently, Morrison et al. (2012, p. 143) claim they did not find evidence of systematic bias from the use of language restrictions in systematic reviews into conventional medicines, and also Jüni et al (2002, pp. 118–20) disclose relatively limited impact of language restrictions. Therefore, the inclusion of studies drawn from languages other than English can be useful to give more confidence in outcomes of reviews and examine a broader variety of study designs and perspectives, though this is an assertion subject to contention. TIP: INVOLVEMENT OF LIBRARIANS FOR SEARCHING AND RETRIEVING GREY LITERATURE With regard to support by librarians, as also noted in Section 5.6, this is useful for searching grey literature. For instance, librarians can apply their knowledge of searching to unstructured websites to retrieve relevant information. Increasingly, libraries are offering services to access grey literature, as illustrated by Tillett and
188
5 Search Strategies for [Systematic] Literature Reviews
Newbold (2006) for the British Library. The publication by Pappas and Williams (2011) outlines some sources for grey literature pertaining to medicine and other health sciences. In addition, the support of librarians is regularly mentioned for the search into grey literature. For example, Benzies et al. (2006, p. 57) refer to the support provided by librarians and Mahood et al. (2014, p. 233) acknowledge the librarians’ support.
5.8
Undertaking the Search and Recording Results
In addition to setting out and documenting a search strategy for identifying relevant studies, during the search a log in a journal should be kept; there are multiple reasons to do this. First, during the execution of the actual search keywords, subject terms and settings may have to be adapted to increase either the specificity or sensitivity. Such changes affect the outcome and may also lead to a deviation of the search strategy. This is the second reason for keeping a journal about the actual search. Changes and deviations from the search strategy are poorly reported. In this respect, Silagy et al. (2002, p. 2834) note that these changes should be reported to find out what effect they may have had on the results of a review; see also Section 13.2. Third, to a certain extent, the description of the search strategy as part of the methodology should lead to the retrieval of studies being reproducible. Fourth, last but not least, during the actual search, the reviewers may note points of interest for the analysis and synthesis. It is helpful to have such notes available during later stages of the literature review. This means that logging the actual search in a journal will support the reporting of the literature review, the impact of changes on results and findings from the review, and the analysis and synthesis. Particular points of attention for recording results are the repetition of sources already found. This can appear in the form of duplications, which means that the same source is found in another database or search engine, or in another search using different keywords and subject terms. Increasing duplication can indicate that saturation is more likely; see Section 5.6 for a discussion of saturation in the context of searching. It can also appear that the same study appears in different forms. This could be the result of a conference proceeding before the study was published in a journal or different versions discussing different aspect of an empirical study. For example, Kennedy (2007, p. 141) indicates that they found that publications describing the same study and redundant citations reduced their find of 465 records to 420 unique ones. The redundancy could also be caused by self-plagiarism, as written about in Section 2.8. Thus, there are various reasons for repetition of sources already found, and depending on the purpose of the review and the nature of those repetitions, informed decisions should be made to include or exclude them. When managing large-volume literature searches and working in teams, attention needs to be paid to the execution of the search strategy, limitations of supporting software, keeping track of changes and consistency throughout the
5.8 Undertaking the Search and Recording Results
189
entire search process. Such is pointed out by Havill et al. (2014) when discussing their experiences with managing 67,698 records they initially identified. To manage this search, supported by reference software, they had to conduct the search in stages, customise the search for each database, record retrievals separate for each database, combine these and filter out duplicates manually. This example shows that systematic reviews having to process a large volume of records are not only labour-intensive, but also require adequate management of this volume. For these reasons, some authors, including O’Mara-Eves et al. (2015), have suggested using text mining to reduce workloads. According to them, this is more effective during the initial stages of a review rather than later stages where more detail for evaluating the suitability of sources is required. And Lu (2011) points to the support specialist
Defining Research Objectives
Scoping Study Informing
Purpose of Systematic Literature Review
Setting Review Questions
Identifying or Developing Models Protocol including • Keywords and related subject terms • Relevant key works • Databases • Complementary search strategies
Questions or Themes for Literature Review
Keywords and Databases
Inclusion and Exclusion Criteria
Retrieval of Studies
Quantification of Retrieved Studies
Quantitative Analysis of Studies
Qualitative Analysis of Studies
Synthesis of Findings
Fig. 5.7 Scoping study for search strategy of archetype systematic literature reviews. Building on the position of the scoping study in Figure 4.6, this figure depicts how a scoping study informs the search strategy for the protocol of systematic literature reviews. This includes the keywords and subject terms, relevant keyworks that a search strategy should identify, databases that are appropriate and complementary search strategies.
190
5 Search Strategies for [Systematic] Literature Reviews
websites and web-based tools can offer for accessing the database PubMed. Notwithstanding the availability of text mining and web-based tools, undertaking a large volume search poses additional challenges for managing records, which should be taken into account when setting out a search strategy. TIP: USE REFERENCE SOFTWARE Reference software can be used in two ways. First, these applications provide a consistent way for recording the bibliographical details of sources. Nowadays, many of these details can be downloaded from publisher’s websites and imported in these applications. Second, some applications are directly connected to search engines and databases. This makes it possible to directly find potentially relevant studies and import these into the reference software. However, not all applications have the same features and they work slightly differently. Also, academic institutions may support different reference software. Therefore, the choice for reference software should be based on how it is going to be used and which applications are available for a literature review.
5.9
Scoping Reviews and Scoping Studies for Search Strategy
Scoping studies, introduced in Section 4.5, can also support the development of an appropriate search strategy for protocol-driven literature reviews; for the archetype systematic literature review this is depicted in Figure 5.7. In addition to specifying the review question and modelling to support the analysis and synthesis, a scoping study should point out which key publications are of interest (taking into account the note at the end of Section 5.1), which keywords and subject terms should be used, which databases are suitable and which complementary search strategies will complement the keywords, controlled vocabulary and databases search strategy. These findings from the scoping study can be translated into a protocol for the search strategy. For the archetype systematic review the scoping review is similar to that of the archetype systematic literature review; however, it can also be conducted in a more formal manner, called a scoping review, as mentioned in Section 4.5. Again, this will result in a protocol. Normally, these protocols are published, with Rosenstock et al. (2016) being a case in point; they consider themes, practices and outcomes of climate-smart agriculture. Furthermore, they present their search strategy, including the databases and search strings they intend to use. Although not mentioned by Rosenstock et al. (ibid.), a scoping review could also indicate which complementary search strategies are best to use. Thus, the scoping review—see Figure 5.8—may set out which search filters are appropriate for the review questions in conjunction with which databases to use and indicate complementary search strategies for the systematic review to be effective.
Derived from Scoping Review (Protocol-driven)
5.10
Key Points
191 Context of Systematic Review
Developing Review Questions
Identifying or Developing Models Protocol covering • Search filters (incl. keywords, subject terms) • Databases • Complementary search strategies
Detailed Questions for Analysis Studies
Keywords and Databases
Inclusion and Exclusion Criteria
Retrieval of Studies
Extraction of Quantitative Data
Extraction of Qualitative Data
Quantitative Analysis of Studies
Qualitative Analysis of Studies
Synthesis of Findings
Fig. 5.8 Scoping review for search strategy of archetype systematic reviews. The scoping review should not only indicate review questions for systematic reviews, as found in Figure 4.5, but also which search filters to use for specific databases (keywords, subject terms, Boolean operators) and which complementary search strategies to the keywords, controlled vocabulary and databases search strategy are appropriate.
5.10
Key Points
• Search strategies should be aiming at finding relevant studies, works and reports that provide either conceptualisations or evidence; this could extend to theories, perspectives, methods and tools, depending on the purpose of a literature review. The effectiveness of search strategies is determined by sensitivity and specificity (the latter is also known as precision); see Section 5.1 for more detail. • There are different types of sources that can be used for literature reviews, dependent on their purpose. In any case, the credibility of such publications and works needs to be assessed to judge whether inclusion is justified. Table 5.1 provides an overview of different sources with points to pay attention to. Generally, literature reviews as scholarly works rely on publications of studies in journals and conference proceedings; practices vary by discipline on this
192
•
•
•
•
•
•
5 Search Strategies for [Systematic] Literature Reviews
point. However, citation rates are not necessarily a reflection of the relevancy of studies for inclusion in a literature review. Special attention should be given to whether or not to consider including grey literature. This refers to publications that are not found through traditional publishing channels. This type of literature may hold valuable information, but it is often more difficult to trace. Generally, including grey literature leads to more reliable findings in literature reviews and could also lead to better understanding in specific situations (i.e., idiographic research, see Section 3.3). There are two distinct search strategies: • The iterative search strategy is more suitable for the archetypes narrative overview and narrative review. The iterative search strategy can be enhanced by using complementary techniques such as citation pearl growing, briefsearch, building blocks and successive fractions; these are described in Section 5.3. This iterative type of searching, consulting databases with keywords, supports finding appropriate sources for a literature review. • The keywords, controlled vocabulary and databases search strategy is used for protocol-driven literature reviews, i.e., systematic literature reviews and systematic reviews, but can also support the other two archetypes. Section 5.4 provides detailed guidance on how to conduct this search strategy, options for using databases and which caveats could occur. Other search strategies, which are also used as complementary search strategies to the keywords, controlled vocabulary and databases search strategy are: • Hand searching. • Snowballing. • Backward and forward searching. • Root and branch searches. • Citation pearl growing. • Consultation of experts, including use of expert panels. • Four of these search strategies (snowballing, backward and forward searching, root and branch searches and citation pearl growing) are summarised in Table 5.3. How many relevant studies from all studies are retrieved can be determined by the degree of saturation. This means that when each search of another database or using other sets of keywords yields marginal increases of identified relevant studies then saturation is achieved. This process is described in Section 5.6. Duplication when finding relevant sources and studies is a poor indicator of saturation. To identify relevant studies, search filters have been developed for specific topics, domains and applications. These filters are aiming at achieving a high degree of sensitivity and specificity, but should be applied with caution; see Section 5.6. The conduct of scoping reviews and scoping studies will outline elements of the protocol, including search filters (keywords, subject terms, Boolean operators), databases, relevant key works and which complementary search strategies to use.
5.11
5.11
How to …?
193
How to …?
5.11.1 … Set an Appropriate Search Strategy Which search strategy to apply depends on the archetype for the literature review. For narrative overviews and narrative reviews the iterative search strategy is a more appropriate choice, although also the keywords, controlled vocabulary and databases search strategy is appropriate; see Section 5.3. For the protocol-driven archetypes systematic literature reviews and systematic reviews a search strategy based on keywords, controlled vocabulary and databases is the standard; see Section 5.4. However, for the latter search strategy, normally complementary search strategies are necessary as discussed in Section 5.6. This means that the search strategies more or less follow the classification into narrative literature reviews and protocol-driven literature reviews. The carrying out of both main search strategies (iterative search strategy, and keywords, controlled vocabulary and databases search strategy) depends on saturation. In the iterative search strategy this is achieved when the reviewer is satisfied that all key works or all key constructs for the topic have been included. For the keyword, controlled vocabulary and database search strategy this needs to be determined explicitly by looking at the yield of additional searches in other databases or using different sets of keywords; when there is a marginal yield the search can be halted.
5.11.2 ... Determine Which Type of Sources to Consider Which sources to consider for a literature review depends entirely on the topic of the study; see Section 5.2 for types of sources. A clearly expressed question for the review, as outlined in Sections 4.2 and 4.4, supports assessing which types of sources are appropriate. For some literature reviews the relevant sources can extend to professional publications, whereas reviews into evidence-based interventions, policies, practices and treatments tend to rely on empirical studies. The type of sources used may influence outcomes. This is particularly relevant for grey literature (see Section 5.7), though its actual influence is still topic of debate.
5.11.3 ... Write a Literature Review In the case of protocol-driven literature reviews (systematic literature reviews and systematic reviews), the study needs to contain explicit statements about how the search was conducted. This includes the mention of search filters, including the keywords, controlled vocabulary, databases searched, which complementary search
194
5 Search Strategies for [Systematic] Literature Reviews
strategies were used and how saturation was achieved. When grey literature and other types of non-scholarly publications are consulted, then the way they were obtained should be disclosed. Also, whether a scoping study or review preceded the literature review should be revealed. Normally, these points are addressed in a description of the review methodology (some call this ‘research methodology’ and others name it the protocol). In the case of narrative literature reviews (narrative overviews and narrative reviews), the search strategy is normally not disclosed. An exception is the hermeneutic approach to literature reviews (Section 3.3). However, if the purpose is to provide clarity about how the literature reviews was done, then some description of it may help the reader to better understand the outcomes of narrative literature reviews; this advice may apply better to narrative reviews than narrative overviews.
References Aguillo IF (2012) Is Google Scholar useful for bibliometrics? A webometric analysis. Scientometrics 91(2):343–351. https://doi.org/10.1007/s11192-011-0582-8 Bardia A, Wahner-Roedler DL, Erwin PL, Sood A (2006) Search strategies for retrieving complementary and alternative medicine clinical trials in oncology. Integr Cancer Ther 5 (3):202–205. https://doi.org/10.1177/1534735406292146 Bates MJ (1989) The design of browsing and berrypicking techniques for the online search interface. Online Rev 13(5):407–424 Bates MJ (2007) What is browsing—really? A model drawing from behavioural science research. Inform Res 20(4). http://informationr.net/ir/12-4/paper330.html Benzies KM, Premji S, Hayden KA, Serrett K (2006) State-of-the-evidence reviews: advantages and challenges of including grey literature. Worldviews Evid Based Nurs 3(2):55–61. https:// doi.org/10.1111/j.1741-6787.2006.00051.x Bernardo M, Simon A, Tarí JJ, Molina-Azorín JF (2015) Benefits of management systems integration: a literature review. J Clean Prod 94:260–267. https://doi.org/10.1016/j.jclepro. 2015.01.075 Beynon R, Leeflang MM, McDonald S, Eisinga A, Mitchell RL, Whiting P, Glanville JM (2013) Search strategies to identify diagnostic accuracy studies in MEDLINE and EMBASE. Cochrane Database Syst Rev (9). https://doi.org/10.1002/14651858.MR000022.pub3 Bolton JE (1971) Small firms—report of the committee of inquiry on small firms (4811). London Boluyt N, Tjosvold L, Lefebvre C, Klassen TP, Offringa M (2008) Usefulness of systematic review search strategies in finding child health systematic reviews in MEDLINE. Arch Pediatr Adolesc Med 162(2):111–116. https://doi.org/10.1001/archpediatrics.2007.40 Booth A, Noyes J, Flemming K, Gerhardus A, Wahlster P, van der Wilt GJ, Rehfuess E (2018) Structured methodology review identified seven (RETREAT) criteria for selecting qualitative evidence synthesis approaches. J Clinic Epidemiol 99:41–52. https://doi.org/10.1016/j.jclinepi. 2018.03.003 Chesbrough H (2012) Open innovation: where we’ve been and where we’re going. Res Technol Manag 55(4):20–27. https://doi.org/10.5437/08956308X5504085 Chesbrough HW (2003) Open innovation: the new imperative for creating and profiting from technology. Harvard Business School Press, Boston Conn VS, Valentine JC, Cooper HM, Rantz MJ (2003) Grey literature in meta-analyses. Nurs Res 52(4):256–261
References
195
de la Torre Díez I, Cosgaya HM, Garcia-Zapirain B, López-Coronado M (2016) Big data in health: a literature review from the year 2005. J Med Syst 40(9):209. https://doi.org/10.1007/s10916016-0565-7 Dekkers R, Hicks C (2019) How many cases do you need for studies into operations management? Guidance based on saturation. In: Paper presented at the 26th EurOMA conference, Helsinki Dekkers R, Koukou MI, Mitchell S, Sinclair S (2019) Engaging with open innovation: a Scottish perspective on its opportunities, challenges and risks. J Innov Econ Manag 28(1):193–226. https://doi.org/10.3917/jie.028.0187 Dieste O, Grimán A, Juristo N (2009) Developing search strategies for detecting relevant experiments. Empir Softw Eng 14(5):513–539. https://doi.org/10.1007/s10664-008-9091-7 Eady AM, Wilczynski NL, Haynes RB (2008) PsycINFO search strategies identified methodologically sound therapy studies and review articles for use by clinicians and researchers. J Clin Epidemiol 61(1):34–40. https://doi.org/10.1016/j.jclinepi.2006.09.016 Egger M, Zellweger-Zähner T, Schneider M, Junker C, Lengeler C, Antes G (1997) Language bias in randomised controlled trials published in English and German. The Lancet 350(9074):326– 329. https://doi.org/10.1016/S0140-6736(97)02419-7 Eisenhardt KM (1989) Agency theory: an assessment and review. Acad Manag Rev 14(1):57–74. https://doi.org/10.5465/AMR.1989.4279003 Finfgeld-Connett D, Johnson ED (2013) Literature search strategies for conducting knowledge-building and theory-generating qualitative systematic reviews. J Adv Nurs 69 (1):194–204. https://doi.org/10.1111/j.1365-2648.2012.06037.x Frederick JT, Steinman LE, Prohaska T, Satariano WA, Bruce M, Bryant L, Snowden M (2007) Community-based treatment of late life depression: an expert panel-informed literature review. Am J Prev Med 33(3):222–249. https://doi.org/10.1016/j.amepre.2007.04.035 Glanville J, Kaunelis D, Mensinkai S (2009) How well do search filters perform in identifying economic evaluations in MEDLINE and EMBASE. Int J Technol Assess Health Care 25 (4):522–529. https://doi.org/10.1017/S0266462309990523 Godin K, Stapleton J, Kirkpatrick SI, Hanning RM, Leatherdale ST (2015) Applying systematic review search methods to the grey literature: a case study examining guidelines for school-based breakfast programs in Canada. Syst Rev 4(1):138. https://doi.org/10.1186/ s13643-015-0125-0 Green BN, Johnson CD, Adams A (2006) Writing narrative literature reviews for peer-reviewed journals: secrets of the trade. J Chiropr Med 5(3):101–117. https://doi.org/10.1016/S0899-3467 (07)60142-6 Greenhalgh T, Peacock R (2005) Effectiveness and efficiency of search methods in systematic reviews of complex evidence: audit of primary sources. BMJ 331(7524):1064–1065. https:// doi.org/10.1136/bmj.38636.593461.68 Grégoire G, Derderian F, le Lorier J (1995) Selecting the language of the publications included in a meta-analysis: is there a Tower of Babel bias? J Clin Epidemiol 48(1):159–163 Gross T, Taylor AG, Joudrey DN (2015) Still a lot to lose: the role of controlled vocabulary in keyword searching. Catalog Classific Q 53(1):1–39. https://doi.org/10.1080/01639374.2014. 917447 Grosso G, Godos J, Galvano F, Giovannucci EL (2017) Coffee, caffeine, and health outcomes: an umbrella review. Annu Rev Nutr 37(1):131–156. https://doi.org/10.1146/annurev-nutr071816-064941 Gusenbauer M, Haddaway NR (2020) Which academic search systems are suitable for systematic reviews or meta-analyses? Evaluating retrieval qualities of Google Scholar, PubMed, and 26 other resources. Res Synthesis Methods 11(2):181–217. https://doi.org/10.1002/jrsm.1378 Haddaway NR, Bayliss HR (2015) Shades of grey: two forms of grey literature important for reviews in conservation. Biol Cons 191:827–829. https://doi.org/10.1016/j.biocon.2015.08.018 Haddaway NR, Collins AM, Coughlin D, Kirk S (2015) The role of Google Scholar in evidence reviews and its applicability to grey literature searching. PLoS One 10(9):e0138237. https:// doi.org/10.1371/journal.pone.0138237
196
5 Search Strategies for [Systematic] Literature Reviews
Harari MB, Parola HR, Hartwell CJ, Riegelman A (2020) Literature searches in systematic reviews and meta-analyses: a review, evaluation, and recommendations. J Vocat Behav 118:103377. https://doi.org/10.1016/j.jvb.2020.103377 Harzing A-WK, van der Wal R (2008) Google Scholar as a new source for citation analysis. Ethics Sci Environ Politics 8(1):61–73. https://doi.org/10.3354/esep00076 Hausner E, Guddat C, Hermanns T, Lampert U, Waffenschmidt S (2016) Prospective comparison of search strategies for systematic reviews: an objective approach yielded higher sensitivity than a conceptual one. J Clin Epidemiol 77:118–124. https://doi.org/10.1016/j.jclinepi.2016. 05.002 Hausner E, Waffenschmidt S, Kaiser T, Simon M (2012) Routine development of objectively derived search strategies. Syst Rev 1(1):19. https://doi.org/10.1186/2046-4053-1-19 Havill NL, Leeman J, Shaw-Kokot J, Knafl K, Crandell J, Sandelowski M (2014) Managing large-volume literature searches in research synthesis studies. Nurs Outlook 62(2):112–118. https://doi.org/10.1016/j.outlook.2013.11.002 Hildebrand AM, Iansavichus AV, Haynes RB, Wilczynski NL, Mehta RL, Parikh CR, Garg AX (2014) High-performance information search filters for acute kidney injury content in PubMed, Ovid Medline and Embase. Nephrol Dial Transplant 29(4):823–832. https://doi.org/10.1093/ ndt/gft531 Hinckeldeyn J, Dekkers R, Kreutzfeldt J (2015) Productivity of product design and engineering processes—unexplored territory for production management techniques? Int J Oper Prod Manag 35(4):458–486. https://doi.org/10.1108/IJOPM-03-2013-0101 Hopewell S, Clarke M, Lefebvre C, Scherer R (2007) Handsearching versus electronic searching to identify reports of randomized trials. Cochrane Database Syst Rev (2):MR000001. https:// doi.org/10.1002/14651858.mr000001.pub2 Isckia T, Lescop D (2011) Une analyse critique des fondements de l’innovation ouverte. Rev Fr Gest 210(1):87–98 Jackson JL, Kuriyama A (2019) How often do systematic reviews exclude articles not published in English? J Gen Intern Med 34(8):1388–1389. https://doi.org/10.1007/s11606-019-04976-x Jennex ME (2015) Literature reviews and the review process: an editor-in-chief’s perspective. Commun Assoc Inf Syst 36:139–146. https://doi.org/10.17705/1CAIS.03608 Jensen MC, Meckling WH (1976) Theory of the firm: managerial behavior, agency costs and ownership structure. J Financ Econ 3(4):305–360. https://doi.org/10.1016/0304-405X(76) 90026-X Jüni P, Holenstein F, Sterne J, Bartlett C, Egger M (2002) Direction and impact of language bias in meta-analyses of controlled trials: empirical study. Int J Epidemiol 31(1):115–123. https://doi. org/10.1093/ije/31.1.115 Kennedy MM (2007) Defining a literature. Educ Res 36(3):139. https://doi.org/10.3102/ 0013189x07299197 Koffel JB (2015) Use of recommended search strategies in systematic reviews and the impact of librarian involvement: a cross-sectional survey of recent authors. PLoS One 10(5):e0125931. https://doi.org/10.1371/journal.pone.0125931 Koffel JB, Rethlefsen ML (2016) Reproducibility of search strategies is poor in systematic reviews published in high-impact pediatrics, cardiology and surgery journals: a cross-sectional study. PLoS One 11(9):e0163309. https://doi.org/10.1371/journal.pone.0163309 Lawal AK, Rotter T, Kinsman L, Sari N, Harrison L, Jeffery C, Flynn R (2014) Lean management in health care: definition, concepts, methodology and effects reported (systematic review protocol). Syst Rev 3(1):103. https://doi.org/10.1186/2046-4053-3-103 Levay P, Ainsworth N, Kettle R, Morgan A (2016) Identifying evidence for public health guidance: a comparison of citation searching with Web of Science and Google Scholar. Res Synthesis Methods 7(1):34–45. https://doi.org/10.1002/jrsm.1158 Li L, Smith HE, Atun R, Tudor Car L (2019) Search strategies to identify observational studies in MEDLINE and Embase. Cochrane Database Syst Rev (3). https://doi.org/10.1002/14651858. MR000041.pub2
References
197
Linton JD, Thongpapanl NT (2004) Ranking the technology innovation management journals. J Prod Innov Manag 21(2):123–139. https://doi.org/10.1111/j.0737-6782.2004.00062.x Lokker C, McKibbon KA, Wilczynski NL, Haynes RB, Ciliska D, Dobbins M, Straus SE (2010) Finding knowledge translation articles in CINAHL. Studies Health Technol Inform 160 (2):1179–1183 Lu Z (2011) PubMed and beyond: a survey of web tools for searching biomedical literature. Database. https://doi.org/10.1093/database/baq036 MacSuga-Gage AS, Simonsen B (2015) Examining the effects of teacher—directed opportunities to respond on student outcomes: a systematic review of the literature. Educ Treat Child 38 (2):211–239. https://doi.org/10.1353/etc.2015.0009 Mahood Q, Van Eerd D, Irvin E (2014) Searching for grey literature for systematic reviews: challenges and benefits. Res Synthesis Methods 5(3):221–234. https://doi.org/10.1002/jrsm. 1106 Marangunić N, Granić A (2015) Technology acceptance model: a literature review from 1986 to 2013. Univ Access Inf Soc 14(1):81–95. https://doi.org/10.1007/s10209-014-0348-1 Mc Elhinney H, Taylor B, Sinclair M, Holman MR (2016) Sensitivity and specificity of electronic databases: the example of searching for evidence on child protection issues related to pregnant women. Evid Based Midwifery 14(1):29–34 McGowan J, Sampson M, Salzwedel DM, Cogo E, Foerster V, Lefebvre C (2016) PRESS peer review of electronic search strategies: 2015 guideline statement. J Clin Epidemiol 75:40–46. https://doi.org/10.1016/j.jclinepi.2016.01.021 McManus RJ, Wilson S, Delaney BC, Fitzmaurice DA, Hyde CJ, Tobias S, Hobbs FDR (1998) Review of the usefulness of contacting other experts when conducting a literature search for systematic reviews. Br Med J 317(7172):1562–1563https://doi.org/10.1136/bmj.317.7172. 1562 Methley AM, Campbell S, Chew-Graham C, McNally R, Cheraghi-Sohi (2014) PICO, PICOS and SPIDER: a comparison study of specificity and sensitivity in three search tools for qualitative systematic reviews. BMC Health Serv Res 14(1):579. https://doi.org/10.1186/s12913-0140579-0 Mitnick BM (1973) Fiduciary rationality and public policy: the theory of agency and some consequences. In: Paper presented at the annual meeting of the American political science association, New Orleans, LA Morrison A, Polisena J, Husereau D, Moulton K, Clark M, Fiander M, Rabb D (2012) The effect of English-language restriction on systematic review-based meta-analyses: a systematic review of empirical studies. Int J Technol Assess Health Care 28(2):138–144. https://doi.org/10.1017/ S0266462312000086 Neuhaus C, Daniel HD (2008) Data sources for performing citation analysis: an overview. J Document 64(2):193–210. https://doi.org/10.1108/00220410810858010 O’Mara-Eves A, Thomas J, McNaught J, Miwa M, Ananiadou S (2015) Using text mining for study identification in systematic reviews: a systematic review of current approaches. Syst Rev 4(1):5. https://doi.org/10.1186/2046-4053-4-5 Ogilvie D, Foster CE, Rothnie H, Cavill N, Hamilton V, Fitzsimons CF, Mutrie N (2007) Interventions to promote walking: systematic review. BMJ 334(7605):1204. https://doi.org/10. 1136/bmj.39198.722720.BE Onetti A (2019) Turning open innovation into practice: trends in European corporates. J Bus Strateg 42(1):51–58. https://doi.org/10.1108/JBS-07-2019-0138 Papaioannou D, Sutton A, Carroll C, Booth A, Wong R (2010) Literature searching for social science systematic reviews: consideration of a range of search techniques. Health Info Libr J 27 (2):114–122. https://doi.org/10.1111/j.1471-1842.2009.00863.x Pappas C, Williams I (2011) Grey literature: its emerging importance. J Hosp Librariansh 11 (3):228–234. https://doi.org/10.1080/15323269.2011.587100 Pearson M, Moxham T, Ashton K (2011) Effectiveness of search strategies for qualitative research about barriers and facilitators of program delivery. Eval Health Prof 34(3):297–308. https://doi. org/10.1177/0163278710388029
198
5 Search Strategies for [Systematic] Literature Reviews
Piggott-McKellar AE, McNamara KE, Nunn PD, Watson JEM (2019) What are the barriers to successful community-based climate change adaptation? A review of grey literature. Local Environ 24(4):374–390. https://doi.org/10.1080/13549839.2019.1580688 Piller F, West J (2014) Firms, users, and innovations: an interactive model of coupled innovation. In: Chesbrough HW, Vanhaverbeke W, West J (eds) New frontiers in open innovation. Oxford University Press, Oxford, pp 29–49 Poole R, Kennedy OJ, Roderick P, Fallowfield JA, Hayes PC, Parkes J (2017) Coffee consumption and health: umbrella review of meta-analyses of multiple health outcomes. BMJ 359:j5024. https://doi.org/10.1136/bmj.j5024 Priem RL, Butler JE (2001) Is the resource-based “view” a useful perspective for strategic management research? Acad Manag Rev 26(1):22–40. https://doi.org/10.5465/amr.2001. 4011928 Relevo R (2012) Effective search strategies for systematic reviews of medical tests. J Gener Internal Med 27(1):S28–S32. https://doi.org/10.1007/s11606-011-1873-8 Rethlefsen ML, Farrell AM, Osterhaus Trzasko LC, Brigham TJ (2015) Librarian co-authors correlated with higher quality reported search strategies in general internal medicine systematic reviews. J Clin Epidemiol 68(6):617–626. https://doi.org/10.1016/j.jclinepi.2014.11.025 Rewhorn S (2018) Writing your successful literature review. J Geogr High Educ 42(1):143–147. https://doi.org/10.1080/03098265.2017.1337732 Rietjens JA, Bramer WM, Geijteman EC, van der Heide A, Oldenmenger WH (2019) Development and validation of search filters to find articles on palliative care in bibliographic databases. Palliat Med 33(4):470–474. https://doi.org/10.1177/ 0269216318824275 Rogers M, Bethel A, Abbott R (2018) Locating qualitative studies in dementia on MEDLINE, EMBASE, CINAHL, and PsycINFO: a comparison of search strategies. Res Synthesis Methods 9(4):579–586. https://doi.org/10.1002/jrsm.1280 Rosenstock TS, Lamanna C, Chesterman S, Bell P, Arslan A, Richards M, Zhou W (2016) The scientific basis of climate-smart agriculture: a systematic review protocol. CGIAR, Copenhagen Ross SA (1973) The economic theory of agency: the principal’s problem. Am Econ Rev 63 (2):134–139 Rosumeck S, Wagner M, Wallraf S, Euler U (2020) A validation study revealed differences in design and performance of search filters for qualitative research in PsycINFO and CINAHL. J Clin Epidemiol 128:101–108. https://doi.org/10.1016/j.jclinepi.2020.09.031 Rowley J, Slack F (2004) Conducting a literature review. Manag Res News 27(6):31–39. https:// doi.org/10.1108/01409170410784185 Rudestam K, Newton R (1992) Surviving your dissertation: a comprehensive guide to content and process. Sage, London Salgado EG, Dekkers R (2018) Lean product development: nothing new under the sun? Int J Manag Rev 20(4):903–933. https://doi.org/10.1111/ijmr.12169 Savoie I, Helmer D, Green CJ, Kazanjian A (2003) BEYOND MEDLINE: reducing bias through extended systematic review search. Int J Technol Assess Health Care 19(1):168–178. https:// doi.org/10.1017/S0266462303000163 Schlosser RW, Wendt O, Bhavnani S et al (2006) Use of information-seeking strategies for developing systematic reviews and engaging in evidence-based practice: the application of traditional and comprehensive Pearl growing. A review. Int J Language Commun Disorders 41 (5):567–582. https://doi.org/10.1080/13682820600742190 Schryen G (2015) Writing qualitative IS literature reviews—guidelines for synthesis, interpretation, and guidance of research. Commun Assoc Inf Syst 34:286–325. https://doi. org/10.17705/1CAIS.03712 Shishank S, Dekkers R (2013) Outsourcing: decision-making methods and criteria during design and engineering. Product Plan Control Manage Oper 24(4–5):318–336. https://doi.org/10. 1080/09537287.2011.648544
References
199
Silagy CA, Middleton P, Hopewell S (2002) Publishing protocols of systematic reviews comparing what was done to what was planned. JAMA 287(21):2831–2834. https://doi.org/10. 1001/jama.287.21.2831 Soldani J, Tamburri DA, Van Den Heuvel W-J (2018) The pains and gains of microservices: a systematic grey literature review. J Syst Softw 146:215–232. https://doi.org/10.1016/j.jss.2018. 09.082 Stevinson C, Lawlor DA (2004) Searching multiple databases for systematic reviews: added value or diminishing returns? Complement Ther Med 12(4):228–232. https://doi.org/10.1016/j.ctim. 2004.09.003 Swift JK, Wampold BE (2018) Inclusion and exclusion strategies for conducting meta-analyses. Psychother Res 28(3):356–366. https://doi.org/10.1080/10503307.2017.1405169 Swift JK, Callahan JL, Cooper M, Parkin SR (2018) The impact of accommodating client preference in psychotherapy: a meta-analysis. J Clin Psychol 74(11):1924–1937. https://doi. org/10.1002/jclp.22680 Tanon AA, Champagne F, Contandriopoulos A-P, Pomey M-P, Vadeboncoeur A, Nguyen H (2010) Patient safety and systematic reviews: finding papers indexed in MEDLINE, EMBASE and CINAHL. Qual Saf Health Care 19(5):452–461. https://doi.org/10.1136/qshc.2008.031401 Tillett S, Newbold E (2006) Grey literature at the British library: revealing a hidden resource. Interlend Document Supply 34(2):70–73. https://doi.org/10.1108/02641610610669769 Trott P, Hartmann D (2009) Why ‘open innovation’ is old wine in new bottles. Int J Innov Manag 13(4):715–736. https://doi.org/10.1142/S1363919609002509 vom Brocke J, Simons A, Riemer K, Niehaves B, Plattfaut R, Cleven A (2015) Standing on the shoulders of giants: challenges and recommendations of literature search in information systems research. Commun Assoc Inf Syst 37:205–224. https://doi.org/10.17705/1CAIS.03709 Webster J, Watson RT (2002) Analyzing the past to prepare for the future: writing a literature review. MIS Q 26(2):xiii–xxiii Wellington JJ, Bathmaker A, Hunt C, McCulloch G, Sikes P (2005) Succeeding with your doctorate. Sage, Thousand Oaks Wilczynski NL, Haynes RB (2007) EMBASE search strategies achieved high sensitivity and specificity for retrieving methodologically sound systematic reviews. J Clin Epidemiol 60(1): 29–33. https://doi.org/10.1016/j.jclinepi.2006.04.001 Wilczynski NL, Marks S, Haynes RB (2007) Search strategies for identifying qualitative studies in CINAHL. Qual Health Res 17(5):705–710. https://doi.org/10.1177/1049732306294515 Wohlin C, Prikladnicki R (2013) Systematic literature reviews in software engineering. Inf Softw Technol 55(6):919–920. https://doi.org/10.1016/j.infsof.2013.02.002 Wong SS-L, Wilczynski NL, Haynes RB (2006) Optimal CINAHL search strategies for identifying therapy studies and review articles. J Nurs Scholarsh 38(2):194–199. https://doi. org/10.1111/j.1547-5069.2006.00100.x Zhang L, Ajiferuke I, Sampson M (2006) Optimizing search strategies to identify randomized controlled trials in MEDLINE. BMC Med Res Methodol 6(1):23. https://doi.org/10.1186/ 1471-2288-6-23
Lynn Irvine is the College Librarian for the College of Social Sciences at the University of Glasgow. She is a qualified librarian with an MA joint honours degree from the University of Glasgow and a postgraduate qualification in Information Science and Librarianship from the University of Strathclyde. She is an Associate Fellow of the Higher Education Academy. Lynn has been an academic librarian in several roles in Higher Education since 1996 and in this time, was able to also to squeeze in a period managing an art bookshop for a contemporary art centre in Glasgow. She has recently co-authored a book on legal information skills (2017) and contributed to a book on research skills for legal practitioners (2019). Her
200
5 Search Strategies for [Systematic] Literature Reviews
research interests include learning spaces, information literacy and information seeking behaviour. She regularly does peer review for the journal Global Knowledge, Memory and Communication.
Chapter 6
Setting Inclusion and Exclusion Criteria
After retrieving studies and sources based on the review question, see Chapter 4, and the search strategy, see Chapter 5, the next step is considering which sources to include and which ones to exclude. This is necessary because the actual search using keywords and controlled vocabulary may result in many sources found due to the way databases and search engines work, thus not necessarily retrieving only those ones relevant to the review question. Therefore, the step for inclusion and exclusion avoids that sources are taken into consideration and analysed that provide no information that pertains to the review question. This chapter describes how to ensure that sources are relevant and how to undertake this selection process. Section 6.1 describes how to determine whether sources are relevant after they have been found. This is based on the scanning of sources on title, abstract and content. In Section 6.2 common criteria for inclusion are presented. This is followed by a brief discussion of exclusion criteria in Section 6.3, which also presents two examples. How to include studies in a literature review based on the quality of evidence is found in Section 6.4, particularly for the archetype systematic review. The methods include how to assess whether the quality of evidence influences findings and recommendations. Section 6.5 considers the quality of evidence for the other archetypes of literature reviews. The final section of this chapter (Section 6.6) completes the purpose of scoping reviews and scoping studies by relating it to setting the inclusion and exclusion criteria, and considering the level of evidence; this extends the introduction of scoping reviews and studies for the development of review questions in Section 4.5, and the setting out of a search strategy in Section 5.9. Therefore, this chapter complements the search strategies described in Chapter 5 by focusing on which of the retrieved studies should be included in the overview and how to deal with variety in the quality of evidence found in these selected studies; this is particularly of interest to protocol-driven literature reviews, but some points made can also be applied to narrative literature reviews.
© Springer Nature Switzerland AG 2022 R. Dekkers et al., Making Literature Reviews Work: A Multidisciplinary Guide to Systematic Approaches, https://doi.org/10.1007/978-3-030-90025-0_6
201
202
6.1
6 Setting Inclusion and Exclusion Criteria
Filtering for Relevant Sources
Having a question for the literature review, and if applicable, a succeeding empirical study, all sources that are relevant to the topic and how it is defined should be taken into account. Although a search strategy normally includes keywords, the outcomes of the retrieval do not warrant that a particular study is relevant or worthwhile. This is caused by many reasons. One of the reasons is that sometimes incorrect wording or labelling is used. An example of such could be the confusion about the terms offshoring and outsourcing, which are interchangeably used, but may point to entirely different concepts. Another reason for finding more than just relevant sources is that incorrect keywords have been attributed to the indexing of the study. Finally, keywords that are related to very different disciplines or topics may have been used during the search for sources. These reasons require to check whether the actual content of a source or study matches with the focus of the literature review. To avoid taking in sources that are irrelevant to the topic, filtering sources based on title and abstract is a first step. After retrieval of studies, see Sections 5.3 and 5.4 on search strategies, this stage is necessary, although titles and abstracts are not necessarily informative. Yitzhaki (1997, p. 220) highlights the importance of informative titles, but also notes that often sufficient substantive words—those that accurately describe the content—are lacking, even though this may depend on practices in disciplines (ibid., p. 227). Also, Haggan (2004) points to effective titles and ineffective titles from a linguistic point of view, thus implicitly indicating that titles are not always sufficiently informative. Such compels looking at the abstract when it is not clear what the title covers or to verify whether the title covers adequately the content. However, abstracts also vary in their informativeness. For example, Rosen et al. (2005) examine abstracts of papers on cost-effectiveness analysis using four elements: population, comparator, intervention and perspective, akin to the format population-intervention-outcome for the review question (see Section 4.4 for this format). They (ibid., p. 426) find that only 20% of the sample provided information on all four elements they considered. Postman and Kateman (1992, p. 151) also note that only 20% of articles are complete with regard to reporting analytical methods for chemical experiments. This lack of adequate abstracts has also been observed for other disciplines, such as physical therapy (Richter et al. 2016). Thus, when checking the title and abstract of a source, it has to be kept in mind that these do not necessarily reflect the actual content; particularly, when studies are excluded for further analysis based on titles and abstracts, this may lead to less studies considered than should have been. Because of insufficient clarity in titles and abstracts of studies, it may be necessary to filter sources on content, which requires to examine the full text. The evaluation whether a source is suitable depends on how its contents provide useful information for the review question. Also, other factors, such as the rigour of the article, could be taken into account. This implies that the evaluation of content in studies is a more intense step in terms of effort; however, it is necessary to avoid discarding sources that hold relevant information to the review question.
6.1 Filtering for Relevant Sources
203
NOTE: SOMETIMES PARTS OF STUDIES NEED TO BE CONSIDERED ONLY Whereas typically systematic reviews consider a study holistically, the other three archetypes—narrative overviews, narrative reviews and systematic literature reviews—may look at parts of a study. A case in point is systematic literature review in the doctoral study by Koukou (2020, pp. 14–41), when examining the effects of customer involvement in new product development, where some works appraised did not have this customer involvement as main topic of study; this example is elaborated further in Section 6.4.
6.2
Inclusion Criteria
There are seven categories of common inclusion criteria that could be considered for determining whether studies should be taken to the stage of analysis: content, data of publication, language, source, research design and method, sampling, and data analysis. The choice of inclusion (and exclusion) criteria should logically follow from the review question (e.g., Meline 2006, p. 21). However, how broad or narrow the selection process is becomes important, because it determines which studies will be analysed. Consequently, it influences the outcomes of the literature review, specifically the scope and validity of conclusions to be drawn later. Therefore, the inclusion of sources based on criteria leads to setting the boundaries for studies to be analysed further; since the selection has an impact on the analysis and outcomes, each of these possible inclusion criteria will be discussed in the following subsections.
6.2.1
Content
The inclusion of sources and studies based on their content requires looking with more detail into studies, as suggested in Section 6.1. Particularly, this is the case when titles and abstracts poorly reflect content, the description of studies is poor, or the authors use ambiguous or uncommon terms. It is not necessarily the case that authors do such on purpose, but they might not have benefited from appropriate peer review when preparing a study. This requires close reading of studies that may seem to fall in the intended scope of a literature review. When such happens, a note should be made so that others than the authors of a literature review can track how the review reached its conclusion with regard to specific studies. This could lead to how terms used in a study have been considered in the actual literature review. An example would be the use of the term ‘lean strategy’ in retrieved studies for a literature review. The wording ‘lean production’ (Womack et al. 1991) and ‘lean thinking’ (Womack and Jones 1996) emerged from the study into Japanese manufacturing practices (Holweg 2007, p. 427), specifically
204
6 Setting Inclusion and Exclusion Criteria
into what is called the ‘Toyota Production System’ (the production system of an automotive manufacturer). In this context, does ‘lean strategy’ mean that the strategy itself is lean, the adoption of the principles of lean thinking to the process of strategy formation (e.g., Atkinson 2004, p. 18 ff.) or the adoption of tools and practices associated with lean production to the contents of strategy (for instance, Antony et al. 2003, p. 40)? A literature review needs to sort ambiguous use of subject terms out, position how loosely-defined terms relate and what the authors of a review are taking forward. Thus, close reading of studies (see also Section 3.6) leads to taking note of what is relevant in a study concerning the topic of the review, identifying different use of terminology and providing clarity about what is learned from a studies or studies and how this influences findings. NOTE: BALANCING INCLUSION AND EXCLUSION Although an inclination may exist to include studies directly related to the review question, particularly when having a narrow focus, also variation needs to be considered. When studies close to the focus of a literature review are included, it can be established whether an intervention is effective or a theory, law of observed regularity, conceptualisation, etc. is supported. However, from a scholarly perspective it is also relevant to find out what causes variation across studies and other works. This requires balancing how a wider scope, expressed in broader inclusion criteria with regard to content, may be beneficial for findings related to further research and how rigour in the case of quantitative reviews and trustworthiness for qualitative reviews can be achieved at the same time.
6.2.2
Date of Publication
The date of publication as an inclusion criterion may appear for three reasons. The first one is that an intervention, theory, law of observed regularity, method, etc. was not available before a specific date. Sometimes care has to be taken that a concept or theory is available under a different name. Going back to the example about lean production, this was commonly known in the 1980s as just-in-time production; the renaming of this approach into lean production was merely an extension rather than a new typification. The second reason can be that an existing review more or less covers the topic up until a specific date. Third, specifying a date of publication based on availability of research methods used could be a reason for considering studies. A case in point is the consolidated criteria for reporting qualitative studies (aka COREQ), which was published in 2007 (Tong et al.); its availability implies that only studies thereafter could refer to this tool for assessing the reporting of qualitative research. Therefore, the availability of specific scholarly knowledge (e.g., theories, laws of observed regularities, interventions, policies), existing reviews and advances in research methods are among the reasons to specify a date of publication for the inclusion of studies.
6.2 Inclusion Criteria
6.2.3
205
Language
Another inclusion criterion is language in which sources are published. Often literature reviews opt for studies written in English, with cases in point being the scoping study of Genet et al. (2011, p. 207/2)1 into the provision of home care in Europe and the impact of billboards on the visual behaviour of drivers by Decker et al. (2015, p. 238). However, to prevent bias studies in other languages may need to be included. In this respect, some studies (e.g., Grégoire et al. 1995) have found a bias, which is called the ‘Tower of Babel’, referring to the biblical story; see also Section 5.7. It means that literature reviews only focusing on publications in English tend to report more statistically significant results and to favour more positive outcomes; for instance, see Egger et al. (1997, p. 329), and Thornton and Lee (2000, p. 209), who both confirm this observation. An example of this criterion for inclusion is the work by van Ham et al. (2006, p. 175) who take in studies in five languages for evaluating job satisfaction among general practitioners. However, there is some conflicting evidence; a case in point is the work by Morrison et al. (2012, p. 143) who indicate that the language bias may play a lesser role, though they notice a difference in methodology and reporting of studies; also, Moher et al. (2000) find that language restrictions do not lead to biased estimates of the effectiveness of interventions. Generally speaking, this means that the inclusion of other languages leads to more studies to be considered, which could lead to improving the internal and external validity of findings in a review. In addition to increased access to more studies, sometimes, works written in another language offer a differing perspective. A well-known case in point is the publications by Joseph Alois Schumpeter. His original work (1911) was written in German, but the later published translation into English (1934) has content that did not appear in the original work; this is relevant to those that are interested in how his thoughts developed. Therefore, including studies and works in other languages could lead to different insight, albeit the reasons may differ for individual instances. TIP: TOWER OF BABEL The inclusion of a greater diversity in studies for a literature review is seen as beneficial as discussed in Section 5.7, where also its controversy is mentioned. With regard to including studies in another language, the following advice applies: • Include in a protocol how studies in other languages will be found and translated. A common measure for accuracy is to translate studies back from English or the language of the review into the original language by independent researchers or translators, so that the accuracy of the translation can be checked. • If possible and practical, provide translations. Not all readers and assessors of works may master the original language of a work. Providing the translation as
Although the authors label the study as a ‘systematic literature review’, it actually has the characteristics of a scoping study (see Section 4.5 for the latter).
1
206
6 Setting Inclusion and Exclusion Criteria
used may make it easier for others to fully understand the work that has been done. If the full translation is not possible, it could be considered to only make the relevant fragments available.
6.2.4
Types of Source
Also, the type of publication could lead to consideration of inclusion; see Section 5.2 for a typology of sources. More typically, this is the case for narrative reviews and systematic literature reviews due to their more qualitative nature. Especially, this is relevant when a review looks at the development of thoughts and concepts that are not necessarily led by academics, or when a concept has developed in a less observable manner. For example, the systematic literature review by Salgado and Dekkers (2018) looks at the origins of concepts found in lean product development, an approach to new product development that emerged in the 1990s; to this purpose, they appraised conference proceedings, doctoral theses, monographs, presentations, professional publications, reports and working papers in addition to published studies in scholarly journals. Some of their findings would have not been possible without taking these other sources into account. This example shows that depending on the review question the inclusion of a broader scope of publications than those in academic journals could yield insight that otherwise would be more difficult to obtain; therefore, the consideration of which type of sources are suitable for the review question plays a crucial role in some, if not most, literature reviews.
6.2.5
Research Design and Method
The design of the research, including the methods used, could be a reason for inclusion of studies in the analysis of a literature review. This is common in evidence-based interventions, policies, practices and treatments, but also in other literature reviews the inclusion of different research designs could lead to either broadening the evidence base or confounding due to diversity. This may be different for the archetypes of literature reviews. For systematic reviews, the hierarchy of evidence and frameworks, such as GRADE (Grading of Recommendations, Assessment, Development and Evaluations), are used to account for differences in research designs; see Section 6.4. For qualitative studies other methods have to be used for comparing studies, and thus, including a broader range of studies; see Section 6.5. Thus, setting criteria which type of research designs and methods in studies will be considered for analysis clarifies the comparison of studies to each other, and could lead to a broader range of studies, and consequently, a broader evidence base to be part of a review.
6.2 Inclusion Criteria
207
It could be that sometimes a literature review focuses on one type of research design. This is the case when specific outcomes are sought, which often links to the research paradigms (see Section 2.7). If perspectives and views of participants are at the centre of a literature review, then qualitative studies will be more appropriate to consider; the related research methods are then action research, case studies, focus groups, interviews among others depending on the precise formulation of the review question. This implies that when including studies with specific research designs, the review question guides this inclusion criterion.
6.2.6
Sampling
A further inclusion criterion is which objects or subjects are included in a literature review. In systematic reviews aiming at evidence-based interventions, practices and treatments in healthcare, these are commonly age groups or groups of patients with specific conditions. Therefore, the setting of this inclusion criterion is mostly related to population in the format population-intervention-outcome and its variants for formulating review questions (see Section 4.4). Also, in other type of literature reviews sampling can be a criterion for inclusion. An instance could be the inclusion of specific industries when looking at concepts for strategic management for technology cycles; these cycles differ vastly across industries, such as food and utilities. Therefore, specifying characteristics for the type of objects and subjects to be considered for analysis is crucial to an effective review.
6.2.7
Data Analysis
A final criterion for inclusion is the conduct of data analysis. This can refer to specific measurement instruments or equipment used as well as methods for analysis. An example is the study by Connor et al. (2002, p. 254) who included only studies that used a rating scale for aggressive behaviour in youths for the context of attention-deficit/hyperactivity disorder; see Box 6.A. However, a study could also come to the conclusion that there is a variety of instruments used across studies, which may make comparing outcomes more cumbersome. For this reason, it could be helpful to specify methods for data analysis, depending on the purpose of the review and the review question. TIP: OTHER INCLUSION (AND EXCLUSION) CRITERIA More criteria exist than these seven mentioned, though the ones in this chapter represent the most common ones. The determination which criteria should be used depends on the review question and the purpose of the review; for instance, when a
208
6 Setting Inclusion and Exclusion Criteria
literature review aims at contributing to scholarly knowledge only, a wider range of studies could be taken into account. With regard to which criteria to use, the format review question provides insight. For example, the criterion setting can be used when the format population-intervention-outcome for review questions is extended (see Section 4.4). Box 6.A Example I of Inclusion and Exclusion Criteria (Systematic Review) Connor et al. (2002) performed a meta-analysis into the effects of stimulants on aggression-related behaviours within the context of the attention-deficit/ hyperactivity disorder (ADHD); they focus on youths for their systematic review. They applied inclusion and exclusion criteria (ibid., p. 254), mentioned below; behind each specific criterion the specific category is referred to. Their use of inclusion and exclusion criteria resulted in only 28 out of more than 200 studies found in the search to be usable for analysis. Inclusion criteria • Studies reporting quantitative data on independent stimulant effects for defined aggression-related behaviours within the context of ADHD. [Content] • Studies should have been published in a peer-reviewed scientific journal. [Source] • The research method included a placebo control, either in a crossover or parallel-groups methodological design. [Research design and method] • The participants in studies should have a mean sample age less than 18 years. [Sampling] • Studies should have used a rating scale or method of observation to assess aggression-related behaviours in youths with ADHD. [Data analysis] Exclusion criteria • Reports that only include effects of stimulants on the core symptoms of ADHD. [Content] • Open studies, case reports or review articles. [Source]
6.3
Exclusion Criteria
Exclusion criteria can be seen as the opposite of inclusion criteria. Similar to inclusion criteria, their use is related to the review question and practical reasons for conducting a literature review. For example, the language criterion that studies should be only in English in a hypothetical review can be formulated in two ways:
6.3 Exclusion Criteria
209
• Studies in English are considered (formulation as inclusion criterion). • Studies in other languages than English have been discarded (formulation as exclusion criterion). This means that principally it does not matter whether the criteria are written as inclusion or exclusion as long as they clarify what content and characteristics of studies will be considered during the actual appraisal of retrieved studies. When defining exclusion criteria, it should not lead to essential and relevant studies and sources to be discarded. The purpose of retrieving studies and works, and the use of exclusion criteria is to find as many as possible relevant sources. The use of exclusion criteria should only lead to the removal of those sources whose content and other characteristics do not pertain to the review question. This still leaves open how to deal with studies and works that contribute in varying way to the review question; this is the topic of the next two Sections 6.4 and 6.5. Before going into more detail about how to deal with variance across studies, two examples for the use of inclusion and exclusion criteria are presented here. The first example in Box 6.A is the meta-analysis by Connor et al. (2002) into the effects of stimulants on aggression-related behaviours of youths within the context of the attention-deficit/hyperactivity disorder, mentioned in the previous section. For instance, they use both an inclusion criterion and exclusion criterion for evaluating the content of retrieved studies. The second example concerns the systematic literature review by Muccini et al. (2016), who look into the conceptualisation and application of self-adaptation for cyber-physical systems; self-adaptation refers to systems being self-aware, context-aware and goal-aware, and cyber-physical systems are large-scale distributed systems of software and hardware; see Box 6.B. In their case, two inclusion criteria are used for evaluating the content. In both examples, the use of inclusion and exclusion criteria leads to a considerable reduction of studies that are taken into further analysis; see the boxes for detail. Also, both reflect on the use of inclusion and exclusion criteria. Connor et al. (2002, p. 259) hint at this when they note that some of their findings may be due to chance, whereas Muccini et al. (2016, p. 80) state that one of their limitations was whether a specific study did apply self-adaptation as it seems that not all studies where explicit in its use, despite being a main feature of cyber-physical systems. Although both examples on the use of inclusion and exclusion criteria show the effect on studies taken to the actual appraisal stage of a literature review, they also contain indications about the potential deficiencies in the search strategy, and the use of inclusion and exclusion criteria. TIP: OUTLINE BORDERLINE CASES FOR SOURCES Particularly, useful for qualitative systematic literature reviews is to indicate which studies were on the borderline for exclusion or inclusion. Such a disclosure will assist readers of a review to obtain better insight how the criteria for inclusion and exclusion were actually applied during the literature review. An example is the
210
6 Setting Inclusion and Exclusion Criteria
study by Salgado and Dekkers (2018, p. 905), who list examples of studies that were excluded because of content, type of study, source and specific instances of conference proceedings. Box 6.B Example II of Inclusion and Exclusion Criteria (Systematic Literature Review) Muccini et al. (2016) look into the conceptualisation and application of self-adaptation for cyber-physical systems (CPS); self-adaptation refers to systems being self-aware, context-aware and goal-aware, and cyber-physical systems are large-scale distributed systems of software and hardware. They applied the following inclusion and exclusion criteria (ibid., p. 76); the specific type of criteria is found behind each criterion. Inclusion criteria • Studies that are proposing, leveraging or analysing an architectural solution, architectural method or technique specific for CPSs. [Content] • Studies in which self-adaptation is explicitly used as an instrument to design and construct a CPS. [Content] • Studies subject to peer review (e.g., journal papers, papers published as part of conference proceedings). [Source: type of publication] • Studies published after or in 2006. [Date of publication] Exclusion criteria • Studies that are written in a language other than English. [Language] • Studies that are not available in full-text. [Source] • Secondary studies (e.g., systematic literature reviews, surveys, etc.). [Source: type of publication]. The application of these criteria yielded 42 studies to be considered for further analysis out of 783 found during the search.
6.4
Quality of Evidence
A particular point related to the inclusion or exclusion of studies and how to evaluate evidence provided in a study is called the ‘quality of evidence’. The assessment of the quality of evidence can be used for the quantitative or qualitative analysis of studies and synthesis of findings. This section will elaborate on this matter, particularly for the archetype systematic reviews.
6.4 Quality of Evidence
6.4.1
211
Hierarchy of Evidence Pyramid—Systematic Reviews
For evidence-based interventions, policies, practices and treatments, particularly in healthcare and medicine, the quality of evidence is viewed as a hierarchy based on the type of study; commonly, it is depicted as a pyramid as shown in Figure 6.1. It is derived from the recommendations of a working group (Guyatt et al. 1995) and the work by Greenhalgh (1997), although variations of this pyramid exits. In this pyramid, the quality of evidence depends on the type of study undertaken; this is discussed here from the lowest quality for the level of evidence to the highest level: • Editorials and expert opinions are ranked lowest in this hierarchy, because of potential bias by authors and selective use of evidence. • Physiological studies have more credence, because they focus on biochemical mechanisms. However, outcomes of these studies do not lead necessarily to effective interventions and practices. • Case reports describe the medical history of a single participant in study by a narrative. These can also be part of a series of case reports for a particular
Randomised controlled trials Cohort studies Case-control studies
In cr e
as in
g
qu
al
ity
of
ev
id
en
ce
Meta-analysis, systematic reviews, critical appraisals
Cross-sectional studies, surveys Case reports, case studies Physiological studies Editorials, expert opinions
Fig. 6.1 Representation of traditional hierarchy of evidence for interventions as pyramid. This heuristic is used for comparing the relative strength of outcomes obtained from empirical studies, particularly in evidence-based interventions and practices in healthcare. The research design of a work, such as a case report for an individual instance or a randomised controlled trial, and the outcomes measured, for instance survival rate or quality of life, determine the strength of the evidence. Typically, systematic reviews of completed, high-quality randomised control trials rank as the highest quality of evidence above observational studies, while expert opinions and editorials are found at the lowest level for quality of evidence in this hierarchy of evidence pyramid.
212
6 Setting Inclusion and Exclusion Criteria
condition, specific intervention or treatment. Therefore, they have more credibility than the previous two levels. • Cross-sectional studies and surveys focus on clinical questions amongst a sample of participants but may also refer to a recount of the past. Because of the scale of sampling, they may have credibility. • Case–control studies pinpoint patients with a particular disease or condition that are ‘matched’ with controls (patients with some other disease, general population, neighbours or relatives). Data are then collected (for example, by searching back through these people’s medical records or by asking them to recall their own history) on past exposure to a possible causal agent for the disease. • Cohort studies select two (or more) groups of people on the basis of differences in their exposure to a particular agent (such as a vaccine, a drug or an environmental toxin); these are followed up to see how many in each group develop a particular disease or any other outcome. These are typically longitudinal studies. • Randomised controlled trials randomly allocate participants by a process equivalent to flipping a coin to either one intervention (such as a drug) or another (such as a placebo treatment or different drug). Both groups are followed up for a specified period and analysed in terms of outcomes defined at the outset (mortality, heart attack, serum cholesterol level, etc.). • Meta-analyses, systematic reviews and critical appraisals are found at the top of the pyramid. Meta-analyses and systematic reviews are protocol-driven literature reviews that weigh evidence from a multiple of sources to find the evidence for specific outcomes. Critical appraisals are also found here, because they add an evaluation of the outcomes to a specific study. There are two types of critical appraisals in this pyramid: • Critically appraised articles. This type of works evaluates and summarises individual empirical studies, and provides additional notes with regard to the credibility of the outcomes. • Critically appraised topics. Writings of this type evaluate and synthesise multiple research studies. Systematic reviews and meta-analyses carry a higher credibility than critical appraisals. Therefore, by distinguishing levels of evidence depending on the research design, the pyramid offers an overview of what the reliability of studies is when they are considered during the stages of analysis and synthesis for a literature review. There are some modifications of this hierarchy for the quality of the level of evidence, for which two instances are presented here. First, Murad et al. (2016) based their hierarchy of evidence, see Figure 6.2, on two alterations to the traditional hierarchy of evidence for evidence-based medicine. The first reshaping (ibid., p. 127) is found in the quality of studies that may vary even if they are of the similar type; this relates to the topic of grading individual studies aggregated in a
Case-control studies
re In c
Case reports, case studies
Meta-analyses, systematic reviews
v id en lit yo fe qu a
Cohort studies
as
ing
Randomised controlled trials
Filtering of information
213 ce
6.4 Quality of Evidence
Fig. 6.2 Representation of modified hierarchy of evidence for interventions. Based on Murad et al. (2016, p. 126), this hierarchy of evidence for evidence-based interventions and practices in healthcare and medicine shows that because of variances in conduct of individual studies, the borderlines between classes of studies become permeable. Furthermore, meta-analyses and systematic reviews are seen as lenses defined by critical objectives to which evidence from empirical studies are viewed.
review, which is part of the next subsection. The second reshaping is that systematic reviews and meta-analyses provide a lens for filtering information from studies. Although not contesting the pyramid for the hierarchy of evidence as such, the representation by Murad et al. offers a different look at the traditional pyramid. The second instance of a modification is the evidence-based healthcare pyramid 5.0 by Alper and Haynes (2016), a further development of earlier models; see Figure 6.3 for an adaptation of their model for the hierarchy consistent with the approaches in this book. It simplifies those earlier models by having only five layers, and focuses on guidelines and recommendations derived from evidence in studies. The two instances of modification show that different versions of the hierarchy exist, sometimes serving different purposes than the traditional conceptualisation of the hierarchy of evidence pyramid.
6.4.2
GRADE: Grading of Recommendations, Assessment, Development and Evaluations
In addition to the hierarchy of evidence, an assessment method for the quality of evidence called GRADE (Grading of Recommendations, Assessment, Development and Evaluations) is used, particularly in evidence-based interventions, practices and treatments. It aims at providing a transparent framework for developing and presenting summaries of evidence with a systematic approach for making clinical practice recommendations (Guyatt et al. 2008). With the method, the aggregated quality of evidence across studies considered for a literature review is categorised, see Box 6.C; the quality of evidence is also called certainty in
214
6 Setting Inclusion and Exclusion Criteria
Systems
Synthesised summaries for clinical reference Systematically derived recommendations (guidelines)
Systematic reviews
Studies
Summaries aggregating appraisals of three lower layers • Synthesis (aggregated multiple appraisals) • Synopsis (extracted and appraised) • Search filter: (peer-)reviewed • Synthesis (extracted and appraised) • Search filter: (peer-)reviewed • Aggregation (extracted) • Search filter: (peer-)reviewed
Fig. 6.3 Evidence-based healthcare pyramid for finding pre-appraised evidence and guidance. This pyramid for the hierarchy of evidence modified from Alper and Haynes (2016, p. 124)* distinguishes five levels: empirical studies, systematic reviews, systematically derived recommendations (guidelines), synthesised summaries for clinical reference and systems (the latter refers to, for example, computerised decision support systems integrated with electronic health records). Each level should build systematically from lower levels and provide substantially more useful information for guiding clinical decision-making. Within the bottom three levels, critically appraised content includes filtered (pre-appraised) collections of original reports, synopses of original reports (appraisal and extraction of key content); and, the levels rely on syntheses combining multiple original reports and synopses. Synthesised summaries for clinical reference are resources that include all three lower layers and integrate the content meeting clinical reference needs. *Adapted to fit with the content of the book and terminology for reviews.
evidence. Looking at Figures 6.1 and 6.2, evidence from randomised control trials starts at high quality and, because of potential measurement errors (residual confounding), evidence that includes observational data starts at low quality. Therefore, rating the quality of evidence for both individual studies in a systematic review or outcomes of a systematic review provides insight whether recommendation are robust and further studies are necessary. The quality of evidence should be related to individual outcomes of studies. The reason for applying the assessment to each separate outcome (Balshem et al. 2011, p. 404) is that the quality of evidence often varies across outcomes. In addition, not every study in a systematic review may report all outcomes. Furthermore, an overall GRADE quality rating is derived from a body of evidence across outcomes, usually by taking the lowest quality of evidence from all of the outcomes that are critical to decision making (Guyatt et al. 2013, p. 155). This means that in a systematic (literature) review to inform evidence-based interventions and practices the evidence for each outcome is considered separately, and for all outcomes critical to decision making the outcome with the lowest level of evidence forms the benchmark.
6.4 Quality of Evidence
215
Box 6.C Categories for Quality of Evidence (GRADE) The GRADE method for evaluating the quality of evidence distinguishes four categories (Guyatt et al. 2008, p. 926): • High quality: further research is very unlikely to change the confidence in the estimate of the effect of an intervention or practive. • Moderate quality: further studies are likely to have an important impact on the confidence in the estimate of an effect and may change the estimate. • Low quality: further inquiries are very likely to have an important impact on the confidence in the estimate of effect and are likely to change the estimate. • Very low quality: any estimate of the effect of an intervention is very uncertain. When considering the quality of evidence, the confidence is influenced by the following factors, but not limited to: • Limitations of a study. • Inconsistency of results (within a study and across studies). • Indirectness of evidence (for example, the use of proxy variables) • Imprecision (for instance, reporting of temperature rather than measuring). • Publication bias (within a study and across studies).
Table 6.1 Factors that may lead to upgrading or downgrading in the GRADE framework. This tables provides an overview of factors that lead to downgrading or upgrading the quality of evidence for findings and recommendations, based on Figure 6.1 in Djulbegovic and Guyatt (2017, p. 417). When rating an individual study, first the quality of the evidence is determined by the research method or content of the work as depicted in Figure 6.1. Then the content of the work is evaluated and the level of rating, see Box 6.C, may be downgraded or upgraded according to the rating for factors mentioned in the table. Downgrading • • • • •
Risk of bias in conduct of empirical study Inconsistencies between studies Indirectness Imprecision Likely publication bias
Change in levels • 1 or 2 levels • 1 or 2 levels • 1 or 2 levels • 1 or 2 levels • 1 or 2 levels
Upgrading • • • •
Large effect size Dose–response gradient (cause-effect) Plausible confounding would reduce demonstrated effect Possible confounding would suggest spurious effect when actual results show no effect
• • • •
1 or 2 levels 1 or 2 levels 1 level 1 level
216
6 Setting Inclusion and Exclusion Criteria
Since studies may have different reporting, even in similar studies with regard to the design and method, the rating may need to be adjusted. Based on the four categories for the quality of evidence in Box 6.C, for each study the certainty in the evidence can be downgraded for five main reasons (see Table 6.1 for an overview): • Risk of bias in conduct of empirical studies. This type of bias occurs when systematic flaws or limitations in the design, conduct or analysis of a study or studies used for the review distort the results. Such may result in a study not representing the truth because of inherent limitations in design or conduct of a study. During the undertaking of a systematic review, it is difficult to know to what degree potential biases influence the results, and therefore, certainty is lower in the estimated effect if the studies informing the estimated effect could be biased; this means that the risk of bias is assessed rather than whether bias occurred. Several tools have been developed to evaluate the risk of bias for randomised trials. For example, Higgins et al. (2011) present a method used for reviews in the Cochrane Collaboration, Shea et al. (2007) introduce AMSTAR (A Measurement Tool to Assess Systematic Reviews) for systematic reviews and Schulz et al. (1995) propose a method later known as the Schulz approach; AMSTAR was updated to AMSTAR-2 (Shea et al. 2017) with the purpose to include observational studies. There are also other tools for appraising the quality of observational studies; some are mentioned here: • COSMIN (COnsensus-based Standards for the selection of health Measurement INstruments) (Mokkink et al. 2010a, b). • Jadad scale (Jadad et al. 1996). • Newcastle–Ottawa Scale (Wells et al. 2011). • ROBINS-I (Risk Of Bias In Non-randomised Studies of Interventions) (Sterne et al. 2016). • RTI item bank (Viswanathan and Berkman 2012), named after the institute RTI International at which it was developed. • SYRCLE’s risk of bias tool (Hooijmans et al. 2014) for animal studies. There are several publications that have looked into evaluating these methods and comparing methods; for example, Hartling et al. (2009) look at the tool used by the Cochrane Collaboration (see also Chapter 8), Jadad scale and Schulz approach, and Margulis et al. (2014) compare the Newcastle-Ottawa Scale with the RTI item bank. Such writings could be used to justify the choice of a specific method for appraising the quality of evidence of studies. In a systematic review, these judgements with regard to risk of bias about individual studies need to be collated into the robustness of findings across studies. Guyatt, Oxman, Vist et al. (2011, p. 412) suggest the following. The appraisal of risk of bias across studies is not merely averaging across studies, but rather a deliberation on the contribution by each study. In addition, it requires evaluating how each study contributes to a specific outcome; a conservative approach is advised to ensure robustness and trustworthiness. The risk of bias across studies should be related to the other factors for the quality of evidence (imprecision, inconsistency of results, indirectness of evidence and publication
6.4 Quality of Evidence
217
bias). This should lead to a balanced assessment for the influence of the risk of bias across studies. • Imprecision. The measure of imprecision refers to statistical significance of outcomes; these are typically influenced by factors, such as accrued sample size, required or optimal information size (‘sample size’ across studies), confidence intervals for the overall effect, and specified critical margins of ‘no effect’, ‘important benefit’ or ‘important harm’ (Castellini et al. 2018, p. 110/2). For the assessment of imprecision, two approaches can be used. The GRADE approach to rating imprecision focuses on the 95% confidence interval around the best estimate of the absolute effect (Guyatt, Oxman, Kunz, Brozek et al. 2011, p. 1284). Certainty is lower when if the clinical decision is likely to be different if the true effect was at the upper versus the lower end of the confidence interval. Since the effect estimate may come from only one or two small studies or few events, Walsh et al. (2014, p. 628) propose to use a fragility index. In addition to using the GRADE framework, Castellini et al. (2018) suggest trial sequential analysis as complementary measure if imprecision. In this method, the trials are placed in sequential order to judge when the outcomes become statistically significant; this is generally achieved when the number of participants in a trial meets the threshold for confidence intervals. Thus, the judgement of imprecision contributes to the level of confidence in observed outcomes. • Inconsistency of results. Because of studies in a systematic review conducted independently, even though aiming at the same intervention or practice, it is unlikely that all aspects of these are similar; this may result in inconsistency of results. Such heterogeneity is likely to arise through diversity in actual interventions, lengths of follow up, study quality and inclusion criteria for participants (Higgins et al. 2003, p. 557). However, certainty across a body of evidence is highest when there are several studies that show consistent effects. When considering whether or not certainty should be rated down for inconsistency, the similarity of point estimates, extent of overlap of confidence intervals, and statistical criteria including tests of heterogeneity should be taken into account (Guyatt, Oxman, Kunz, Woodcock, Brozek, Helfand, Alonso-Coello, Glasziou et al. 2011, p. 1295). If there is heterogeneity, this can be related to factors in the conduct of particular studies or clusters of studies. • Indirectness of evidence. Another reason for downgrading the certainty of evidence is when measures of the outcome or the population of interest vary across studies (see Guyatt, Oxman, Kunz, Woodcock, Brozek, Helfand, Alonso-Coello, Falck-Ytter et al. 2011, p. 1304); this is called indirectness of evidence. The use of a measure different from the outcome, sometimes called ‘surrogate outcome’, can be instigated by relevance; for example, when patients find specific outcomes more important. It could also be that the outcome is more difficult to measure. These are called differences in outcome. The certainty about evidence can be rated down if the participants studied are different from those for which interventions applies or that the setting differs; such are differences in population. An example of the latter is studying a new surgical procedure in a
218
6 Setting Inclusion and Exclusion Criteria
highly specialised treatment centre, which only indirectly applies to centres with less experience. For the latter, principles of generalisation (Dekkers 2017, pp. 50–2), and homomorphism2 and isomorphism3 can be used; also, generalisation based on principles for case studies (for example, Evers and Wu 2006) is informative for this matter’; abductive reasoning is a key process for generalisation. This implies that indirectness plays an important role in the assessment of the quality of evidence and that principles of generalisation, including abductive reasoning, can achieve more confidence across studies, particularly observational studies. • Publication bias. This type of bias is seen as perhaps the most troubling for assessing the quality of evidence, because it requires making inferences about incomplete evidence. Guyatt, Oxman, Montori et al. (2011, p. 1279) state several reasons why publication bias may happen related to initial studies being small, motivation of authors to publish and processes for review of manuscripts by journals. It occurs more frequently with observational studies and when published studies are funded by industry. A common method to investigate publication bias is the so-called funnel plot; see Section 7.7 for a generic discussion of this plot and Section 8.2 for an example drawn from the Cochrane Collaboration. In addition to downgrading, studies can also be upgraded in the certainty of evidence for four reasons (see Table 6.1). • First, when there is a very large magnitude of effect, it might be more certain that there is at least a small effect. However, such judgement should be balanced with inconsistency across studies. • Second, when there is a clear dose–response gradient (or intervention-outcome). Such a relationship would indicate that the intensity of an intervention is related to outcomes that need to be achieved. • Third, when there is plausible confounding which would reduce the demonstrated effect or outcome. In this case, in studies confounding by contingencies, variables or combined treatments leads to the actual effect not fully being observed. • Fourth, when possible confounding would suggest a spurious effect when the actual results show no effect. This means that an effect is suggested in studies, but that actually no effect or correlation exists; normally, it is caused by other factors manifesting the same effect.
2
Homomorphism (Dekkers 2017, pp. 64–5) indicates that some of the elements and relationships (structure) are not identical, but the remaining ones are sufficiently relevant for making comparisons. 3 Isomorphism (Dekkers 2017, p. 64) means that elements and relationships (structure) are the same with regard to the purpose of the study. In this respect, Norbert Wiener (cited in Dekkers 2017, p. 63) famously said that ‘The best material model of a cat is another, or preferably the same, cat.’ This means that in practice comparisons are always limited, that some degree of homomorphism is to be expected and that when making a comparison thought should be given to which elements, subjects or objects and their relationships essential to the topic the comparison is plausible.
6.4 Quality of Evidence
219
This is derived from the reasons for upgrading provided by Guyatt, Oxman, Sultan et al. (2011, p. 1312). It implies that sometimes studies of a specific type can be providing better quality of evidence when the researchers or authors have taken care how the study was conducted and reported. NOTES • It is strongly advised to pay explicit attention to reporting the risk of bias in systematic reviews. Katikireddi et al. (2015, p. 193) find that reporting of findings lacks transparency with regard to bias, and consequently, robustness cannot be fully assessed in most cases. To the purpose of increasing transparency, they suggest performing a sensitivity analysis or reporting separately the findings from more robust studies. • When outcomes are not independent as assumed in GRADE, but when they are correlated, it will be more difficult to attribute the quality of evidence to specific recommendations. This can be considered akin to imprecision. It requires assessing the individual outcomes and the confounding by the degree of their correlation. This implies that the quality of evidence for factors and contingencies related to the correlations between multiple outcomes should be separated from the quality of evidence for factors and contingencies that affect only one specific outcome. TIP: ASSESSING QUALITY OF CASE STUDIES IN EVIDENCE-BASED INTERVENTIONS AND PRACTICE Although case studies are not providing rigorous evidence in terms of a population of patients or participants, the quality of reporting plays a vital role; on this matter, a few noteworthy publications have been written that may assist in assessing the quality of reporting. Among them is the publication by Runeson and Höst (2008).
6.4.3
Quality of Evidence for Qualitative Analysis and Synthesis
Whereas the use of GRADE and other related tools for quantitative approaches to systematic reviews, there are also tools for assessing the quality of evidence in the case of qualitative analysis and synthesis. For example, Carmona et al. (2021, pp. 496–7) find that ten out of 17 qualitative studies discussed the methodological quality. They further put forward that the appropriate tools for assessing the confidence in findings are: • CASP (Critical Appraisal Skills Programme) (an early reference for use in qualitative synthesis is Dixon-Woods et al. [2007]). • Cochrane Handbook (Higgins et al. 2019, Chapter 20). • GRADE CERQual (Confidence in the Evidence from Reviews of Qualitative research) (Lewin et al. 2015).
220
6 Setting Inclusion and Exclusion Criteria
• Quality in Qualitative Evaluation (Spencer et al. 2003). • Critical Appraisal Checklist for Qualitative Research (The Joanna Briggs Institute 2017). Some, for example, Dixon-Woods et al. (2007) and Purssell (2020), have looked at whether some of these tools can be used together and how they compare. Purssell (ibid., p. 1088) finds that CASP can be used in a complimentary manner to GRADE. Thus, it is of paramount importance that similar to GRADE and other tools for quantitative systematic reviews, tools are used for assessing the quality of evidence for findings and recommendations when conducting qualitative analysis and synthesis. NOTE: APPRAISING QUALITATIVE ANALYSIS AND SYNTHESIS In Section 10.3, guidance can be found for ensuring and assessing the quality of qualitative synthesis. Table 6.2 provides and overview of tools for this purpose by Majid and Vanstone (2018, pp. 2120–3). Some methods mentioned in the table do reappear in the context of qualitative synthesis in Section 10.3.
6.5
Determining Level of Evidence for Other Archetypes
Whereas the hierarchy of evidence and the GRADE framework are directly related to systematic reviews, particularly in healthcare and medicine, this leaves open what to do when conducting any other archetype of literature review; especially, when conducting a systematic literature review this needs to be considered. Since there is greater variety in other than evidence-based interventions, practices and treatments in general, there is a lack of frameworks suitable for evaluating the quality of evidence. Therefore, researchers need to develop their own frameworks for assessing the quality of studies considered during a literature review. This development of frameworks could be based on converting existing frameworks into ones that are suitable for a specific domain. A case in point is the development of a critical appraisal tool for software engineering by bin Ali and Usman (2019); it is derived from AMSTAR-2. Whereas using existing standards can be helpful, they also may be limited in their assessment for other domains than the original domain or discipline they were developed for. When existing frameworks for determining the level of evidence are not suitable, another way is creating a hierarchy of evidence similar to the ones used in systematic reviews. An example is found in the doctoral study of Koukou (2020, pp. 37–9) on end-user involvement during new product development processes; see Box 6.D. Although many of the 72 studies focused on end-user involvement in some way, only seven of these met the highest level for the quality of evidence provided. Such a finding indicates that the actual evidence in studies on customer involvement during new product development is relatively limited and also does not allow for sufficient granulation for how new product development is conducted.
• • • •
• Emergent appraisal tool • Coverage of all methodologies unestablished
• Transparent guidelines • Very comprehensive
• Very transparent in purpose • Claims considering all qualitative methodologies • Use of ‘berry-picking’ approach to find existing appraisal tools • Mimics how appraisers would locate tools
18 structured appraisal questions
21 structured statements
12 essential criteria statements
QQE (quality in qualitative evaluation)
SRQR (standards for reporting qualitative research)
Walsh’s statements
• ‘Berry-picking’ approach is non-systematic • Use of criteria ‘imaginatively’ used instead of prescriptively difficult for consistent reporting quality
Time-consuming Lower reliability between appraisers than CASP Applies only to traditional data collection methods Lack of guidelines for distinctive methodologies
• Quality criteria have evolved and been expanded • No detailed guidelines on how to use and score
• Seen as seminal formulation of quality criteria • Highlights differences between qualitative and quantitative research
8 ‘markers’ Highlights differences between qualitative and quantitative research
Popay’s markers
• Main emphasis is on congruity between philosophy, methodology and methods
• On-line software tool streamlines appraisal • Brevity and clarity for less experienced users • Better assessment of study details than other methods
Checklist with 10 questions
JBI (The Joanna Briggs Institute)
• Lack congruity philosophy and method • Requires use by expert in qualitative research • Time-consuming
• Very comprehensive • Evaluative abstract serves as summary of study
38 unstructured questions
ETQS (evaluation tool for qualitative. studies)
• Lacks influence of research philosophy • Only applicable to interviews and focus groups • Lacks guidelines for non-traditional data collection
• Widely used and endorsed by many journals
Checklist with 10 questions
32 structured statements
Noted weaknesses • Weaker in evaluation of methodological quality • Adaptations of tool harder to use • Favours studies with better methodological quality above contributions to domain
Strengths • Easy to understand and administer • Most commonly used method for qualitative synthesis
Brief description
COREQ (consolidated criteria for reporting qualitative research)
Method CASP (critical appraisal skills programme)
Table 6.2 Methods for assessing quality of evidence in qualitative literature reviews. This is a summary of the overview by Majid and Vanstone (2018, pp. 2120–3). For each of the eight methods strengths and indicate weaknesses are shown. The method QF has been renamed QQE to fit better with its original source.
6.5 Determining Level of Evidence for Other Archetypes 221
222
6 Setting Inclusion and Exclusion Criteria
Box 6.D Quality of Evidence for End-User Involvement During New Product Development As part of the systematic literature review in her doctoral study Koukou (2020, pp. 37–9) needed to assess the evidence provided in publications about the involvement of end users during new product development (NPD) in firms. This was necessary due to the descriptive nature of this topic in studies and the assumption in studies that the presence of end users during new product development also implied involvement of some kind. To this purpose, she developed four levels for the hierarchy of evidence: • Level I. Descriptive papers that focus on how end users get involved in the NPD process or on the type of end user that is best to be used according to the phase of new product developmentno evidence for supporting the impact. • Level II. Authors comment on end-user involvement that are based on the analysis of the results from case studies and surveys. Focus of studies is not explicitly on end-user involvement. Some indicators but not strong or clear evidence. • Level III. Evidence based on comments from the studied companies (managers, designers, employees). No clear evidence. • Level IV. Strong evidence supported by facts and outcomes related to the impact of end-user involvement in NPD. Two examples of the classification are provided here. This classification places the study by Filieri (2013) at Level II (Koukou 2020, Appendix II). The study by Filieri (2013) is more focused on the processes within a case study then the actual impact of end user involvement. For example, there is no evidence how customer involvement did lead to the success of new product development. The study by Hauser (1993) demonstrates the relationship between customer involvement and the successful launch of products into the market for the case study; therefore, the quality of the level of evidence is graded as Level IV. This example about developing a hierarchy of evidence shows that a typology can be created close to the ones used in systematic reviews. A third way is using other studies with classifications. An example is how theory is used. Zahra and Newey (2009, pp. 1066–70) propose a classification. Mode 1 represents studies that borrow and replicate existing theory, Mode 2 consists of research that borrows and extends theories, and Mode 3 are work that transform the core of existing theories and proposes new theoretical perspectives. Dekkers and Kühnle (2012, p. 1104) use this classification for assessing interdisciplinary contributions to collaborative manufacturing networks; they find that only five out of 202 papers analysed make a contribution in terms of Mode 3. This means that existing classifications that pertain to the topic of study could be used for assessing the quality of evidence in retrieved studies, although they may focus on specific points relevant to a review.
6.5 Determining Level of Evidence for Other Archetypes
223
A fourth approach is using generic quality criteria for research, particularly for empirical studies. Generally, these criteria are internal validity, external validity, rigour and reliability for positivist studies; note that these are sometimes replaced with craftsmanship, trustworthiness, dependability and confirmability for qualitative research. See Sections 3.3 and 3.4 for more detail on these criteria. In addition to generic criteria for research, also specific checklists have been produced. A case in point is the extensive checklist for reporting qualitative research by Tong et al. (2007). Another work helpful for evaluating the quality of evidence is the work by Majid and Vanstone (2018), from a healthcare perspective; they present eight tools together with strengths and weaknesses for assessing the quality of studies (ibid., pp. 2120–3). The tools are captured in Table 6.2. Thus, quality criteria for empirical studies, whether generic or in more detail, could serves as framework for assessing the quality of the evidence, albeit they may have to be adapted to the specific review question and domain. Whereas the four approaches presented in this section are indicative, see Figure 6.4, setting levels of evidence is of paramount importance to conduct systematic literature reviews and also narrative reviews to an extent. It is necessary to express the confidence in studies that are being reviewed to arrive at warranted results and findings. This also implies that if not directly a framework is available, scholars should assess the quality of the evidence related to the review questions in their studies and report about it in their writing.
6.6
Scoping Reviews and Scoping Studies for Setting Inclusion and Exclusion Criteria
In addition to paving the way for setting review questions (Section 4.5) and determining the search strategy (Section 5.9), another purpose of a scoping study or scoping review is to set the criteria for which studies should be included or excluded; see Figure 6.5 for the archetype systematic literature review and Figure 6.6. for the archetype systematic review. In the case of a scoping review also inclusion and exclusion criteria are used for examining a broad range of literature. Consequently, a scoping review results in not only discovering what is out there, but also what will be the impact of limiting the studies and sources to be considered for more specific review questions. Furthermore, the development of a model or the selection of an appropriate theory, law of observed regularity, perspective, conceptualisation, etc., see Section 4.4, could inform inclusion and exclusion criteria. Since a broad range of literature is looked at during a scoping study or scoping review, specific inclusion and exclusion criteria related to the review question follow from it to examine in more detail publications during the actual literature review.
converting from
Specific levels of evidence
Methods for Develop qualitative research classification • CASP • COREQ • ETQS • JBI • Popay Use exsiting classifications • QF • Building theory • SRQR • etc. • Walsh
Qualitative analysis
Systematic literature reviews
Fig. 6.4 Overview of methods for assessing quality of evidence. This figure shows for which archetype of literature which method for appraising the quality of evidence in studies can be used. Some methods for systematic reviews are mentioned in Section 6.4. The existing methods for evaluating qualitative studies are found in Table 6.2. When assessing the quality of evidence in systematic literature reviews, the first option is converting an existing method to a suitable model for the quality of evidence. The second option is using methods for assessing qualitative research. The third approach is to use existing classifications if appropriate. If none of these works, then the only possibility is to develop a specific classification.
• AMSTAR2 • GRADE • Jadad scale • Newcastle-Ottawa scale • ROBINS-I • RTI item bank • SYRCLE
Observational studies
Randomised controlled trials
• AMSTAR • GRADE • Jadad scale • Schulz approach • SYRCLE
Qualitative synthesis
Meta-analysis
Systematic reviews
Quality of evidence
Review question
224 6 Setting Inclusion and Exclusion Criteria
6.6 Scoping Reviews and Scoping Studies for Setting Inclusion …
225
Defining Research Objectives
Purpose of Systematic Literature Review
Scoping Study Informing
Setting Review Questions
Identifying or Developing Models
Protocol including • Inclusion criteria • Exclusion criteria
Questions or Themes for Literature Review
Keywords and Databases
Inclusion and Exclusion Criteria
Retrieval of Studies
Quantification of Retrieved Studies
Quantitative Analysis of Studies
Qualitative Analysis of Studies
Synthesis of Findings
Fig. 6.5 Scoping study for setting inclusion and exclusion critera for archetype systematic literature review. Building on the position of the scoping study in Figure 4.6, this figure depicts that a scoping study informs the inclusion and exclusion criteria that should be integrated in the protocol.
In addition to finding out more about the use of inclusion and exclusion criteria for specific review questions, a scoping review could also cover a proposed method for assessing the quality of evidence; see Figure 6.6. This is particularly the case when the later systematic review is directed at evidence-based interventions, policies and practices. Normally, the scoping review should reveal the breadth and depth of literature related to the topic of the systematic review. This coverage informs which type of studies and research designs are appropriate for specific review questions. Knowing which studies and research designs will be considered during the systematic review informs which methods for assessing the quality of evidence are appropriate; see Figure 6.4 for guidance.
Derived from Scoping Review (Protocol-driven)
226
6 Setting Inclusion and Exclusion Criteria Context of Systematic Review
Developing Review Questions
Identifying or Developing Models Protocol covering • Inclusion criteria • Exlusion criteria • Tools for assessment quality of evidence
Detailed Questions for Analysis Studies
Keywords and Databases
Inclusion and Exclusion Criteria
Retrieval of Studies
Extraction of Quantitative Data
Extraction of Qualitative Data
Quantitative Analysis of Studies
Qualitative Analysis of Studies
Synthesis of Findings
Fig. 6.6 Scoping review for setting inclusion and exclusion criteria for archetype systematic reviews. The scoping review should not only indicate review questions, as found in Figure 4.5, and search strategies, as presented in Figure 5.7, but also which inclusion and exclusion criteria to use. Furthermore, the outcomes of a scoping review indicate how broad the evidence base is and how diverse the type of studies; this determines which tools for assessing the quality of evidence are most appropriate.
6.7
Key Points
• A key criterion for the inclusion of studies and sources is the relevance of their content towards the purpose of a literature review and what the review covers. In addition to setting a search strategy this requires determining inclusion and exclusion criteria, specifically for protocol-driven literature reviews. • The content and relevance are regularly difficult to extract from the title, keywords and abstract of a study and sources due to incompleteness in writing and presentation. Therefore, close reading of sources becomes necessary, unless it becomes quickly apparent that a specific source is relevant to the topic of a literature review.
6.7 Key Points
227
• Particularly, for the archetypes systematic literature review and systematic review it is necessary to explicitly state inclusion and exclusion criteria; these two types of criteria are the opposite of each other. In addition to relevance of content, typical criteria concern: • Date of publication. • Language. • Type of source. • Research design and method. • Sampling. • Data collection and analysis. • Assessing the quality of evidence for retrieved studies serves two purposes: • The appraisal leads to what evidence is available and to what extent it is appropriate for the purpose of a literature review. • The appraisal can also result in the inclusion of more different types of studies. • For evidence-based interventions, practices and treatments a hierarchy of evidence is commonly used; see Figure 6.1. • For evidence-based interventions, practices and treatments the quality of retrieved studies is assessed by using the method of GRADE (Grading of Recommendations, Assessment, Development and Evaluations). In addition to the hierarchy of evidence, the grading of individual studies is downgraded or upgraded based on the following points to be considered: • Risk of bias. • Imprecision. • Inconsistency of results. • Indirectness of evidence. • Publication bias. • Large magnitude effect. • Dose–response gradient. • Plausible confounding which would reduce demonstrated effects. • Possible confounding which would suggest spurious effects when actual results show no effect. • For other systematic reviews and systematic literature reviews than those for evidence-based interventions, practices and treatments it is recommended to create a specific hierarchy of evidence derived from methods available in literature or develop a method for appraising the quality of evidence. See Figure 6.4 how his can be drawn from literature on methods and reporting of qualitative and quantitative studies. • The outcomes of a scoping review may lead to (a) defining inclusion and exclusion criteria for specific review questions that follow from these studies, and (b) a proposed method for the assessment of the quality of evidence.
228
6.8 6.8.1
6 Setting Inclusion and Exclusion Criteria
How to …? … Set Effective Inclusion and Exclusion Criteria
The use of exclusion and inclusion criteria aiming at finding relevant works should not lead to purposely disregarding studies and sources, because of potential bias caused by selection. At the same time, the number of sources and studies to be considered should be kept within bounds due to the methods for evaluating the quality of evidence and analysis, and the available resources, such as the number of researchers working on the review. The latter implies also that access to studies and languages of publications could be points of attention for inclusion and exclusion of studies. Other criteria that are normally taken into account for inclusion and exclusion are: research design and method, sampling, and data collection and analysis. The purpose of the criteria is to find the broadest range of sources and studies to be considered during the analysis, while ensuring that all relevant studies with regard to content are part of the review.
6.8.2
… Evaluate the Quality of Evidence in Studies
Using a framework for assessing the quality of studies may lead to considering a wider range of study types, and therefore, increase the robustness of findings for a specific review question; to this purpose, for evidence-based interventions, practices and treatments, the GRADE framework (Section 6.4) can be used. There are also other methods available, for which the selection depends on the type of studies that are considered during a systematic review or the specific domain of the review. Choosing the most appropriate tool depends on purpose of the review, the efforts required and requirements for specific protocols. For literature reviews that are qualitative-oriented, there are other options available; see Table 6.2. Some methods for the evaluation of the quality of evidence can be derived from those used for observational studies, whereas for other studies a framework for assessing the quality of evidence needs to be developed; see Figure 6.4. This means that the selection or development of a method for assessing the quality of evidence in a literature review depends on the purpose of the specific review and which type of evidence is available in studies.
6.8.3
… Write a Literature Review
Writing a literature review, particularly when being a systematic literature review or systematic review, requires mentioning explicitly the inclusion and exclusion criteria used. Clearly stating these and how they were used will contribute to the
6.8 How to …?
229
rigour and reproducibility of reviews. What to include depends primarily on the review question (see Chapter 4), and thus, determines which content of studies is of interest. Other aspects to be considered for inclusion and exclusion criteria are date of publication, language, type of source, research design and method, sampling, and data collection and analysis. When defining inclusion and exclusion criteria a balance should be sought not to exclude relevant studies to the review question, and thus creating bias by selection, while keeping the number of studies for in-depth analysis within bounds of feasibility. Furthermore, the assessment of the quality of evidence in studies serves two purposes for writing literature reviews. In the first place, the use of methods for assessing the quality of evidence in studies allows a broader range of studies to be taken into account. This will possibly contribute to a more robust body of evidence informing findings from a literature review. The second purpose is that assessing the quality of evidence allows identify the contribution to knowledge by each study, and thus, this facilitates aggregating evidence. The methods for assessing the quality of evidence are found in Sections 6.4 and 6.5, with an overview in Figure 6.4; the methods cover both systematic reviews and systematic literature reviews. The inclusion of broader range of studies may also result in giving fuller consideration of imprecision, inconsistency of results, indirectness of evidence and publication bias, caused by the variety of conduct for empirical studies. Therefore, a broader range of literature also ends up in more attention to consistent perusal of individual studies with regard to their contribution to knowledge, findings and recommendations.
References Alper BS, Haynes RB (2016) EBHC pyramid 5.0 for accessing preappraised evidence and guidance. Evid Based Med 21(4):123–125. https://doi.org/10.1136/ebmed-2016-110447 Antony J, Escamilla JL, Caine P (2003) Lean Sigma [production and supply chain management]. Manuf Eng 82(2):40–42 Atkinson P (2004) Creating and implementing lean strategies. Manag Serv 48(2):18–21, 33 Balshem H, Helfand M, Schünemann HJ, Oxman AD, Kunz R, Brozek J et al (2011) GRADE guidelines: 3. Rating the quality of evidence. J Clin Epidemiol 64(4):401–406. https://doi.org/ 10.1016/j.jclinepi.2010.07.015 Bin Ali N, Usman M (2019) A critical appraisal tool for systematic literature reviews in software engineering. Inf Softw Technol 112:48–50. https://doi.org/10.1016/j.infsof.2019.04.006 Carmona C, Baxter S, Carroll C (2021) Systematic review of the methodological literature for integrating qualitative evidence syntheses into health guideline development. Res Synth Methods 12(4):491–505. https://doi.org/10.1002/jrsm.1483 Castellini G, Bruschettini M, Gianola S, Gluud C, Moja L (2018) Assessing imprecision in Cochrane systematic reviews: a comparison of GRADE and trial sequential analysis. Syst Rev 7(1):110. https://doi.org/10.1186/s13643-018-0770-1 Connor DF, Glatt SJ, Lopez ID, Jackson D, Melloni RH (2002) Psychopharmacology and aggression. I: a meta-analysis of stimulant effects on overt/covert aggression–related behaviors in ADHD. J Am Acad Child Adolesc Psychiatry 41(3):253–261. https://doi.org/10.1097/ 00004583-200203000-00004
230
6 Setting Inclusion and Exclusion Criteria
Decker JS, Stannard SJ, McManus B, Wittig SMO, Sisiopiku VP, Stavrinos D (2015) The impact of billboards on driver visual behavior: a systematic literature review. Traffic Inj Prev 16:234– 239. https://doi.org/10.1080/15389588.2014.936407 Dekkers R (2017) Applied systems theory, 2nd edn. Springer, Cham Dekkers R, Kühnle H (2012) Appraising interdisciplinary contributions to theory for collaborative (manufacturing) networks: still a long way to go? J Manuf Technol Manag 23(8):1090–1128. https://doi.org/10.1108/17410381211276899 Dixon-Woods M, Sutton A, Shaw R, Miller T, Smith J, Young B et al (2007) Appraising qualitative research for inclusion in systematic reviews: a quantitative and qualitative comparison of three methods. J Health Serv Res Policy 12(1):42–47. https://doi.org/10.1258/ 135581907779497486 Djulbegovic B, Guyatt GH (2017) Progress in evidence-based medicine: a quarter century on. The Lancet 390(10092):415–423. https://doi.org/10.1016/S0140-6736(16)31592-6 Egger M, Zellweger-Zähner T, Schneider M, Junker C, Lengeler C, Antes G (1997) Language bias in randomised controlled trials published in English and German. The Lancet 350(9074):326– 329. https://doi.org/10.1016/S0140-6736(97)02419-7 Evers CW, Wu EH (2006) On generalising from single case studies: epistemological reflections. J Philos Educ 40(4):511–526. https://doi.org/10.1111/j.1467-9752.2006.00519.x Filieri R (2013) Consumer co-creation and new product development: a case study in the food industry. Mark Intell Plan 31(1):40–53. https://doi.org/10.1108/02634501311292911 Genet N, Boerma WGW, Kringos DS, Bouman A, Francke AL, Fagerström C et al (2011) Home care in Europe: a systematic literature review. BMC Health Serv Res 11(1):207. https://doi.org/ 10.1186/1472-6963-11-207 Greenhalgh T (1997) How to read a paper. Getting your bearings (deciding what the paper is about). BMJ (Clin Res Ed), 315(7102):243–246. https://doi.org/10.1136/bmj.315.7102.243 Grégoire G, Derderian F, le Lorier J (1995) Selecting the language of the publications included in a meta-analysis: is there a tower of babel bias? J Clin Epidemiol 48(1):159–163. https://doi.org/ 10.1016/0895-4356(94)00098-B Guyatt G, Oxman AD, Sultan S, Brozek J, Glasziou P, Alonso-Coello P et al (2013) GRADE guidelines: 11. Making an overall rating of confidence in effect estimates for a single outcome and for all outcomes. J Clin Epidemiol 66(2), 151–157. https://doi.org/10.1016/j.jclinepi.2012. 01.006 Guyatt GH, Oxman AD, Kunz R, Brozek J, Alonso-Coello P, Rind D et al (2011) GRADE guidelines 6. Rating the quality of evidence—imprecision. J Clin Epidemiol 64(12):1283– 1293. https://doi.org/10.1016/j.jclinepi.2011.01.012 Guyatt GH, Oxman AD, Kunz R, Woodcock J, Brozek J, Helfand M et al (2011) GRADE guidelines: 8. Rating the quality of evidence—indirectness. J Clin Epidemiol 64(12):1303– 1310. https://doi.org/10.1016/j.jclinepi.2011.04.014 Guyatt GH, Oxman AD, Kunz R, Woodcock J, Brozek J, Helfand M et al (2011) GRADE guidelines: 7. Rating the quality of evidence—inconsistency. J Clin Epidemiol 64(12):1294– 1302. https://doi.org/10.1016/j.jclinepi.2011.03.017 Guyatt GH, Oxman AD, Montori V, Vist G, Kunz R, Brozek J et al (2011) GRADE guidelines: 5. Rating the quality of evidence—publication bias. J Clin Epidemiol 64(12):1277–1282. https:// doi.org/10.1016/j.jclinepi.2011.01.011 Guyatt GH, Oxman AD, Sultan S, Glasziou P, Akl EA, Alonso-Coello P et al (2011) GRADE guidelines: 9. Rating up the quality of evidence. J Clin Epidemiol 64(12):1311–1316. https:// doi.org/10.1016/j.jclinepi.2011.06.004 Guyatt GH, Oxman AD, Vist G, Kunz R, Brozek J, Alonso-Coello P et al (2011) GRADE guidelines: 4. Rating the quality of evidence—study limitations (risk of bias). J Clin Epidemiol 64(4):407–415. https://doi.org/10.1016/j.jclinepi.2010.07.017 Guyatt GH, Oxman AD, Vist GE, Kunz R, Falck-Ytter Y, Alonso-Coello P, Schünemann HJ (2008) GRADE: an emerging consensus on rating quality of evidence and strength of recommendations. BMJ (Clin Res Ed) 336(7650):924–926. https://doi.org/10.1136/bmj.39489. 470347.AD
References
231
Guyatt GH, Sackett DL, Sinclair JC, Hayward R, Cook DJ, Cook RJ et al (1995) Users’ guides to the medical literature: IX. A method for grading health care recommendations. JAMA 274(22):1800–1804. https://doi.org/10.1001/jama.1995.03530220066035 Haggan M (2004) Research paper titles in literature, linguistics and science: dimensions of attraction. J Pragmat 36(2):293–317. https://doi.org/10.1016/S0378-2166(03)00090-0 Ham van I, Verhoeven AAH, Groenier KH, Groothoff JW, De Haan J (2006) Job satisfaction among general practitioners: a systematic literature review. Euro J General Pract 12(4):174– 180. https://doi.org/10.1080/13814780600994376 Hartling L, Ospina M, Liang Y, Dryden DM, Hooton N, Krebs Seida J, Klassen TP (2009) Risk of bias versus quality assessment of randomised controlled trials: cross sectional study. BMJ (Clin Res Ed) 339:b4012. https://doi.org/10.1136/bmj.b4012 Hauser JR (1993) How Puritan-Bennett used the house of quality. Sloan Manag Rev 34(3):61–70 Higgins JPT, Altman DG, Gøtzsche PC, Jüni P, Moher D, Oxman AD et al (2011) The Cochrane collaboration’s tool for assessing risk of bias in randomised trials. BMJ (Clin Res Ed) 343: d5928. https://doi.org/10.1136/bmj.d5928 Higgins JPT, Thomas J, Chandler J, Cumpston M, Li T, Page MJ, Welch VA (2019) Cochrane handbook for systematic reviews of interventions, 2nd ed. Wiley, Chichester Higgins JPT, Thompson SG, Deeks JJ, Altman DG (2003) Measuring inconsistency in meta-analyses. BMJ (Clin Res Ed) 327(7414):557–560. https://doi.org/10.1136/bmj.327. 7414.557 Holweg M (2007) The genealogy of lean production. J Oper Manag 25(2):420–437. https://doi. org/10.1016/j.jom.2006.04.001 Hooijmans CR, Rovers MM, de Vries RBM, Leenaars M, Ritskes-Hoitinga M, Langendam MW (2014) SYRCLE’s risk of bias tool for animal studies. BMC Med Res Methodol 14(1):43. https://doi.org/10.1186/1471-2288-14-43 Jadad AR, Moore RA, Carroll D, Jenkinson C, Reynolds DJM, Gavaghan DJ, McQuay HJ (1996) Assessing the quality of reports of randomized clinical trials: Is blinding necessary? Control Clin Trials 17(1):1–12. https://doi.org/10.1016/0197-2456(95)00134-4 Katikireddi SV, Egan M, Petticrew M (2015) How do systematic reviews incorporate risk of bias assessments into the synthesis of evidence? A methodological study. J Epidemiol Community Health 69(2):189–195. https://doi.org/10.1136/jech-2014-204711 Koukou MI (2020) End-user involvement in new product development: a comparative study between open innovation, participatory design and the ‘instrumental approach’. Doctoral Thesis, University of Glasgow, Glasgow Lewin S, Glenton C, Munthe-Kaas H, Carlsen B, Colvin CJ, Gülmezoglu M et al (2015) Using qualitative evidence in decision making for health and social interventions: an approach to assess confidence in findings from qualitative evidence syntheses (GRADE-CERQual). PLOS Med 12(10):e1001895. https://doi.org/10.1371/journal.pmed.1001895 Majid U, Vanstone M (2018) Appraising qualitative research for evidence syntheses: a compendium of quality appraisal tools. Qual Health Res 28(13):2115–2131. https://doi.org/ 10.1177/1049732318785358 Margulis AV, Pladevall M, Riera-Guardia N, Varas-Lorenzo C, Hazell L, Berkman ND et al (2014) Quality assessment of observational studies in a drug-safety systematic review, comparison of two tools: the Newcastle-Ottawa scale and the RTI item bank. Clin Epidemiol 6:359–368. https://doi.org/10.2147/CLEP.S66677 Meline T (2006) Selecting studies for systematic review: inclusion and exclusion criteria. Contemp Issues Commun Sci Disorders 33:21–27. https://doi.org/10.1044/cicsd_33_S_21 Moher D, Pham KTP, Schulz KF, Berlin JA, Jadad AR, Liberati A (2000) What contributions do languages other than English make on the results of meta-analyses? J Clin Epidemiol 53(9):964–972. https://doi.org/10.1016/S0895-4356(00)00188-8 Mokkink LB, Terwee CB, Patrick DL, Alonso J, Stratford PW, Knol DL et al (2010a) The COSMIN study reached international consensus on taxonomy, terminology, and definitions of measurement properties for health-related patient-reported outcomes. J Clin Epidemiol 63(7):737–745. https://doi.org/10.1016/j.jclinepi.2010.02.006
232
6 Setting Inclusion and Exclusion Criteria
Mokkink LB, Terwee CB, Patrick DL, Alonso J, Stratford PW, Knol DL et al (2010b) The COSMIN checklist for assessing the methodological quality of studies on measurement properties of health status measurement instruments: an international Delphi study. Qual Life Res 19(4):539–549. https://doi.org/10.1007/s11136-010-9606-8 Morrison A, Polisena J, Husereau D, Moulton K, Clark M, Fiander M et al (2012) The effect of English-language restriction on systematic review-based meta-analyses: a systematic review of empirical studies. Int J Technol Assess Health Care 28(2):138–144. https://doi.org/10.1017/ S0266462312000086 Muccini H, Sharaf M, Weyns D (2016) Self-adaptation for cyber-physical systems: a systematic literature review. Paper presented at the 11th international symposium on software engineering for adaptive and self-managing systems, Austin, TX. https://doi.org/10.1145/2897053.2897069 Murad MH, Asi N, Alsawas M, Alahdab F (2016) New evidence pyramid. Evid Based Med 21(4):125–127. https://doi.org/10.1136/ebmed-2016-110401 Postman GJ, Kateman G (1992) The quality of analytical information contained within abstracts and paper on new analytical methods. Anal Chim Acta 265(1):133–155. https://doi.org/10. 1016/0003-2670(92)85164-2 Purssell E (2020) Can the critical appraisal skills programme check-lists be used alongside grading of recommendations assessment, development and evaluation to improve transparency and decision-making? J Adv Nurs 76(4):1082–1089. https://doi.org/10.1111/jan.14303 Richter RR, Sebelski CA, Austin TM (2016) The quality of reporting of abstracts in physical therapy literature is suboptimal: cross-sectional, bibliographic analysis. Am J Phys Med Rehabil 95(9):673–684. https://doi.org/10.1097/phm.0000000000000467 Rosen AB, Greenberg D, Stone PW, Olchanski NV, Neumann PJ (2005) Quality of abstracts of papers reporting original cost-effectiveness analyses. Med Decis Making 25(4):424–428. https://doi.org/10.1177/0272989x05278932 Runeson P, Höst M (2008) Guidelines for conducting and reporting case study research in software engineering. Empir Softw Eng 14(2):131–164. https://doi.org/10.1007/s10664-0089102-8 Salgado EG, Dekkers R (2018) Lean product development: nothing new under the sun? Int J Manag Rev 20(4):903–933. https://doi.org/10.1111/ijmr.12169 Schulz KF, Chalmers I, Hayes RJ, Altman DG (1995) Empirical evidence of bias: dimensions of methodological quality associated with estimates of treatment effects in controlled trials. JAMA 273(5):408–412. https://doi.org/10.1001/jama.1995.03520290060030 Schumpeter J (1911) Theorie der wirtschaftlichen Entwicklung. von Duncker & Humblot, Leipzig Schumpeter JA (1934) The theory of economic development: an inquiry into profits, capital, credit, interest, and the business cycle. Harvard University Press, Cambridge, MA Shea BJ, Grimshaw JM, Wells GA, Boers M, Andersson N, Hamel C et al (2007) Development of AMSTAR: a measurement tool to assess the methodological quality of systematic reviews. BMC Med Res Methodol 7(1):10. https://doi.org/10.1186/1471-2288-7-10 Shea BJ, Reeves BC, Wells G, Thuku M, Hamel C, Moran J et al (2017) AMSTAR 2: a critical appraisal tool for systematic reviews that include randomised or non-randomised studies of healthcare interventions, or both. BMJ (Clin Res Ed) 358:j4008. https://doi.org/10.1136/bmj. j4008 Spencer L, Ritchie J, Lewis J, Dillon L (2003) Quality in qualitative evaluation: a framework for assessing research evidence. Cabinet Office, London Sterne JA, Hernán MA, Reeves BC, Savović J, Berkman ND, Viswanathan M et al (2016) ROBINS-I: a tool for assessing risk of bias in non-randomised studies of interventions. BMJ (Clin Res EdÜ) 355:i4919. https://doi.org/10.1136/bmj.i4919 The Joanna Briggs Institute (2017) Critical appraisal checklist 6 for qualitative research. Adelaide Thornton A, Lee P (2000) Publication bias in meta-analysis: its causes and consequences. J Clin Epidemiol 53(2):207–216. https://doi.org/10.1016/S0895-4356(99)00161-4 Tong A, Sainsbury P, Craig J (2007) Consolidated criteria for reporting qualitative research (COREQ): a 32-item checklist for interviews and focus groups. Int J Qual Health Care 19(6):349–357. https://doi.org/10.1093/intqhc/mzm042
References
233
Viswanathan M, Berkman ND (2012) Development of the RTI item bank on risk of bias and precision of observational studies. J Clin Epidemiol 65(2):163–178. https://doi.org/10.1016/j. jclinepi.2011.05.008 Walsh M, Srinathan SK, McAuley DF, Mrkobrada M, Levine O, Ribic C et al (2014) The statistical significance of randomized controlled trial results is frequently fragile: a case for a fragility index. J Clin Epidemiol 67(6):622–628. https://doi.org/10.1016/j.jclinepi.2013.10.019 Wells GA, Shea B, O’Connell D, Peterson J, Welch V, Losos M, Tugwell P (2011) The Newcastle-Ottawa Scale (NOS) for assessing the quality of nonrandomised studies in meta-analyses. The Ottawa Hospital, Ottawa. Accessed from http://www.ohri.ca/programs/ clinical_epidemiology/oxford.asp Womack JP, Jones DT (1996) Lean thinking. Simon & Schuster, New York Womack JP, Jones DT, Roos D (1991) The machine that changed the world: the story of lean production. Free Press, New York Yitzhaki M (1997) Variation in informativity of titles of research papers in selected humanities journals: a comparative study. Scientometrics 38(2):219–229. https://doi.org/10.1007/ BF02457410 Zahra SA, Newey LR (2009) Maximizing the impact of organization science: theory-building at the intersection of disciplines and/or fields. J Manag Stud 46(6):1059–1075. https://doi.org/10. 1111/j.1467-6486.2009.00848
Part II
Quantitative Analysis and Synthesis
Illustration reproduced under the Creative Commons License, courtesy of Hilda Bastian
Chapter 7
Principles of Meta-Analysis
Meta-analysis is a common feature of quantitative synthesis for systematic reviews, one of the four archetypes in this book. The term ‘meta-analysis’ was coined by Glass (1976, p. 3 ff.) for combining results from multiple studies, though the use of statistical methods to this purpose dates from centuries earlier as described in Section 1.2. Its application is found in a range of disciplines, with economics, healthcare, medicine and psychology being among them. Furthermore, there have been advances with how to conduct this type of systematic reviews best; for example, see Section 4.4 for formats of review questions, Sections 5.4, 5.5 and 5.6 for search strategies, Sections 6.2 and 6.3 for inclusion and exclusion criteria, and Section 6.4 for quality of evidence. Building on this previous guidance, the focus will be here on conducting meta-analysis with retrieved studies. Given the extent of approaches, methods and tools, this chapter introduces its principles and applications. Section 7.1 starts off with going into more detail what meta-analysis encompasses. This is followed by Section 7.2, in which the basics of meta-analysis are explained including the diversity of studies, data to be used and the process for conducting this review. Regarding the extraction of data from retrieved studies, Section 7.3 provides detail. It covers the different type of data and attributes, and how this step can be structured. An overview of mathematical models for estimating effect sizes from a collection of studies is found in Section 7.4; it also discusses determining the confidence interval. Which measures of effect sizes are commonly used is decribed in Section 7.5. This is followed by Section 7.6 going into more detail which methods are applicable to different review questions about and measures for the effect size extracted from studies. Section 7.7 discusses causes for heterogeneity and presents common methods for explaining variation across studies. In Section 7.8 attention is paid to publication bias, a common source for heterogeneity. In this context, the sections also indicate how to undertake sensitivity analysis. How to assess the quality of a systematic review using meta-analysis is the topic of Section 7.9. By addressing these points, this chapter provides insight into what involves conducting a meta-analysis and what starting points are. © Springer Nature Switzerland AG 2022 R. Dekkers et al., Making Literature Reviews Work: A Multidisciplinary Guide to Systematic Approaches, https://doi.org/10.1007/978-3-030-90025-0_7
237
238
7.1
7 Principles of Meta-Analysis
Introduction to Meta-Analysis
Meta-analysis is a form of quantitative synthesis for systematic reviews that uses statistical analysis to combine results of multiple studies. The main idea is that the aggregation of studies leads to a higher statistical power and more robust estimate of effects than is possible from measures derived from any individual study. This means that there should be multiple scholarly studies addressing the same question, or at least that the same measure (or measures) appears across the studies retrieved. Each individual study reporting measurements is expected to have some degree of error, creating uncertainty about the effect. By using statistical methods as meta-analytic approach an estimate can be derived that is closest to the unknown common truth; this includes determining the uncertainty across studies. This means that the application of meta-analysis is mainly directed at effects of interventions, policies, practices and treatments. A systematic review using meta-analysis then leads to acquiring an estimate about the change in effect by an intervention, policy, practice or treatment. It also results in finding out what the certainty is that this estimate of the effect can be achieved. Thus, recommendations for practitioners and policymakers can be derived from implications of the effect size and its related (un)certainty. Typically, these interventions, policies, practices or treatments are found in economics, education, healthcare, medicine and policy studies, but also in other domains, such as business and management studies, and software engineering. An example is the work by Doucouliagos and Ulubaşoğlu (2008) whether political democracy has an effect on economic growth of nations. It combines data from 84 studies, on which several analyses are performed. They (ibid., p. 78) find no accumulated evidence of democracy being detrimental to economic growth. However, they discover that it has indirect effects, such as a favourable impact on human capital formation, level of economic freedom, inflation and political instability, that stimulate economic growth. However, they also state that at least of third of the differences for reported results can be attributed to differences in research designs of studies and econometric specification. This example demonstrates the main points of meta-analysis: bringing results from studies together, subjecting them statistical analysis, searching for explanations and exploring causes for variance. In addition to providing foundations for evidence-based interventions, policies, practices and treatments, meta-analysis can be used for testing theories, laws of observed regularities and causal relationships. A case in point is the study by Tang and Hall (1995) into the overjustification effect in educational settings. This effect is stated as a person who initially performs an activity for no reward (because the activity is liked) will become less likely to perform the activity for no reward after being rewarded for its performance. To this purpose, they use the data from 50 studies that contained 256 comparisons. These comparisons are subjected to statistical analysis and a homogeneity test. They (ibid., pp. 379, 384) generally find support for the effect to occur where it should, but ponder by stating (ibid., p. 385) that ‘the real value of studies on the overjustification effect may have been to call
7.1 Introduction to Meta-Analysis
239
attention to the importance of intrinsic motivation in the educational endeavour.’ Another instance of meta-analysis for testing theory is the investigation by De Wolff and van Ijzendoorn (1997) into parental antecedents of infant attachment. Their meta-analysis of 66 studies is preceded by an expert consultation to sort concepts for the later statistical analysis. They (ibid., p. 578) use a correlation coefficient as proxy for effect size. The conclusion (ibid., p. 585) is that maternal sensitivity—the ability to respond appropriately and promptly to the signals of the infant—plays a moderate but affirmative role, while noting that more persuasive effects found in a famous study on this matter could not be substantiated. A third example is McShane and Böckenholt (2017) who demonstrate the use of meta-analysis for theory testing for the domain of consumer behaviour by revisiting three studies. These examples show how meta-analysis may support the development of theory as being literature reviews aiming at testing theory and aggregation depicted in Figure 3.8. This also means that the outcomes of meta-analysis may also trigger forming of tentative theories that could lead to further research. In this sense, meta-analysis has the role of testing theories and aggregating outcomes of other studies to further scholarly knowledge, albeit that such depends on the domain of application. However, the application of meta-analysis to examine theoretical foundations is contended. For example, Chow (1987, p. 268) rejects such outright, based on the argument that studies aiming at testing theory or using theory for specific application are diverse; too varied to allow a meaningful meta-analysis to take place. A qualitative discourse is suggested, including the appropriation of alternative theories. Furthermore, Aguinis et al. (2011, pp. 316–8) argue that meta-analysis is not an instrument to discover causal relationships, a necessary foundation for developing theories. They attribute this to the passive nature of this type of statistical analysis; passive here refers to studies using only existing relationships between variables in studies. Another point would be that primary studies with a theoretical perspective often are cross-sectional, and thus, limitedly are suitable for studying cause-effect relationships. An example is the time needed before a patent generates revenue through products or services sold; the development of products and services takes considerable time and so may their market introduction, and therefore, there is a time lag between a patent filed and the revenue it may generate. Perhaps, such a situation can be indicatively captured by a cross-sectional study, but the real effect of patents on revenue requires a longitudinal study. However, almost all relevant studies to this matter are cross-sectional, thus making it impossible to conduct an appropriate meta-analysis with a plausible theoretical foundation. For these three reasons, meta-analysis can test theories, laws of observed regularities and causal relationships only when there is sufficient similarity between, see also next section, otherwise a qualitative literature review, systematic or not, is necessary. The range of domains in which meta-analysis is applied are vast. Already mentioned were economics, education, healthcare, medicine and policy studies, but it is also found in domains such as built environment (for example, Ewing & Cervero, 2010), business and management studies (e.g., Hoobler et al. 2018) and
240
7 Principles of Meta-Analysis
physics (for example, Animasaun et al. 2019). Sometimes, these applications have resulted in the development of approaches specific to a domain. For example, Stanley (2001) develops the method of meta-regression analysis for the domain of economics to address specific issues with regard to theory. This is followed by the issuing of guidelines for this type of meta-analysis (Stanley et al. 2013). These points and examples indicate that the application of meta-analysis and further development of methods and tools can be found across an array of disciplines, focusing on estimating effects and their uncertainty. NOTE: CONFUSING META-ANALYSIS WITH SYSTEMATIC REVIEWS The term meta-analysis is sometimes confused with systematic reviews. However, meta-analysis is a non-essential component of a systematic review. The archetype of systematic reviews follows protocols for finding and retrieving relevant studies, and a structured approach to analysis and synthesis. One of these approaches is meta-analysis as statistical analysis to estimate an effect size and its uncertainty.
7.2
Basics of Meta-Analysis
Thus, meta-analysis can be considered to focus on the direction and magnitude of an effect, seen as the dependent variable. It does not consider statistical significance of the relationship between an independent variable and dependent variable. However, it also aims to quantify uncertainty around the summary analysis. The question is then under which conditions meta-analysis can be applied, how data are used and what is its generic process.
7.2.1
Conditions for Applicability
Meta-analysis can be applied when four conditions are met. First, studies should contain quantitative results rather than qualitative findings; this is necessary because for determining the direction and size of the effect means and variances are needed. Second, this implies that studies retrieved have findings that can be configured in comparable statistical form, e.g., effect sizes, correlation coefficients and odds ratios. Third, it should be possible to compute, approximate or estimate the effect size, which is often abbreviated as ES. Ideally, studies should have the same or similar outcomes measures. Fourth, these lead to meta-analysis to examining the (same) constructs and relationships across studies that are ‘comparable’ given the review question. Thus, the four conditions signify that meta-analysis looks at studies that are relatively well-related and contain quantitative data in some form that can amalgamated into meaningful statistical analysis.
7.2 Basics of Meta-Analysis
241
Particularly, the degree of similarity in retrieved studies is often placed in the context of the replication continuum. In the context of this replication continuum,1 the contrast between the terms ‘strict replication’ and ‘conceptual replication’ appears in the propositional work of Hendrick2 (1990) and the empirical study by Howard and Maxwell (1980, p. 816 ff.). Strict replication, also called pure replication, means that the methods for data collection and analysis of primary studies are identical. For example, the software used for analysing data is the same across studies. However, it is unlikely that a larger number of studies will have identical research designs, and use fully identical methods and tools. On the other side of the spectrum, there is conceptual replication. This indicates that constructs and variables used across a set of studies are similar, but not necessarily the research designs, methods and tools. Even for measuring the effect or coding of phenomena differences may occur. This leads to a higher level of abstraction to be considered during extraction of data, analysis and synthesis. An instance of high-level abstraction is the relationship between the provision of food to students in primary schools and academic attainment; the underlying approaches to providing food will vary across studies and academic attainment is measured using different scales. Limiting the review question to specific modes of provision and particular scales for attainment will enable perhaps meta-analysis, but limit the outcomes of the review to very specific situations. Thus, as depicted in Figure 7.1, the diversity in research designs and methods increases when moving from strict replication to conceptual replication in the replication continuum. The position of meta-synthesis in the replication continuum also determines which type of synthesis is most suitable; see Figure 7.1. Meta-analysis is only possible when retrieved studies for the synthesis are relative similar with regard to the measure of the effect size, research design and data collection; in this respect, Allen and Preiss (1993) point to the reciprocal relationship between replication across studies and meta-analysis. An increased variance across the studies under consideration may ‘flip’ the most suitable approach to mixed-methods synthesis (see Chapter 12) and qualitative synthesis (see Chapters 10 and 11); see Figure 7.1. However, it is difficult to put an exact point on the scale of the replication continuum where changes to the most suitable approach to meta-synthesis happens. The only guidance here is the closer to strict replications a collection of studies is, the easier it is to argue comparability for conducting meta-analysis. The tipping point where meta-analysis becomes impossible is when the studies are so varied that detecting directions and sizes of effects across studies in a reliable manner is no
1
By some the replication continuum is attributed to the work of Lipsey and Wilson (1993). However, there is no mention of it. Also, the statement ‘the closer to pure replications your collection of studies, the easier it is to argue comparability’ does not appear in the text of Lipsey and Wilson nor can it be interpreted as a paraphrased statement. This means caution is required when looking for the origins of the replication continuum. 2 Hendrick (1990) refers to a working paper written by him in 1974 about the dichotomy ‘strict replication’ and ‘conceptual replication.’
242
7 Principles of Meta-Analysis
Diversity in studies
Strict replications
Meta-analysis
Conceptual replications
Mixed-methods synthesis
Qualitative synthesis
Fig. 7.1 Replication continuum and appropriateness type of synthesis. On the left-hand side of the replication continuum studies are found that are so-called strict replications. They use an identical design of the research methodology. On the right-hand side of the scale studies are positioned where only replication is possible on a conceptual levels (for example, high-level theories or conceptual models, see Section 4.4). Meta-analysis is only possible when studies are relatively replications in some sense. If variety increases, then mixed-methods synthesis must be considered (i.e. combining quantitative and qualitative synthesis) (see Chapter 12). If variety increases further, then only qualitative synthesis is possible (see Chapters 10 and 11).
longer warranted. Thus, whether a meta-analysis is still a valid approach to synthesis of a collection of studies depends on the position of the collection on the replication continuum.
7.2.2
Use of Data in Meta-Analysis
There are two approaches in meta-analysis to the use of data. The first possibility is that original data from studies can be extracted and amalgamated to determine the direction and size of the effect, and its uncertainty; see Figure 7.2. This can only happen when the studies are similar; in terms of the replication continuum, this is when the collection of studies are strict replications. Furthermore, the original data should have been made available; sometimes, the data are found in a repository (see Sections 14.2, 14.3 and 14.4), sometimes authors of studies have to be approached to get access to the original data. Also, there are cases where authors have conducted an empirical study and synthesised it with other studies later. The second possibility is that outcomes of studies are statistically analysed, also to determine the direction and size of the effect, and its uncertainty. This means that not the
7.2 Basics of Meta-Analysis
243
Study A
Study A
Study B
Study B
Study C
Study C
Study k
Study k
Pooled data
Estimate effect
Fig. 7.2 Pooled data versus meta-analysis. In the case of pooled data, from different studies original data are extracted and combined into one data set for analysis. This requires studies to be strict replications of each other. When original data are not or insufficiently available, then it is more appropriate to use summary data about the effect size and its distribution for generating an estimate.
Diversity in studies
Strict replications
Conceptual replications
Meta-analysis Pooled data Fig. 7.3 Positioning pooled data on replication continuum. This figure complements Figure 7.2 by showing that the use of pooled data is only possible when studies are strict replications. Meta-analysis can be used, too, for studies that qualify for pooled data, but then only summary data of studies will be used instead of amalgamating original data from studies.
original data are used, but aggregated data, sometimes called summary data, are extracted from the individual studies of a set for a systematic review. Most meta-analyses are of this type. Again, this can be linked to the replication continuum; see Figure 7.3. The use of pooled data is only possible when there is strict replication, whereas meta-analysis can be applied when there is a higher diversity of studies in a collection.
244
7 Principles of Meta-Analysis
A further point of attention is how methods for meta-analysis address errors found in studies. In general, meta-analysis can reduce the impact of random error, but not systematic errors (i.e., bias). Hence, the methodological rigour of the primary studies is an important feature of the reliability of meta-analysis. This requires also adequate reporting by studies so that this point can be assessed by scholars, readers and users.
7.2.3
Process for Meta-Analysis
The process for meta-analysis follows the process for systematics reviews, described in Section 3.2; see also Figure 7.4. The context for the meta-analysis determines the appropriateness of review questions. Normally, they follow the format for population-intervention-outcome and its variants, as described in Section 4.4. These reviews questions may have corollaries. Furthermore, they could be supported or formed by models, theories, laws of observed regularities, etc. The third step of the process for meta-analysis is identifying relevant studies and retrieving those; more information about search strategies is found in Chapter 5. This step also includes setting inclusion and exclusion criteria, see Sections 6.2 and 6.3. The retrieved studies should be screened on their eligibility for inclusion, which is the fourth step. A fifth step is extracting and consolidating data from relevant studies. It covers collecting relevant characteristics of studies and (experimental) covariates used. In addition, the quality of studies is assessed, either by categorisation or by attributes; Section 6.4 holds information on this matter. This is followed by data analysis, which comprises of selecting a statistical model for meta-analysis (see Section 7.4), performing relevant transformations of data into comparable statistical form, computing the effect size and evaluating the extent of between-study inconsistency (heterogeneity). Also, determining the summary measure and confidence interval belong to this step. Furthermore, the analysis may direct at exploring sources of heterogeneity, differences in effect sizes and confidence intervals for subgroups and clusters of studies, and contextual variables or contingencies that influence the effect size and confidence intervals. The findings from the analysis result in recommendations in the case of evidence-based interventions, policies, practices and treatments, and conjectures for further research in the case of more theoretical-oriented meta-analyses. Using collected information about the quality and diversity of studies also the quality of evidence should be considered. Thus, the process-oriented approach to meta-analysis aims at creating rigour and reliability with respect to aggregated results, findings, conclusions and recommendations. An important aspect of meta-analysis is which studies to include for the purpose of aggregation; this matter is also appearing in Section 6.4 in the context of the quality of evidence for recommendations. Studies suitable for inclusion in
7.2 Basics of Meta-Analysis
245 Context for Meta-Analysis
Developing Review Questions
Inclusion and Exclusion Criteria
Keywords and Databases
Retrieval of Studies
Extraction of Quantitative Data
Meta-Analysis of Studies
Synthesis of Findings
Fig. 7.4 Process for a systematic review using meta-analysis. In the case of using meta-analysis for a systematic review, first review questions are developed, which also depends on the context for undertaking the review; the context depends on whether the effectiveness of an intervention, policy, practice or treatment is its aim or whether the emphasis is on capturing scholarly knowledge. The next step is defining the search strategy, including which databases are searched and which types of studies are included. After retrieval of studies, data are extracted for analysis. The results of the meta-analysis with regard to an estimate of effect direction and size complemented with analysis of variety leads to the synthesis of findings.
meta-analysis could contain between-groups contrasts, individual differences, prevalence rates and averages, statistical association between variables and within-groups contrasts. Concerning research designs, the inclusion can cover experimental research designs, measurement research (e.g., reliability and validity), natural research designs, non-experimental research designs and pre-post research
246
7 Principles of Meta-Analysis
designs among others. Most important is the degree of similarity between studies, as symbolically shown in Figure 7.1. This means that across the studies in a collection for meta-analysis, relatively comparable measures of effects, means and variables should allow to conduct a meaningful statistical analysis, including considering variety in research designs. When conducting a meta-analysis, estimating the direction and size of the effect is as important as examining its variance across studies. Variance may occur across subgroups or clusters of studies. Another reason for variance can be differences in the design of research methodologies in the studies being analysed. This may extend to different factors and variables that are considered or reported. Also, the way the effect is measured could result in variances in outcomes of studies. This extends to conducting a sensitivity analysis to determine which changes factors and variables will affect how the direction and size of the effect; see Section 7.5 for more detail. Thus, variances may be inherent to research designs, subgroups or variables considered, all which normally should be part of a systematic review using meta-analysis. NOTE: INDEPENDENCE OF STUDIES As Schmid et al. (1991, p. 106) note, the use of statistical approaches to estimate effect size and variance assumes that studies included in the meta-analysis are independent of each other. In practice, this will be more difficult because of studies building on each other and peer networks that may influence how a study has been undertaken. To this purpose, they suggest analysing results according to periods they took place, research institutes and researchers; it is suggested here that considering periods could be based on points in time where significant steps were made in acquiring scholarly knowledge, and developing conceptualisations and theoretical foundations. NOTE: SIMPSON’S PARADOX FOR META-ANALYSIS When conducting statistical analysis of subgroups and clusters of studies during meta-analysis, Simpson’s paradox3 may occur; it also goes by the names: amalgamation paradox, reversal paradox, Simpson’s reversal and Yule-Simpson effect. This paradox states that a trend may appear in several subgroups of data but disappears or reverses when the subgroups are amalgamated. In such a case, adding up effects from retrieved studies may lead to an incorrect aggregated estimate for the effect size. Bravata and Olkin (2001) demonstrate this for the case of meta-analysis. This leads to the advice to examine whether directions and sizes of effects differ on different aggregation strata of objects and subjects of the meta-analysis.
3 The term Simpson’s paradox was introduced by Blyth (1972), inspired by Simpson (1951). However, notions by Pearson et al. (1899, p. 278) and Yule (1903, pp. 132–4) about combining data seem to predate Simpson (1951).
7.3 Identifying and Coding Variables and Attributes …
7.3
247
Identifying and Coding Variables and Attributes for Inclusion in Meta-Analysis
A key feature of meta-analysis is the extraction of data from retrieved studies for the statistical analysis to follow. The relevance of these data is related to the review question. In this sense the format for review questions population-interventionoutcome (PIO) and its variants could support identifying the attributes and variables of interest; Figure 7.5 displays this for population-intervention-comparisonoutcome (PICO). The effect, also designated outcome, is normally the dependent variable. Attributes of the population, its constituents and the parameters of the intervention are the independent variables. The mediating variables—variables that link the independent and dependent variables—are normally found as attributes of the comparison. In statistics there are also moderating variables—variables that affect the direction of the dependent variable but do not have a causal relationship; an example would be age affecting the body’s response to a virus, but age is not a cause of contracting the virus. Models, theories and laws of observed regularities, see Section 4.4, can support identification of dependent, independent and mediating variables in a more structured manner. Thus, the review question determines the relevant dependent, independent and mediating variables for which data has to be extracted from the retrieved studies. Further sources of attributes and variables relevant to the study can be the examination of the characteristics from studies listed. When appraising individual studies, it is possible to identify which dependent, independent, mediating and moderating variables have been used. Because it is unlikely that all studies are identical, this may result in a range of variables that should be assessed on their relevance to the review question. It could be that some variables have been deemed unrelated or statistically insignificant. For this point, it is also important to consider potential changes over time related to advancements in scholarly knowledge. Later studies may contain different variables and factors than earlier studies. Potentially, further information on relevant attributes and variables can be found in the discussion of findings and conclusions of individual studies; limitations and suggestions for further research may point to issues that were not considered but could influence the outcome. Such points are also helpful for collecting information needed for assessing the quality of evidence; see Section 6.4. In this respect, evaluating their study design, methods and tools is necessary, because these might influence both the quality of data and the reliability of the study. Thus, appraising studies on attributes and variables used, and their design of the research methodology could complement the structured approach followed when identifying them from the review question. For the statistical analysis variables can take different forms, which determines how data are extracted from retrieved studies. The following forms of data are commonly distinguished:
248 Population Attributes · Independent var. · Moderating var. · Mediating var.
7 Principles of Meta-Analysis Intervention · Independent var. · Moderating var.
Comparison · Independent var. · Mediating var.
Outcome Effect · Dependent var.
Attributes of studies’ designs · Mediating var. (methods, tools, etc.)
Fig. 7.5 Symbolic representation for dentifying attributes and variables for meta-analysis. This figure demonstrates how the PICO (population-intervention-comparison-outcome) format can be used for identifying independent, moderating, mediating and dependent variables. Attributes of the population being studied are divided into independent and moderating variables. This division appears, too, for the intervention. The comparison of an intervention (or policy, practice or treatment) introduces mediating variables in addition to independent variables. The description of the effect to be studies results in dependent variables.
• Dichotomous and binary data. Dichotomous data have only two values, where each entity’s variable is one of only two possible categories. Binary data, a subclass of dichotomous data, have only the value 0 or 1. An instance of dichotomous data are receiving a pass or fail for an exam (for example, typical for a doctoral study) and an instance of binary data a response to a yes–no question recorded as 0 or 1. • Ordinal data. This type of data is a categorical, statistical data type where variables have ordered categories and distances between categories is not known; normally, there are more categories, distinct from dichotomous and binary data. An example is the variable colour when no measurement devices are used, but its measurement is a visual assessment by a person. For such data well-defined measurement scales are a key factor for improving reliability. • Continuous data. This type of data reflects each entity’s outcome as a measurement of a numerical quantity. Principally, they can be measured on an infinite scale, meaning in practice it can take any value between two numbers, no matter how small. The measurement of a persons’ height would be an example of this type of data. • Counts and rates. This type of data is calculated from counting the number of events experienced by each entity. The observations can take only non-negative integer values and where these integers arise from counting rather than ranking. Instances are the number of commutes per month and the number of sunny periods per time interval (year, month). • Ratios. A ratio indicates how many times one number contains another. An instance is the profitability of a firm expressed as percentage of its revenue in a certain period. • Time-to-event. Length of time until the occurrence of a well-defined end point of interest. However, it could be that not all entities in the study experience the
7.3 Identifying and Coding Variables and Attributes …
249
event due to limitations in the observation period; this is called censoring. An example is how long it takes before a patient experiences a stroke. Consequently, the data for the time-to-event variables may not be normally distributed. When collecting these data from retrieved studies, categorisations and measurements of these data may vary across studies. This could be that data are measured using different scales, with height of a person in terms of meters and inches being an obvious case in point. It could also be that measurement scales are having different intervals; dividing patients in different age groups could be an example of such. Also, measurements and their accuracy may depend on instruments used; instruments include surveys, questionnaires and checklists. An instance would be measuring heart rate by taking a pulse on the wrist or using electrocardiography. In addition to different types of data, there also are different ways variables are measured and categorised. Data extraction and its abstraction into variables for meta-analysis need to happen in a structured manner. The collection of data from retrieved studies implies an a priori structure, which can take the form of using documents (hard copy or electronically), spreadsheets and tools. Also, the conversion of data and variables should be adequately recorded to improve reliability and transparency. To this purpose, it could be considered what tools to use. For example, Pedder et al. (2016, p. 212/2) suggest spreadsheets and databases with limited functionality for smaller datasets, and specific statistical software for larger datasets and more extensive data manipulation. Their DECiMAL guide provides further detail what to consider when extracting data and converting these into datasets. Also, there have been developments to extend statistical software to be more compatible for meta-analysis. An example is the extension METAGEAR for the software R (Lajeunesse, 2016). Furthermore, Brown et al. (2003, pp. 207–8) suggest developing a codebook for the extraction of data; such a protocol is helpful for ordinal data and other data that need to be converted, including codes for the study design. Thus, using appropriate tools and codebooks supports the appropriate extraction of data from retrieved studies. TIP: RULE OF SIGNIFICANT DIGITS The rule of significant digits applied to the extraction of data from retrieved studies implies that results cannot be of a higher accuracy than the measurements. For example, a study with a sample of 100 entities cannot have an accuracy of effect size more than one percent; so, if a calculated parameter is 0.123456, it suggests a higher degree of accuracy than warranted by the size of the sample. Based on the sample size, the effect size should have been recorded as 0.12. A particular point of contention for the rule of significant digits are so-called Likert scales, commonly used for surveys and questionnaires. This scale often has a 5-, 7- or 9-point scale for respondents. Principally, the accuracy is determined by the intervals, and thus, for a 9-point scale, the uncertainty is about 13 percent (depending on the calculation and the significant digits). This means that the rule of significant digits may lead to revisiting data and their conversion for use in meta-analysis.
250
7 Principles of Meta-Analysis
TIP: IMPROVING QUALITY OF DATA EXTRACTION There are a few ways to improve the quality of the data extraction. A longstanding advice is to have at least two independent reviewers per retrieved study; this is called double-data extraction. This is different from single-data extraction, where a second reviewer (or observer) verifies the data extraction by a first reviewer. Generic guidance here (e.g., Munn et al. 2014, p. 50; Wanous et al. 1989, p. 263) is that systematic reviews with meta-analysis should be undertaken independently by reviewers before they are merged into the data for the meta-analysis, i.e., double-data extraction. Wanous et al. (ibid., p. 263) conclude that differences can occur during the stages of deciding which studies to include and how data are extracted, based on their comparison of few meta-analysis on similar topics. They advise to report how judgment calls were made with the aim of providing transparency and increasing reliability. A step further would be to analyse and report the inter-rater reliability, i.e., the consistency in ratings given by the same person across multiple instances. Also, sometimes, the intra-rater reliability is considered. In this case, if various raters (reviewers of data extraction from primary studies) do not agree, then either the scale of rating is inappropriate or the protocol for rating may need revisiting or the instructions to reviewers may need clarification. Both the intra-rater and inter-rater reliability can be evaluated using statistical tools. Thus, the quality of extraction can be enhanced by having independent reviewers, considering intra-rater reliability and inter-rater reliability, and adequate reporting.
7.4
Models for Calculating Effect Sizes
In practice calculating effect sizes depends on the data being analysed and the statistical model chosen. The four main statistical models—common-effect, fixed-effects, random-effects and mixed-effects models—differ in their assumptions about the population of studies that are being analysed. Particularly for common-effect and fixed-effects models, sometimes there is confusion about the nomenclature and the application of the appropriate models, for example, as noted by Rice et al. (2018, pp. 207, 209). The methods for calculating effect sizes differ from statistical approaches to correlation of variables. For the latter, common measures such as r2 tests (i.e., the proportion of the variation in the dependent variable that is predictable from the independent variable) and Chi-squared tests (i.e., the degree of expected frequencies of occurrences related to observed frequencies) are of little value, because they do not determine the direction and size of the effect. In this respect, the emphasis is on estimating the dependent variables: • effect size: b; • variance: c.
7.4 Models for Calculating Effect Sizes
251
This means that common-effect, fixed-effects, random-effects and mixed-effects models, which are explained in the next four subsections, use specific approaches to meta-analysis with the purpose of determining the direction, size and uncertainty from a set of studies.
7.4.1
Common-Effect Models
The common-effect model can be used when studies in a collection for meta-analysis are replicates or near-replicates of each other; it is also confusingly denoted as the fixed-effects model (Rice et al. 2018, p. 209), with McKenzie et al. (2016, p. 628 ff.) being a case in point. For the common-effect model it can be assumed that studies have used an identical parameter for measuring the effect of an intervention or treatment. Because all measure the independent variables and the effect as dependent variable in a similar manner, there is no or negligible variance across studies; this is akin to pooled data Figure 7.3 in, but because original data are not available for the meta-analysis, only reported summary data can be extracted from all or most retrieved studies. Thus, this model is only applicable in the case of set of retrieved studies that are (almost) identical in their study design with respect to the review question for the meta-analysis or a cluster of studies within a collection under the same condition. For calculating the effect size in the common effect model, a frequently used method is the weighted mean average. This assumes that each study has only a measurement error related to effect size: ci ¼ hcommon þ es;i
ð7:1Þ
where • hcommon is the effect size of the pooled studies; • es,1 represents the measurement error of the ith study. Principally, a systematic measurement error implies that the larger the number of observations pooled together, the lesser the effect of it is. Therefore, averaging the effect sizes of the individual studies should lead to a reliable estimate of the mean effect size: Pk ni c i b b common ¼ Pi¼1 k i¼1 ni where • ni is the number of observations in a study; • 1 i k; • k > 2.
ð7:2Þ
252
7 Principles of Meta-Analysis
This means that studies with a larger number of observations weigh more than studies with fewer observations in the common-effect model.
7.4.2
Fixed-Effects Models
The fixed-effects model builds on the assumption that there is one true effect size that applies to all studies in the meta-analysis. Therefore, all differences in observed effects are due to sampling errors: ci ¼ hfixed þ ei
ð7:3Þ
where • hfixed is the effect size of the pooled studies; • ei represents the study-specific random error of the ith study. These assumptions are easy to understand but may be unrealistic as research is often prone to several sources of variance; for example, treatment effects may differ according to locale, dosage levels and study conditions. In this sense, it differs from the common-effect model where there is greater homogeneity across studies due to using a highly similar method for measuring the effect size; symbolically, this is displayed in Figure 7.6. This means the fixed-effects models brings together studies on the same phenomena, but with variances inherent to sampling and independent variables.
Measurement error
Effect a. common-effect model
Sampling error
Effect b. fixed-effects model
Effect c. random-effects model
Fig. 7.6 Symbolic representation of common-effect, fixed-effects and random-effects models. The three different approaches to mathematical models for measuring effects are shown in this adaptation of the figure by Hartung et al. (2011, Chapter 4, cited by Rice et al. (2018, p. 209)). In the common-effect model (a), there is one effect and several observations are combined; variation is only a result of measurement errors. In the case of the fixed-effects model (b) the variation is attributed to sampling error. In case of between-study variation, the random-effects model is more appropriate.
7.4 Models for Calculating Effect Sizes
253
In the fixed-effects model, the summary effect size uses a weighted average of a series of study estimates. The inverse of estimates’ variance is commonly used as study weight, so that larger studies tend to contribute more than smaller studies to the weighted average. The inverse-variance weighted average is calculated: Pk w i ci b b fixed ¼ Pi¼1 k i¼1 wi
ð7:4Þ
where • wi = 1/vi is the weight in the ith study. The fixed-effects model assumes that the major source of variability that requires statistical manipulation is the chance variation between studies of the estimate of the true effect size. Hence, the variance is calculated as follows: 1 b S 2fixed ¼ Pk i¼1
wi
ð7:5Þ
where • b S 2fixed is the estimated sampling variance and • b S fixed the standard error. This approach to calculating the effect size and the variance is also known as the weighted least squares. The estimate for variance can be used to calculate confidence intervals. A usual approach for fixed-effects models is the so-called Wald test: b b fixed zbs fixed
ð7:6Þ
where • z is a quantile of the normal distribution. Normally, the confidence interval is set at 95%, therefore, the corresponding value of a two-tailed test should be z0.975. This value can be found in so-called z-tables and is approximately 0.196. With the z-value the upper and lower bound of the confidence interval can be determined. In practice, fixed-effects models work best with groups of similar studies addressing the same question. This means that studies in the set for meta-analysis are functionally identical (Borenstein et al. 2010, p. 105). When dominated by a very large study, findings from smaller studies are practically ignored. This may be acceptable in some settings such as similar trials of the same drug treatment, but is less likely when a treatment or intervention is more complex or when there are a multitude of factors influencing the effect size. The second limitation is that findings need not be to be generalised beyond the population of objects and subjects studied
254
7 Principles of Meta-Analysis
so that a common mean can be determined (ibid.). In most systematic reviews on an intervention, policy, practice or treatment, such populations are hardly studied and a larger variety should be taken into account. This means that the fixed-effects model is more likely an ideal type for analysis rather than a realistic approach due to inherent variety across studies. NOTE: CAUTION FOR CONFIDENCE INTERVALS Schmidt et al. (2009, p. 124) find that when using the fixed-effects model confidence intervals are too narrow. Their empirical analysis of meta-analysis shows that fixed-effects confidence intervals around mean effect sizes are, on average, 52% narrower than their actual width (ibid., pp. 119–20). In particular, the nominal 95% fixed-effects confidence intervals were found to have a level as low as 56%, on average.
7.4.3
Random-Effects Models
The random-effects model assumes analysing a population of different studies and allows that the true effect size might differ from study to study. Therefore, the random-effects model aims to estimate the average effect in a population of studies and does not assume they are all measuring the dependent variable in the same way. The model takes into account both between-study and within-study variability, expressed as: ci ¼ hrandom þ ni þ ei ð7:7Þ where • hrandom is the effect size of the studies in a collection; • ni represents the sampling error of the ith study. The variance of n, written as s2, is known as the amount of heterogeneity or the variance component. In case of the fixed-effects model it is assumed that s2 = 0. However, in the case of the random-effects model, a common practice is to investigate this variation; see Section 7.5 for further detail on causes for heterogeneity, and how it is calculated and analysed. Also, for the random-effects model the inverse-variance weighted average is used for calculating the effect estimate: Pk e i ci w b b randomd ¼ Pi¼1 k ei i¼1 w
ð7:8Þ
where e i = 1/(v + s2) is the weight in the ith study. • w The random-effects model assumes that the major source of variability that requires statistical manipulation is the chance variation between studies to determine the estimate for the true effect size. Hence, the variance is calculated as follows:
7.4 Models for Calculating Effect Sizes
255
1 b S 2randomd ¼ Pk i¼1
ei w
ð7:9Þ
where • b S 2random is the estimated sampling variance and • b S random the standard error. This means that the calculation of variance includes both the in-study variability and the between-study variability. When applying the random-effects model to a meta-analysis, the confidence interval needs to be calculated, too. This can follow Eq. 7.6 for the Wald test, assuming a normal distribution. In addition to the Wald test, Veroniki et al. (2019) provide an overview of different methods for determining confidence intervals. They (ibid., pp. 27–8) also created an overview of which software supports which method for calculating confidence intervals. The accuracy of these methods is largely determined by how many studies the systematic review with meta-analysis covers. As a result of taking between-study variability into account, random-effect models tend to provide more ‘conservative’ variance measures than fixed-effects models. In practice, apart from wider confidence intervals, random-effects models tend to provide greater emphasis on the results of smaller outlying studies. Thus, small studies of likely lower methodologic quality will be given more relative weight using a random-effects model than when a fixed-effects model is employed. NOTE: UNRESTRICTED WEIGHTED LEAST SQUARES MODEL Stanley and Doucouliagos (2015) draw attention to the unrestricted weighted least squares model. They (ibid., pp. 2125–6) show that this model offers a correction for the standard error in the fixed-effects model and is often superior to the conventional random-effects model; see Table 7.1 for a comparison how variance and weight are calculated in each model. Also, the claim is made that the unrestricted weighted least squares model is easy to implement. NOTE: PREDICTION INTERVALS INSTEAD OF CONFIDENCE INTERVALS Instead of confidence intervals, or as complementary test, prediction intervals can be used for uncertainty. Whereas the confidence interval quantifies the accuracy of the mean for the effect estimate, the prediction interval quantifies the dispersion (or distribution) of effect estimates. Chiolero et al. (2012) demonstrate that prediction intervals may be wider than confidence intervals for pharmacist interventions to improve the management of major cardiovascular disease risk factors in outpatients using a random-effects model. However, they point out that although interventions are effective on average to decrease blood pressure based on the prediction intervals, some of these pharmacists’ interventions may not be as effective as derived from confidence intervals. Consequently, prediction intervals may be wider than confidence intervals.
256
7 Principles of Meta-Analysis
Table 7.1 Weights and variances for the fixed-effects, random-effects model and unrestricted weighted least squares. All three models use the same estimator for the effect size— inverse-variance weighted average—but vary in terms of weights and variances are calculated. In the case of the model for the unrestricted weighted least squares, unknown multiplicative constant u. Derived from Stanley and Doucouliagos (2015, p. 2117). Model Fixed-effects model
Weight (wi) 1 s2i
Variance (vi) 1 Pk i¼1 wi
Random-effects model
1 s2i þ s2
Pk
1 ;s2i
Pk
Unrestricted weighted least squares
7.4.4
1
i¼1
wi
1
i¼1 wi
/ ¼ Pk
2 i¼1 si
Mixed-Effects Models
In addition to single effect size estimates that are collected from multiple independent studies, there are also approaches to include multiple outcomes and multiple variables, which are called mixed-effects models. These include, among others, multivariate mixed-effects models for pooling multiple outcomes or multi-parameter associations, multilevel mixed-effects models for hierarchically-structured studies, intervention-effect (response) mixed-effects models and longitudinal mixed-effects models for studies reporting multiple estimates at different times; see Figure 7.7 for a symbolic representation of data structures for mixed-effects models. For these mixed-effect models approaches have been developed. For instance, Sera et al. (2019) introduce a framework for meta-analysis using mixed-effects models and present an extension to the software R to support the analysis. Typically, multilevel mixed-effects models lead to analysis of different aggregation strata; this involves the estimate of effect size and analysis of the aggregated effect, but also requires analysis at the lower levels. An example of mixed-effects model is the study by Polák (2017) into whether investments in information and communication technology yield improvements in productivity (which is called the productivity paradox). Multi-level analysis of 850 estimates from more than 70 studies together with other descriptive indicators lead to the conclusion that these investments only result in 0.30% improvement of productivity and have no effect on profitability of firms (ibid., pp. 47–8). According to him, this might be explained by information and communication technology incorporated into any capital and production technology that it cannot be clearly separated, and thus, find positive effects of these investments. Deliberating on causes for studies with positive results influencing the outcomes, he suggests that this might be caused by self-censorship of authors. This all indicates that
y1, 3
yi, 3
yi, 2
...
yi, 1
y1
Intervention
yi, 2
Study i
Intervention
y2
...
yi, 3
y3
a. Multivariate
yi, 1
Study i
...
yk, 1
yk, 1
Intervention
yk, 2
Study k
yk, 3
yk, 2
Study k
c. Intervention-effect (response)
Intervention
y1, 2
y1, 2
...
y3
y2
yk, 3
y1, t1 Time
y1, t2
Study 1
y1, 1
y1, t3
y1, 2
Cluster 1
...
...
yi, 2
yt2
Time
yi, t2
Study i
Time
yi, t3
yt3
...
d. Longitudinal
yi, t1
yt1
b. Multilevel
yi, 1
Cluster i
y
... yk, t1
ym, 1
Time
yk, t2
Study k
ym, 2
Cluster m
yk, t3
Fig. 7.7 Illustrative examples of data structures for mixed-effects models. This figure shows four data structures for mixed-effects models having k studies adapted from Sera et al. (2019, p. 5432). In the multivariate mixed-effects model (a) interrelated effects are found. This requires analysing not only estimating the separate effect sizes but also their interrelationships. In the multilevel mixed-effects model (b) studies are divided in m clusters. For each cluster an estimate of the effect size is determined. These are combined into an estimate for the overall effect size. In the case of the intervention-effect (response) model (c), there are interrelated interventions, but these are also sequenced. How they are sequenced is then point of analysis. The longitudinal model (d) studies how effects occur over time. Intervals may vary across studies and thus effects that are achieved.
y1, 1
y1, 3
Study 1
y1, 1
Study 1
y1
7.4 Models for Calculating Effect Sizes 257
258
7 Principles of Meta-Analysis
mixed-effects models follow similar approaches to the random-effects model but that they in addition require evaluating more results at the level of aggregate effects and underlying levels of data.
7.4.5
Meta-Regression
One of the methods for mixed-effects models is meta-regression, which combines the techniques of meta-analysis and linear regression (Sutton and Higgins, 2008, p. 629); this is also known as mixed-effects meta-analysis or mixed-effects meta-regression. In principle, both fixed-effects and random-effects models are suitable for meta-regression, though Baker et al. (2009, p. 1433) recommend a random-effects model when there is between-study variability. Meta-regression is similar to simple regression analysis in that the outcome variable (the effect estimate) is predicted according to one or more explanatory variables; these are often called ‘potential effect modifiers’ or covariates. The model allows the investigation of effects from continuous as well as categorical characteristics of objects and subjects (Tipton et al. 2019, p. 174). In principle, meta-regression allows the effects of multiple factors to be investigated simultaneously. Meta-regression aims at finding out the influence of variables on the effect estimate. In the case of a single variable for meta-regression, sometimes called univariate meta-regression, the following formula is used: ci ¼ hregression þ ai zi þ ei
ð7:10Þ
where • zi is the (independent or mediating) variable. • ai represents the regression coefficient. However, it is more likely that a number of variables need to be considered. For multiple variables or covariates, Stanley and Jarrell (2005, p. 302) provide the following formula. ci ¼ hregression þ
i X
aj zi;j þ ei
ð7:11Þ
j¼1
where • Zi,j is the meta-independent variable which reflects relevant characteristics of an empirical study. • aj represents the regression coefficient for the meta-independent variable. In this covariate meta-regression Zi,j might include (ibid., p. 302):
7.4 Models for Calculating Effect Sizes
259
• Specified variables that account for differences in functional forms, types of regression, and data definitions or sources, etc.; see Section 7.3 for the type of data extracted from the studies considered for meta-analysis. • Dummy variables which reflect whether potentially relevant independent variables have been omitted from (or included in) the primary study. • Sample size of a study. • Selected characteristics of authors of primary studies. • Measures of research or data quality. To explore variance across the primary studies, an R2 index can be calculated to indicate its percentage of variance explained by the predictors. Meta-regression is seen as very suitable for exploring between-study variability. In this regard, it can be considered as an extension to subgroup analyses (Thompson and Higgins, 2002, p. 1563). This may happen because not all required data are necessarily available in all studies; then, studies are clustered for analysis of specific variables leading to subgroups of studies to be considered. Again at the level of subgroup or cluster of studies, meta-regression allows the effects of multiple factors to be investigated simultaneously. However, this is often limited by inadequate and incomplete data being available (e.g., ibid., p. 1566), which is also sometimes a consequence of inadequate reporting (Tipton et al. 2019, pp. 167–8). Furthermore, as a rule of thumb, meta-regression should not be considered if there are fewer than ten studies in a meta-analysis. Thus, the application of meta-regression is limited by reported data in primary studies and the number of studies for analysis at aggregate or subgroup level. An example of the use of meta-regression is the systematic review by Itani et al. (2017) into the relationship between (short) sleep duration and health outcomes. They used data from 153 studies to perform a meta-regression relating to the duration of short sleep to mortality and other health outcomes, among them hypertension, coronary heart diseases and depression. The independent variable sleep duration is a continuous variable, which allows to undertake the meta-analysis, though measured in discrete steps in this paper. For the relationship of sleep duration to risk of mortality the outcome is presented in a scatter plot, found in Figure 7.8; this follows the advice of Baker et al. (2009, p. 1431), and Thompson and Higgins (2002, p. 1560) that visualisation is essential to interpretation of the results from meta-regression. They (Itani et al. 2017, p. 254) conclude that short sleep defined as the duration less than six hours was associated with a significant increase in mortality, diabetes, cardiovascular disease, coronary heart disease and obesity. This example shows that meta-regression is helpful when there is a linear relationship between variables and outcomes. NOTE: DATA DREDGING Thompson and Higgins (2002, p. 1566) draw attention to the pitfall of ‘data dredging.’ This may occur when there are few trials but many variables that may explain variance across studies. Studies may tend then to over-fit data and results to
260
7 Principles of Meta-Analysis
Fig. 7.8 Meta-regression for duration of short sleep in mortality risk. In the study by Itani et al. (2017, p. 254)* shorter duration of definition of short sleep is significantly associated with increase in mortality risk (coefficient = 0.056, standard error = 0.021 and R2 analog = 0.84). The confidence intervals of 95% are above zero at the definition of sleep less than six hours. * Reprinted with permission of Elsevier.
conjectures and findings in order to discover significant findings. They suggest limiting the number of variables to avoid this fallacy; van Houwelingen et al. (2002, p. 607) add that five to ten studies for each variable should be considered to avoid this scenario. In addition, Schmidt (2017, p. 470) notes that with eight variables the number of studies should be at least 150, while noting that only very few systematic reviews with meta-regression for eight variables reach this number. Though not noted by these authors, adequate modelling, as suggested in Section 4.4, may also support staying away from data dredging.
7.5
Common Measures for Effect Size Used in Meta-Analysis
The effect for which meta-analysis generates an estimate, through fixed-effects models, random-effects models, mixed-effects models and meta-regression, can be measured in various ways. This subsection introduces some common measures— standardised mean difference, weighted mean difference, risk ratio, odds ratio—and briefly introduces three other ones—correlation coefficient, proportion, standardised gain scores.
7.5 Common Measures for Effect Size Used in Meta-Analysis
7.5.1
261
Standardised Mean Difference
The standardised mean difference (normally abbreviated as SMD) is used when the studies assess the same continuous outcome but measure it in a variety of different ways, for example, different quality-of-life scales. To manage the different scales the results of the studies are standardised to a uniform scale before they can be combined. Takeshima et al. (2014, p. 30/5) suggest that the standardised mean difference may be preferred over the mean difference from the viewpoint of generalisability. The standardised mean difference is calculated by dividing the difference in mean outcome between groups by the standard deviation of the outcome among participants. Therefore, studies for which the difference in means is the same proportion of the standard deviation will have the same standardised mean difference, regardless of actual scales for the measurements. The standardised mean difference expresses the size of the intervention effect in each study relative to the variability observed in that study; see Table 7.2 for determining effect size and approximations of variance. The standard deviations are also used to compute study weights, so studies with small variance have relatively higher estimates of standardised mean difference. This assumes that between-study variability in standard deviations reflects only differences in measurement scales and not differences in the reliability of outcome measures or variability among study populations. There are different ways for calculating the effect estimate with the standardised mean difference based on which denominator is used. A common calculation is called Cohen’s d; it can be calculated as the difference between the means of the control group and the experimental group divided by the pooled standard deviation; see Lakens (2013) for a more detailed description for calculating Cohen’s d. Since it tends to overestimate the effect size, especially when sample size are less than twenty studies, a correction factor can be applied. This factor for the standardised mean difference is called Hedges’ g. However, when the mean difference is divided not by the pooled standard deviation but by the standard deviation of the control group, it is known as Glass’ d. The latter is preferred when the intervention or treatment changes not only the mean but also the standard deviation; particularly, this applies to cases where there is considerable difference between the standard deviation of the control group and the experimental group. For the Glass’ d the control group serves as point of reference. Thus, these three ways of calculating the effect estimate for the standardised mean difference—Cohen’s d, Hedges’ g and Glass’ d—are applied depending on the number of studies included, and whether observed changes are substantial or marginal. NOTE: DATA EXTRACTION AS KEY TO ACCURACY Gøtzsche et al. (2007) draw attention to the role of data extraction for achieving accuracy of meta-analysis using the standardised mean difference. They (ibid., pp. 435–6) find errors related to extraction of data that potentially negate or even reverse the findings of the included studies. They suggest mitigating this by calling on statistical expertise, paying attention, using double-data extraction instead of
Proportion
Fisher’s z transformed score
Correlation coefficient
Odds ratio
Risk ratio (aka relative risk)
Weighted mean difference
Standardised mean difference
a: frequency of occurrence b: frequency of non-occurrence n=a+b p = a/n
c: frequency of success in Group 2 d: frequency of failure in Group 2 n2 = c + d r: sample correlation coefficient n: sample size
a: frequency of success in Group 1 b: frequency of failure in Group 1 n1 = a + b
X 1 : sample mean for Group 1 s21 :sample variance for Group 1 n1 :sample size for Group 1 X 2 :sample mean for Group 2 s22 :sample variance for Group 2 n2 :sample size for Group 2
Summary statistics
bc
a d
an2 cn1
1þr cz ¼ :5x log 1r cP log ab
cR ¼ r
cOR ¼ log
ðn1 1Þs21 þ ðn2 1Þs22 n1 þ n2 2
cRR ¼ log
s2pooled ¼
pooled
ðn1 1Þs21 þ ðn2 1Þs22 n1 þ n2 2
cWMD ¼ Xs21 X 2
s2pooled ¼
Effect size X 2 cSMD ¼ 1 4ðn þ3nÞ9 Xs1pooled
c2WMD 2ðn1 þ n2 Þ
c2SMD 2ðn1 þ n2 Þ
vP ¼ 1a þ
1 b
n1
ð1r2 Þ 1 vz n3
vR ¼
vOR ¼ 1a þ
2
1 b
þ
1 c
þ
1 d
vRR ¼ 1a n11 þ 1c n12
vWMD ¼ n1n1þn2n2 þ
vSMD ¼ n1n1þn2n2 þ
Approximate variance
Table 7.2 Effect sizes and their approximate variance for common measures. Based on Cheung et al. (2012, p. 131)*, this tabulation shows the formulae for the most common measures in use for meta-analyses. * Reproduced with permission of Wiley and Sons.
262 7 Principles of Meta-Analysis
7.5 Common Measures for Effect Size Used in Meta-Analysis
263
single-data extraction and adequate reporting. They remark that errors are more likely to occur in systematic reviews using continuous and ordinal-scale outcomes than in those with binary data for the meta-analysis. These points they note apply may well apply to meta-analysis that employ other effect measures than the standardised mean difference. TIP: DIRECTION OF SCALES An important tip to remember here is to ensure that all scales are operating in the same direction.
7.5.2
Weighted Mean Difference
The weighted mean difference (normally abbreviated as WMD) follows the same principle as the standardised mean difference but where all studies have used the same scale for the outcome measure. Therefore, results are in ‘real’ units of measure; for example, the weight change of patients is always measured in kilograms. In this approach, the standard deviations are used together with the sample sizes to compute the weight given to each study; see Table 7.2 for determining effect size and approximations of variance. Studies with small standard deviations are given relatively higher weight, whilst studies with larger standard deviations are given relatively smaller weights. An example of a systematic review with meta-analysis using both standardised and weighted mean differences is Zeng et al. (2014); they look at the effect of qigong and tai chi on health-related outcomes of cancer patients. They use the weighted mean difference for body mass index, body composition and cancer-specific quality-of-life; this is compliant with the notion that the weighted mean difference is most appropriate when studies employ the same scale for a measure. The changes in depression and anxiety score are assessed using the standardised mean difference. They (ibid., pp. 184–5) find that qigong and tai chi had positive effects on the cancer-specific quality-of-life, fatigue, immune function and cortisol level of cancer patients. Another finding is that qigong has no effects on reducing the anxiety of cancer patients, as the pooled effect of anxiety change score was in favour of control groups rather than the intervention group. However, they also note risk of bias because of no adequate allocation concealment, no description of random sequence generation, and variation in intervention frequency and duration across the studies. This example demonstrates that standardised and weighted mean differences are used for different effects, and can be used in conjunction with each other.
7.5.3
Odds Ratio and Risk Ratio
For meta-analysis of studies, the odds ratio and risk ratio are used when comparing groups. The odds ratio (often abbreviated to OR) measures the ratio of the odds of
264
7 Principles of Meta-Analysis
an event occurring in one group versus another. It is used with binary, dichotomous data, such as pass or fail, alive or dead, and recovered or not recovered. The risk ratio (commonly abbreviated to RR) is the ratio of the probability of an event in the treatment group compared to that in the control group. See Table 7.2 for calculating effect size and variance. The odds ratio is less intuitive than the risk ratio but has better mathematical properties for analysis. According to Bakbergenuly et al. (2019, p. 398), this leads to balancing the mathematical convenience of the odds ratio against the simpler interpretation of the risk ratio. Thus, the use of risk ratio and odds ratio are closely related for analysing binary, dichotomous data but both measures have an advantage over each other. Two examples of systematic reviews with meta-analysis demonstrate how the two ratios can be used. The first example is the systematic review by Lopes et al. (2019) into the three interventions (warm-up, neuromuscular training and eccentric exercise [slow, lengthening muscle contractions for a specific muscle]) to reduce incidence of injuries among athletes. Their meta-analysis based on risk ratios covers sixteen studies, from which eight on eccentric exercise, five on warming up and three on neuromuscular training. Both fixed-effects and random-effects models are used to establish that eccentric exercise and neuromuscular training are effective in preventing lower limb muscle injury when compared with a control group, and regarding warm-up no significant difference was observed between groups (ibid., pp. e003224/8–9). The risk ratio is appropriate here because the control groups and interventions have different occurrences of muscle injuries. The second example is the systematic review by O’Keefe and Hale (2001) using the odds-ratio to investigate the door-in-the-face technique; this is a compliance method in which a relatively large initial request is made of a person, which the person declines, followed by a smaller one, with the hope that the person’s having declined the initial request will make the person more likely to comply with the second (target) request. Their rationale for the odds ratio (ibid., pp. 31–3) is chiefly that the outcomes are dichotomous: compliance or non-compliance, whereas three preceding meta-analyses have relied on using correlation coefficients to investigate variables that influence the (non) compliance. Although not expected by them, they (ibid., p. 37) conclude that findings from their earlier meta-analysis4 using correlation coefficients are largely intact; the only difference was that the meta-analysis using the odds ratio for a random-effects model yielded a dependably negative effect rather than a negative mean effect. They note that normally such congruence of results between meta-analysis using the odds ratio and correlation coefficient are not to be expected. Thus, these two examples demonstrate how the selection of the appropriate measure—risk ratio or odds ratio—is related to what is being studied.
The other two of the three preceding systematic reviews with meta-analysis were dated fifteen years before this systematic review using the odds ratio for conducting the meta-analysis.
4
7.5 Common Measures for Effect Size Used in Meta-Analysis
7.5.4
265
Correlation Coefficients, Proportions and Standardised Gain Scores (Change Scores)
Less commonly used approaches include meta-analysis of correlation coefficients, proportions and standardised gain scores (the latter are also called change scores). The measure correlation coefficient expresses the relationship between variables. When correlation coefficients are used as measure, the coefficients are usually converted into a standard normal metric using Fisher’s z transformation called Fisher’s z transformed score, see Table 7.1. Also, it is possible to use Pearson’s correlation coefficient and other similar methods. Both fixed-effects and random-effects models can be applied for a meta-analysis with correlation coefficients as measure. An instance of a systematic review with meta-analysis using correlation coefficients is the work by Kim (2005) into the relationship between creativity test scores and IQ scores. The quantitative synthesis of 21 studies used Fisher’s z transformation for estimating the effect size and a weighted linear multiple regression model for variables that were statistically significant moderators for explaining variation in the magnitude of correlation coefficients. Her conclusion (ibid., p. 65) is that ‘the negligible relationship between creativity and IQ scores indicates that even students with low IQ scores can be creative.’ As another measure for effect size, proportion can be used for aggregated studies that provide data about individual groups with respect to a dichotomous dependent variable; see Table 7.2. Here, two variables should be specified ai and ni, denoting the number of objects or subjects experiencing the event of interest and the total number of objects or subjects within each study, respectively. Instead of specifying ai, it is also possible to use bi for specify the number of objects or subjects that do not experience the event of interest. In addition to this raw proportion, sometimes the measure is modified using the sample size or variance. An example of using proportions is the study by Cortoni et al. (2017) into the proportion of sexual offenders who are female. They use grey literature,5 see Section 5.7 for a description, to provide a robust estimate of the prevalence of female sexual offending based on 17 samples from twelve countries. To overcome the limitation of the raw proportion—meta-analysis not being optimal for low proportion events—, they (ibid., p. 151) use a proposed variance formula to adjust the proportion measure. Based on this adjustment for the measure, they calculate the effect size using fixed-effects and random-effects models. One of their findings (ibid., p. 154) is that ‘while females constitute a very small proportion of sexual offenders in police reports and court cases, there exists a much larger proportion of female sexual offenders that are not reported to the police.’ A less used measure is the standardised gain score. An analysis based on changes from a baseline removes a component of variability, making it mathematically more efficient. However, in practice it may be less efficient for outcomes which are unstable or difficult to measure precisely, where The authors do not use the term ‘grey literature’, which is introduced here for consistency of terminology in the book.
5
266
7 Principles of Meta-Analysis
the measurement error may be larger than true baseline variability. A case in point of change scores is the systematic review by Woodward et al. (2005) into potential cognitive benefits of antipsychotic drugs when treating schizophrenia. They performed two analysis, one using the standardised mean difference as it is called here and the other one using the change score as measure; they (ibid., p. 461) remark that both analyses cannot be compared due to the different way studies are evaluated. The review (ibid., p. 466) finds that the second analysis—change scores—supported a finding from the first analysis, i.e., atypical antipsychotic drugs treatments lead to indications of improvements in a wide array of cognitive functions. The three measures discussed here—correlation coefficients, proportions, standardised gain scores or change scores—have their typical applications, as also shown by the examples, and care should be taken of their mathematical properties before applying them in a systematic review with meta-analysis.
7.6
Methods for Meta-Analysis
The most common methods used in meta-analysis vary according to whether a fixed-effects model or a random-effects model; for the fixed-effects model, the Mantel–Haenszel method (Mantel & Haenszel, 1959) is regularly used. It is a method that generates an estimate of an association between an exposure and an outcome after adjusting for or taking into account confounding. The Mantel–Haenszel method may be used with ratios, typically with the odds ratio, but can be applied to the risk ratio, too. The method uses a dichotomous outcome variable and a dichotomous risk factor. The method is particularly advantageous when aggregating a large number of studies with small sample sizes. Extensions of the Mantel–Haenszel method have been made, such as by Peto and colleagues (Yusuf et al. 1985) who presented an alternative method to the usual Mantel–Haenszel method for pooling odds ratios across the strata of fourfold tables. It uses an approximate method of estimating the log odds ratio, and different weights. This method is not mathematically equal to the classical odds ratio but it has come to be known as the’Peto odds ratio.’ The Peto odds ratio can cause bias, especially when there is a substantial difference between the treatment and control group sizes, but it performs well in many situations. Thus, the Mantel–Haenszel method is seen as a basic method for performing a meta-analysis with a fixed-effects model, and the Peto odds ratio is an extension suitable for substantial differences between the treatment and control group. Where a random effects model is being used the DerSimonian-Laird method (DerSimonian & Laird, 1986) (which is also called the normal mixture method) is commonly employed. Historically, it has been the most commonly implemented method in meta-analyses as it is calculated directly rather than requiring an iterative procedure. The DerSimonian and Laird method may be used with ratios—odds ratio, risk ratio—and differences—weighted mean andstandardised mean difference. It uses an inverse-variance method to incorporate an assumption that the different studies are estimating different, yet related, intervention effects. The estimator in the
7.6 Methods for Meta-Analysis
267
DerSimonian-Laird method is derived by comparing the observed value of the test statistic Q and its expectation; see Section 7.5 for the test statistic Q. When the test statistic Q is smaller than its expectation, the DerSimonian-Laird estimate shows no evidence for the presence of between-study heterogeneity and a fixed-effects model is acceptable. However, default use of the DerSimonian-Laird method has often been challenged, as it may underestimate between-study heterogeneity, leading to smaller confidence intervals for the effect estimate, specifically when between-study heterogeneity is large. For example, IntHout et al. (2014) compare it with the HKSJ method (named after Hartung and Knapp [2001] and Sidik and Jonkman [2006]). Their simulations show that the HKSJ method for random-effects meta-analysis consistently results in more appropriate error rates than the DerSimonian-Laird method, especially when there is a smaller number of studies. This means that the DerSimonian-Laird method, though regularly used by systematic reviews with meta-analysis, should be applied with caution when the number of studies is smaller. In addition to these well-used methods, there are extensions and other approaches. Several studies have addressed a particular concern of the DerSimonian-Laird method; it underestimates the between-study variability and consequently leads to narrower confidence intervals that lead to inferences that interventions, policies, practices and treatments may be effective when they are actually not. These studies include Biggerstaff and Tweedie (1997) and Viechtbauer (2007). The overview by Veroniki et al. (2019) evaluates fifteen approaches to setting confidence intervals in the context of random-effects models; they also discuss the advantages of using particular methods for determining these intervals. Other methods that are suitable for meta-analysis include Bayesian approaches (e.g., Sutton and Abrams 2001) and the beta-binomial model (Mathes & Kuss, 2018). Bayesian approaches to meta-analysis use a priori information about probability distributions for effect sizes to analyse actual distributions. For these methods in meta-analysis, Sutton and Abrams (2001) see advantages that they are better able to cope with uncertainty, to include more information and to accommodate more complex, but frequently occurring, scenarios. The methods of the Bayesian approaches can be applied to fixed-effects, random-effects and mixed-effects models, including meta-regression. A different approach to meta-analysis is the use of beta-binominal methods; binominal models have two outcomes, so they are mostly associated with proportions, odds ratios and risk ratios. For example, Bakbergenuly and Kulinskaya (2017) present a beta-binominal model for odds ratios and extend the Mantel–Haenszel method. With respect to suitability, Mathes and Kuss (2018, p. 380) point out that beta-binominal models can cover unusual circumstances, such as zero counts and rare events in studies. This means that alternative approaches to the meta-analysis may be more appropriate for setting confidence intervals and covering specific situations. The availability of multiple methods for meta-analysis has led to studies examining the appropriateness of methods and models. A case in point is Mathes and Kuss (2018) comparing the beta-binominal method with DerSimonian-Laird, modified Hartung-Knapp, Paule-Mandel, Mantel-Haenszel and Peto odds ratio methods, when the meta-analysis has few studies. They (ibid., pp. 379–80) suggest
268
7 Principles of Meta-Analysis
for the fixed-effects model the classical Mantel–Haenszel and Peto odds ratio methods; if the standard fixed-effects models are not converging, for example, because of many double zero studies, then the more robust beta-binominal model is an alternative. For the random-effects model the modified Hartung-Knapp, Paule-Mandel between-study variance estimator with the HKSJ confidence intervals, and the beta-binominal method combined with confidence intervals using a t-distribution with the degrees of freedom equal to twice the number of studies minus 3 to account for the estimation of the 3 distributional parameters are found to be more robust. Also, whether the distribution of effect sizes across studies is normally distributed may play a role in selecting the most appropriate method. For example, Kontopantelis and Reeves (2012a) compare the performance of seven methods when effect sizes are non-normally distributed: DerSimonian-Laird, Q-based, maximum likelihood, profile likelihood, Biggerstaff-Tweedie, Sidik-Jonkman and Follmann-Proscha methods. They (ibid., p. 425) pinpoint that the DerSimonian-Laird method is not the most optimal method for the random-effects model. In a follow-up study (Kontopantelis and Reeves 2012b, p. 659), they conclude that the complexity of using the restricted maximum likelihood method—the eighth method investigated—does not justify its better performance when compared with the DerSimonian-Laird method. As another point of consideration, they note (Kontopantelis & Reeves, 2012a, p. 425) that software packages tend to support fixed-effects models and the DerSimonian-Laird method but less other methods; this remark dates back but is still applicable in some sense. The tool put forward by Suurmond et al. (2017) is an example of being restricted to DerSimonian-Laird method, though with the HKSJ adjustment. Tools that include calculations for effects sizes include spreadsheets (for instance, Lakens, 2013; Neyeloff et al. 2012; Suurmond et al. 2017), specialist software (such as OpenMEE and RevMan [Cochrane Collaboration]) and statistical software with options for meta-analysis (for example, SPSS® and Stata). Some studies compare software for meta-analysis for guidance, albeit often related to specific types of meta-analysis, with Bax et al. (2007) for causal studies, and Pastor and Lazowski (2018) for multi-level meta-analysis being cases in point. Thus, the choice of a specific method is related to the type of model, number of studies, statistical profile for the distribution of effect-sizes and availability of methods in software for meta-analysis.
7.7
Determining Between-Study Heterogeneity
Studies brought together in a systematic review or meta-analysis will inevitably differ to some extent; any kind of variability among studies is called between-study heterogeneity. Some variability may be attributed to chance, but is also influenced by a range of variables, factors and contingencies. Looking at which variables, factors and contingencies cause heterogeneity across studies is exploratory by nature, and therefore, results should be interpreted with caution and will likely identify topics that need further investigation (this can be considered hypothesis-generating).
7.7 Determining Between-Study Heterogeneity
269
Common characteristics to look at include years of publication, types of interventions, policies, practices and treatments, and attributes of what is studied (see Section 7.3). This is often supported by visualisations, with the forest plot, funnel plot, L’Abbé plot and the radial plot (aka Galbraith plot) being the most common ones (Anzures-Cabrera & Higgins, 2010, pp. 77–8); they also appear in Song et al. (2001) guidance for assessing heterogeneity. After looking into more detail about what heterogeneity means in the next subsection, the four common plots will be briefly discussed followed by sensitivity analysis in subsections thereafter.
7.7.1
Distinguishing Types of Between-Study Heterogeneity
In systematic reviews there are basically three different types of between-study heterogeneity: ontological heterogeneity, methodological heterogeneity and statistical heterogeneity. Ontological heterogeneity6 refers to differences in attributes of objects and subjects considered, settings in which studies take place, and changes and phenomena that are studied. In healthcare and medicine ontological heterogeneity is called clinical heterogeneity, which is variability in the participants, interventions, settings and outcomes studied. This is usually assessed in a qualitative or semi-quantitative manner (for example, by using subgroup or sensitivity analyses). Methodological heterogeneity is variation in the design of studies, measurement and tools for independent and dependent variables, and risk of bias; the latter refers to studies with specific methodologies and settings contributing to results, findings and conclusions; a case in point is the number of objects or subjects investigated in studies. This also is usually assessed in a qualitative or semi-quantitative manner (for example, by using subgroup or sensitivity analyses). Statistical heterogeneity is variation in the effects being measured in the different studies. One would expect effects to vary by chance (i.e., random error). Statistical heterogeneity is seen when the observed effects vary more between each other than one would expect due to chance alone. All these three types of heterogeneity need to be explored and assessed when conducting a systematic review with meta-analysis (in some sense, these are valid for other types of systematic reviews, too, and probably also apply to the other archetypes); statistical analysis is the starting point for exploring the impact of the three types on between-study heterogeneity. Often statistical heterogeneity serves as starting point for exploring factors that influence variance; it is often referred to simply as heterogeneity. If a meta-analysis includes a large number of studies and the difference across studies is purely due to random variation, the results of studies will be distributed around an average, and there will be fewer studies whose results are further away from the average (Song et al. 2001, pp. 127, 129). The power of statistical tests for heterogeneity depends on two factors: the number of studies included and the weight allocated to
6
See Section 3.3 for more detail on ontology in the context of research paradigms.
270
7 Principles of Meta-Analysis
each study. Statistical testing is used to estimate the probability of between-study variations that are equal to or greater than the variation observed under the assumption that all studies are estimating the same average effect size seems plausible. Thus, determining statistical heterogeneity is a key step in methods for meta-analysis and starting point for seeking explanations to this heterogeneity.
7.7.2
Determining Statistical Heterogeneity
The result of statistical tests for heterogeneity can be presented as a p-value. A p-value is the probability of obtaining the observed effect (or larger) under a ‘null hypothesis’, which in the context of Cochrane reviews is either an assumption of ‘no effect of the intervention’ or ‘no differences in the effect of intervention between studies’ (no heterogeneity). Thus, a large p-value for between-study heterogeneity suggests that the observed variation across studies is plausibly due to chance, and therefore, the assumption that studies are estimating the same effect size cannot be held. Conversely, a small p-value indicates a small possibility that the observed variation between studies is due to chance, and therefore, a very small p-value indicates statistically significant heterogeneity across studies. In addition to the p-value, statistical heterogeneity is usually measured using Chi-squared tests or the I2 index (Deeks et al. 2021), which assess whether observed differences in results between studies are compatible with chance alone. A large Chi-squared statistic relative to its degree of freedom (for meta-analysis: the number of studies—1) provides evidence of heterogeneity; an explanation for the degrees of freedom is found in Box 7.A. A standard statistic for determining heterogeneity is the Qtest: k X Q¼ wi ðci bÞ2 ð7:12Þ i¼1
In the case of the fixed-effects model it is assumed that s2 = 0, for which the Q-test can formulated as: 2 k X ci bfixed Q¼ ð7:13Þ r2i i¼1 Using the Q-test, it is possible to quantify inconsistency across studies in a way that assesses its impact on the meta-analysis. A useful statistic for quantifying this inconsistency is the I2 index (Higgins and Thompson, 2002, p. 1545; Higgins et al. 2003, p. 558): I2 ¼ where • df is the degrees of freedom.
Q df Q ðk 1Þ ¼ Q Q
ð7:14Þ
7.7 Determining Between-Study Heterogeneity
271
The I2 index can be interpreted as the proportion of between-study heterogeneity to the total variation (between-study heterogeneity plus sampling error). When the estimated I2 is negative, it is truncated to zero. As a rule of thumb, an I2 index of 25%, 50% and 75% can be considered as indicating low, moderate and high heterogeneity. Since the amount of sampling error depends heavily on the sample sizes used in the primary studies, the I2 index will get larger when sample sizes in the primary studies are larger. In practice, often the main limitation in determining the scale of statistical heterogeneity is the small sample size available for testing. Box 7.A Degree of Freedom for Statistics and Meta-Analysis In statistics, the degree of freedom (often abbreviated as df) indicates the number of independent values that can vary in an analysis without breaking any constraints; see Walker (1940) about origins and an explanation. Typically, the degrees of freedom equal the sample size minus the number of parameters that will be calculated during an analysis; usually, it a positive whole number. In meta-analysis, the effect size is the parameter to be calculated so that the degree of freedom is normally one. A further illustration of the concept using tables follows here. In a 2 2 table, if there are constraints for variables, then only one value needs to be set for the values of other cells to be determined (values in parantheses); in the table below, this is the value that is written in bold. Constraint A
Variable A
Variable B Constraint B
12
(4)
16
(7)
(11)
18
19
15
34
If this table is expanded with a column, then two values need to be set before others are determined; see table below. Constraint A
Variable A 12
8
(4)
24
(7)
10
(10)
27
19
18
14
51
Variable B Constraint B
Thus, each increase in the number of values for variables leads to a higher degree of freedom. In statistics, this is important because the degree of freedom is an indication of randomness, i.e., to what extent chance plays a role.
272
7 Principles of Meta-Analysis
In addition to calculating statistical heterogeneity, it can be observed through visualisations; this is presented in the next subsections where four types of plots are presented to examine between-study heterogeneity. Of particular interest is when confidence intervals for the results of individual studies have poor overlap; a forest plot is usually used to examine this. A further point for which plots are used is identifying outliers, i.e., studies that have considerable deviations from the estimate for the effect size or confidence intervals; the latter can be when the intervals are narrow or large related to sample size (normally, studies with larger samples should have narrower confidence intervals. Thus, the main purpose of visualisations is determining the degree of overlap for confidence intervals between studies and identifying studies that are outliers. When statistical heterogeneity is present, there are four approaches to consider: • Use a random-effects model since the assumptions for the fixed-effects model do not apply. • Perform a cluster or subgroup analysis instead of an aggregate analysis (see the note that follows). This is particularly helpful when the degree of heterogeneity is high. • Carry out analysis of publication bias and a sensitivity analysis to identify causes of heterogeneity (see also Section 7.8). Even plausible sensitivity analyses are prone to post-hoc biases. • Consider whether a meta-analysis should be abandoned. When the degree of heterogeneity is high, this is an option, particularly when studies with a high degree of precision display between-study heterogeneity (it could indicate that studies have used different measures or variables). The actual choice which of these approaches is most appropriate depends on the extent of heterogeneity discovered during the search strategy and the extraction of data. NOTE: DEALING WITH THE MULTIPLICITY PROBLEM Even though advice may imply using multiple methods and analysis for conducting a meta-analysis to increase its rigour, this may lead to what is called the multiplicity problem. This indicates that the more analyses are undertaken using the same data, the more likely it becomes that some tests will find statistically significant effects although there may be no true effects, or that some confidence intervals will not include the true value of the effect sizes (the latter akin to outliers). Bender et al. (2008, p. 858 ff.) state as six potential causes for the problem to occur: multiple outcomes, multiple groups, multiple time points, multiple effect measures, subgroup analyses, and continuous accumulation of data (the latter happens when meta-analyses are updated). They (ibid., pp. 862–3) also provide guidance to avoid the multiplicity problem, including setting a protocol for the systematic review in advance and limiting the number of specific analyses; this corresponds with the advice to conduct a scoping study or review (see Sections 4.5, 5.9 and 6.6). In addition to setting a protocol, Tendal et al. (2011, p. d4829/4) suggest reporting and analysing all data with regard to intervention groups, time points, and measurement scales, even though this might decrease accessibility by a broader
7.7 Determining Between-Study Heterogeneity
273
range of readers and users. Furthermore, specific to effect sizes, López-López et al. (2018, p. 348) propose three stages for addressing multiplicity: • Articulating the review question. This includes specifying definitions of the constructs involved in the relationship(s) of interest and clarifying the extent to which the review takes a convergent approach (in which narrow eligibility criteria are adopted to answer a highly focused question) or a divergent approach (in which wider eligibility criteria are used to bring diversity of effect sizes into the review). • Examining the effect sizes within the set of primary studies to determine whether they can be combined. When different metrics are used, studies should not be combined statistically. Also, sources of multiplicity should be considered for which they provide a tabulation. • Choosing an analysis strategy to handle any multiplicity. A key decision is whether to perform separate meta-analyses to address the multiple review questions or to pool the effect sizes into a single synthesis of a broader review question. Thus, the multiplicity problem can be addressed by considering its potential sources, defining a protocol a priori and setting out an adequate analysis strategy; to improve transparency of systematic reviews with meta-analysis, these considerations should be reported. NOTE HETEROGENEITY MAY LEAD TO CLUSTER OR SUBGROUP ANALYSIS The need to perform subgroup or cluster analysis within a meta-analysis may arise to avoid Simpson’s paradox (see Section 7.2) or address heterogeneity (see also previous note about multiplicity); in the latter case, there could be no meta-analysis at aggregate level but one at the level of specific subgroups or clusters of studies. This is depicted in Figure 7.9. Subgroup analysis can be a result of studies measuring effects in different ways, for example, using different scales, or research methods of primary studies. If for cluster or subgroup analysis heterogeneity persists, qualitative synthesis should be considered instead of meta-analysis. TIP: AVOID ECOLOGICAL FALLACY An ecological fallacy (aka ecological inference fallacy or population fallacy) is a formal fallacy in the interpretation of statistical data that occurs when inferences about the nature of individuals are deduced from inferences about the group to which those individuals belong. An example is given by Kaufmann et al. (2016, p. 159) about the conclusions of the study by Robinson (1950, cited by Kaufmann, ibid.). The aggregated data from the United States in the study concerning the proportion of foreign-born residents and the literacy rate at the state level implied that foreign-born immigrants were less literate than their native-born peers (the ecological fallacy). However, the average correlation between foreign birth and literacy at the individual level was, in fact, positive and much lower in magnitude, suggesting that foreign-born immigrants were, on average, more literate than native citizens. The negative correlation at the state level arose because immigrants tended
274
7 Principles of Meta-Analysis Diversity in studies
Meta-analysis
Conceptual replications
Mixed-methods synthesis
Qualitative synthesis
• Explanatory investigation of heterogeneity • Avoiding Simpson’s paradox
Cluster analysis
Aggregated analysis
Strict replications
Heterogeneity necessitates meta-analysis for clusters or subgroups only
Fig. 7.9 Positioning of cluster and subgroup analysis within replication continuum. As depicted in the figure there are two separate reasons to conduct cluster or subgroup analysis. The first ground for cluster or subgroup analysis is to investigate heterogeneity between studies. Normally, this is associated with the random-effects model, but it can also occur when using the fixed-effects model. Related to this point is that cluster or subgroup analysis will avoid Simpson’s paradox, mentioned in Section 7.2. The second reason to perform cluster or subgroup analysis is that heterogeneity across studies does not allow meta-analysis of fully aggregated data to draw any meaningful inferences. This can be caused by constructs differently defined across studies, data collection being dissimilar and methodologies varying.
to settle in states where the native population was more literate. This indicates that ecological fallacy can be avoided by interpreting data correctly and considering additional inferences to evaluate all causal relationships for a phenomenon.
7.7.3
Forest Plot
Heterogeneity in meta-analysis is often explored with the support of the standard, analytical graph: the forest plot. In this plot studies are represented by their effect estimate, variance and size of the sample; see Figure 7.10. The horizontal axis in the plot represents the direction and size of the effect, with normally a vertical line representing the point of ‘null effect.’ Displaying the direction and size of the estimated effect of each study in combination with sample size and confidence interval, normally set at 95%, allows judging variance in-between studies. Small
7.7 Determining Between-Study Heterogeneity
275
Study A Study B Study C Study D Study E
Aggregated effect Effect size Fig. 7.10 Symbolic representation of forest plot. A forest plot displays all relevant studies with a common measure for an effect. This allows comparing directly the results of individual studies and their confidence intervals. The dotted line is the line that represents the null effect. The squares indicate the effect size of the individual studies. The size of the squares represents the relative sample size; in this case, study A has the largest sample size of all individual studies. The horizontal lines represent the confidence interval of each study. As is normally the case, studies with larger sample sizes have narrower confidence intervals. Study B has a small sample size and a very wide confidence interval (on the left side even beyond what is shown). The diamond at the bottom is the aggregated effect size, with all studies weighted; its width indicates the aggregated confidence interval.
studies will normally show greater random variance and the forest plot will illustrate these pictorially. Thus, the plot is used to examine between-study heterogeneity, and how studies contribute to the overall effect estimate and its confidence interval. An example of using forest plots in a systematic review with meta-analysis is the study by Lloyd et al. (2015) into whether psychological interventions reduce the need to strive for perfectionism by individuals; they undertake the study because perfectionism is associated with psychiatric disorders, hinders treatment and leads to poorer treatment outcomes. The selection process for identifying relevant publication leads to only eight studies being considered for the meta-analysis (ibid, p. 718). These selected studies use multiple scales for assessing perfectionism and specific disorders. After subjecting the eight studies to statistical analysis, forest plots (ibid., p. 723 ff.) are presented to investigate heterogeneity. Some of their plots show outliers and some more homogeneity; an example of the latter is the concern of mistakes, found in Figure 7.11. Among their findings (ibid., pp. 726–7) is that the aggregation of individual studies demonstrated large, pooled effect sizes for change between pre- and post-intervention on concern over mistakes in one of the
276
7 Principles of Meta-Analysis
Fig. 7.11 Forest plot for Frost Multidimensional Perfectionism Scale: Concern over Mistakes subscale with standardised effect sizes for change between pre- and post-intervention. This forest plot (Lloyd et al. 2015, p. 723)* shows that interventions used in the six studies caused a significant change with a narrow confidence interval. The authors (ibid., 2015, p. 726) later conclude based on this plot and other plots that a cognitive behavioural approach with short interventions in adults with perfectionism significantly reduces aspects of perfectionism. * Reprinted with permission of Elsevier.
subscales; this contributes to concluding that this review provides initial evidence that a cognitive behavioural approach may be effective in reducing perfectionism in individuals with a psychiatric diagnosis or elevation of perfectionism. This example shows that forest plots are helpful to examine the homogeneity and heterogeneity in the evidence base, albeit that analysis at subgroups of studies or subsets of data may be necessary, as is the case here.
7.7.4
Funnel Plot
The funnel plot is for the graphical assessment of the effect size against the standard error of each study. It displays the results of studies on the x-axis and the precision on y-axis; see Figure 7.12. Normally, a confidence interval of 95% is indicated and it is expected that studies fall within these intervals depending on their precision indicated by the standard error. The plot should have a ‘funnel’ shape with the most precise effect sizes at the top and least precise effect sizes at the base of the funnel. One point to consider is whether studies are distributed symmetrically around the effect size estimate. In this sense, an ideal funnel plot is one where the included studies have scattered either side of the overall effect line in a symmetrical manner. In addition, it can be used to identify outliers, i.e., studies with effect size outside
7.7 Determining Between-Study Heterogeneity
277
Standard error [study precision]
0 0.1
95% confidence interval
0.2 B Aggregated effect size
0.3 0.4
A
C
0.5 -.5
0 .5 Log (odds ratio) [study result]
1
Fig. 7.12 Symbolic representation of funnel plot. A funnel plot is a scatter plot, in which the studies are positioned based on the effect size they measured (x-axis) and the standard error aka study precision (y-axis). In this figure study A represents a study with low precision; normally this is caused by chance (sampling) and a low number of samples. Study B has a higher precision and is closer to the aggregated effect size. Normally, all studies should fit within the 95% confidence interval (dashed lines).
the confidence interval of 95%; normally, such publications are found at the bottom of the pyramid. However, if there are a considerable number of outliers then the reliability of the pooled effect size and its confidence interval should be doubted as the included studies have employed a too rich diversity of setting and study designs. This effect could be articulated when there is a large span of estimated effect sizes for studies with higher precision outside the confidence interval. A case in point is the funnel plot by Govindan et al. (2020, p. 101,923/10) for their study into supply chain sustainability and performance of firms; their plot has a large number of studies with higher precision outside the confidence interval. This should have prompted an analysis, possibly qualitative, why this extent of variety occurred at higher levels of precision; a plausible reason could be that the variance originates in the effect being measured in different ways or even being defined through inconsistent constructs and variables. Alternatively, they could have looked at cluster or subgroup analysis to find indications beyond normal expectations. Thus, the funnel plot indicates how individual studies are positioned within a confidence interval of the estimated overall effect size and can trigger investigations into why outliers occur. An example of using the funnel plot is the study by Verhaeghen (2003) on the relationship between aging and vocabulary scores. From the 210 publications data has been extracted and subjected to meta-analysis. Its funnel plot is found in
278
7 Principles of Meta-Analysis
Fig. 7.13 Funnel plot of effect sizes on aging and vocabulary scores. This funnel plot in Verhaeghen (2003, p. 334)* displays studies from the journal ‘Psychology and Aging’ that (a) reported a measure of vocabulary and (b) examined a sample of younger adults (average age older than 18 and younger than 30); the period of publication covered 1986–2001. The distribution of studies in funnel plot is very regular, that is, there are no gaps or asymmetries, with the exception of a clear outlier with a reported effect size larger than 7. * Reproduced with permission of the American Psychological Association.
Figure 7.13; in this case, the y-axis displays the number of participants in each study, a variable substituting the standard error as variable for study precision. The plot shows a normal distribution with only one outlier. As he (ibid., p. 333) states the shape is very regular, with no gaps or asymmetries, except for the outlier study. This example shows that the funnel plot supports finding studies outside confidence intervals and the distribution of studies in terms of their precision.
7.7.5
L’Abbé Plot
The L’Abbé plot is used to display data in a meta-analysis of studies that compare outcomes of an experiment and a control intervention. It is a scatter plot in which each point represents a study, with the vertical axis measuring the event rates in the treatment group and the horizontal axis the event rates in the control group (L’Abbé et al., 1987, p. 227 ff.); see Figure 7.14. L’Abbé plots can be considered for visualisation and analysis whenever each study contributes two statistically
7.7 Determining Between-Study Heterogeneity
279
Proportional effect intervention
100%
75% Line of equality
50%
Study A
Study C
25%
0
Study B
25%
50%
75%
100%
Proportional effect control
Fig. 7.14 Symbolic representation of L’Abbé plot. In a L’Abbé plot studies are positioned on two dimensions: event rates in the control group and event rates in the treatment group. Study A represents a study where the treatment had more effect on events when compared with the control group. For Study B this is reversed, meaning the treatment had a negative effect (comparatively). Study C is found on the line of equality, where the treatment has no effect, not better or worse, when compared with the control group.
independent pieces of information that might be expected to be inherently correlated across studies, such as treatment and control group outcomes or sensitivity and specificity (Anzures-Cabrera & Higgins, 2010, p. 73). Some meta-analyses plot points at a size proportional to weight or trial size (preferably weight) as additional information. In some cases, the use of the axes can be reversed or defined differently, so care should be taken when interpreting this visualisation. L’Abbé plots are particularly useful for studies comparing two groups and to this purpose they are informative. An example of using the L’Abbé plot is the meta-analysis by Shah et al. (2007) into the use of echinacea for the prevention and treatment of the common cold. The analysis that echinacea is beneficial in the prevention and reduces the duration of the common cold is supported by forest plots, funnel plots and a L’Abbé plot; the latter is used for comparing the incidence, see Figure 7.15. They (ibid., p. 478) claim that the results of the meta-analysis show that echinacea reduces the incidence by 58% as well as the duration of the common cold by one to four days. This example shows that the L’Abbé plot can or should be used in conjunction with other plots to determine between-study heterogeneity.
280
7 Principles of Meta-Analysis
Fig. 7.15 L’Abbé plot for the effect of echinacea on incidence of common cold. This plot (Shah et al. 2007, p. 477)* shows that included studies generally agreed on echinacea’s positive effect for the incidence of the common cold, but not the magnitude of the benefit. The authors contrast this L’Abbé plot with the Q-test statistic that indicated significant heterogeneity; therefore, this plot was informative towards the conclusions of the study. * Reprinted with permission of Elsevier.
7.7.6
(Galbraith) Radial Plot
The so-called radial plot is seen as an alternative and complement to the forest plot; it also goes under the name Galbraith plot or Galbraith radial plot. This plot supports ‘to judge visually which subsets of the estimates are consistent with each other or with some theoretical value’ (Galbraith, 1988, pp. 892–3). The original version is a radial plot in which the vertical scale (y-scale) represents the standardised log odds ratio and the horizontal scale (x-scale) the relative error; see Figure 7.16. The relative error is calculated using the inverse of the standard error. Each study is presented in the plot by a dot using these two scales. This means that the higher the number of observations in a study and the narrower the standard distribution of observations, the more reliable a particular study is with regard to determining the effect size. In Figure 7.16 this means that more informative studies with smaller standard errors are located further away from the origin. Moreover, the extent of heterogeneity across studies can be assessed by how individual studies scatter about a straight line through the origin. A circular scale can be drawn to measure the estimates of effect size on a log scale. Often in these plots the radial scale is not displayed, but they use the same principle for visualisation. Two
Standardised estimate
7.7 Determining Between-Study Heterogeneity
281
Approx. 95% confidence interval 2
Effect size (log scale)
C
0 -2
A
B
Relative standard error 50% 20% 10%
0
5
Precision
10
7%
15
Fig. 7.16 Symbolic representation of radial plot. The radial plot assesses the extent of heterogeneity between studies in a meta-analysis. The y-axis shows the (log-transformed) effect size divided by its standard error (z-score) and the inverse of the standard error on the x-axis. Each study is represented by a single dot and a regression line runs centrally through the plot. Parallel to the regression line, at a 2-standard-deviation distance, 2 lines create a confidence interval in which most dots would be expected to fall if the studies were estimating a single fixed parameter.
advantages of using radial plots (Anzures-Cabrera & Higgins, 2010, p. 72) are that it is relatively easy to detect outliers and it can display more comfortably a larger number of studies than forest plots. Thus, using these radial plots it is easier to explain heterogeneity, identify studies that have smaller standard errors and pinpoint outliers, though its different appearance and interpretation may need explanation. An example using these radial plots is the study by Göritz (2006) into using incentives for completion of web surveys. Her study specifically focused on a meta-analysis of how different types of incentives influence consumers responding to and completing surveys. She uses odds ratios to determine whether potential participants engage with invitations for web surveys depended on incentives offered. One of the outcomes, a radial plot, is depicted in Figure 7.17. A second meta-analysis in her study considers if incentives lead to completion of web surveys. For both investigations—the effect of incentives on responses and retention for web surveys—the use of a radial plot complemented that of a forest plot. One of her findings (ibid., p. 65) is that despite the established effects on both response and retention in absolute terms the difference incentives make are small;
282
7 Principles of Meta-Analysis
Fig. 7.17 Radial plot for the effect of material incentives on responses to web surveys. This radial plot (Göritz, 2006, p. 62)* shows the heterogeneity of a meta-analysis of 31 studies that reported responses to web surveys by investigating the effect of material responses. There are two studies that appear as outliers. The Q-test statistic for the 31 studies is 76 (the corresponding I2 index is 59%, i.e. moderate to high, not reported in the study but calculated here) and for 29 studies excluding the outliers 23 (I2 index: 0%, truncated). Many of the studies have low statistical power due to smaller sample sizes, partly caused by experiments that were included in the analysis. * Reproduced with permission of the author and the editors of the journal.
she also states that it took the meta-analyses to reveal such, as most individual studies were underpowered to detect the effects of incentives. NOTE: CHOOSING GRAPHS FOR VISUALISATION Some studies have looked into the suitability of specific graphs for detecting betweem-study heterogeneity and publication bias. For example, Bax et al. (2008, p. 254) find that the interpretation of graphs requires care, because reproducibility and validity depend heavily on the type of graph and the construct it is meant to visualise. Consequently, they state that a study using meta-analysis should be selective in the graphs they choose for the exploration of their data. Also, Anzures-Cabrera and Higgins (2010) discuss pros and cons of specific visualisations. Whereas forest plots and funnel plots are standard techniques for examining heterogeneity, complementary visualisations should be considered.
7.8
Publication Bias and Sensitivity Analysis
Publication bias (sometimes called the file-drawer problem or file-drawer effect) can be defined as the tendency to publish studies with significant results. The phenomenon is believed to occur because, compared with neutral or negative
7.8 Publication Bias and Sensitivity Analysis
283
studies, those with a ‘positive’ result are more likely to be written up, presented at conferences, submitted for publication, accepted for publication and cited by other publications. This appears to occur independently of the quality and conduct of the study.7 In terms of evidence base, having access to these neutral or negative studies may lead to more robust findings. This is also associated with the so-called grey literature, see Section 5.7, although the latter encompasses more publications and works than just neutral or negative studies. Where most would relate publication bias to healthcare, medicine, nursing and related disciplines, Franco et al. (2014) demonstrate that this is also the case for the social sciences, with the study of McDaniel et al. (2006) bringing similar points to the fore. With publication bias being omnipresent, Dalton et al. (2016, p. 812) plea for the need to undertake an assessment of publication bias, while they also call it ‘the elephant in the review.’ Also, in Section 5.7 reference was made to studies that asserted different views on the impact of grey literature on outcomes of systematic reviews. Notwithstanding these views, publication bias is a concern when conducting a meta-analysis. It is necessary to consider whether publication bias occurs and to what extent to ensure that the results, findings, conclusions and recommendations from a systematic review are as sound as possible. The main challenge for systematic reviews is that while the power of an analysis can be improved by meta-analysis (i.e., reducing random error), there is no protection against a systematic publication bias. For this reason, high-quality systematic reviews should have three main features: • Adequate searches for literature. Authors should make strenuous efforts to identify all relevant studies by searching grey literature, conference abstracts, commercial databases, etc.; see Sections 5.4, 5.5, 5.6 and 5.7 for guidance. • Testing for publication bias. Authors should test for the presence of publication bias. This section pays attention to some techniques and visualisations that can be used. • Sensitivity analyses. This is often the only effective response to the threat of publication bias. It aims at estimating its potential impact if it has occurred. This was also discussed in the previous section. These features if conducted properly will also increase the trustworthiness of results, conjectures, findings and recommendations in a systematic review.
7.8.1
Assessing Publication Bias
In the course of time, further methods and visualisations have been proposed to assess publication bias in meta-analysis beyond the plots mentioned in the previous section:
7
See Dickersin (1990. pp. 1385–1386) for some historical notes with regard to publication bias.
284
7 Principles of Meta-Analysis
• Tolerance for null results (aka fail-safe N method). Rosenthal (1979) suggests determining the tolerance for null results by calculating how many studies with null results are needed to increase the probability of a Type I error (mistaken rejection of the null hypothesis). The higher this number, the less likely it is that (unreported) studies in the ‘file drawer’ may lead to different insight. • Capture-recapture. Originally found in epidemiological studies (e.g., Hook & Regal, 1995), this method can also be used to estimate how many studies are likely missing from a systematic review (Bennett et al. 2004, pp. 351–2). It is similar to determining saturation for search strategies discussed in Section 5.6. A limitation is that it only estimates how many studies are missing (Poorolajal et al. 2010, p. 112), but does not state the impact of those studies on results, conjectures and findings with regard to the effect size and confidence interval. • Funnel plot. This plot, see the previous section for a description, is commonly used to illustrate publication bias. It plots the distribution of studies in relation to their power and effect size, and should highlight if small negative studies are under-represented. Moreover, if the distribution around the effect size is asymmetrical, then publication bias is indicated. Although designed to show publication bias these plots can show other forms of bias as well such as selective reporting of outcomes. Further methods have been developed to arrive at more robust conclusions using the funnel plot: • Copas and Shi method. This approach to assessing publication bias (Copas & Shi, 2000, 2001) investigates whether there is a trend in the funnel plot. This trend in the funnel plot is modelled using a range of probabilities of selection and the size of the treatment effect is examined for different rates of study identification. This allows the robustness of the conclusions arising from the systematic review to be assessed. The model is based on identifying a population of studies that have been conducted in the area of interest. The aim is to learn about the average value of the effect size across the population of studies. This method assumes that the studies included in the systematic review have been identified by a systematic search strategy, but they are a selection of studies from the total number of studies available. • Egger regression. This test developed by Egger, Smith, Schneider and Minder (1997) involves fitting a linear regression model with a standardised effect size as the dependent variable and the effect size’s precision as the independent variable. This is algebraically identical to a weighted regression of the effect size on its standard error where the weights are proportional to the inverse of the variance of the effect size. If there is no evidence of publication bias, then the results are found to be insignificant. • Funnel plot regression. This approach, suggested by Macaskill et al. (2001, pp. 644–5), fits a regression with the effect size as the dependent variable, and study size as the independent variable. The observations are weighted by the inverse variance of the estimate for the effect size to allow for possible non-constant variation. This means that variations may vary across the spectrum of studies, which is not well-captured by linear regression.
7.8 Publication Bias and Sensitivity Analysis
285
• Trim-and-fill method. This method (Duval and Tweedie, 2000) is also based on asymmetry in the funnel plot, which is used as an indicator of publication bias. The trim-and-fill method assumes that there are observed studies included in a meta-analysis, and there are additional relevant number of studies, which are not observed due to publication bias. The steps for the method are: (1) ‘trim’ (remove) smaller studies causing funnel plot asymmetry, (2) use the trimmed funnel plot to estimate the true ‘centre’ of the funnel, and (3) replace the omitted studies and their missing ‘counterparts’ around the centre (filling). In addition to providing an estimate of the number of missing studies, an adjusted intervention effect is derived by performing a meta-analysis including the filled studies. Funnel plots, mathematical regression tests and the capture-recapture may suffer from lack of statistical power and can rarely be used in systematic reviews with less than ten studies. For this reason, it may be useful to consider multiple methods, including sensitivity analysis, for assessing publication bias. However, using methods for publication bias could also lead to introducing artificially this bias. For example, Terrin et al. (2003, p. 2121) when looking at the trim-and-fill method find that adjusting for publication bias spuriously adjusted the estimate of the effect when the studies were heterogeneous, if (i) the variability among studies caused some precisely estimated studies to have effects far from the global mean, simply due to chance, or (ii) an inverse relationship between treatment efficacy and sample size was introduced by the studies’ a priori power calculations. As they state, in the second case, the trim-and-fill method performs extremely poorly, and in both cases, performance is worse with larger random-effects variances. The results suggest that the funnel plot may be inappropriate for heterogeneous meta-analyses, because its application may lead to filling in the part of the plot where trimming and filling imputes studies. This study implies that heterogeneity requires some qualitative interpretation of measures and methods in studies to assert that no adjustments are made for publication bias when not appropriate.
7.8.2
Sensitivity Analysis
In addition to using plots to detect publication bias, a more detailed sensitivity analysis could assist detecting confounding variables and their influence. For example, Elvik (2005, p. 222) suggests that the following elements should be part of a sensitivity analysis: • Publication bias. • Choice of estimator of effect (if there is a choice). • Presence of outlier bias. • Statistical weighting of studies included in a meta-analysis. • Assessment of study quality. He also asserts that these issues should be conducted in this specific order to build a rationale for the analysis. The aim is to appraise the consistency of the direction of
286
7 Principles of Meta-Analysis
the estimates for effect size, the magnitude of the effect size and the precision. Another approach is put forward by Egger, Smith and Phillips (1997, p. 1536): • Using two or more mathematical models for statistical sensitivity analysis. In this spirit, Veroniki et al. (2019, p. 37) recommend a sensitivity analysis using a at least two to three methods to assess the robustness of findings and conclusions, especially in a meta-analysis with fewer than ten studies. • Assessment of methodological quality. • Evaluating presence of publication bias by stratifying the analysis by study size. • Analyse impact of studies that were halted earlier (these may contain incomplete information). In addition to these approaches, there are also complementary statistical approaches to sensitivity analysis. Instances are Chootrakool et al. (2011) and Copas (2013). These should be selected with care because they often focus on specific primary studies and methods for meta-analysis.
7.9
Assessing Quality of Meta-Analysis
The quality of a systematic review using meta-analysis largely reflects its ability to mitigate bias. This can be addressed through several points: • Develop a protocol. Its purpose is to state in advance how the meta-analysis will be carried out, following the steps in Figures 3.5 and 7.4. These protocols can be registered on databases such as PROSPERO or the Cochrane Library; see also Section 14.3. Often the format PRISMA (Preferred Reporting Items for Systematic reviews and Meta-Analyses) serves as guideline for setting out protocols; see Section 13.4. • Search for publications and studies. The search strategy for the systematic review should ensure extensive searching to identify literature that is often missed, e.g., non-English language, conference proceedings and grey literature; see also Sections 5.4, 5.5 and 5.6 for descriptions of effective search strategies and enhancements. • Data collection. The selection of studies and extraction of data should be done by two independent reviewers using standardised data collection forms to minimise potential bias (double-data extraction); see Section 7.3. • Data analysis. The analysis of primary and secondary outcomes should be carried out as outlined in the protocol with assessing heterogeneity and reflection on whether meta-analysis is appropriate. • Interpretation of results. The interpretation of the results should consider the potential impact of the choice of outcome measures and analysis methods plus the presence of heterogeneity, missing data and publication bias. Sensitivity analyses can often allow exploration of these factors. Cluster or subgroup analyses should be pre-defined, limited in number, analysed appropriately and interpreted with great care.
7.9 Assessing Quality of Meta-Analysis
287
Where there is advantage by incorporating these points into the protocol for a systematic review with meta-analysis, selection bias remains one of the most difficult points to overcome; studies with negative and null outcomes are often not published or poorly reported. Thus, access to studies and data often forms the barrier in conjunction with how the review questions were formulated; the latter also leads the inclusion and exclusion criteria that might prohibit specific relevant studies to be found, as discussed in Sections 6.2 and 6.3. Another point of attention here is that using criteria for assessing the quality of studies and recommendations resulting from the review could result in a broader range of study designs to be considered; see Section 6.4. All this means that performing a systematic review adequately and reflecting on its conduct form the key to a high-quality study using meta-analysis. This is also reflected in checklists for systematic reviews with meta-analysis. For example, Russo (2007, p. 638) presents a checklist for systematic reviews, which follows largely the points raised in the previous paragraph. Other checklists for meta-analysis, such as Nakagawa et al. (2017) for biology—specifically non-human species and ecosystems—, Philibert et al. (2012) for agronomy, and Pigott and Polanin (2020) in general, raise similar considerations for high-quality systematic reviews using meta-analysis. An example of using such criteria for systematic reviews using meta-analysis is the review of previous systematic reviews that used meta-analysis by Uttl et al. (2017) into the relationship between student evaluation of teaching and faculty’s teaching effectiveness. They (ibid., pp. 38–9) find that both are not related, mainly based on deficiencies in the previous meta-analyses they considered, and thus contradicting the findings of those preceding reviews. Among the shortcomings of previous systematic reviews using meta-analysis are inadequate search strategies, incorrect extraction of data from studies, not accounting for small size study effects and not investigating heterogeneity in-depth. Making sure that these points and other issues for systematic reviews with meta-analysis are properly addressed can be supported by consulting guidance and checklists. These topics can also be related to research paradigms, particularly positivism and perhaps to a lesser extent post-positivism; see Section 3.3. This means that criteria related to these paradigms—construct validity, internal validity, external validity, reliability and objectivity—take a central role for determining the quality of a systematic review with meta-analysis. Except for construct validity, these points are found in this chapter; construct validity can be enhanced by using theories and modelling for forming review questions as presented in Section 4.4. In addition, attention should be paid to adequate reporting to improve transparency how criteria related to the research paradigm are met, see Chapter 13. This means that further guidance on how best to conduct a systematic review with meta-analysis can also be drawn from the application of criteria related to the research paradigms positivism and post-positivism. Notwithstanding the guidance for creating a high-quality systematic review with meta-analysis, the topic of a review should provide insight beyond existing
288
7 Principles of Meta-Analysis
scholarly knowledge found in individual publications and preceding literature reviews, including meta-analyses. In this sense, Guzzo et al. (1987, pp. 432–3) provide three examples on topics related to organisational behaviour where a meta-analysis did not yield different insight than already existent in narrative reviews; this could also have been interpreted as a ‘null result’ for a systematic review (thus, avoiding publication bias for reviews). Therefore, in addition to seeking guidance for creating high-quality systematic reviews with meta-analysis, authors should reflect on the contribution to knowledge, or alternatively, clarify how their contribution is positioned in extant scholarly knowledge, even in the case of null or negative results from a review. TIP: MAKE STATISTICAL INTERPRETATIONS ACCESSIBLE TO POLICY MAKERS, PRACTITIONERS AND USERS Systematic reviews with meta-analysis often have implications for practice, so it is necessary to pay attention to clarify statistical interpretations so that they can be understood by policymakers, practitioners and other users. In this regard, Mavros et al. (2013, p. e47229/4) find in their experiment that even corresponding authors of articles indexed by PubMed often have difficulties interpreting statistical outcomes of meta-analysis correctly. Thus, being clear about results and their meaning, conjectures, findings and recommendations is of utmost importance for the utility of systematic reviews with meta-analysis, not only for policymakers and practitioners but also to other readers to avoid any other unintended interpretations.
7.10
Key Points
• Meta-analysis has become a very popular and effective step in the systematic review armoury. Some of its key advantages are: • Structured process. The stepwise approach (see Figure 7.18, and Section 3.2) aims at improving reproducibility and minimising bias. • Quantifying uncertainty. Improved statistical power by amalgamating studies is used to quantify the direction and size of the effect size, and the uncertainty around the aggregated estimate. • Finding relationships across studies. By including large numbers of studies, meta-analysis can help explore relationships across studies to better understand the phenomenon being tested. • Avoiding over-interpretation. A key skill for undertaking meta-analysis is understanding the limitations of the data and methods for analysis. At its most extreme, considerations on these points can result in concluding that a meta-analysis is not appropriate because of heterogeneity or potential bias. • Large volume of included studies possible. Using statistical analysis enables to investigate large sets of studies. This contributes to more robust results, conjectures, findings and recommendations.
7.10
Key Points
289 Context for Meta-Analysis
Developing Review Questions
Keywords and Databases
Inclusion and Exclusion Criteria
Meta-Analysis
Retrieval and Selection of Studies
• Where possible supported by models or theories • Considering PIO format and variants
• Search filters (sensitivity and specificity) • At least two databases • Grey literature considered • Saturation
Extraction and Coding of (Quantitative) Data
• Double-data extraction
Calculating Effect Size and Confidence Int.
• Selection appropriate mathematical model and method • Method for confidence interval
Determining BetweenStudy Heterogeneity
• Calculating I2 • Forest plot
Assessing Publication Bias
Performing Sensitivity Analysis
Synthesis of Findings
• (A)symmetry of funnel plot • Identifying outliers with funnel plot or other plots • Enhancement of analysis by using complementary methods • Exploring confounding variables to explain heterogeneity and publication bias
• GRADE (or alternatives) for recommendations
Fig. 7.18 Expanded process for systematic reviews with meta-analysis. The process for systematic reviews with meta-analysis is an expansion Figures 3.5 and 7.4, and shows additional information for achieving rigour. The development of a review question should be supported by using models or theoretical insight. Also, using the format PIO (population-intervention-outcome) or its variants will contribute to clarifying the review question. Adequate search strategies should be developed for the protocol, including how to determine saturation. Retrieved studies should be subjected to double-data extraction to improve the quality of the data set. The selection of the appropriate model (fixed-effects, random-effects and mixed-effects model) should be explained as well as the selection of an appropriate method. Furthermore, heterogeneity should be explored by calculating the I2 index and visualisations (forest plot, funnel plot, and others if appropriate). A final step before the findings and recommendations is performing a sensitivity analysis.
290
7 Principles of Meta-Analysis
• Some of the potential drawbacks of meta-analysis include: • Efforts and resource allocation. A large, well-conducted meta-analysis is a major undertaking for several researchers. Normally, data are extracted from the same study by two independent reviewers. • Focus on quantifiable variables. Inevitably meta-analysis is limited by the outcome data available. • Protocol-driven rather than interpretive. The rigour required to minimise bias means that meta-analyses can be rather rigid in their interpretation. Also, systematic reviews with meta-analysis tend to take a positivist or post-positivist approach, which emphasises internal validity, reproducibility and rigour (see also Section 3.3). • Selection biases. All forms of information synthesis are subject to selection biases (especially publication bias). Because of the power and apparent precision of some meta-analyses, these biases can become more evident when heterogeneity is investigated. • The focus on quantifiable results and the application of mathematical models implies that studies selected for a meta-analysis find themselves at the end of the replication continuum where studies are replications from each other. If the degree of replication decreases, then either cluster or subgroup analysis or mixed-methods synthesis or qualitative synthesis should be considered. • The selection of an appropriate mathematical model for the meta-analysis depends on the nature of the (quantitative) data and the sampling errors. Thus, when studies are replication or close to replications of each other, then the fixed-effects model is appropriate. When there is more influence of sampling errors, then the random-effects model should be applied. But it can also be there are multiple factors to be considered; in such cases meta-regression is fitting better. Also, data could be available as differences between groups of objects and subjects, in which case the weighted or standardised mean difference can be used as measure in the mathematical model. For dichotomous data, the relative risk or odds ratio are the preferred approaches. • In case of multiple outcomes or variables, mixed-effects models and meta-regression are more suitable approaches to meta-analysis. • For most meta-analysis the number of studies may be limited, for which there are also specific methods with regard to determining the direction and estimate of the effect size. Particularly, the DerSimonian-Laird method may be less suitable, implying that either extensions of it should be used or other methods. • After determining the effect size estimate and its confidence interval, heterogeneity should be investigated. In addition to the Q-test statistic and I2 index, graphical representations are used, such as the forest plot, funnel plot, L’Abbé plot and radial plot (aka Galbraith plot). • A particular point of interest for heterogeneity is finding out the extent of publication bias. This aims it finding outliers and whether small studies with the chance of more articulated effect sizes influence the effect size estimate and its confidence interval. Methods include establishing the fail-safe N method,
7.10
Key Points
291
examining the funnel plot, the method for capture-recapture, and the trim-and-fill method (the latter also based on the funnel plot). For examining the funnel plot, more advanced methods are available, among them are the Copas and Shi method, Egger regression and funnel plot regression. • Also, a sensitivity analysis is normally performed to explore confounding variables to explain between-study heterogeneity. • Setting out a protocol a priori for a systematic review with meta-analysis avoids the multiplicity problem and the related data dredging.
7.11
How to …?
7.11.1 … Choose the Most Appropriate Statistical Method for Meta-Analysis The choice for a particular statistical method for meta-analysis to estimate the direction and size of the effect estimate depends on the data and variety across studies. If data are available from studies and these investigations have been conducted in similar manner than the data can be pooled for statistical analysis to determine an estimate of the effect size and variances. However, this is limitedly the case, and therefore, meta-analysis uses summary data from studies in almost all systematic reviews of this type. With regard to variety of studies, the use of the fixed-effects model is most appropriate when there is no evidence of between-study heterogeneity, based on statistical testing, and systematic reviews with meta-analysis resort to the random-effects model, otherwise. In situations, where there are more effects to be analysed, mixed-effects models or meta-regression are the preferred mathematical models. It could also be that extracted summary data of studies are available as difference between groups, for which the weighted or standardised mean difference is commonly used, or as dichotomous data, in which case the risk ratio or odds ratio are measures for use in appropriate mathematical models. The selection of appropriate models is also related to how confidence intervals are calculated. Thus, the nature of the summary data—variety across studies, sampling errors—and the related review question determine the most appropriate mathematical model for the meta-analysis in a systematic review. In addition to determining the direction and size of the effect estimate, variety is considered through specific lenses in systematic reviews with meta-analysis. The first is establishing between-study heterogeneity by calculating the I2 index. It is normally followed by analysis of visualisations, which include the forest plot, funnel plot, L’Abbé plot and radial (Galbraith) plot. Their purpose is to pinpoint whether the set of studies is well-distributed on either side of the effect size estimate, identify outliers and how studies with imprecision influence the effect size estimate and its confidence interval. There are enhancements of the funnel plot, such as the Copas and Shi method, Egger regression and funnel plot regression,
292
7 Principles of Meta-Analysis
aiming at investigating publication bias, in particular. Also, other methods are available, with the fail-safe N method, examining the funnel plot, method for capture-recapture, and trim-and-fill method (the latter also based on the funnel plot) among them. Sensitivity analysis is used to identify the impact of the choice for specific methods and identifying confounding variables. Thus, the meta-analysis is complemented with methods to explain heterogeneity, identify outliers, assess publication bias and pinpoint confounding variables.
7.11.2 … Write A Literature Review Writing a systematic review with meta-analysis requires adequate reporting, particularly with regard to the statistical exercises undertaken. The reporting includes justification of the mathematical models for estimating the direction and size of the effect, the rationale for mathematical models that estimate the confidence interval, the selection of approaches to determining heterogeneity, selection bias and publication bias, and the conduct of a sensitivity analysis. These aspects of the meta-analysis are related and this should be reflected in the rationale and the steps of the analysis. Normally, these points are also addressed in a protocol that is defined a priori. A high-quality systematic review with meta-analysis also includes how the review question was developed, which search strategy was set out and how the quality of evidence relates to conjectures, findings and conclusions.
References Aguinis H, Pierce CA, Bosco FA, Dalton DR, Dalton CM (2011) Debunking myths and urban legends about meta-analysis. Organ Res Methods 14(2):306–331. https://doi.org/10.1177/ 1094428110375720 Allen M, Preiss R (1993) Replication and meta-analysis: a necessary connection. J Soc Behav Pers 8(6):9–20 Animasaun IL, Ibraheem RO, Mahanthesh B, Babatunde HA (2019) A meta-analysis on the effects of haphazard motion of tiny/nano-sized particles on the dynamics and other physical properties of some fluids. Chin J Phys 60:676–687. https://doi.org/10.1016/j.cjph.2019.06.007 Anzures-Cabrera J, Higgins JPT (2010) Graphical displays for meta-analysis: an overview with suggestions for practice. Res Synth Methods 1(1):66–80. https://doi.org/10.1002/jrsm.6 Bakbergenuly I, Hoaglin DC, Kulinskaya E (2019) Pitfalls of using the risk ratio in meta-analysis. Res Synth Methods 10(3):398–419. https://doi.org/10.1002/jrsm.1347 Bakbergenuly I, Kulinskaya E (2017) Beta-binomial model for meta-analysis of odds ratios. Stat Med 36(11):1715–1734. https://doi.org/10.1002/sim.7233 Baker WL, Michael White C,Cappelleri JC, Kluger J, Coleman CI, From the Health Outcomes P, Group EC (2009) Understanding heterogeneity in meta‐analysis: the role of meta‐regression. Int J Clin Pract 63(10):1426–1434. https://doi.org/10.1111/j.1742-1241.2009.02168.x
References
293
Bax L, Ikeda N, Fukui N, Yaju Y, Tsuruta H, Moons KGM (2008) More than numbers: the power of graphs in meta-analysis. Am J Epidemiol 169(2):249–255. https://doi.org/10.1093/aje/ kwn340 Bax L, Yu L-M, Ikeda N, Moons KGM (2007) A systematic comparison of software dedicated to meta-analysis of causal studies. BMC Med Res Methodol 7(1):40. https://doi.org/10.1186/ 1471-2288-7-40 Bender R, Bunce C, Clarke M, Gates S, Lange S, Pace NL, Thorlund K (2008) Attention should be given to multiplicity issues in systematic reviews. J Clin Epidemiol 61(9):857–865. https://doi. org/10.1016/j.jclinepi.2008.03.004 Bennett DA, Latham NK, Stretton C, Anderson CS (2004) Capture-recapture is a potentially useful method for assessing publication bias. J Clin Epidemiol 57(4):349–357. https://doi.org/10. 1016/j.jclinepi.2003.09.015 Biggerstaff BJ, Tweedie RL (1997) Incorporating variability in estimates of heterogeneity in the random effects model in meta-analysis. Stat Med 16(7):753–768. https://doi.org/10.1002/ (SICI)1097-0258(19970415)16:73.0.CO;2-G Blyth CR (1972) On Simpson’s paradox and the sure-thing principle. J Am Stat Assoc 67(338):364–366. https://doi.org/10.1080/01621459.1972.10482387 Borenstein M, Hedges LV, Higgins JPT, Rothstein HR (2010) A basic introduction to fixed-effect and random-effects models for meta-analysis. Res Synth Methods 1(2):97–111. https://doi.org/ 10.1002/jrsm.12 Bravata DM, Olkin I (2001) Simple pooling versus combining in meta-analysis. Eval Health Prof 24(2):218–230. https://doi.org/10.1177/01632780122034885 Brown SA, Upchurch SL, Acton GJ (2003) A framework for developing a coding scheme for meta-analysis. West J Nurs Res 25(2):205–222. https://doi.org/10.1177/0193945902250038 Cheung MW-L, Ho RCM, Lim Y, Mak A (2012) Conducting a meta-analysis: basics and good practices. Int J Rheum Dis 15(2):129–135. https://doi.org/10.1111/j.1756-185X.2012.01712.x Chiolero A, Santschi V, Burnand B, Platt RW, Paradis G (2012) Meta-analyses: with confidence or prediction intervals? Eur J Epidemiol 27(10):823–825. https://doi.org/10.1007/s10654-0129738-y Chootrakool H, Shi JQ, Yue R (2011) Meta-analysis and sensitivity analysis for multi-arm trials with selection bias. Stat Med 30(11):1183–1198. https://doi.org/10.1002/sim.4143 Chow SL (1987) Meta-analysis of pragmatic and theoretical research: a critique. J Psychol 121(3):259–271. https://doi.org/10.1080/00223980.1987.9712666 Copas JB (2013) A likelihood-based sensitivity analysis for publication bias in meta-analysis. J Roy Stat Soc Ser C (Appl Stat) 62(1):47–66. https://doi.org/10.1111/j.1467-9876.2012. 01049.x Copas J, Shi JQ (2000) Meta-analysis, funnel plots and sensitivity analysis. Biostatistics 1(3):247– 262. https://doi.org/10.1093/biostatistics/1.3.247 Copas JB, Shi JQ (2001) A sensitivity analysis for publication bias in systematic reviews. Stat Methods Med Res 10(4):251–265. https://doi.org/10.1177/096228020101000402 Cortoni F, Babchishin KM, Rat C (2017) The proportion of sexual offenders who are female is higher than thought: a meta-analysis. Crim Justice Behav 44(2):145–162. https://doi.org/10. 1177/0093854816658923 Dalton JE, Bolen SD, Mascha EJ (2016) Publication bias: the elephant in the review. Anesth Analg 123(4):812–813. https://doi.org/10.1213/ane.0000000000001596 De Wolff MS, van Ijzendoorn MH (1997) Sensitivity and attachment: a meta-analysis on parental antecedents of infant attachment. Child Dev 68(4):571–591. https://doi.org/10.1111/j.14678624.1997.tb04218.x Deeks JJ, Higgins JPT, Altman DG (2021) Analysing data and undertaking meta-analyses. In: Higgins JP, Thomas J, Chandler J, Cumpston M, Li T, Page MJ, Welch VA (eds) Cochrane
294
7 Principles of Meta-Analysis
handbook for systematic reviews of interventions (Version 6.2 ed): cochrane. https://training. cochrane.org/handbook/current/chapter-10 DerSimonian R, Laird N (1986) Meta-analysis in clinical trials. Control Clin Trials 7(3):177–188. https://doi.org/10.1016/0197-2456(86)90046-2 Dickersin K (1990) The existence of publication bias and risk factors for its occurrence. JAMA 263(10):1385–1389. https://doi.org/10.1001/jama.1990.03440100097014 Doucouliagos H, Ulubaşoğlu MA (2008) Democracy and economic growth: a meta-analysis. Am J Polit Sci 52(1):61–83. https://doi.org/10.1111/j.1540-5907.2007.00299.x Duval S, Tweedie R (2000) Trim and fill: a simple funnel-plot–based method of testing and adjusting for publication bias in meta-analysis. Biometrics 56(2):455–463. https://doi.org/10. 1111/j.0006-341X.2000.00455.x Egger M, Smith GD, Phillips AN (1997) Meta-analysis: principles and procedures. BMJ 315 (7121):1533–1537. https://doi.org/10.1136/bmj.315.7121.1533 Egger M, Smith GD, Schneider M, Minder C (1997) Bias in meta-analysis detected by a simple, graphical test. BMJ 315(7109):629–634. https://doi.org/10.1136/bmj.315.7109.629 Elvik R (2005) Can we trust the results of meta-analyses?: a systematic approach to sensitivity analysis in meta-analyses. Transp Res Rec 1908(1):221–229. https://doi.org/10.1177/ 0361198105190800127 Ewing R, Cervero R (2010) Travel and the built environment. J Am Plan Assoc 76(3):265–294. https://doi.org/10.1080/01944361003766766 Franco A, Malhotra N, Simonovits G (2014) Publication bias in the social sciences: unlocking the file drawer. Science 345(6203):1502–1505. https://doi.org/10.1126/science.1255484 Galbraith RF (1988) A note on graphical presentation of estimated odds ratios from several clinical trials. Stat Med 7(8):889–894. https://doi.org/10.1002/sim.4780070807 Glass GV (1976) Primary, secondary, and meta-analysis of research. Educ Res 5(10):3–8. https:// doi.org/10.3102/0013189X005010003 Göritz AS (2006) Incentives in web studies: methodological issues and a review. Int J Internet Sci 1(1):58–70 Gøtzsche PC, Hróbjartsson A, Marić K, Tendal B (2007) Data extraction errors in meta-analyses that use standardized mean differences. JAMA 298(4):430–437. https://doi.org/10.1001/jama. 298.4.430 Govindan K, Rajeev A, Padhi SS, Pati RK (2020) Supply chain sustainability and performance of firms: a meta-analysis of the literature. Transp Res Part E Logist Transp Rev 137:101923. https://doi.org/10.1016/j.tre.2020.101923 Guzzo RA, Jackson SE, Katzell RA (1987) Meta-analysis analysis. Res Organ Behav 9:407–442 Hartung J, Knapp G (2001) A refined method for the meta-analysis of controlled clinical trials with binary outcome. Stat Med 20(24):3875–3889. https://doi.org/10.1002/sim.1009 Hendrick C (1990) Replications, strict replications, and conceptual replications: are they important? J Soc Behav Personal 5(4):41–49 Higgins JPT, Thompson SG (2002) Quantifying heterogeneity in a meta-analysis. Stat Med 21(11):1539–1558. https://doi.org/10.1002/sim.1186 Higgins JPT, Thompson SG, Deeks JJ, Altman DG (2003) Measuring inconsistency in meta-analyses. BMJ 327(7414):557–560. https://doi.org/10.1136/bmj.327.7414.557 Hoobler JM, Masterson CR, Nkomo SM, Michel EJ (2018) The business case for women leaders: meta-analysis, research critique, and path forward. J Manag 44(6):2473–2499. https://doi.org/ 10.1177/0149206316628643 Hook EB, Regal RR (1995) Capture-recapture methods in epidemiology: methods and limitations. Epidemiol Rev 17(2):243–264. https://doi.org/10.1093/oxfordjournals.epirev.a036192 Howard GS, Maxwell SE (1980) Correlation between student satisfaction and grades: a case of mistaken causation? J Educ Psychol 72(6):810–820. https://doi.org/10.1037/0022-0663.72.6. 810 IntHout J, Ioannidis JPA, Borm GF (2014) The Hartung-Knapp-Sidik-Jonkman method for random effects meta-analysis is straightforward and considerably outperforms the standard
References
295
DerSimonian-Laird method. BMC Med Res Methodol 14(1):25. https://doi.org/10.1186/14712288-14-25 Itani O, Jike M, Watanabe N, Kaneita Y (2017) Short sleep duration and health outcomes: a systematic review, meta-analysis, and meta-regression. Sleep Med 32:246–256. https://doi.org/ 10.1016/j.sleep.2016.08.006 Kaufmann E, Reips U-D, Maag Merki K (2016) Avoiding methodological biases in meta-analysis. Zeitschrift Für Psychologie 224(3):157–167. https://doi.org/10.1027/2151-2604/a000251 Kim KH (2005) Can only intelligent people be creative? A meta-analysis. J Second Gift Educ 16(2–3):57–66. https://doi.org/10.4219/jsge-2005-473 Kontopantelis E, Reeves D (2012) Performance of statistical methods for meta-analysis when true study effects are non-normally distributed: a simulation study. Stat Methods Med Res 21(4):409–426. https://doi.org/10.1177/0962280210392008 Kontopantelis E, Reeves D (2012) Performance of statistical methods for meta-analysis when true study effects are non-normally distributed: a comparison between DerSimonian–Laird and restricted maximum likelihood. Stat Methods Med Res 21(6):657–659. https://doi.org/10.1177/ 0962280211413451 L’Abbé KA, Detsky AS, O’Rourke K (1987) Meta-analysis in clinical research. Ann Intern Med 107(2):224–233. https://doi.org/10.7326/0003-4819-107-2-224 Lajeunesse MJ (2016) Facilitating systematic reviews, data extraction and meta-analysis with the metagear package for R. Methods Ecol Evol 7(3):323–330. https://doi.org/10.1111/2041210X.12472 Lakens D (2013) Calculating and reporting effect sizes to facilitate cumulative science: a practical primer for t-tests and ANOVAs. Front Psychol 4(863). https://doi.org/10.3389/fpsyg.2013. 00863 Lipsey MW, Wilson DB (1993) The efficacy of psychological, educational, and behavioral treatment: confirmation from meta-analysis. Am Psychol 48(12):1181–1209. https://doi.org/10. 1037/0003-066X.48.12.1181 Lloyd S, Schmidt U, Khondoker M, Tchanturia K (2015) Can psychological interventions reduce perfectionism? A systematic review and meta-analysis. Behav Cogn Psychother 43(6):705– 731. https://doi.org/10.1017/S1352465814000162 Lopes JSS, Machado AF, Cavina AP, Kirsch Michelletti J, Castilho de Almeida A, Pastre CM (2019) Specific interventions for prevention of muscle injury in lower limbs: systematic review and meta-analysis. Fisioterapia Movimento 32:e003224. https://doi.org/10.1590/1980-5918. 032.AO24 López-López JA, Page MJ, Lipsey MW, Higgins JPT (2018) Dealing with effect size multiplicity in systematic reviews and meta-analyses. Res Synth Methods 9(3):336–351. https://doi.org/10. 1002/jrsm.1310 Macaskill P, Walter SD, Irwig L (2001) A comparison of methods to detect publication bias in meta-analysis. Stat Med 20(4):641–654. https://doi.org/10.1002/sim.698 Mathes T, Kuss O (2018) A comparison of methods for meta-analysis of a small number of studies with binary outcomes. Res Synth Methods 9(3):366–381. https://doi.org/10.1002/jrsm.1296 Mantel N, Haenszel W (1959) Statistical aspects of the analysis of data from retrospective studies of disease. JNCI J Nat Cancer Inst 22(4):719–748. https://doi.org/10.1093/jnci/22.4.719 Mavros MN, Alexiou VG, Vardakas KZ, Falagas ME (2013) Understanding of statistical terms routinely used in meta-analyses: an international survey among researchers. PLoS One 8(1): e47229. https://doi.org/10.1371/journal.pone.0047229 McDaniel MA, Rothstein HR, Whetzel DL (2006) Publication bias: a case study of four test vendors. Pers Psychol 59(4):927–953. https://doi.org/10.1111/j.1744-6570.2006.00059.x McKenzie JE, Beller EM, Forbes AB (2016) Introduction to systematic reviews and meta-analysis. Respirology 21(4):626–637. https://doi.org/10.1111/resp.12783 McShane BB, Böckenholt U (2017) Single-paper meta-analysis: benefits for study summary, theory testing, and replicability. J Consum Res 43(6):1048–1063. https://doi.org/10.1093/jcr/ ucw085
296
7 Principles of Meta-Analysis
Munn Z, Tufanaru C, Aromataris E (2014) JBI’s systematic reviews: data extraction and synthesis. AJN Am J Nurs 114(7):49–54. https://doi.org/10.1097/01.Naj.0000451683.66447.89 Nakagawa S, Noble DWA, Senior AM, Lagisz M (2017) Meta-evaluation of meta-analysis: ten appraisal questions for biologists. BMC Biol 15(1):18. https://doi.org/10.1186/s12915-0170357-7 Neyeloff JL, Fuchs SC, Moreira LB (2012) Meta-analyses and forest plots using a microsoft excel spreadsheet: step-by-step guide focusing on descriptive data analysis. BMC Res Notes 5(1):52. https://doi.org/10.1186/1756-0500-5-52 O’Keefe DJ, Hale SL (2001) An odds-ratio-based meta-analysis of research on the door-in-the-face influence strategy. Commun Rep 14(1):31–38. https://doi.org/10.1080/ 08934210109367734 Pastor DA, Lazowski RA (2018) On the multilevel nature of meta-analysis: a tutorial, comparison of software programs, and discussion of analytic choices. Multivar Behav Res 53(1):74–89. https://doi.org/10.1080/00273171.2017.1365684 Pearson K, Lee A, Bramley-Moore L (1899) VI. Mathematical contributions to the theory of evolution–VI. Genetic (Reproductive) selection: inheritance of fertility in man, and of fecundity in thoroughbred racehorses. Philos Trans R Soc Lond 192:257–330. https://doi.org/10.1098/ rsta.1899.0006 Pedder H, Sarri G, Keeney E, Nunes V, Dias S (2016) Data extraction for complex meta-analysis (DECiMAL) guide. Syst Rev 5(1):212. https://doi.org/10.1186/s13643-016-0368-4 Philibert A, Loyce C, Makowski D (2012) Assessment of the quality of meta-analysis in agronomy. Agr Ecosyst Environ 148:72–82. https://doi.org/10.1016/j.agee.2011.12.003 Pigott TD, Polanin JR (2020) Methodological guidance paper: high-quality meta-analysis in a systematic review. Rev Educ Res 90(1):24–46. https://doi.org/10.3102/0034654319877153 Polák P (2017) The productivity paradox: a meta-analysis. Inf Econ Policy 38:38–54. https://doi. org/10.1016/j.infoecopol.2016.11.003 Poorolajal J, Haghdoost AA, Mahmoodi M, Majdzadeh R, Nasseri-Moghaddam S, Fotouhi A (2010) Capture-recapture method for assessing publication bias. J Res Med Sci 15(2):107–115 Rice K, Higgins JPT, Lumley T (2018) A re-evaluation of fixed effect(s) meta-analysis. J R Stat Soc A Stat Soc 181(1):205–227. https://doi.org/10.1111/rssa.12275 Rosenthal R (1979) The “File Drawer Problem” and tolerance for null results. Psychol Bull 86(3):638–641. https://doi.org/10.1037/0033-2909.86.3.638 Russo MW (2007) How to review a meta-analysis. Gastroenterol Hepatol 3(8):637–642 Schmid EJ, Koch GG, LaVange LM (1991) An overview of statistical issues and methods of meta-analysis. J Biopharm Stat 1(1):103–120. https://doi.org/10.1080/10543409108835008 Schmidt FL (2017) Statistical and measurement pitfalls in the use of meta-regression in meta-analysis. Career Dev Int 22(5):469–476. https://doi.org/10.1108/CDI-08-2017-0136 Schmidt FL, Oh I-S, Hayes TL (2009) Fixed-versus random-effects models in meta-analysis: model properties and an empirical comparison of differences in results. Br J Math Stat Psychol 62(1):97–128. https://doi.org/10.1348/000711007X255327 Sera F, Armstrong B, Blangiardo M, Gasparrini A (2019) An extended mixed-effects framework for meta-analysis. Stat Med 38(29):5429–5444. https://doi.org/10.1002/sim.8362 Shah SA, Sander S, White CM, Rinaldi M, Coleman CI (2007) Evaluation of echinacea for the prevention and treatment of the common cold: a meta-analysis. Lancet Infect Dis 7(7):473– 480. https://doi.org/10.1016/S1473-3099(07)70160-3 Sidik K, Jonkman JN (2006) Robust variance estimation for random effects meta-analysis. Comput Stat Data Anal 50(12):3681–3701. https://doi.org/10.1016/j.csda.2005.07.019 Song F, Sheldon TA, Sutton AJ, Abrams KR, Jones DR (2001) Methods for exploring heterogeneity in meta-analysis. Eval Health Prof 24(2):126–151. https://doi.org/10.1177/ 016327870102400203 Simpson EH (1951) The interpretation of interaction in contingency tables. J Roy Stat Soc Ser B (Methodol) 13(2):238–241. https://doi.org/10.1111/j.2517-6161.1951.tb00088.x
References
297
Stanley TD (2001) Wheat from chaff: meta-analysis as quantitative literature review. J Econ Perspect 15(3):131–150. https://doi.org/10.1257/jep.15.3.131 Stanley TD, Doucouliagos H (2015) Neither fixed nor random: weighted least squares meta-analysis. Stat Med 34(13):2116–2127. https://doi.org/10.1002/sim.6481 Stanley TD, Doucouliagos H, Giles M, Heckemeyer JH, Johnston RJ, Laroche P, Nelson JP, Paldam M, Poot J, Pugh G, Rosenberger RS, Rost K (2013) Meta-analysis of economics research reporting guidelines. J Econ Surv 27(2):390–394. https://doi.org/10.1111/joes.12008 Stanley TD, Jarrell SB (2005) Meta-regression analysis: a quantitative method of literature surveys. J Econ Surv 19(3):299–308. https://doi.org/10.1111/j.0950-0804.2005.00249.x Sutton AJ, Abrams KR (2001) Bayesian methods in meta-analysis and evidence synthesis. Stat Methods Med Res 10(4):277–303. https://doi.org/10.1177/096228020101000404 Sutton AJ, Higgins JPT (2008) Recent developments in meta-analysis. Stat Med 27(5):625–650. https://doi.org/10.1002/sim.2934 Suurmond R, van Rhee H, Hak T (2017) Introduction, comparison, and validation of Meta-Essentials: a free and simple tool for meta-analysis. Res Synth Methods 8(4):537–553. https://doi.org/10.1002/jrsm.1260 Takeshima N, Sozu T, Tajika A, Ogawa Y, Hayasaka Y, Furukawa TA (2014) Which is more generalizable, powerful and interpretable in meta-analyses, mean difference or standardized mean difference? BMC Med Res Methodol 14(1):30. https://doi.org/10.1186/1471-2288-14-30 Tang S-H, Hall VC (1995) The overjustification effect: a meta-analysis. Appl Cogn Psychol 9(5):365–404. https://doi.org/10.1002/acp.2350090502 Tendal B, Nüesch E, Higgins JPT, Jüni P, Gøtzsche PC (2011) Multiplicity of data in trial reports and the reliability of meta-analyses: empirical study. BMJ 343:d4829. https://doi.org/10.1136/ bmj.d4829 Terrin N, Schmid CH, Lau J, Olkin I (2003) Adjusting for publication bias in the presence of heterogeneity. Stat Med 22(13):2113–2126. https://doi.org/10.1002/sim.1461 Thompson SG, Higgins JPT (2002) How should meta-regression analyses be undertaken and interpreted? Stat Med 21(11):1559–1573. https://doi.org/10.1002/sim.1187 Tipton E, Pustejovsky JE, Ahmadi H (2019) A history of meta-regression: technical, conceptual, and practical developments between 1974 and 2018. Res Synth Methods 10(2):161–179. https://doi.org/10.1002/jrsm.1338 Uttl B, White CA, Gonzalez DW (2017) Meta-analysis of faculty’s teaching effectiveness: student evaluation of teaching ratings and student learning are not related. Stud Educ Eval 54:22–42. https://doi.org/10.1016/j.stueduc.2016.08.007 van Houwelingen HC, Arends LR, Stijnen T (2002) Advanced methods in meta-analysis: multivariate approach and meta-regression. Stat Med 21(4):589–624. https://doi.org/10.1002/ sim.1040 Verhaeghen P (2003) Aging and vocabulary score: a meta-analysis. Psychol Aging 18(2):332– 339. https://doi.org/10.1037/0882-7974.18.2.332 Veroniki AA, Jackson D, Bender R, Kuss O, Langan D, Higgins JPT, Knapp G, Salanti G (2019) Methods to calculate uncertainty in the estimated overall effect size from a random-effects meta-analysis. Res Synth Methods 10(1):23–43. https://doi.org/10.1002/jrsm.1319 Viechtbauer W (2007) Confidence intervals for the amount of heterogeneity in meta-analysis. Stat Med 26(1):37–52. https://doi.org/10.1002/sim.2514 Walker HM (1940) Degrees of freedom. J Educ Psychol 31(4):253–269. https://doi.org/10.1037/ h0054588 Wanous JP, Sullivan SE, Malinak J (1989) The role of judgment calls in meta-analysis. J Appl Psychol 74(2):259–264. https://doi.org/10.1037/0021-9010.74.2.259 Woodward ND, Purdon SE, Meltzer HY, Zald DH (2005) A meta-analysis of neuropsychological change to clozapine, olanzapine, quetiapine, and risperidone in schizophrenia. Int J Neuropsychopharmacol 8(3):457–472. https://doi.org/10.1017/s146114570500516x
298
7 Principles of Meta-Analysis
Yule GU (1903) Notes on the Theory of Association of Attributes in Statistics. Biometrika 2(2):121–134. https://doi.org/10.2307/2331677 Yusuf S, Peto R, Lewis J, Collins R, Sleight P (1985) Beta blockade during and after myocardial infarction: an overview of the randomized trials. Prog Cardiovasc Dis 27(5):335–371. https:// doi.org/10.1016/S0033-0620(85)80003-7 Zeng Y, Luo T, Xie H, Huang M, Cheng ASK (2014) Health benefits of qigong or tai chi for cancer patients: a systematic review and meta-analyses. Complement Ther Med 22(1):173– 186. https://doi.org/10.1016/j.ctim.2013.11.010
Chapter 8
Meta-Analysis in Action: The Cochrane Collaboration
Systematic reviews (and meta-analyses) can be seen as rather technical endeavours that are carried out within academic disciplines. However, in the area of healthcare, systematic reviews have become a crucial component to support practical decision-making and the delivery of high-quality evidence-based healthcare. Over the last two decades, evidence-based practice (the explicit use of research evidence to inform clinical decision-making) has become established as the standard paradigm within the healthcare systems of many developed countries (Sackett et al. 1997). In routine practice, most healthcare decisions concern the effects of interventions (treatments) and for this reason most of this chapter will focus on systematic reviews of healthcare interventions (treatments). Nevertheless, it is important to recognise that the principle of evidence-based healthcare can be applied to any type of healthcare question such as what is the best diagnostic test, what is the best advice to give about recovery, what is the best strategy for disease screening?
8.1
Background
When particularly interested in the effects of a treatment then there is ample evidence that the most reliable primary clinical research methodology is the randomised control trial (Higgins et al. 2019). Therefore, randomised trials have become one of the most common methodologies in healthcare research and it has been estimated that over one and a half a million randomised trials are now published in medicine and this figure is growing every year (The Cochrane Collaboration, n.d.). However, when faced with a decision about the best treatment for a particular condition it is still unusual to be able to find a single randomised trial that can provide a reliable and relevant answer. The answer therefore is to systematically review all the relevant available evidence to inform clinical decision-making. However, even when this has been done there may remain © Springer Nature Switzerland AG 2022 R. Dekkers et al., Making Literature Reviews Work: A Multidisciplinary Guide to Systematic Approaches, https://doi.org/10.1007/978-3-030-90025-0_8
299
300
8 Meta-Analysis in Action: The Cochrane Collaboration
Fig. 8.1 Clinical research and evidence-based practice. The figure illustrates the flow of information from primary research to healthcare decisions. The thick arrows illustrate the most common pathway, while the dashed arrows denote less common routes for evidence-based practice in healthcare.
Primary research studies (clinical trials)
Interpretation of research (e.g. Cochrane systematic reviews)
Decision support in local context (clinical practice guidelines)
Healthcare decision making (patients, clinicians, managers)
important uncertainties about the relevance and applicability of the research evidence to a particular setting or country. For this reason, clinical practice guidelines are commonly developed to consider all the available evidence within a local context and provide practical guidance relevant to that setting. Figure 8.1 illustrates this principle where primary research (often randomised trials) are interpreted through secondary research (systematic reviews) which then form the building blocks of decision support initiatives such as clinical practice guidelines. Ideally, any remaining uncertainties identified at each stage in the process will then inform the future agenda about what primary research and systematic reviews need to be carried out.
8.2
Cochrane Collaboration
In 1972 Archie Cochrane, a British epidemiologist, pointed out that the medical profession was very poor at basing clinical decisions on the best research evidence (evidence-based practice). He pointed out ‘it is surely a great criticism of our profession that we have not organised a critical summary, by specialty or sub-specialty, adapted periodically, of all relevant randomised control trials’ (Cochrane 1979, p. 11). At the same time, he criticised obstetrics and gynaecology for being the medical specialty making the least effort to seek and use good evidence. This criticism prompted Ian Chalmers, a British obstetrician, and colleagues to develop and prepare a database of all the randomised trials relevant to care during pregnancy and childbirth (Chalmers 1991). The success of this endeavour lead to the establishment of The Cochrane Collaboration, an international not-for-profit initiative that aims to help people make well informed
8.2 Cochrane Collaboration
301
decisions about healthcare by preparing, maintaining, and promoting the accessibility of systematic reviews of the effects of healthcare interventions. Cochrane systematic reviews are prepared through a range of small editorial groups (called Cochrane Review Groups) that focus on specific areas of healthcare. These groups use a standard review format and methodology to support volunteer review authors to produce high quality reviews, that are published in the electronic journal The Cochrane Library (www.thecochranelibrary.com).
8.3
Cochrane Reviews
So what is a Cochrane Review? In general, these are systematic reviews, which aim to meet the needs of healthcare decision-makers (clinicians, planners, users of healthcare) to be able to access high quality, relevant, and up-to-date information. Most Cochrane Reviews address questions about the effects of treatments (intervention reviews) although some also address questions about the accuracy of diagnostic tests (diagnostic test accuracy reviews). These reviews use rigorous methods to reduce bias including the use of pre-specified research questions, standard review methodologies and by ensuring the conclusions are based on reliable research. In addition, considerable efforts are made to ensure that people who might have to make decisions based on a Cochrane Review (consumers and stakeholders) have an important input into their development. It is important that Cochrane Reviews should address answerable questions that fill important gaps in knowledge and considerable time may be required in developing an optimal review question. In more recent years, priority setting exercises involving different relevant stakeholders have increasingly been used to help develop the most useful questions. Figure 8.2 outlines the standard approach to a Cochrane review.
8.3.1
Question Choice
The question to be addressed by the review is an important determinant of how the review is developed. Decisions have to be made on whether the topic should be narrow and focussed or broader and more comprehensive. Once the review question has been clarified, this then leads to defining the scope of the review using the standard format PICO (population-interventions-comparisons-outcomes, see also Section 4.4): • Population needs to be defined in terms of the people with the disease or condition under study. • The intervention and comparison components define which treatment will be considered eligible for the review and with what it will be compared.
302
8 Meta-Analysis in Action: The Cochrane Collaboration
Fig. 8.2 Process of a Cochrane systematic review. PICO = Population, Interventions, Comparisons, Outcomes. Developed from the Cochrane Handbook for Systematic Reviews of Interventions (Higgins et al. 2019).
Clinical Question (Does this treatment work?)
Review Question (PICO framework)
Review protocol (Defining how review will be done)
Literature Search (Identifying relevant studies)
Selection and Appraisal (Unbiased selection of included studies)
Data Synthesis (Including meta-analysis if appropriate)
Conclusions (Supported by previous steps)
• The outcomes section defines in advance the measures of disease or recovery that will be considered relevant to judging whether the treatment is effective and safe. See Section 4.4 for more detail on formulating review questions using the format population-intervention-outcome and its variants.
8.3.2
Identifying Relevant Studies
Having defined the relevant studies of interest the next step is to search for and select them in a rigorous and unbiased manner. At this stage of the review authors will usually work closely with a healthcare librarian or information specialist and will use a broad range of search strategies to cover electronic health registers (such as the Cochrane Central Register of Controlled Trials, MEDLINE, Embase) and specialised registers from the relevant Cochrane Review Groups. In addition, registers of ongoing trials will also be scrutinised for relevant information. The identified references are then screened, and two independent reviewers should decide on eligibility and extract relevant information. The reason for duplicating this process is to reduce the risk of bias of an individual reviewer. The same principle is used at the next stage when relevant information (including numerical
8.3 Cochrane Reviews
303
data) is extracted from the identified studies. Decisions on trial eligibility and data extraction use standard report forms to ensure consistency of data recording and to allow cross-comparison between different reviewers.
8.3.3
Analysing Results
The review authors should have defined in advance how they plan to handle outcome information from the identified studies. Broadly speaking two types of data tend to be encountered: • Dichotomous data (where participants can be categorised into an either/or outcome, e.g., alive or dead, with or without disease recurrence), and • Continuous data (where results are expressed and analysed on a continuous scale, e.g., blood pressure, disease rating score). Dichotomous data tend to be analysed using methods such as the odds ratio, risk ratio, or risk difference and expressed with a 95% confidence interval. Continuous outcomes can be compared using statistics such as the weighted mean difference or standardised mean difference (Higgins et al. 2019) and also expressed with a 95% confidence interval.
8.3.4
Potential Sources of Bias
A key principle of a systematic review is to minimise the effects of bias. This can be a bias that arises from within the individual studies or from the systematic review process. The internal validity of studies included in a Cochrane Review is assessed using the Cochrane risk of bias tool the key components of which are outlined in Figure 8.3. These components include: 1. Generating a truly random patient selection (to reduce allocation bias). 2. Reducing confounding by other treatments. 3. Minimising bias through loss of patients to follow-up (to reduce attrition bias). 4. Measuring outcomes in an unbiased manner (to avoid measurement bias). Using this approach, the review authors can try to identify individual trials that are likely to be more or less reliable in their findings.
8.3.5
Biases in the Systematic Review Process
One of the main potential sources of bias in the systematic review process is publication bias; the selective publication of reports which have positive results, are published in English, and are more easily located in literature searches. This is the
304
8 Meta-Analysis in Action: The Cochrane Collaboration
Source of bias
Trial
Proposed solution
Participants Allocation bias (unbalanced groups)
Rigorous (concealed) randomisation process
Control
Treatment Additional interventions given
Confounding
Attrition bias
Measurement bias
Ensure all participants accounted for at end of trial
Drop out
Outcome
Ensure only difference is treatment being studied
Outcome
Ensure outcomes are collected in unbiased way
Fig. 8.3 Sources of bias in clinical trials.
main reason why great efforts are taken to identify relevant studies through thorough literature searching. It is also possible to look for evidence of publication bias using a funnel plot or similar tool (see below).
8.3.6
Analysing Data (Including Meta-Analysis)
All Cochrane systematic reviews should include some form of synthesis of the information identified. In many (probably most) cases this will involve a meta-analysis which is the statistical combination of results from two or more separate studies. However, it is important to recognise that a systematic review does not have to include a meta-analysis if that is deemed inappropriate. This is important because meta-analyses can produce misleading results and careful consideration needs to be given to the appropriateness of that step. The key principles in a meta-analysis are to provide an average of the effects across the different studies (weighted by the amount of information each study has provided) and to indicate the uncertainty around that result. This is only valid if the studies are sufficiently similar to warrant combining them. The next consideration is the degree of variation between studies (heterogeneity) and if there is extensive variation between studies (i.e. they are very dissimilar) then meta-analysis may not be appropriate. Figure 8.4 shows an example of a typical Cochrane meta-analysis.
8.3 Cochrane Reviews
Study or Subgroup Parsons 1988 Gyamfi-Bannerman 2016 (1) Porto 2011 (2) Block 1977 Kari 1994 Dexiprom 1999 Doran 1980 Schutte 1980 Taeusch 1979 Garite 1992 Gamsu 1989 Amorim 1999 Qublan 2001 Collaborative 1981 Liggins 1972b
305
Corticosteroids Events Total 0 2 1 4 5 4 5 6 10 12 15 24 21 47 108
Control Events Total 1 0 3 6 6 10 14 12 12 12 22 36 41 47 122
Risk Ratio Weight M-H, Random, 95% CI 0.5% 0.5% 0.9% 2.8% 3.1% 3.2% 4.2% 4.6% 6.1% 7.4% 8.3% 12.1% 13.2% 14.0% 19.1%
0.32 [0.01 , 7.45] 4.91 [0.24 , 102.09] 0.30 [0.03 , 2.88] 0.60 [0.18 , 2.01] 0.82 [0.26 , 2.61] 0.39 [0.13 , 1.21] 0.28 [0.11 , 0.73] 0.45 [0.18 , 1.11] 1.06 [0.49 , 2.27] 1.14 [0.59 , 2.21] 0.71 [0.39 , 1.31] 0.65 [0.42 , 1.02] 0.48 [0.32 , 0.72] 1.00 [0.69 , 1.46] 0.91 [0.72 , 1.15]
Total (95% CI) 3384 3345 100.0% Total events: 264 344 Heterogeneity: Tau² = 0.05; Chi² = 21.30, df = 14 (P = 0.09); I² = 34% Test for overall effect: Z = 2.99 (P = 0.003) Test for subgroup differences: Not applicable
0.72 [0.58 , 0.89]
23 1427 144 60 95 105 81 65 56 36 131 110 72 378 601
22 1400 131 54 94 103 63 58 71 41 137 108 67 379 617
Risk Ratio M-H, Random, 95% CI
0.1 0.2 0.5 Favours corticosteroids
1
2 5 10 Favours control
Footnotes (1) One due to septic shock and one to cardiac anomaly and arrhythmia. (2) The events are 1 stillbirth in each arm, and 2 neonatal deaths due to severe perinatal asphyxia.
Fig. 8.4 Effect of antenatal corticosteroids given to women at risk of preterm birth on the risk of perinatal death. The figure shows a systematic review of the effect of antenatal corticosteroids given to women at risk of premature birth on the risk of perinatal death (Roberts et al. 2017)*. The graph shows (from left to right) the studies included, the risk of perinatal death (n/N) if treated with corticosteroid or control treatment, the risk ratio (and 95% confidence interval), and the weight (contribution) from each study. The risk of perinatal death is reduced to 0.72 (0.58–0.89) in women who received antenatal corticosteroids. *Reproduced with permission from the authors.
8.3.7
Alternatives to Meta-Analysis
Reviewers are often faced with a situation where meta-analysis is not possible because either outcome information is incomplete or it is inappropriate to numerically combine studies. Alternative synthesis methods usually involve some form of tabulation and visual display. However, it is important to note that all methods of synthesis are subject to some form of limitation. In situations where trials are missing (for instance due to publication bias) this can often be demonstrated using tools such as the funnel plot (see Figure 8.5).
8.3.8
Summary of Findings
A key part of the synthesis of evidence in a Cochrane review is the development of a summary of findings table. This provides key information about the size of impact of a treatment and the degree of confidence which can be placed in that estimate of
306
8 Meta-Analysis in Action: The Cochrane Collaboration
Fig. 8.5 Example of a funnel plot showing evidence of publication bias. This figure* plots the result (odds ratio) against the precision (inverse of variance) of individual trials of low dose aspirin to prevent pre-eclampsia. The largest trial (CLASP) is believed to be closest to the true estimate (odds ratio of 0.9) and the distribution of trials appears to be missing small negative and neutral trials (odds ratio > = 1). *Adapted from Egger et al. (1997, p. 632).
impact. In particular, it brings into consideration the risk of bias in the included studies and the degree to which the gathered evidence is clear, consistent and reliable. The Cochrane Collaboration has adopted the GRADE approach (Grading of Recommendations, Assessment, Development and Evaluation) to assess the certainty (or quality) of a body of evidence (Higgins et al. 2019); see Section 6.4. With this tool, the certainty of evidence begins as being ‘high’ but is downgraded according to the risk of bias within the included studies, any inconsistency or indirectness between studies, and any consideration of publication bias. Therefore, a recommendation may begin as ‘high’, but can be downgraded to moderate, low, or very low (Higgins et al. 2019). See Table 8.1 for an example.
8.3.9
Drawing Conclusions
The final stage in the Cochrane review process is to interpret the results of the data synthesis and communicate the conclusions of the review in an honest and transparent manner. As a general principle, authors are encouraged to present results with their confidence interval and where possible to translate them into
549 per 1000 (519 to 567)
609 per 1000
Mean length of stay across control groups ranged from 12.1 to 123 days
Mean length of stay for intervention groups was, on average, 4.3 days less (7.9 days less to 0.7 days more)
SMD 0.16 lower (0.33 lower to 0.01 higher)
N/A
OR 0.76 (0.66 to 0.88) OR 0.75 (0.66 to 0.85)
Relative effect (95% CI)
4162 (19)
843 (3)
4854 (24)
5902 (29)
Number of participants (studies)
Lowa,b
Very low
Moderate
Moderate
Quality of evidence (GRADE)
Downgraded for: (a) potential risk of performance bias (b)unexplained heterogeneity (c)imprecision Downgraded for: a) potential risk of performance bias and, b) unexplained heterogeneity
Downgraded for potential risk of performance bias
Downgraded for potential risk of performance bias
Comments (reasons for downgrading)
GRADE Working Group grades of evidence High quality: further research is very unlikely to change our confidence in the estimate of effect Moderate quality: further research is likely to have an important impact on our confidence in the estimate of effect and may change the estimate Low quality: further research is very likely to have an important impact on our confidence in the estimate of effect and is likely to change the estimate Very low quality: we are very uncertain about the estimate
Length of stay (days) in a hospital orinstitution
199 per 1000 (179 to 209)
219 per 1000
Death by the end of scheduled follow‐up Death or dependency by end of scheduled follow‐up Subjective health status score
There was a pattern of improved results among stroke unit survivors, with results attaining statistical significance in 2 individual trials
Illustrative comparative risks (95% CI) Assumed risk Corresponding risk Alternative service Organised inpatient (stroke unit) care
Outcomes
Table 8.1 Example of GRADE for quality of evidence. The assumed risk is the mean control group risk across studies. The corresponding risk is based on the assumed risk in the comparison group and the relative effect of the intervention. CI: confidence interval; N/A: not applicable; OR: odds ratio; SMD: standardised mean difference.
8.3 Cochrane Reviews 307
308
8 Meta-Analysis in Action: The Cochrane Collaboration
meaningful figures such as the number needed to treat (number of people with a condition who have to be treated to produce one improved outcome). Cochrane review authors are discouraged from making direct recommendations about healthcare decisions. The main reason for this is that a healthcare decision that is appropriate in one setting (for example a wealthy country where a drug is relatively affordable) may not be an appropriate choice in another setting (for example a poor healthcare setting where cheaper alternatives may be more appropriate). However, the review authors are encouraged to describe the certainty of the evidence and the balance of potential benefits and harms. Review authors should also describe the implications for future research such as whether more trials are needed and what questions should these trials address.
8.4
Example of a Cochrane Review
By mid-2019, the Cochrane Library contained over 8,000 Cochrane reviews, 2,400 protocols (reviews in preparation) and references to over 1.5 million trials (The Cochrane Collaboration, n.d.). Providing an individual example is one way of illustrating the nature of these reviews.
8.4.1
Organised in Patient (Stroke Unit) Care
One early Cochrane review of a complex treatment topic is the stroke unit review (Stroke Unit Trialists’ Collaboration 2013). The background to this topic is that in the past people who had experienced a stroke and were admitted to hospital would usually be distributed through general wards without any specialist staff input. Some pioneers had proposed that the quality of patient care (and patient outcomes) could be improved by concentrating care in a single ward with a multi-disciplinary team of nurses, doctors, and therapists who were interested in stroke and would work in a coordinated manner. This proposal was controversial and by the early 1990s several small clinical trials had been carried out. However, there was no clear consensus as to whether the potential benefits would justify the reorganisation of services. In 1993, a small systematic review of ten clinical trials (Langhorne et al. 1993, p. 396) suggested that patients who were managed with the stroke unit model of care had a lower risk of dying in the first year after their stroke. This then led to a prolonged systematic review initiative with several objectives. In particular to: A. Provide a clear description of stroke unit care (the intervention was not well described). B. Provide a reliable analysis of patient outcomes (across a range of outcome measures).
8.4 Example of a Cochrane Review
309
C. Carry out sub-group analysis (to explore the impact in different patient and service categories). D. Update the results within the Cochrane Library. A collaborative group of trialists (The Stroke Unit Trialists’ Collaboration) coordinated the collection of descriptive information and numerical data and have since provided five updates of this Cochrane review. The most recent version (Stroke Unit Trialists’ Collaboration 2013) includes information from 29 clinical trials (5,902 participants, see Table 8.1) and includes the following: 1. A description of stroke unit care—the review incorporates a detailed description of service characteristics within stroke units and general wards. This includes different models of stroke unit care. 2. An analysis of the risk of bias in the included studies. 3. An analysis of the impact of stroke unit care across all patient groups. This showed that stroke patients who were admitted to a stroke unit were more likely to survive, return home, and regain independence than those managed in general wards. 4. A sensitivity analysis by trial characteristics—this explored the impact of removing less reliable trials from the analysis and showed that the conclusions were unchanged by trial characteristics. 5. A subgroup analysis by patient characteristics. This showed that the apparent benefits were seen in both younger and older patients, men and women, and across a range of stroke severity and stroke type. 6. Subgroup analysis by service characteristics—this included a more novel approach called network meta-analysis (see Section 9.1) and demonstrated that having stroke unit care focussed in a specialist ward appeared to have the greatest benefit. By bringing together a large number of relatively small trials this review provided robust evidence for the benefit of focussing the care of stroke patients within specialist multi-disciplinary stroke wards. The impact of this review has been substantial and it is now referenced in most major clinical practice guidelines and have become standard healthcare policy in most wealthy countries. Current initiatives are exploring how to translate these benefits to allow a wider implementation in low- and middle-income countries.
8.5
Advantages of Cochrane Reviews
Systematic reviews and meta-analysis have become a standard approach to synthesising healthcare information. The majority of systematic reviews are published in conventional paper-based journals and this raises the question of whether there is any particular advantage in accessing Cochrane reviews to support healthcare decision-making. Several independent studies over a number of years indicate that Cochrane reviews, on average, carry some particular advantages.
310
8 Meta-Analysis in Action: The Cochrane Collaboration
Firstly, they provide a standard structure and format that focusses on clinical problems (www.cochranelibrary.com/cdsr). This means that for the user it is relatively easy to find the relevant information. Secondly, Cochrane reviews have been shown to be (on average) of higher methodological quality (Jadad et al. 2000, p. 539) and particularly show more comprehensive efforts to identify trials and reduce publication bias (Egger and Smith 1998, p. 65). Thirdly, Cochrane reviews should be regularly updated so that the user has access to some of the most recent information. Finally, the focus on trial quality and assessment of bias is a key factor in Cochrane reviews and as a result, studies have shown that Cochrane reviews on average tend to have more conservative and cautious conclusions (Tricco et al. 2019, p. 385).
8.5.1
What Features Indicate that Cochrane Reviews Are Reliable?
Taking into consideration all of these factors, a high-quality review should include the following: 1. An ‘a priori’ design (planning in advance with publication of the protocol). 2. A duplicate study selection and data extraction process (to reduce bias). 3. A comprehensive literature search (including multiple sources and languages and access to ‘grey’ literature; see Sections 5.4, 5.5 and 5.6 for search strategies and Section 5.7 for grey literature). 4. A presentation of a list of studies (both included and excluded) with their key characteristics and reasons for their inclusion or exclusion (see Sections 6.2 and 6.3). 5. An assessment of the scientific quality of the included studies plus documentation and appropriate use when formulating conclusions (see Section 6.4). 6. Appropriate methods to combine the results of studies (meta-analysis or alternative method of data synthesis). 7. An assessment of the likelihood of publication bias. 8. Clear conclusions that are supported by the quality features listed above.
8.6
Challenges for the Cochrane Collaboration
The Cochrane Collaboration has been an extremely ambitious initiative rivalling the Human Genome Project in its scope, however, it faces some major challenges. Firstly, it is trying to implement rigorous quality standards, but it is reliant upon volunteer authors with limited time available. Secondly, reviews now have to be prioritised as resources are finite. These challenges particularly apply when keeping
8.6 Challenges for the Cochrane Collaboration
311
large complex reviews up-to-date. Thirdly, there is the challenge of maintaining and developing review quality in the face of methodological developments. Finally, there are limited resources to support central editorial activities; in countries like the United Kingdom there has been fortunately good governmental support, but many other countries have had to develop different solutions.
8.7
Key Points
• The last three decades have seen a dramatic increase in the number of clinical trials in healthcare but it is rare for an individual trial to produce conclusive results that change clinical practice. • Viewing trials together (in systematic reviews) can give important insights and reassurances about the applicability of trial results to routine healthcare. • Supported by the Cochrane Collaboration, the Cochrane Library provides a format for maintaining large complex systematic reviews and they provide a wide range of high-quality reviews. • Major future challenges remain with resourcing, prioritisation, and keeping reviews up-to-date.
References Chalmers I (1991) Improving the quality and dissemination of reviews of clinical research. In Locke S (ed) The future of medical journals: in commemoration of 150 years of the British Medical Journal. BMJ, London, pp 127–146 Cochrane AL (1979) 1931–1971: A critical review with particular reference to the medical profession. In: Teeling-Smith G (ed) Medicines for the year 2000. Office of Health Economics, London, pp 1–11 Egger M, Smith GD (1998) Meta-analysis bias in location and selection of studies. BMJ 316 (7124):61–66. https://doi.org/10.1136/bmj.316.7124.61 Egger M, Smith GD, Schneider M, Minder C (1997) Bias in meta-analysis detected by a simple, graphical test. BMJ 315(7109):629–634. https://doi.org/10.1136/bmj.315.7109.629 Higgins JPT, Thomas J, Chandler J, Cumpston M, Li T, Page MJ, Welch VA (2019) Cochrane handbook for systematic reviews of interventions, 2nd edn. Wiley, Chichester Jadad AR, Moher M, Browman GP, Booker L, Sigouin C, Fuentes M, Stevens R (2000) Systematic reviews and meta-analyses on treatment of asthma: critical evaluation. BMJ 320 (7234):537–540. https://doi.org/10.1136/bmj.320.7234.537 Langhorne P, Williams BO, Gilchrist W, Howie K (1993) Do stroke units save lives? The Lancet 342(8868):395–398. https://doi.org/10.1016/0140-6736(93)92813-9 Roberts D, Brown J, Medley N, Dalziel SR (2017) Antenatal corticosteroids for accelerating fetal lung maturation for women at risk of preterm birth. Cochrane Database Syst Rev (3). https:// doi.org/10.1002/14651858.CD004454.pub3 Sackett DL, Richardson WS, Rosenberg W, Haynes RB (1997) Evidence-based medicine–how to practice and teach EBM. Churchill Livingstone, New York
312
8 Meta-Analysis in Action: The Cochrane Collaboration
Stroke Unit Trialists Collaboration (2013) Organised inpatient (stroke unit) care for stroke. Cochrane Database Syst Rev (9). https://doi.org/10.1002/14651858.CD000197.pub3 The Cochrane Collaboration (n.d.) Cochrane Central Register of Controlled Trials. Accessed from: https://www.cochranelibrary.com/central/ Tricco AC, Tetzlaff J, Pham B, Brehaut J, Moher D (2009) Non-Cochrane vs. Cochrane reviews were twice as likely to have positive conclusion statements: cross-sectional study. J Clin Epidemiol 62(4):380–386.e381. https://doi.org/10.1016/j.jclinepi.2008.08.008
Chapter 9
Other Quantitative Methods
In addition to meta-analysis, as presented in the previous chapters, more methods for quantitative analysis and syntheses are available. Some of these are extensions of meta-analysis, whereas others are different ways of performing quantitative analysis. Such methods and approaches are quite varied and are used for different purposes. Therefore, this chapter will look at some methods available for quantitative analysis and syntheses other than meta-analysis, and discuss their application. To this purpose, the chapter presents five methods for quantitative analysis and synthesis. Section 9.1 goes into detail about network meta-analysis, which is particularly useful for comparing interventions when some comparisons between interventions are missing. An example is discussed to show key considerations for the application of this method. Section 9.2 describes the approach of best-evidence synthesis. Particularly, this method may be useful for accommodating complexities and subtleties when interventions and practices are influenced by a complex of factors. Two illustrations of the method are given. Section 9.3 provides insight in the use of qualitative modelling, a method used less for protocol-driven literature reviews. It covers ‘good practice’ for qualitative modelling. It is followed by the use of bibliometric analysis for literature reviews in Section 9.4. Common methods for bibliometric analysis are presented and some caveats described. Also, the need to complement this type of analysis with other methods gets attention. In Section 9.5, a method for the systematic quantitative literature review consisting of 15 steps is set out. This type of literature review works well when meta-analysis is not possible due to the diversity of literature. The method aims at locating where there is literature and identifying gaps. By presenting these five methods to quantitative synthesis, the chapter extends meta-analysis to incomplete information when conducting more complex comparisons (i.e., network meta-analysis) and offers four alternative approaches for undertaking systematic literature reviews depending on the purpose of the literature review.
© Springer Nature Switzerland AG 2022 R. Dekkers et al., Making Literature Reviews Work: A Multidisciplinary Guide to Systematic Approaches, https://doi.org/10.1007/978-3-030-90025-0_9
313
314
9.1
9 Other Quantitative Methods
Network Meta-Analysis: Making More Than One Comparison
The standard approach to systematic review and meta-analysis involves a single comparison of two groups (for example a drug treatment group and a control group). The question being asked is then simply whether one treatment is superior to the other. However, many research questions involve more complex comparisons (for example, which is the best of all the available treatments?). This requires more sophisticated meta-analysis approaches. A hypothetical example would be where clinical trials have compared two treatments (A vs. B), but as research has progressed people have begun to compare treatment B with a new alternative (trials of B vs. treatment C). However, someone who wishes to make decisions about the most appropriate treatment would also want to know how A compared with C. Network meta-analysis is a technique for comparing three or more treatments by examining a network of studies that incorporate a range of comparisons (Chaimani et al. 2019). The standard approach is to identify comparisons that are both ‘direct’ (two treatments have been directly compared within a trial) and ‘indirect’ (where inference is necessary about the comparison from other studies that have not directly compared the two treatments). This network of studies is then used to estimate the relative effects for each pair of treatment comparisons. Such an approach can provide more estimates of treatment effects as well as a ranking of the different treatments from the most effective to the least effective. Figure 9.1 illustrates a hypothetical example where clinical trials have carried out direct comparisons of A versus B and of B versus C but any comparison of A versus C requires an indirect estimate using the other studies. This method for making comparisons for estimating effects where no direct study is available has advantages but has also received criticism (e.g., Kanters et al. 2016). The main advantage of network meta-analysis is the ability to use all the available evidence (from both direct and indirect comparisons) to provide more precise estimates of the effects of a treatment and also to rank them according to the most effective. The main criticism of network meta-analysis is that its key principle is the use of indirect comparisons which are not based on a randomised controlled trial comparing two treatments. Instead it uses a number of assumptions to estimate the results of a comparison when this has never been carried out in any empirical research study. Therefore, network meta-analysis incorporates a number of concepts and assumptions that need to be understood and considered.
9.1.1
Key Considerations in Network Meta-Analysis
The first consideration is whether indirect comparisons are valid. As mentioned above, the key characteristic of a network meta-analysis is the use of a network of
9.1 Network Meta-Analysis: Making More Than One Comparison
315
A Direct comparison in trials Indirect comparison (inferred from A vs B and B vs C)
C
B
Fig. 9.1 Simple network of comparisons. This diagram shows the principle of comparing treatments. In this case trials have taken place to compare treatment A with B and B with C. An indirect comparison of treatment A with C is possible, because the difference can be deducted by comparing the effects of B versus A and B versus C.
studies directly comparing different treatments (direct comparisons) to generate estimates for indirect comparisons where no such study has taken place. So, in Figure 9.1 there are direct comparisons of A versus B and of B versus C. The estimate (indirectly) for the effect of A versus C is as follows: Effect of A versus C ¼ ðeffect of A versus BÞ ðeffect of B versus C Þ
ð9:1Þ
Therefore, the indirect comparisons provide observational evidence across randomised controlled trials, which will be subject to all the potential confounding and biases of observational comparisons (Chaimani et al. 2019). The validity of an indirect comparison depends on the different sets of trials in the network of studies being broadly similar in all important factors; the term used to describe this is ‘transitivity’. Transivity is the second consideration for network meta-analysis. It means that intervention A is consistent with respect to characteristics that might affect its effectiveness regardless of whether it appears in studies of A versus B or the studies of A versus C. For example, if treatment A is a drug that is used in higher doses when compared with drug B but lower doses when compared with drug C, then the reviewers would need to be convinced that the comparisons were sufficiently similar to justify combining them in the same network. Another way of describing this is to state that all competing interventions should be ‘jointly randomisable’. This means that if they were all included in one single (three arm) study then missing interventions would only occur through chance and not some process of study design. There is no simple rule to demonstrate whether the transitivity assumption is valid in a review, but reviewers must consider whether key characteristics that might modify the effectiveness of a treatment (effect modifiers) are evenly spread throughout the different trials in the network.
316
9 Other Quantitative Methods
A third consideration is consistency (also called coherence). It indicates that if the network meta-analysis is valid then the results of direct comparisons between two treatments and indirect comparisons of those two treatments should differ only by chance. In other words, where there are both direct and indirect estimates comparing two treatments the indirect estimates are consistent with the direct estimates. Furthermore, this requirement should hold for every part (loop) within the network of studies. In practice, this is tested by comparing the results (and confidence intervals) for direct and indirect comparisons, and then comparing the difference between those two estimates.
9.1.2
Example of Network Meta-Analysis
These concepts are illustrated using an example of a network meta-analysis. Section 8.4 introduced the Cochrane review of organised